Selected Methods for non-Gaussian Data Analysis

23 Nov 2018  ·  Krzysztof Domino ·

The basic goal of computer engineering is the analysis of data. Such data are often large data sets distributed according to various distribution models. In this manuscript we focus on the analysis of non-Gaussian distributed data. In the case of univariate data analysis we discuss stochastic processes with auto-correlated increments and univariate distributions derived from specific stochastic processes, i.e. Levy and Tsallis distributions. Deep investigation of multivariate non-Gaussian distributions requires the copula approach. A copula is an component of multivariate distribution that models the mutual interdependence between marginals. There are many copula families characterised by various measures of the dependence between marginals. Importantly, one of those are `tail' dependencies that model the simultaneous appearance of extreme values in many marginals. Those extreme events may reflect a crisis given financial data, outliers in machine learning, or a traffic congestion. Next we discuss higher order multivariate cumulants that are non-zero if multivariate distribution is non-Gaussian. However, the relation between cumulants and copulas is not straight forward and rather complicated. We discuss the application of those cumulants to extract information about non-Gaussian multivariate distributions, such that information about non-Gaussian copulas. The use of higher order multivariate cumulants in computer science is inspired by financial data analysis, especially by the safe investment portfolio evaluation. There are many other applications of higher order multivariate cumulants in data engineering, especially in: signal processing, non-linear system identification, blind sources separation, and direction finding algorithms of multi-source signals.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper