Estimation of the number of spikes using a generalized spike population model and application to RNA-seq data

27 Apr 2018  ·  Choi Hyo Young, Marron J. S. ·

Although a generalized spike population model has been actively studied in random matrix theory, its application to real data has been rarely explored. We find that most methods for determining the number of spikes based on the Johnstone's spike population model choose far too many spikes in RNA-seq gene expression data or often fail to determine the number of spikes by indicating that all components are spikes. In this paper, we propose a new algorithm for the estimation of the number of spikes based on a generalized spike population model. Also, we suggest a new noise model for RNA-seq data based on population spectral distribution ideas, which provides a biologically reasonable number of spikes using the proposed algorithm. Furthermore, we propose a graphical tool for assessing the performance of the underlying noise model.

PDF Abstract
No code implementations yet. Submit your code now

Categories


Methodology

Datasets


  Add Datasets introduced or used in this paper