Grouped Heterogeneous Mixture Modeling for Clustered Data

3 Apr 2018  ·  Shonosuke Sugasawa ·

Clustered data is ubiquitous in a variety of scientific fields. In this paper, we propose a flexible and interpretable modeling approach, called grouped heterogenous mixture modeling, for clustered data, which models cluster-wise conditional distributions by mixtures of latent conditional distributions common to all the clusters. In the model, we assume that clusters are divided into a finite number of groups and mixing proportions are the same within the same group. We provide a simple generalized EM algorithm for computing the maximum likelihood estimator, and an information criterion to select the numbers of groups and latent distributions. We also propose structured grouping strategies by introducing penalties on grouping parameters in the likelihood function. Under the settings where both the number of clusters and within-cluster sample sizes tend to infinity, we present asymptotic properties of the maximum likelihood estimator and selection of the numbers of groups and latent distributions. We demonstrate the proposed method through simulation studies and an application to crime risk modeling in Tokyo.

PDF Abstract


  Add Datasets introduced or used in this paper