Heterogeneous Matrix Factorization: When Features Differ by Datasets

28 May 2023  ·  Naichen Shi, Raed Al Kontar, Salar Fattahi ·

In myriad statistical applications, data are collected from related but heterogeneous sources. These sources share some commonalities while containing idiosyncratic characteristics. One of the most fundamental challenges in such scenarios is to recover the shared and source-specific factors. Despite the existence of a few heuristic approaches, a generic algorithm with theoretical guarantees has yet to be established. In this paper, we tackle the problem by proposing a method called Heterogeneous Matrix Factorization to separate the shared and unique factors for a class of problems. HMF maintains the orthogonality between the shared and unique factors by leveraging an invariance property in the objective. The algorithm is easy to implement and intrinsically distributed. On the theoretic side, we show that for the square error loss, HMF will converge into the optimal solutions, which are close to the ground truth. HMF can be integrated auto-encoders to learn nonlinear feature mappings. Through a variety of case studies, we showcase HMF's benefits and applicability in video segmentation, time-series feature extraction, and recommender systems.

PDF Abstract

Datasets