English 清华大学 旧版入口 人才招聘

论坛讲座

【系综合学术报告】2024年第27期 || Locally Dependent Mixed Membership Estimation for High-dimensional Categorical Data

报告题目: Locally Dependent Mixed Membership Estimation for High-dimensional Categorical Data

报告人:Yuqi Gu ( Columbia University)

时间:2024年6月21日10:00--11:30

地点:理科楼A404

摘要:Mixed membership models are popular individual-level mixture models widely used in various fields including network analysis, topic modeling, and multivariate categorical data analysis. This work focuses on mixed membership models for multivariate categorical data, which are also called Grade of Membership (GoM) models. GoM models drastically increase the modeling flexibility of latent class models by allowing each individual to partially belong to multiple extreme latent profiles. However, such flexibility also comes with challenging identifiability and estimation issues, especially for high-dimensional polytomous (categorical with over two categories) data. Such data take the form of a three-way (quasi)-tensor, with N subjects responding to J items each with C categories in the simplest case. Existing estimation methods based on maximum likelihood or Bayesian MCMC inference are not computationally efficient and lack theoretical guarantees in high dimensions. We propose an SVD-based spectral method for high-dimensional polytomous GoM models with potential local dependence. We innovatively flatten the three-way (quasi)-tensor into a ``fat'' matrix and exploit the singular subspace geometry based on the matrix SVD for estimation. We establish fine-grained finite-sample entrywise error bounds for all parameters. Moreover, we develop novel two-to-infinity singular subspace perturbation theory under arbitrary local dependent noise, which are of independent interest. Simulations and applications to real-world data in genetics, political science, and single-cell sequencing demonstrate the merit of the proposed method.

个人简介:Yuqi Gu is an Assistant Professor in the Department of Statistics at Columbia University. She is also a member of Columbia’s Data Science Institute. Before joining Columbia in 2021, she did a one-year postdoc at Duke University. She received a PhD in Statistics from the University of Michigan in 2020 and a BS in Mathematics from Tsinghua University in 2015. Yuqi’s research centers around investigating unobserved latent structures widely present in statistics, machine learning, psychometrics and other applications. Specifically, she recently works on identifiability and estimation of deep generative models, high-dimensional statistics and spectral methods for latent structures, and latent variable modeling for educational, psychological, and biomedical data. Her work has been published in Journal of the American Statistical Association, Journal of the Royal Statistical Society Series B, Annals of Statistics, Journal of Machine Learning Research, Psychometrika, among others.

邀请人: 杨瑛