报告题目:Optimal subsampling for high-dimensional partially linear models via machine learning methods
报告人:王磊教授(南开大学)
报告时间:2025年11月21日(周五)下午16:00-18:00
报告地点:教2-327会议室
主办单位:太阳成集团tyc
邀请人:张晶
报告内容:
In this paper, we explore optimal subsampling strategies for estimating the parametric regression coefficients in partially linear models with unknown nuisance functions involving high-dimensional and potentially endogenous covariates. To address model misspecifications and the curse of dimensionality, we leverage flexible machine learning (ML) techniques to estimate the unknown nuisance functions. By constructing an unbiased subsampling Neyman-orthogonal score function, we eliminate regularization bias. A two-step algorithm is then used to obtain appropriate ML estimators of the nuisance functions, mitigating the risk of over-fitting. Using martingale techniques, we establish the unconditional consistency and asymptotic normality of the subsample estimators. Furthermore, we derive optimal subsampling probabilities, including A-optimal and L-optimal probabilities as special cases. The proposed optimal subsampling approach is extended to partially linear instrumental variable models to account for potential endogeneity through instrumental variables. Simulation studies and an empirical analysis of the Physicochemical Properties of Protein Tertiary Structure dataset demonstrate the superior performance of our subsample estimators.
报告人简介:
王磊,南开大学统计与数据科学公司教授、博导、百名青年学科带头人。研究方向是统计学习和复杂数据分析,已在统计学期刊Biometrika、JMLR、IEEE TIT、AOAS、Bernoulli、JCGS、Statistica Sinica等发表学术论文多篇,主持3项国家自然科学基金和1项天津市自然科学基金项目。