Detail

Prediction when fitting simple models to high-dimensional data

Author(s)
Lukas Steinberger, Hannes Leeb
Abstract

We study linear subset regression in the context of a high-dimensional linear model. Consider y=ϑ+θ′z+ϵ with univariate response y and a d-vector of random regressors z, and a submodel where y is regressed on a set of p explanatory variables that are given by x=M′z, for some d×p matrix M. Here, “high-dimensional” means that the number d of available explanatory variables in the overall model is much larger than the number p of variables in the submodel. In this paper, we present Pinsker-type results for prediction of y given x. In particular, we show that the mean squared prediction error of the best linear predictor of y given x is close to the mean squared prediction error of the corresponding Bayes predictor E[y∥x], provided only that p/logd is small. We also show that the mean squared prediction error of the (feasible) least-squares predictor computed from n independent observations of (y,x) is close to that of the Bayes predictor, provided only that both p/logd and p/n are small. Our results hold uniformly in the regression parameters and over large collections of distributions for the design variables z.

Organisation(s)
Department of Statistics and Operations Research, Research Network Data Science
External organisation(s)
Albert-Ludwigs-Universität Freiburg
Journal
Annals of Statistics
Volume
47
Pages
1408-1442
No. of pages
35
ISSN
0090-5364
DOI
https://doi.org/10.1214/18-AOS1719
Publication date
06-2019
Peer reviewed
Yes
Austrian Fields of Science 2012
101029 Mathematical statistics
Keywords
ASJC Scopus subject areas
Statistics and Probability, Statistics, Probability and Uncertainty
Portal url
https://ucris.univie.ac.at/portal/en/publications/prediction-when-fitting-simple-models-to-highdimensional-data(7b9c8cd5-daa6-443a-8d56-b2a0756ad229).html