Detail
Prediction when fitting simple models to high-dimensional data
- Author(s)
- Lukas Steinberger, Hannes Leeb
- Abstract
We study linear subset regression in the context of a high-dimensional linear model. Consider y=ϑ+θ′z+ϵ with univariate response y and a d-vector of random regressors z, and a submodel where y is regressed on a set of p explanatory variables that are given by x=M′z, for some d×p matrix M. Here, “high-dimensional” means that the number d of available explanatory variables in the overall model is much larger than the number p of variables in the submodel. In this paper, we present Pinsker-type results for prediction of y given x. In particular, we show that the mean squared prediction error of the best linear predictor of y given x is close to the mean squared prediction error of the corresponding Bayes predictor E[y∥x], provided only that p/logd is small. We also show that the mean squared prediction error of the (feasible) least-squares predictor computed from n independent observations of (y,x) is close to that of the Bayes predictor, provided only that both p/logd and p/n are small. Our results hold uniformly in the regression parameters and over large collections of distributions for the design variables z.
- Organisation(s)
- Department of Statistics and Operations Research, Research Network Data Science
- External organisation(s)
- Albert-Ludwigs-Universität Freiburg
- Journal
- Annals of Statistics
- Volume
- 47
- Pages
- 1408-1442
- No. of pages
- 35
- ISSN
- 0090-5364
- DOI
- https://doi.org/10.1214/18-AOS1719
- Publication date
- 06-2019
- Peer reviewed
- Yes
- Austrian Fields of Science 2012
- 101029 Mathematical statistics
- Keywords
- ASJC Scopus subject areas
- Statistics and Probability, Statistics, Probability and Uncertainty
- Portal url
- https://ucrisportal.univie.ac.at/en/publications/prediction-when-fitting-simple-models-to-highdimensional-data(7b9c8cd5-daa6-443a-8d56-b2a0756ad229).html