Assessing the accuracy of Bayesian Additive Regression Tree credible intervals
LE3 .A278 2016
Master of Science
Mathematics and Statistics
Mathematics & Statistics
A common type of supervised learning problem is to use training data to estimate a predictive model for a numeric response. Many supervised learning models such as Bayesian Additive Regression Trees (BART) try to flexibly model the data. This Bayesian “sum of trees” model uses MCMC back fitting to simulate posterior samples. BART also provides credible intervals (CIs) for prediction. This thesis studies the accuracy of BART credible intervals and analyzes various factors’ effects on it. These factors include the sample size, dimension, noise standard deviation, predictors’ correlations, junk variables, type of error distribution, and BART method. Simulation is used to compute CI accuracy with a designed experiment that systematically varies the factors to find their effects. Analysis of experimental results gives conclusions about BART CI accuracy. It is found to depend considerably on sample size and error variance.
The author grants permission to the University Librarian at Acadia University to reproduce, loan or distribute copies of my thesis in microform, paper or electronic formats on a non-profit basis. The author retains the copyright of the thesis.