Development of statistical tools to analyze young athletes' biomechanics during movement tasks
LE3 .A278 2020
Bachelor or Science
Mathematics and Statistics
Mathematics & Statistics
Every year in the United States alone, over 200 000 people tear their anterior cruciate ligament (ACL) which is a primary stabilizer of the knee. Of these injuries, 100 000 need reconstructive surgery, and this costs 2 billion USD and places a significant financial burden on the healthcare system. Non-contact sports such as basketball, soccer and volleyball contribute to 70% of all ACL injuries. Female athletes aged 15 to 25 years are two to eight times more likely to tear their ACL compared to their male counterparts. The purpose of this study was to investigate fitting high-dimensional logistic regression models to identify features of athletes' biomechanical waveform data collected at the John MacIntyre motion Lab of Applied Biomechanics. The waveforms were individually summarized by a regression of measured values vs time, using B-spline basis functions. This summarized and reduced the dimensionality from 2424 columns of waveform data down to 264. Each column of waveform data following the B-spline reduction represented a tenth of its original waveform. The B-spline coefficients were then used as the predictors in the logistic regression model with shrinkage penalty. The shrinkage method used in this thesis was the elastic net, which is a hybrid between ridge regression and lasso regression. The elastic net penalty was used since there were more predictor variables than observations, and there was high pairwise correlation between predictors coming from the same waveform. The process was tested on three different classification problems, with a 2-class problem and two 3-class problems. The analysis used 20% of the data as a test set. Within the remaining 80%, 5-fold cross-validation was used to select the elastic net penalty parameters. A logistic regression with elastic net penalty was used to predict participants' age class (young vs old), puberty class (pre vs during vs post-puberty) and a mix of puberty and gender (pre-puberty male and females vs post-puberty males vs post-puberty females). For classifying young vs old participants the model performed with 89% test set accuracy. For classifying pre vs during vs post-puberty the model performed with approximately 67% accuracy. When classifying pre-puberty males and females vs post-puberty males vs post-puberty females the model achieved an accuracy of 95%. The elastic net logistic regression models were able to correctly identify areas of difference between classes' movement patterns. The model built using the elastic net penalty performed significantly better than models built using the ridge regression or lasso regression penalty. The models created were able to accurately separate classes and identify differences in movement patterns while maintaining high levels of interpretability
The author retains copyright in this thesis. Any substantial copying or any other actions that exceed fair dealing or other exceptions in the Copyright Act require the permission of the author.