Terpene classification using mass spectra data and machine learning
LE3 .A278 2022
Silver, Daniel L.
Master of Science
Traditionally, separation and classification of terpenes from food samples is done with thehelp of two processes, namely: Gas Chromatography (GC) (separate terpenes from each other) and Mass Spectrometry (MS) (detects the separated terpenes). A mass spectrum is matched to known terpenes by a chemist using an existing database of various samples. We propose a machine learning solution that detects multiple terpenes from mass spectra without separating them. Synthetic data is generated using six common terpenes, cannabid-iol (CBD) and tetrahydrocannabinol (THC) using an additive composition model with five percent resolution in each component. The six base terpenes included in the current study are myrcene, linalool, limonene, α-pinene, β-pinene, and trans-caryophyllene, referred to as target chemicals. These terpenes are found in many food products that we consume in our day-to-day lives. There are 50,388 possible mass spectra combinations of these terpenes. The research investigates the development of a machine learning approach that classifies and quantifies the target chemicals. The system reads the mass spectra and automatically detects target chemicals that are present and determines their relative proportions. Various supervised and unsupervised classification and regression models are built and tested.Principal Component Analysis (PCA) of the eight target chemicals is performed to test how distinguishable the terpenes are. Classification of the major contributor target chemical with the help of artificial neural networks and a deep learning algorithm results in an accuracy of 99.1% for target chemicals respectively based on a train-test split methodology. Regression results with the help of ANNs and a single-layered model result in a mean absolute error of 0.001222 for target chemicals based on a train-test split methodology.
The author grants permission to the University Librarian at Acadia University to reproduce, loan or distribute copies of my thesis in microform, paper or electronic formats on a non-profit basis. The author retains the copyright of the thesis.