Query classification based on a new query expansion approach
LE3 .A278 2009
2009
Chipman, Hugh Ranjan, Pritam
Acadia University
Master of Science
Masters
Statistics
Mathematics & Statistics
Query classification is an important and yet challenging problem for the search engine industry and e-commerce companies. In this thesis, I develop a query classification system based on a novel query expansion approach and classification methods. The proposed methodology is used to classify queries based on a taxonomy (a database of words and their corresponding topic classification). The taxonomy used was obtained from GenieKnows, a vertical search engine company in Halifax, Canada. The query classification system can be divided into three phases: feature selection, query expansion, and query classification. The rst phase uses a chi-square statistic to select a subset of \feature words" from the GenieKnows taxonomy; the second phase uses cosine similarity and Kullback-Leibler divergence to nd \feature words" similar to the query for query expansion; and nally the third phase introduces three classication methods: naive Bayes multinomial model, naive Bayes Bernoulli model and Dirichlet/multinomial model to classify the expanded queries. Results from the KDD-Cup 2005 competition are used to test the performance of the proposed query classification system. The experiment shows that the performance of the query classification system is quite good.
The author retains copyright in this thesis. Any substantial copying or any other actions that exceed fair dealing or other exceptions in the Copyright Act require the permission of the author.
https://scholar.acadiau.ca/islandora/object/theses:138