A document exploring system on LDA topic model for Wikipedia articles

Tong, Zhou

Tong, Z. (2016). A document exploring system on LDA topic model for Wikipedia articles. https://scholar.acadiau.ca/islandora/object/theses:1376

Search for this publication on Google Scholar

Details:

Title: A document exploring system on LDA topic model for Wikipedia articles
Author: Tong, Zhou
Call Number: LE3 .A278 2016
Date: 2016
Supervisor: Zhang, Haiyi
Degree Grantor: Acadia University
Degree Name: Master of Science
Degree Level: Masters
Discipline: Computer Science
Affiliation: Computer Science
Abstract: Organizing and exploring millions of documents, papers and other text information becomes a challenge for researchers and publishers. As machine learning techniques are quickly developed and widely used, a new text mining method called topic model was proposed in 2003. The topic model is based on Latent Dirichlet allocation (LDA) and has drawn much attention since it was introduced. LDA topic model is a probabilistic model, which can process text documents and exhibit hidden topics. Compared to other document processing methods working on content directly, the LDA topic model processes documents to topic distributions. The results are easier to understand, categorize and compare. Most importantly, topics make more sense to humans than structured machine formats. In the thesis, we briefly introduce the background knowledge of LDA topic model and its working principles. Then we deeply explain how to apply LDA topic model to a text corpus by doing experiments on Simple Wikipedia documents. The experiments include all necessary steps of data retrieving, pre-processing, fitting the model and evaluations. The result of the experiments shows the LDA topic model working effectively on document clustering and fnding similar documents. Meanwhile, based on LDA topic model, we propose a document exploring system which allows users to organize and explore the documents by topic where related documents are easier to fnd and access.
Rights: The author retains copyright in this thesis. Any substantial copying or any other actions that exceed fair dealing or other exceptions in the Copyright Act require the permission of the author.
Permanent Link: https://scholar.acadiau.ca/islandora/object/theses:1376

Acadia Scholar

A document exploring system on LDA topic model for Wikipedia articles

Details:

Bookmarks: