An automatic news article filtering engine
LE3 .A278 1998
1998
Watters, Carolyn
Acadia University
Master of Science
Masters
Computer Science
With the rapidly growing amount of information on the internet, many readers face the increasingly serious problem of overwhelming incoming news data (text, photos, and video). The ability to filter away irrelevant information is becoming critical to a news delivery system. This thesis introduces a new methodology which binds a news representation object to a news document. When users perform a search to retrieve news documents, they can acquire similar ones from other resources. First this thesis presents an algorithm for creating a representation of news object. Based on the analysis of actual collections of newspaper articles, a news representation object was created which captured the regularities of the news documents. Four types of descriptive features can be extracted from news documents. They are: Person, Event Data, Event Location and Organization. Second this thesis presents an algorithm to calculate the similarity between two news objects. We assume that two documents are related if their objectsoverlap. Results indicate that the representation of news objects can be used to quickly sieve through news documents for meaningful information. The algorithms for calculating a relationship between two news objects can be used as a fast way to filter away irrelevant news documents.
The author retains copyright in this thesis. Any substantial copying or any other actions that exceed fair dealing or other exceptions in the Copyright Act require the permission of the author.
https://scholar.acadiau.ca/islandora/object/theses:2832