Classification models for transactional graph data
LE3 .A278 2006
2006
Chipman, Hugh
Acadia University
Master of Science
Masters
Mathematics and Statistics
Mathematics & Statistics
Of primary interest in this thesis is the general supervised learning problem of studying communication patterns and classifying data on graph networks. Consider the nodes on a graph as communicators, and edges on this graph as messages sent/received between unique nodes. We describe this graph data using summarized features derived from our graph for each node. We then consider 2-class classification, using our features to determine whether or not a node belongs to some class of interest. We consider several sub-problems, including feature selection, and “unbalanced†2-class classification, in which one rare class is of special interest. A concrete example using e-mails related to the Enron scandal will be used to highlight these techniques. Our results show that a handful of features is enough to provide adequate models and that techniques such as random forests and boosted trees prove to be good classifiers, as measured by misclassification rate. A second dataset corresponding to the unbalanced case is also considered. Again, a small number of features could be used to produce accurate predictions, as measured by the area under a lift curve.
The author retains copyright in this thesis. Any substantial copying or any other actions that exceed fair dealing or other exceptions in the Copyright Act require the permission of the author.
https://scholar.acadiau.ca/islandora/object/theses:115