Two new approaches for compressing XML
LE3 .A278 2005
Master of Science
The Extensible Markup Language (XML) has emerged as a popular format for associating semantics with data. While XML offers the advantages of flexibility and extensibility, its expressiveness leads to greater verbosity. It is not uncommon for an XML representation of a data set to be five to ten times larger than alternative data encoding formats. The goal of XML compression is to reduce this verbosity without sacrificing the benefits of using an XML representation. This thesis provides an overview of existing XML compressors before introducing two new approaches to XML-conscious compression. The first strategy, called AXECHOP, uses a grammar-based strategy for compressing XML structure. It is intended for use in archiving applications, where reducing disk storage requirements is a top priority. The second presented strategy, TREECHOP, performs an online compression of XML data and supports querying of compressed data without requiring a prior decompression to take place. These features make it particularly well-suited for use in XML messaging applications, where key concerns are the efficient transfer and processing of XML-encoded messages between networked systems.
The author grants permission to the University Librarian at Acadia University to reproduce, loan or distribute copies of my thesis in microform, paper or electronic formats on a non-profit basis. The author retains the copyright of the thesis.