Parallelization of XML compressor: XSAQCT
LE3 .A278 2013
Bachelor of Computer Science
The Extensible Markup Language, XML has become a de facto standard for semi-structured data, with many considering it to be the lingua franca of web services. XML was cre-ated with the intent of being a self-describing, extensible, and flexible language for the interchange of information. For all of its advantages, the major downside of any markup language, especially XML, is that marking up the data significantly increases the size of the original data, rendering it unusable for some applications. In other words, markup languages are verbose. Therefore, compressing XML documents for e cient storage and transmission is becoming an ever-increasing requirement. This thesis explores the theoretical and practical applications of parallel programming applied to the XML compression, based on our novel XML-conscious, queryable and up-dateable compressor, called XSAQCT. Using constructs from the parallel programming model, the theory behind parallel programming, the question of “is XSAQCT is inherently sequential” was analyzed. It was determined that both the compression and decompression can be e ciently parallelized using a standard parallel random access machine. To examine practical applications, several versions of parallel XSAQCT were developed to determine which version will best take advantage of parallel processing by reducing the compression and decompression time while preserving or improving the compression rate. Finally, we designed a new XML grammar, which focuses only on aspects of XML compression. Our work has shown that the original design of XSAQCT can be modified to create e cient and scalable parallel implementations that are significantly better than other known sequential XML compressors.
The author grants permission to the University Librarian at Acadia University to reproduce, loan or distribute copies of my thesis in microform, paper or electronic formats on a non-profit basis. The author retains the copyright of the thesis.