Design and application of an XML data compression algorithm based on Huffman coding
Author information+
1. College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029;2. China Nuclear Control System Engineering Co. Ltd, Beijing 100176, China
An XML data compression method based on Huffman coding has been proposed for the problem where the accessing rate of a production process report system for a large data source is not high in a certain bandwidth. A data processing class was constructed for XML documents to get a high rate word units in this algorithm. With the help of Huffman coding to code specific unit words, the coded document was compressed by the LZMA compression algorithm. The problem of needing the assistance of the document type definition and XML parser in the traditional XML data compression algorithm was solved using this algorithm, which resulted in a good compression effect. The Huffman-LZMA compression algorithm was constructed and was applied to the production process report system design. The experimental compression ratio of the report data reached about 88%. The bandwidth and storage space were saved effectively, and the report accessing rate was improved.
Design and application of an XML data compression algorithm based on Huffman coding[J]. Journal of Beijing University of Chemical Technology, 2013, 40(4): 120-127
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1]霍志华, 王建林, 薛尧予. 基于工作流和XML的生产报表系统设计与实现[J]. 计算机工程与设计, 2008, 29(16): 4249-4251. Huo Z H, Wang J L, Xue Y Y. Design and implementation of production report system based on workflow and XML[J]. Computer Engineering and Design, 2008, 29(16): 4249-4251. (in Chinese) [2]徐祖铭, 熊友生, 张子鋆, 等. XCompress: 一种XML压缩技术在医院信息直报和追溯系统中的应用[J]. 计算机应用与软件, 2008, 25(12): 130-132. Xu Z M, Xiong Y S, Zhang Z J, et al. XCompress: A kind of XML compress method used in hospital information direct reporting and backdating system[J]. Computer Applications and Software, 2008, 25(12): 130-132. (in Chinese)[3]康萍, 杜来红. 基于XML的物流数据交换技术[J]. 西北大学学报: 自然科学版, 2010, 40(6): 979-982. Kang P, Du L H. Research of logistics data exchange technology based on XML[J]. Journal of Northwest University: Natural Science, 2010, 40(6): 979-982. (in Chinese) [4]赵友桥, 张山山, 路松峰, 等. COX: 高压缩率的中文XML文档压缩技术[J]. 计算机工程与应用, 2012, 48(17): 143-147. Zhao Y Q, Zhang S S, Lu S F, et al. COX: Chinese oriented XML compressor with high compression ratio[J]. Computer Engineering and Applications, 2012, 48(17): 143-147. (in Chinese) [5]钟世明, 邵锐, 张胜, 等. 基于位置服务系统中XML数据流压缩方法[J]. 武汉理工大学学报: 交通科学与工程版, 2006, 30(1): 29-32. Zhong S M, Shao R, Zhang S, et al. XML data stream compression in the location based services[J]. Journal of Wuhan University of Technology: Transportation Science & Engineering, 2006, 30(1): 29-32. (in Chinese) [6]张晓琳, 翟国锋, 谭跃生, 等. 基于动态哈夫曼编码的XML数据流压缩技术[J]. 内蒙古科技大学学报, 2007, 26(4): 331-336. Zhang X L, Zhai G F, Tan Y S, et al. Dynamic Huffman codebased compressing technology over XML data stream[J]. Journal of Inner Mongolia University of Science and Technology, 2007, 26(4): 331-336. (in Chinese) [7]Hashemian R. Condensed table of Huffman coding, a new approach to efficient decoding[J]. IEEE Transactions on Communications, 2004, 52(1): 6-8. [8]Sharma M. Compression using Huffman coding[J]. International Journal of Computer Science and Network Security, 2010, 10(5): 133-141. [9]赵帮, 何倩, 王勇, 等. 基于LZMA和多版本的网页防篡改备份恢复机制[J]. 计算机应用, 2012, 32(7): 1998-2002. Zhao B, He Q, Wang Y, et al. Backup and recovery mechanism of Web antitamper system based on LZMA and multiversion[J]. Journal of Computer Applications, 2012, 32(7): 1998-2002. (in Chinese)