...... continued from The Need For Compression

A Brief History of Data Compression

The late 40's were the early years of Information Theory, the idea of developing efficient new coding methods was just starting to be fleshed out. Ideas of entropy, information content and redundancy were explored. One popular notion held that if the probability of symbols in a message were known, there ought to be a way to code the symbols so that the message will take up less space.

The first well-known method for compressing digital signals is now known as Shannon-Fano coding. Shannon and Fano [~1948] simultaneously developed this algorithm which assigns binary codewords to unique symbols that appear within a given data file. While Shannon-Fano coding was a great leap forward, it had the unfortunate luck to be quickly superseded by an even more efficient coding system : Huffman Coding.

Huffman coding [1952] shares most characteristics of Shannon-Fano coding. Huffman coding could perform effective data compression by reducing the amount of redundancy in the coding of symbols. It has been proven to be the most efficient fixed-length coding method available.

In the last fifteen years, Huffman coding has been replaced by arithmetic coding. Arithmetic coding bypasses the idea of replacing an input symbol with a specific code. It replaces a stream of input symbols with a single floating-point output number. More bits are needed in the output number for longer, complex messages. 

Dictionary-based compression algorithms use a completely different method to compress data. They encode variable-length strings of symbols as single tokens. The token forms an index to a phrase dictionary. If the tokens are smaller than the phrases, they replace the phrases and compression occurs.

Two dictionary-based compression techniques called LZ77 and LZ78 have been developed. LZ77 is a "sliding window" technique in which the dictionary consists of a set of fixed-length phrases found in a "window" into the previously seen text. LZ78 takes a completely different approach to building a dictionary. Instead of using fixed-length phrases from a window into the text, LZ78 builds phrases up one symbol at a time, adding a new symbol to an existing phrase when a match occurs.


 Previous Page

 Home

 Next Section