| A Brief History
of Data Compression
The late 40's were the early years of Information
Theory, the idea of developing efficient new coding methods was just
starting to be fleshed out. Ideas of entropy, information content
and redundancy were explored. One popular notion held that if the
probability of symbols in a message were known, there ought to be a way
to code the symbols so that the message will take up less space.
The first well-known method for compressing digital
signals is now known as Shannon-Fano coding. Shannon and Fano
[~1948] simultaneously developed this algorithm which assigns binary codewords
to unique symbols that appear within a given data file. While Shannon-Fano
coding was a great leap forward, it had the unfortunate luck to be quickly
superseded by an even more efficient coding system : Huffman Coding.
Huffman coding
[1952] shares most characteristics of Shannon-Fano coding. Huffman coding
could perform effective data compression by reducing the amount of redundancy
in the coding of symbols. It has been proven to be the most efficient fixed-length
coding method available.
In the last fifteen years, Huffman coding
has been replaced by arithmetic coding. Arithmetic coding bypasses
the idea of replacing an input symbol with a specific code. It replaces
a stream of input symbols with a single floating-point output number. More
bits are needed in the output number for longer, complex messages.
Dictionary-based compression algorithms use a completely different method to
compress data. They encode variable-length strings of symbols as single tokens.
The token forms an index to a phrase dictionary. If the tokens
are smaller than the phrases, they replace the phrases and
compression occurs.
Two dictionary-based compression techniques
called LZ77 and LZ78 have been developed. LZ77 is a
"sliding window" technique in which the dictionary consists
of a set of fixed-length phrases found in a "window" into
the previously seen text. LZ78 takes a completely different approach
to building a dictionary. Instead of using fixed-length phrases from a window
into the text, LZ78 builds phrases up one symbol at a time,
adding a new symbol to an existing phrase when a match
occurs.
|