2.1.1.1 Huffman codes

A brief introduction to codes theory

A code over an alphabet U is any set C U⁺ satisfying
( u₁,...,u_m,v₁,...,v_n C ) ( u₁...u_m = v₁...v_n u₁ = v₁ )
An encoding of an input source s V⁺ is a bijective mapping:
code:V C.
Consider an alphabet an alphabet U and C U⁺. C is called a prefix set if for any word c Cthere is no word c' such that c is a prefix of c'.

Remark 1. Consider U is an alphabet and C U⁺. If C is a prefix set then C is a code.
Indeed, suppose by contradiction that C is a prefix set but is not a code. Then:
( u₁,...,u_m,v₁,...,v_n C ) ( u₁...u_m = v₁...v_n u₁ != v₁)
Since u₁ and v₁ are prefixes in the same word and u₁ != v₁ , clearly | u₁| != |v₁|. Without the loos of the generality we can suppose that | u₁| < |v₁| .But since u₁ and v₁ are both prefixes in the same word, then the symbol in the first position in u₁ is the same as the symbol in the first position in |v₁,... , the symbol in the |u₁|-th position in u₁ is the same as the symbol in the |u₁|-th position in v₁ . Therefore, u₁ is a prefix of v₁.

Prefix codes are also called instantaneous codes. Any instantaneous codes C U⁺ satisfies Kraft inequality:

The average code-length of C with respect to the probabilities distributions of symbols form V in the input source s is:

Shannons's fundamental noiseless source coding theorem says that entropy defines a lower limit of the average number of bits needed to encode the source symbols. In other word considering any encoding of an input source the average code-length is greater then entropy.

Huffman codes

Huffman compression technique is probably one of the most famous compression method. Since its development, in 1952, this method has been the subject of intensive research in data compression theory, being used as a stand-alone method or combined with other techniques.
This is a statistical method that uses variable-size codes, assigning shorter codes to symbols or groups of symbols that appear more often in the input source.
There are two main problems using variable-size codes:

use codes that can be unambiguously decoded, and
minimize the average size of the code.

The first problem is solved using prefix sets.
The second problem (minimizing the average size of the code) is more difficult.
Consider V an alphabet, and s V⁺ an input source. For any symbol a V consider p_a the probability of occurrence of this symbol in the input source. Then

=1.
Without loss of generality we may assume that for any symbol a V, p_a > 0. Let U={0,1} be a binary alphabet, and let C U⁺ be a code (prefix set), and consider the mapping:
code:V C.
The following restrictions will be imposed to this mapping [1]

for any two symbols a,b V , if p_a p_b then |code(a)| |code(b)|.
considering L the maximum size of a word in C, then there are two codes in C with length L such that they are alike except for their last digit.
each possible sequence of L digits must be used either as a code or must have one of its prefixes used as a message code.

Following the restrictions above we can conclude with the building code procedure:

if |V|=2 then V={a,b} and code(a)=0 and code(b)=1.
if |V|>2 then select from V two symbols with smallest probabilities (call them a and b). Then code(a)=code(#)0, code(b)=code(#)1, where # is a new symbol with p_# = p_a + p_b, and code(#) is computed applying the same procedure with the alphabet V \ {a,b} {#}.

In order to encode the input source s, symbols are replaced with the Huffman code associated, and a word B(s) is obtained.
This recursive procedure suggest the construction of a tree (actually, any prefix code can be represented as a tree). The following example will explain better the construction of a Huffman code using a tree (called Huffman tree). The labels of nodes consist of pairs (symbol, probability). Each edge descending form a node is either 0 or 1. This tree is traversed to determinate the codes of the symbols. In order to obtain the original word s from the word B(s), decoder should know the statistical model of the encoder (the probabilities of the symbols in the input source). Using those probabilities, the decoder can also build the same Huffman tree. It starts reading the word B(s) bit by bit. The Huffman tree is traversed starting from the root, according to the current bit from B(s). If a leaf is reached, decoder outputs the symbol corresponding to that leaf and return to the root of the Huffman tree. This process is repeated until all bits in B(s) are processed.

Remark 2. A Huffman code is not unique, relative to a distribution probabilities of symbols in an input source.

Theorem 1.Consider s V^* an input source and C

{0,1}⁺ a Huffman code as above. Then, for all prefix codes C',
L_C L_C'.