2.1.2.1 Offline dictionary based compression

The main idea in dictionary compression is to use an indexed set of tokens(or phrases) of the input messages, and replace tokens in with corresponding indexes. Compression is obtained since a reference(pointer) to a phrase can be more compact than the phrase it self.

The basic idea of dictionary compression algorithms

There are two main direction in dictionary compression.
Online techniques build the dictionary incrementally while reading the input sources. Decompressor also build the dictionary incrementally in a similar way, and doing so allows the dictionary to be transmitted implicitly (the case of LZ-algorithms).

Another approach is working offline, using the entire input source to infer a complete dictionary as an initialization phase, and include an explicit representation of the dictionary as part of the compressed message. Having access to the entire the message, while building the dictionary, should optimize the chooice of phrases, maximizing this way the compression performance[1].

Let us present now formally the concept of dictionary compression.

Let V be an alphabet and w V+ an input source over the alphabet V.
Let D = ( di | i I ) be an I-indexed set, called dictionary, where di V+ , i I. The words di are also called phrases. Let v I+, v = v1v2...vk, be a word over I, where vj I, 1 j k, satisfying the property: w = dv1 dv2 ... dvk.

We define code(w) = (D,v) where D and v are as above.
Consider now V is a binary alphabet. Intuitively, w is compressed by (D,v) if:


Remark 1. We can build a Chomsky type 2 grammar G using D and v as above such that the language generated contains only the word w :
L(G) = {w}.
Indeed, let G=(T,N,s0,P) where:
  • T the set of terminals is the alphabet V as above;
  • N the set of nonterminals is the set I as above;
  • s0, the starting symbol, is a new symbol;
  • P is the set of roules:
    P = { s0 v} { i di | i I }.
This is a type 2 grammar and is easy to see that L(G) = {w}. Unfortunately, inferring an optimal dictionary is an NP-complete problem.