1.2. Algorithmic information theory - Kolmogorov complexity

         "The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning; that is they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem. The significant aspect is that the actual message is one selected from a set of possible messages. The system must be designed to operate for each possible selection, not just the one which will actually be chosen since this is unknown at the time of design."[1]
         Algoritmic information theory is interested in the minimum number of bits from which a particular sequence of known symbols can effectively be reconstructed; in other words the minimum computer program that generates exactlly that given sequence and halts.
         "Our definition of the quantity of information has the advantage that it refers to individual objects and not to objects treated as members of a set of objects with a probability distribution given on it. The probabilistic definition can be convincingly applied to the information contained, for example, in a stream of congratulatory telegrams. But it would not be clear how to apply it, for example, to an estimate of the quantity of information contained in a novel or in the translation of a novel into another language relative to the original. I think that the new definition is capable of introducing in similar applications of the theory at least clarity of principle."[2]
         Before trying to define formally the notion of Kolmogorov complexity we introduce first some extra definitions.
         A Turing machine(Fig. 1) is a kind of state machine. At any step the machine is in one of a finite number of states. Instructions for a Turing machine consist in specified conditions under which the machine will transition between one state and another. It has an infinite one-dimensional tape divided into cells. The tape has one end, at the left say, and stretches infinitely far to the right. Each cell is able to contain one symbol, either ‘0’ or ‘1’ (Using the alphabet {0,1} is just a convention but not a restriction). The machine has a read-write head, which at any time scanning a single cell on the tape. This read-write head can move left and right along the tape to scan successive cells.


Fig. 1. A Turing machine


         The action of a Turing machine is determined completely by:
  1. the current state of the machine;
  2. the symbol in the cell currently being scanned by the head
  3. a table of transition rules, which serve as the “program” for the machine. Transition rules are of the form: reading simbol s move machine head to left/right and change the machine state to any one of posibble machine's states.

         The machine halts when it reach some special states caled final states. (Here is a funny article about the famous Turing Machines haltin problem.)
         An universal Turing machine Uis a machine that taking as input the encoding of another machine M simulates the later (Uhalts exactly when M halts).
         Definition 1. The Kolmogorov complexity of a string s is the size in bites of the smallest Turing machine M ( as an input to an universal Turing machine U ) that generates exactly s and halts.

Infinite random sequence definition; random data are incompressible

Definition 2. An infinite random sequence is a sequence S over a finite alphabet V,such that, considering any Turing machine generating infinite sequence of indices i1 < i2 <. .. < in < ..., the distirbution probability of the sequence S[i1],S[i2],...,S[in], ... equals .
Intuitively, if we consider S is compressible, then there is a Turing Machine Mn such that Mn can generate the sequence S[0],...,S[n] and size of Mn is smaller than n. Since Mn is smaller than n, Mn should predict (compute) as least Mn - n symbols (not just output S[x] at each step x). This contradicts the above definition of randm sequences.