Experience indicates that CLs of zero are very common and tend to have long runs. (Recall that the codes in question are codes of literals/lengths and distances. Any given data file to be compressed may be missing many literals, lengths, and distances.) This is why runs of zeros are assigned the two special flags 17 and 18. A flag of 17 is followed by a 3-bit repetition factor that indicates 3–10 repetitions of CL 0. Flag 18 is followed by a 7-bit repetition factor that indicates 11–138 repetitions of CL 0. Thus, six consecutive zeros in a sequence of CLs are compressed to 17, 0112, and 12 consecutive zeros in an SQ are compressed to 18, 00000012.
   The sequence of CLs is compressed in this way to a shorter sequence (to be termed SSQ) of integers in the interval [0, 18]. The sequence of CLs is originally composed of 256 CLs corresponding to the literals, 1 CL for the end-of-block, 29 CLs corresponding to match lengths and finally 30 CLs for the match distances.

     Literals  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
               0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
               0, 5, 5, 3, 4, 4, 4, 4, 3, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 0, 0, 0, 0, 7, 7, 7, 7, 0, 0, 0, 0, 0,
               0, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 0, 0, 0, 0, 0, 0, 0,
               0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
               0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
               0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
               0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 End-of-block  7,
      Lengths  5, 5, 5, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    Distances  5, 2, 5, 5, 2, 4, 4, 3, 5, 0, 4, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.

The trailing zeros are removed from lengths and distances CLs lists, header parameters HLIT and HDIST are set accordingly. The final part of the CLs sequence is consequently reduced to:

 Lengths/Distances  5, 5, 5, 5, 5, 2, 5, 5, 2, 4, 4, 3, 5, 0, 4, 4.

The whole CLs sequence is compressed to the 37-number SSQ:

    (17, 1112), 7, (18, 01010112), 5, 5, 3, 4, (16, 002), 3, 7, (16, 112), (16, 002), (17, 0012), 7, (16, 002),
    (17, 0112), 6, (16, 112), (16, 002), (17, 1112), 7, (16, 002), (18, 11111002), 7, 5, (16, 012), 2, 5, 5, 2,
    4, 4, 3, 5, 0, 4, 4.


   Step 2. Prepare Huffman codes for the SSQ in order to compress further. Our example SSQ contains the following numbers (with their frequencies in parentheses): 0(1), 2(2), 3(3), 4(5), 5(6), 6(1), 7(5), 16(8), 17(4), 18(2). Its initial and standard Huffman trees are shown in Figure 6.42a,b. The standard tree can be represented by the sequence of ten lengths 5, 4, 4, 3, 3, 5, 3, 2, 3, and 4. These are the lengths of the Huffman codes assigned to the ten numbers 0, 2, 3, 4, 5, 6, 7, 16, 17, and 18, respectively.

   Step 3. This sequence of ten lengths is now extended to 19 numbers by inserting zeros in the positions that correspond to unused CCLs.

   Position:    0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
   CCL:         5   0   4   4   3   3   5   3   0   0   0   0   0   0   0   0   2   3   4

Next, the 19 CCLs are permuted according to

   Position:   16  17  18   0   8   7   9   6  10   5  11   4  12   3  13   2  14   1  15
   CCL:         2   3   4   5   0   3   0   5   0   3   0   3   0   4   0   4   0   0   0

The reason for the permutation is to end up with a sequence of 19 CCLs that's likely to have trailing zeros. The sequence of 19 CCLs minus the trailing zeros is written on the compressed stream, preceded by its actual length, which can be between 4 and 19. Each CCL is written as a 3-bit number. In our example, there are three trailing zeros, so the 16-number sequence 2, 3, 4, 5, 0, 3, 0, 5, 0, 3, 0, 3, 0, 4, 0, 4 is written on the compressed stream followed by the SSQ as the final, compressed code of both prefix-code tables.

Two Huffman Trees for Code Lengths

          Figure 6.42: Two Huffman Trees for Code Lengths.

   A reader finally reaching this point (sweating profusely with such deep concentration on so many details) may respond with the single word “insane.” This scheme of Phil Katz for compressing the two prefix-code tables per block is devilishly complex and hard to follow, but it works!
   The format of a block in mode 3 is as follows:
   1. The 3-bit header 010 or 110.
   2. A 5-bit parameter HLIT indicating the number of codes in the literal/length code table. This table has codes 0–255 for the literals, code 256 for end-of-block, and the 29 codes 257–285 for the lengths. Some of the 29 length codes may be missing, so this parameter indicates how many of the length codes actually exist in the table.
   3. A 5-bit parameter HDIST indicating the size of the code table for distances. There are 30 codes in this table, but some may be missing.
   4. A 4-bit parameter HCLEN indicating the number of CCLs (there may be between 4 and 19 CCLs).
   5. A sequence of HCLEN + 4 CCLs, each a 3-bit number.
   6. A SSQ of HLIT + 257 + HDIST + 1 CLs for the literal/length code table and the distance code table compressed as explained earlier, special flags 16, 17 and 18 are directly followed by their repetition factor written respectively as 2, 3, and 7-bit numbers.
   7. The compressed data, encoded with the two prefix-code tables.
   8. The end-of-block code (the prefix code of edoc 256).