Development: Z File Format

From WxWiki
Jump to navigation Jump to search

Z

Part of Developers_Notebook-Z-File Formats

Files with a .Z extension are compressed using the unix compression utility compress.

There are quite a few versions of compress out there - but the compression methods between them only differ slightly.

Compress compresses with a variant of LZW popularily known as LZC, the C standing for compress. However, LZC is in fact a derivative of LZW, and falls under certain patents.

Note that .Z may not have magic bytes, in which case they have no compression method field either. Files without magic bytes are usually compressed with the later block compression method.

Header

Offset Bytes Type Description
0 2 wxByte Magic Bytes - 1F,9D
2 1 wxByte Compression Method (See Below)

Compression Methods

Where [value] is the byte value of offset 2 of the .Z file.

[value] & 0x80 If not 0, then block compression is used
[value] & 0x1F Max bits the LZC code size can be

Normal LZC Compression (compress 2.0)

LZC compression is basically LZW with a code size. Code size is the number of bits each code takes up when sent to the decompressor. Each time the table becomes full, compress increases the code size by one until the maximum is reached (almost always 16 in compress, the starting value being 9).

Note that later compress versions account for table clear messages (code 256) from non-block-compression LZC methods. However, it is almost unheard of for older LZC versions to send table clear messages - as they generally do not do table clears at all.

Block Compression

Used by compress 4.0 and later, block compression is essentially LZC with adaptive table clears. When the compressor wants to clear the table, it sends code 256 to the decompressor, informing it to clear its LZW table. When the clear code (256 as mentioned previously is sent) the compressor clears its table completely, starting again at index 257, and resetting the code size back to the minimum (9).

Compress sends the clear code after the code size has reached its maximum, the table is full, and the compression ratio changes.

When the table is full, it calls the following function every byte or so to check the compression ratio -

 #define CHECK_GAP 10000L 
 int cl_block ()     
 {
     register long int rat;
 
     checkpoint = in_count + CHECK_GAP;
 #ifdef DEBUG
 	if ( debug ) {
         fprintf ( stderr, "count: %ld, ratio: ", in_count );
         prratio ( stderr, in_count, bytes_out );
 		fprintf ( stderr, "\n");
 	}
 #endif
 
     if(in_count > 0x007fffff) {	/* shift will overflow */
         rat = bytes_out >> 8;
         if(rat == 0)       /* Don't divide by zero */
             rat = 0x7fffffff;
         else
             rat = in_count / rat;
     }
     else
         rat = (in_count << 8) / bytes_out;  /* 8 fractional bits */
 
     if ( rat > ratio ){
         ratio = rat;
         return FALSE;
     }
     else {
         ratio = 0;
 #ifdef DEBUG
         if(debug)
     		fprintf ( stderr, "clear\n" );
 #endif
         return TRUE;    /* clear the table */
     }
     return FALSE; /* don't clear the table */
 }

Some versions of compress later than 4.0 actually remove this feature for speed purposes, and just clear the table when it becomes full.

Finally, as opposed to the normal LZW starting index value of 256, the block compress method of LZC uses 257 as the starting value at all times - starting and after clearing the table.

Remarks

References

  • Public domain compress source code (MS-DOS version - comp430s.zip)