NAME

lzx_init, lzx_compress_block, lzx_finish - LZX compression

SYNOPSIS


#include <stdint.h>
#include <lzx_compress.h>

int lzx_init( lzx_data ** lzxdp, int wsize_code , lzx_get_bytes_t get_bytes, void *get_bytes_arg , lzx_at_eof_t at_eof , lzx_put_bytes_t put_bytes, void *put_bytes_arg , lzx_mark_frame_t mark_frame, void *mark_frame_arg )

int lzx_compress_block(lzx_data *lzxd, int block_size, int subdivide)

int lzx_finish(lzx_data *lzxd, struct lzx_results *lzxr)

void lzx_reset(lzx_data *lzxd)

DESCRIPTION

The lzx_init(), lzx_compress_block(), and lzx_finish() functions comprise an compression engine for Microsoft's LZX compression format.

Initializing and releasing the LZX compressor

The lzx_init() function takes a wsize_code to indicate the log (base 2) of the window size for compression, so 15 is 32K, 16 is 64K, on up to 21 meaning 2MB. It also takes the following callback functions and their associated arguments:

int get_bytes(void *get_bytes_arg, int n, void *buf)
The lzx_compress_block() routine calls this function when it needs more uncompressed input to process. The number of bytes requested is n and the bytes should be placed in the buffer pointed to by buf. The get_bytes() function should return the number of bytes actually provided (which must not be greater than n), nor 0, except at EOF.

int at_eof(void * get_bytes_arg)
Must return 0 if the end of the input data has not been reached, positive otherwise. Note that this function takes the same argument as get_bytes().

int put_bytes(void * put_bytes_arg, int n, void * buf)
The put_bytes() callback is called by lzx_compress() when compressed bytes need to be output. The number of bytes to be output is n and the bytes are in the buffer pointed to by buf.

int mark_frame(void *mark_frame_arg, uint32_t uncomp, uint32_t comp)
The mark_frame() callback is called whenever LZX_FRAME_SIZE (0x8000) uncompressed bytes have been processed. The current (as of the last call to put_bytes() ) location in the uncompressed and compressed data streams are provided in uncomp and comp respectively. This is intended for .CHM (ITSS) and other similar files which require a "reset table" listing the frame locations. This callback is optional; if the mark_frame argument to lzx_init() is NULL, no function will be called at the end of each frame.

The lzx_init() function allocates an opaque structure, a pointer to which will be returned in lzxdp. A pointer to this structure may be passed to the other LZX compression functions. The function returns negative on error, 0 otherwise

The lzx_finish() function writes out any unflushed data, releases all memory held by the compressor (including the lzxd structure) and optionally fills in the lzx_results structure, a pointer to which is passed in as lzxr (NULL if results are not required)

Running the compressor

The lzx_compress_block() function takes the opaque pointer returned by lzx_init(), a block_size, and a flag which says whether or not to subdivide the block. If the subdivide flag is set, blocks may be subdivided to increase compression ratio based on the entropy of the data at a given point. Otherwise, just one block is created. Returns negative on error, 0 otherwise.

Note:
The block size must not be larger than the window size. While the compressor will create apparently-valid LZX files if this restriction is violated, some decompressors will not handle them.

The lzx_reset() function may be called after any block in order to reset all compression state except the number of compressed and uncompressed bytes processed. This forces the one-bit Intel preprocessing header to be output again, the Lempel-Ziv window to be cleared, and the Huffman tables to be reset to zero length. It should only be called on a frame boundary; the results of calling it elsewhere or during a callback are undefined.

To compress data, simply call lzx_compress_block() and optionally lzx_reset() repeatedly, handling the various callbacks described above, until your data is exhausted.

ERRORS

The lzx_init(), lzx_compress_block(), and lzx_finish() functions return a negative number on error.

The callbacks are intended to return a negative result on error, but this is not yet understood by the compressor.

BUGS

The compressor is currently unable to output an uncompressed block, so incompressible data may expand more than is necessary (though still not more than is permitted by the CAB standard, 6144 bytes.)

There is no well-defined set of error codes.

There is no way for the callbacks to report an error and abort the compression.

The algorithm for splitting blocks is suboptimal.

AUTHOR

Matthew T. Russotto

REFERENCES

LZXFMT.DOC -- Microsoft LZX Data Compression Format (part of Microsoft Cabinet SDK)

Comments in cabextract.c, concerning errors in LZXFMT.DOC (part of cabextract, at http://www.kyz.uklinux.net/cabextract.php3)

CHM file format documentation (http://www.speakeasy.net/~russotto/chmformat.html)

SEE ALSO

cabextract(1)