Re: BIO_f_zlib() / gzip Format

Darryl Miles Wed, 26 May 2010 03:08:59 -0700

I think the FAQ point is trying to highlight that the GZIP format as-iswas designed for single file compression (a "compress" replacement). Sotherefore the extra tiny header at the start of the GZIP data that youfind in *.gz files is not necessary for zlib and streaming compressors.Also since a streaming compressor might not have an endpoint and inmany application the use of checksum is not required and would increasethe data length (defeating the point of a streaming compressor) theydecided to remove that from each chunk sent.



You can observe the changes to the filename in the header by:

gzip somefile

compared to:

cat somefile | gzip > somefile.gz

Compare the two resulting files, the differences are in the filenameencoded into this GZIP header, but this header actually has nothing todo with the compression algorithm. It is like a small piece of datatacked onto the front of the data, it has a magic number in it to aidformat detection.

The important point to remember is the common ground is the compressionalgorithm. zlib is the reuse of the mathematical algorithm used in GZIPbut adapted for streaming compression use.

I have to now ask, how are you using the raw/original/verbatim GZIPsingle file compression algorithm with SSL ? Who has somehow boltedthat in without using zlib ? You might consider zlib to be the defactostandard for how to apply the gzip algorithm to a stream.

There are other matters that zlib addresses such as ensuring a way toforce a symbol flush on any arbitrary bit boundary. That is a LZW likecompressed stream usually ends up as a bunch of odd-sized symbols (5bit, 6 bit, 7 bit, etc... upto maybe 15 bit) i.e. not the nice modulus 8bits that computers need to send over the network. So any streamingcompressor needs the ability to flush the data to the sender, often aspecial reserved symbol number is used followed by zero of more bits ofpadding (to make it into a nice modulus 8 bit length). This is the kindof thing zlib adds to the stream that is not catered for in the plaincompression algorithm.

Since it is using a symbol to do it, it can actually be performed in acompatible way, you just reserve a symbol value for this purpose.



Darryl
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    openssl-users@openssl.org
Automated List Manager                           majord...@openssl.org

Re: BIO_f_zlib() / gzip Format

Reply via email to