Modern "multimedia" vectorized hardware instructions can speed deflate().
For higher-end x86* CPUs the speedup might be 2% to 3% of total CPU time.
On a slower CPU, or with a compiler plus instruction decoder that suffer
longer latency after a branch (such as gcc for some PowerPC chips)
then the im
My kneejerk reaction is that (1) performance improvements like
this should be put into zlib, as opposed to gzip proper,
and that (2) gzip should be changed to use zlib instead of having
an independent version of the compression algorithm. Does this
sound feasible to you? I can look into (2) at so