On 15.02.2019 18:26, Tomas Vondra wrote:
On 2/15/19 3:03 PM, Konstantin Knizhnik wrote:
On 15.02.2019 15:42, Peter Eisentraut wrote:
On 2018-06-19 09:54, Konstantin Knizhnik wrote:
The main drawback of streaming compression is that you can not
decompress some particular message without decompression of all previous
messages.
It seems this would have an adverse effect on protocol-aware connection
proxies: They would have to uncompress everything coming in and
recompress everything going out.
The alternative of compressing each packet individually would work much
better: A connection proxy could peek into the packet header and only
uncompress the (few, small) packets that it needs for state and routing.
Individual compression of each message depreciate all idea of libpq
compression. Messages are two small to efficiently compress each of
them separately. So using streaming compression algorithm is
absolutely necessary here.
Hmmm, I see Peter was talking about "packets" while you're talking about
"messages". Are you talking about the same thing?
Sorry, but there are no "packet" in libpq protocol, so I assumed that
packet=message.
In any protocol-aware proxy has to proceed each message.
Anyway, I was going to write about the same thing - that per-message
compression would likely eliminate most of the benefits - but I'm
wondering if it's actually true. That is, how much will the compression
ratio drop if we compress individual messages?
Compression of small messages without shared dictionary will give awful
results.
Assume that average record and so message size is 100 bytes.
Just perform very simple experiment create file with 100 equal
characters and try to compress it.
With zlib result will be 173 bytes. So after "compression" size of file
is increase 1.7 times.
This is why there is no other way to efficiently compress libpq traffic
without usage of streaming compression
(when dictionary isĀ shared and updated for all messages).
Obviously, if there are just tiny messages, it might easily eliminate
any benefits (and in fact it would add overhead). But I'd say we're way
more interested in transferring large data sets (result sets, data for
copy, etc.) and presumably those messages are much larger. So maybe we
could compress just those, somehow?
Please notice that copy stream consists of individual messages for each
record.
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company