Re: libpq compression

Daniil Zakhlystov Mon, 14 Dec 2020 09:54:23 -0800

> On Dec 10, 2020, at 1:39 AM, Robert Haas <robertmh...@gmail.com> wrote:
> 
> I still think this is excessively baroque and basically useless.
> Nobody wants to allow compression levels 1, 3, and 5 but disallow 2
> and 4. At the very most, somebody might want to start a maximum or
> minimum level. But even that I think is pretty pointless. Check out
> the "Decompression Time" and "Decompression Speed" sections from this
> link:
> 
> https://www.rootusers.com/gzip-vs-bzip2-vs-xz-performance-comparison/
> 
> This shows that decompression time and speed is basically independent
> of compression method for all three of these compressors; to the
> extent that there is a difference, higher compression levels are
> generally slightly faster to decompress. I don't really see the
> argument for letting either side be proscriptive here. Deciding with
> algorithms you're willing to accept is totally reasonable since
> different things may be supported, security concerns, etc. but
> deciding you're only willing to accept certain levels seems unuseful.
> It's also unenforceable, I think, since the receiving side has no way
> of knowing what the sender actually did.

I agree that decompression time and speed are basically the same for different 
compression ratios for most algorithms.
But it seems like that this may not be true for memory usage.

Check out these links: http://mattmahoney.net/dc/text.html and 
https://community.centminmod.com/threads/round-4-compression-comparison-benchmarks-zstd-vs-brotli-vs-pigz-vs-bzip2-vs-xz-etc.18669/

According to these sources, zstd uses significantly more memory while 
decompressing the data which has been compressed with high compression ratios.

So I’ll test the different ZSTD compression ratios with the current version of 
the patch and post the results later this week.

> On Dec 10, 2020, at 1:39 AM, Robert Haas <robertmh...@gmail.com> wrote:
> 
> 
> Good points. I guess you need to arrange to "flush" at the compression
> layer as well as the libpq layer so that you don't end up with data
> stuck in the compression buffers.

I think that “flushing” the libpq and compression buffers before setting the 
new compression method will help to solve issues only at the compressing 
(sender) side
but won't help much on the decompressing (receiver) side.

In the current version of the patch, the decompressor acts as a proxy between 
secure_read and PqRecvBuffer /  conn->inBuffer. It is unaware of the Postgres 
protocol and 
will fail to do anything other than decompressing the bytes received from the 
secure_read function and appending them to the PqRecvBuffer.
So the problem is that we can’t decouple the compressed bytes from the 
uncompressed ones (actually ZSTD detects the compressed block end, but some 
other algorithms don’t).

We may introduce some hinges to control the decompressor behavior from the 
underlying levels after reading the SetCompressionMethod message
from PqRecvBuffer, but I don’t think that it is the correct approach.

> On Dec 10, 2020, at 1:39 AM, Robert Haas <robertmh...@gmail.com> wrote:
> 
> Another idea is that you could have a new message type that says "hey,
> the payload of this is 1 or more compressed messages." It uses the
> most-recently set compression method. This would make switching
> compression methods easier since the SetCompressionMethod message
> itself could always be sent uncompressed and/or not take effect until
> the next compressed message. It also allows for a prudential decision
> not to bother compressing messages that are short anyway, which might
> be useful. On the downside it adds a little bit of overhead. Andres
> was telling me on a call that he liked this approach; I'm not sure if
> it's actually best, but have you considered this sort of approach?

This may help to solve the above issue. For example, we may introduce the 
CompressedData message:

CompressedData (F & B) 

Byte1(‘m’) // I am not so sure about the ‘m’ identifier :)
Identifies the message as compressed data. 

Int32 
Length of message contents in bytes, including self. 

Byten
Data that forms part of a compressed data stream.

Basically, it wraps some chunk of compressed data (like the CopyData message).

On the sender side, the compressor will wrap all outgoing message chunks into 
the CopyData messages.

On the receiver side, some intermediate component between the secure_read and 
the decompressor will do the following:
1. Read the next 5 bytes (type and length) from the buffer 
2.1 If the message type is other than CompressedData, forward it straight to 
the PqRecvBuffer /  conn->inBuffer.
2.2 If the message type is CompressedData, forward its contents to the current 
decompressor.

What do you think of this approach?

—
Daniil Zakhlystov

Re: libpq compression

Reply via email to