Re: libpq compression

Robert Haas Tue, 22 Dec 2020 09:43:21 -0800

On Mon, Dec 14, 2020 at 12:53 PM Daniil Zakhlystov
<usernam...@yandex-team.ru> wrote:
> > On Dec 10, 2020, at 1:39 AM, Robert Haas <robertmh...@gmail.com> wrote:
> > Good points. I guess you need to arrange to "flush" at the compression
> > layer as well as the libpq layer so that you don't end up with data
> > stuck in the compression buffers.
>
> I think that “flushing” the libpq and compression buffers before setting the 
> new compression method will help to solve issues only at the compressing 
> (sender) side
> but won't help much on the decompressing (receiver) side.


Hmm, I assumed that if the compression buffers were flushed on the
sending side, and if all the data produced on the sending side were
transmitted to the receiver, the receiving side would then return
everything up to the point of the flush. However, now that I think
about it, there's no guarantee that any particular compression library
would actually behave that way. I wonder what actually happens in
practice with the libraries we care about?

> This may help to solve the above issue. For example, we may introduce the 
> CompressedData message:
>
> CompressedData (F & B)
>
> Byte1(‘m’) // I am not so sure about the ‘m’ identifier :)
> Identifies the message as compressed data.
>
> Int32
> Length of message contents in bytes, including self.
>
> Byten
> Data that forms part of a compressed data stream.
>
> Basically, it wraps some chunk of compressed data (like the CopyData message).
>
> On the sender side, the compressor will wrap all outgoing message chunks into 
> the CopyData messages.
>
> On the receiver side, some intermediate component between the secure_read and 
> the decompressor will do the following:
> 1. Read the next 5 bytes (type and length) from the buffer
> 2.1 If the message type is other than CompressedData, forward it straight to 
> the PqRecvBuffer /  conn->inBuffer.
> 2.2 If the message type is CompressedData, forward its contents to the 
> current decompressor.
>
> What do you think of this approach?

I'm not sure about the details, but the general idea seems like it
might be worth considering. If we choose a compression method that is
intended for streaming compression and decompression and whose library
handles compression flushes sensibly, then we might not really need to
go this way to make it work. But, on the other hand, this method has a
certain elegance that just compressing everything lacks, and might
allow some useful flexibility. On the third hand, restarting
compression for every new set of messages might really hurt the
compression ratio in some scenarios. I'm not sure what is best.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: libpq compression

Reply via email to