On Mon, Dec 14, 2020 at 12:53 PM Daniil Zakhlystov <usernam...@yandex-team.ru> wrote: > > On Dec 10, 2020, at 1:39 AM, Robert Haas <robertmh...@gmail.com> wrote: > > Good points. I guess you need to arrange to "flush" at the compression > > layer as well as the libpq layer so that you don't end up with data > > stuck in the compression buffers. > > I think that “flushing” the libpq and compression buffers before setting the > new compression method will help to solve issues only at the compressing > (sender) side > but won't help much on the decompressing (receiver) side.
Hmm, I assumed that if the compression buffers were flushed on the sending side, and if all the data produced on the sending side were transmitted to the receiver, the receiving side would then return everything up to the point of the flush. However, now that I think about it, there's no guarantee that any particular compression library would actually behave that way. I wonder what actually happens in practice with the libraries we care about? > This may help to solve the above issue. For example, we may introduce the > CompressedData message: > > CompressedData (F & B) > > Byte1(‘m’) // I am not so sure about the ‘m’ identifier :) > Identifies the message as compressed data. > > Int32 > Length of message contents in bytes, including self. > > Byten > Data that forms part of a compressed data stream. > > Basically, it wraps some chunk of compressed data (like the CopyData message). > > On the sender side, the compressor will wrap all outgoing message chunks into > the CopyData messages. > > On the receiver side, some intermediate component between the secure_read and > the decompressor will do the following: > 1. Read the next 5 bytes (type and length) from the buffer > 2.1 If the message type is other than CompressedData, forward it straight to > the PqRecvBuffer / conn->inBuffer. > 2.2 If the message type is CompressedData, forward its contents to the > current decompressor. > > What do you think of this approach? I'm not sure about the details, but the general idea seems like it might be worth considering. If we choose a compression method that is intended for streaming compression and decompression and whose library handles compression flushes sensibly, then we might not really need to go this way to make it work. But, on the other hand, this method has a certain elegance that just compressing everything lacks, and might allow some useful flexibility. On the third hand, restarting compression for every new set of messages might really hurt the compression ratio in some scenarios. I'm not sure what is best. -- Robert Haas EDB: http://www.enterprisedb.com