On Mon, Sep 30, 2019 at 09:20:22PM +0500, Andrey Borodin wrote:


30 сент. 2019 г., в 20:56, Tomas Vondra <tomas.von...@2ndquadrant.com> 
написал(а):

I mean this:

  /*
   * Use int64 to prevent overflow during calculation.
   */
  compressed_size = (int32) ((int64) rawsize * 9 + 8) / 8;

I'm not very familiar with pglz internals, but I'm a bit puzzled by
this. My first instinct was to compare it to this:

  #define PGLZ_MAX_OUTPUT(_dlen)        ((_dlen) + 4)

but clearly that's a very different (much simpler) formula. So why
shouldn't pglz_maximum_compressed_size simply use this macro?


compressed_size accounts for possible increase of size during
compression. pglz can consume up to 1 control byte for each 8 bytes of
data in worst case.

OK, but does that actually translate in to the formula? We essentially
need to count 8-byte chunks in raw data, and multiply that by 9. Which
gives us something like

  nchunks = ((rawsize + 7) / 8) * 9;

which is not quite what the patch does.

Even if whole data is compressed well - there can be prefix compressed
extremely ineffectively. Thus, if you are going to decompress rawsize
bytes, you need at most compressed_size bytes of compressed input.

OK, that explains why we can't use the PGLZ_MAX_OUTPUT macro.

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply via email to