On Thu, Feb 13, 2014 at 10:07 AM, Claudio Freire <klaussfre...@gmail.com> wrote: > On Thu, Feb 13, 2014 at 1:20 AM, Amit Kapila <amit.kapil...@gmail.com> wrote: >> Here one of the improvements which can be done is that after prefix-suffix >> match, instead of going byte-by-byte copy as per LZ format we can directly >> copy all the remaining part of tuple but I think that would require us to use >> some different format than LZ which is also not too difficult to do, but the >> question is do we really need such a change to handle the above kind of >> worst case. > > > Why use LZ at all?
We are just using LZ *format* to represent compressed string. Just copied some text from pg_lzcompress.c, to explain what exactly we are using "the first byte after the header tells what to do the next 8 times. We call this the control byte. An unset bit in the control byte means, that one uncompressed byte follows, which is copied from input to output. A set bit in the control byte means, that a tag of 2-3 bytes follows. A tag contains information to copy some bytes, that are already in the output buffer, to the current location in the output." > Why not *only* prefix/suffix? To represent prefix/suffix match, we atleast need a way to tell that the offset and len of matched bytes and then how much is the length of unmatched bytes we have copied. I agree that a simpler format could be devised if we just want to do prefix-suffix match, but that would require much more test during recovery to ensure everything is fine, advantage with LZ format is that we don't need to bother about decoding, it will work as without any much change in LZ decode routine. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers