On Thu, Feb 6, 2014 at 5:57 PM, Amit Kapila <amit.kapil...@gmail.com> wrote: > On Thu, Feb 6, 2014 at 9:13 AM, Amit Kapila <amit.kapil...@gmail.com> wrote: > Considering above change as correct, I have tried to see the worst > case overhead for this patch by having new tuple such that after > 25% or so of suffix/prefix match, there is a small change in tuple > and kept rest of tuple same as old tuple and it shows overhead > for this patch as well. > > Updated test script is attached. > > Unpatched > testname | wal_generated | duration > ----------------------------------+---------------+------------------ > ten long fields, 8 bytes changed | 348843824 | 5.56866788864136 > ten long fields, 8 bytes changed | 348844800 | 5.84434294700623 > ten long fields, 8 bytes changed | 350500000 | 5.92329406738281 > (3 rows) > > > > wal-update-prefix-suffix-encode-1.patch > > testname | wal_generated | duration > ----------------------------------+---------------+------------------ > ten long fields, 8 bytes changed | 348845624 | 6.92243480682373 > ten long fields, 8 bytes changed | 348847000 | 8.35828399658203 > ten long fields, 8 bytes changed | 350204752 | 7.61826491355896 > (3 rows) > > One minor point, can we avoid having prefix tag if prefixlen is 0. > > + /* output prefix as a tag */ > + pgrb_out_tag(ctrlp, ctrlb, ctrl, bp, prefixlen, hlen);
I think generating out tag for suffix/prefix has one bug i.e it doesn't consider the max length of 273 bytes (PGLZ_MAX_MATCH ) which is mandatory for LZ format. One more point about this patch is that in function pgrb_delta_encode(), is it mandatory to return at end in below check: if (result_size > result_max) return false; I mean to say as before starting for copying literal bytes we have already ensured that the compressed data is greater than >25%, so may be we can avoid this check. I have tried to take the data by removing this check and found that it reduces overhead and improves WAL reduction as well. The data is as below (compare this with data in above mail for unpatched version data): testname | wal_generated | duration ----------------------------------+---------------+------------------ ten long fields, 8 bytes changed | 300705552 | 6.51416897773743 ten long fields, 8 bytes changed | 300703816 | 6.85267090797424 ten long fields, 8 bytes changed | 300701840 | 7.15832996368408 (3 rows) If we want to go with this approach, then I think apart from above points there is no major change required (may be some comments, function names etc. can be improved). > But if we really just want to do prefix/suffix compression, this is a > crappy and expensive way to do it. We needn't force everything > through the pglz tag format just because we elide a common prefix or > suffix. Here are you bothered about below code where the patch is doing byte-by-byte copy after prefix/suffix match? /* output bytes between prefix and suffix as literals */ dp = &source[prefixlen]; dend = &source[slen - suffixlen]; while (dp < dend) { pgrb_out_literal(ctrlp, ctrlb, ctrl, bp, *dp); dp++; /* Do not do this ++ in the line above! */ } I think if we want to change LZ format, it will be bit more work and verification for decoding has to be done much more strenuously. Note - During performance test, I have focussed mainly on worst case, because we already know that this idea is good for best and average cases. However if we decide that this is better and good to proceed, I can take the data for other cases as well. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers