Hi,

I've spent a bit more time on this, mostly running tests to get a better
idea of the practical benefits.

Firstly, I think there's a bug in ReorderBufferCompress() - it's legal
for pglz_compress() to return -1. This can happen if the data is not
compressible, and would not fit into the output buffer. The code can't
just do elog(ERROR) in this case, it needs to handle that by storing the
raw data. The attached fixup patch makes this work for me - I'm not
claiming this is the best way to handle this, but it works.

FWIW I find it strange the tests included in the patch did not trigger
this. That probably means the tests are not quite sufficient.


Now, to the testing. Attached are two scripts, testing different cases:

test-columns.sh - Table with a variable number of 'float8' columns.

test-toast.sh - Table with a single text column.

The script always sets up a publication/subscription on two instances,
generates certain amount of data (~1GB for columns, ~3.2GB for TOAST),
waits for it to be replicated to the replica, and measures how much data
was spilled to disk with the different compression methods (off, pglz
and lz4). There's a couple more metrics, but that's irrelevant here.

For the "column" test, it looks like this (this is in MB):

    rows    columns    distribution    off     pglz    lz4
  ========================================================
  100000       1000    compressible    778      20       9
                             random    778     778      16
  --------------------------------------------------------
  1000000       100    compressible    916     116      62
                             random    916     916      67

It's very clear that for the "compressible" data (which just copies the
same value into all columns), both pglz and lz4 can significantly reduce
the amount of data. For 1000 columns it's 780MB -> 20MB/9MB, for 100
columns it's a bit less efficient, but still good.

For the "random" data (where every column gets a random value, but rows
are copied), it's a very different story - pglz does not help at all,
while lz4 still massively reduces the amount of spilled data.

I think the explanation is very simple - for pglz, we compress each row
on it's own, there's no concept of streaming/context. If a row is
compressible, it works fine, but when the row gets random, pglz can't
compress it at all. For lz4, this does not matter, because with the
streaming mode it still sees that rows are just repeated, and so can
compress them efficiently.

For TOAST test, the results look like this:

  distribution     repeats        toast       off    pglz     lz4
  ===============================================================
  compressible       10000        lz4          14       2       1
                                  pglz         40       4       3
                      1000        lz4          32      16       9
                                  pglz         54      17      10
        ---------------------------------------------------------
        random       10000        lz4        3305    3305    3157
                                  pglz       3305    3305    3157
                      1000        lz4        3166    3162    1580
                                  pglz       3334    3326    1745
       ----------------------------------------------------------
       random2       10000        lz4        3305    3305    3157
                                  pglz       3305    3305    3158
                      1000        lz4        3160    3156    3010
                                  pglz       3334    3326    3172

The "repeats" value means how long the string is - it's the number of
"md5" hashes added to the string. The number of rows is calculated to
keep the total amount of data the same. The "toast" column tracks what
compression was used for TOAST, I was wondering if it matters.

This time there are three data distributions - compressible means that
each TOAST value is nicely compressible, "random" means each value is
random (not compressible), but the rows are just copy of the same value
(so on the whole there's a lot of redundancy). And "random2" means each
row is random and unique (so not compressible at all).

The table shows that with compressible TOAST values, compressing the
spill file is rather useless. The reason is that ReorderBufferCompress
is handling raw TOAST data, which is already compressed. Yes, it may
further reduce the amount of data, but it's negligible when compared to
the original amount of data.

For the random cases, the spill compression is rather pointless. Yes,
lz4 can reduce it to 1/2 for the shorter strings, but other than that
it's not very useful.

For a while I was thinking this approach is flawed, because it only sees
and compressed changes one by one, and that seeing a batch of changes
would improve this (e.g. we'd see the copied rows). But I realized lz4
already does that (in the streaming mode at least), and yet it does not
help very much. Presumably that depends on how large the context is. If
the random string is long enough, it won't help.

So maybe this approach is fine, and doing the compression at a lower
layer (for the whole file), would not really improve this. Even then
we'd only see a limited amount of data.

Maybe the right answer to this is that compression does not help cases
where most of the replicated data is TOAST, and that it can help cases
with wide (and redundant) rows, or repeated rows. And that lz4 is a
clearly superior choice. (This also raises the question if we want to
support REORDER_BUFFER_STRAT_LZ4_REGULAR. I haven't looked into this,
but doesn't that behave more like pglz, i.e. no context?)

FWIW when doing these tests, it made me realize how useful would it be
to track both the "raw" and "spilled" amounts. That is before/after
compression. It'd make calculating compression ratio much easier.


regards

-- 
Tomas Vondra

Attachment: test-toast.sh
Description: application/shellscript

Attachment: test-columns.sh
Description: application/shellscript

1727083455 1727083494 64kB off 100000 1000 random 39 816000000 2000000000 1926788895
1727083508 1727083555 64kB pglz 100000 1000 random 47 816000000 2000000000 912987792
1727083593 1727083630 64kB lz4 100000 1000 random 37 16555784 2000000000 14425752
1727083648 1727083685 64kB off 100000 1000 compressible 37 816000000 2000000000 1800588895
1727083699 1727083733 64kB pglz 100000 1000 compressible 34 20665177 2000000000 12547360
1727083745 1727083783 64kB lz4 100000 1000 compressible 38 9665751 2000000000 8318691
1727083801 1727083838 512kB off 100000 1000 random 37 816000000 2000000000 1926188895
1727083853 1727083897 512kB pglz 100000 1000 random 44 816000000 2000000000 331066026
1727083918 1727083955 512kB lz4 100000 1000 random 37 16548612 2000000000 14472439
1727083973 1727084005 512kB off 100000 1000 compressible 32 816000000 2000000000 1900588895
1727084019 1727084053 512kB pglz 100000 1000 compressible 34 20799851 2000000000 12544310
1727084064 1727084096 512kB lz4 100000 1000 compressible 32 9828681 2000000000 8318693
1727084114 1727084151 4MB off 100000 1000 random 37 816000000 2000000000 1927088895
1727084166 1727084211 4MB pglz 100000 1000 random 45 816000000 2000000000 332679062
1727084231 1727084268 4MB lz4 100000 1000 random 37 16720988 2000000000 14425897
1727084285 1727084318 4MB off 100000 1000 compressible 33 816000000 2000000000 1900588895
1727084332 1727084366 4MB pglz 100000 1000 compressible 34 20848682 2000000000 12540369
1727084377 1727084410 4MB lz4 100000 1000 compressible 33 9808910 2000000000 8318690
1727084427 1727084463 64kB off 1000000 100 random 36 960000000 2000000000 1939888896
1727084476 1727084519 64kB pglz 1000000 100 random 43 960000000 2000000000 12442367
1727084531 1727084566 64kB lz4 1000000 100 random 35 69792488 2000000000 12905812
1727084584 1727084617 64kB off 1000000 100 compressible 33 960000000 2000000000 1906888896
1727084629 1727084666 64kB pglz 1000000 100 compressible 37 124849430 2000000000 12385533
1727084679 1727084710 64kB lz4 1000000 100 compressible 31 65217839 2000000000 12336481
1727084727 1727084762 512kB off 1000000 100 random 35 960000000 2000000000 1934888896
1727084775 1727084818 512kB pglz 1000000 100 random 43 960000000 2000000000 12413439
1727084830 1727084865 512kB lz4 1000000 100 random 35 70166953 2000000000 12890598
1727084883 1727084914 512kB off 1000000 100 compressible 31 960000000 2000000000 1906888896
1727084926 1727084964 512kB pglz 1000000 100 compressible 38 124672339 2000000000 12261806
1727084975 1727085006 512kB lz4 1000000 100 compressible 31 65499058 2000000000 12337511
1727085023 1727085057 4MB off 1000000 100 random 34 960000000 2000000000 1935888896
1727085070 1727085113 4MB pglz 1000000 100 random 43 960000000 2000000000 12277513
1727085125 1727085159 4MB lz4 1000000 100 random 34 70063518 2000000000 12903008
1727085177 1727085210 4MB off 1000000 100 compressible 33 960000000 2000000000 1906888896
1727085222 1727085260 4MB pglz 1000000 100 compressible 38 121806968 2000000000 12385491
1727085273 1727085304 4MB lz4 1000000 100 compressible 31 65511687 2000000000 12336249
1727101381 1727101685 4MB pglz off 100000 1000 random2 304 3496202728 3200000000 3200688890
1727101698 1727101997 4MB pglz pglz 100000 1000 random2 299 3487810007 3200000000 1869238511
1727102065 1727102368 4MB pglz lz4 100000 1000 random2 303 3326043497 3200000000 3200691961
1727102385 1727102562 4MB pglz off 100000 1000 random 177 3496200000 3200000000 3200688895
1727102576 1727102755 4MB pglz pglz 100000 1000 random 179 3488055924 3200000000 583756849
1727102788 1727102965 4MB pglz lz4 100000 1000 random 177 1829510657 3200000000 40619616
1727102977 1727103025 4MB pglz off 100000 1000 compressible 48 56898772 3200000000 3200688895
1727103034 1727103082 4MB pglz pglz 100000 1000 compressible 48 17316243 3200000000 23016359
1727103097 1727103146 4MB pglz lz4 100000 1000 compressible 49 10356903 3200000000 16764329
1727103152 1727103394 4MB lz4 off 100000 1000 random2 242 3313932520 3200000000 3200688890
1727103408 1727103634 4MB lz4 pglz 100000 1000 random2 226 3309284822 3200000000 1869238511
1727103701 1727103930 4MB lz4 lz4 100000 1000 random2 229 3156562298 3200000000 3200691961
1727103949 1727104066 4MB lz4 off 100000 1000 random 117 3319600000 3200000000 3200688895
1727104079 1727104216 4MB lz4 pglz 100000 1000 random 137 3315267470 3200000000 583756849
1727104250 1727104363 4MB lz4 lz4 100000 1000 random 113 1656772933 3200000000 40619616
1727104375 1727104402 4MB lz4 off 100000 1000 compressible 27 33199647 3200000000 3200688895
1727104410 1727104439 4MB lz4 pglz 100000 1000 compressible 29 16596895 3200000000 23016359
1727104455 1727104483 4MB lz4 lz4 100000 1000 compressible 28 9704302 3200000000 16764329
1727104489 1727104790 4MB pglz off 10000 10000 random2 301 3465780000 3200000000 3200058890
1727104804 1727105098 4MB pglz pglz 10000 10000 random2 294 3465349151 3200000000 1868576351
1727105165 1727105472 4MB pglz lz4 10000 10000 random2 307 3310941656 3200000000 3200061957
1727105492 1727105684 4MB pglz off 10000 10000 random 192 3465780000 3200000000 3200058894
1727105698 1727105892 4MB pglz pglz 10000 10000 random 194 3465349035 3200000000 1869203336
1727105961 1727106151 4MB pglz lz4 10000 10000 random 190 3310342175 3200000000 3200061961
1727106170 1727106220 4MB pglz off 10000 10000 compressible 50 42069867 3200000000 3200058894
1727106231 1727106281 4MB pglz pglz 10000 10000 compressible 50 4676993 3200000000 20451166
1727106299 1727106349 4MB pglz lz4 10000 10000 compressible 50 2625791 3200000000 13004334
1727106358 1727106584 4MB lz4 off 10000 10000 random2 226 3465780000 3200000000 3200058890
1727106598 1727106818 4MB lz4 pglz 10000 10000 random2 220 3465339285 3200000000 1868576351
1727106887 1727107110 4MB lz4 lz4 10000 10000 random2 223 3310848431 3200000000 3200061957
1727107132 1727107256 4MB lz4 off 10000 10000 random 124 3465780000 3200000000 3200058894
1727107269 1727107438 4MB lz4 pglz 10000 10000 random 169 3465339057 3200000000 1869203336
1727107505 1727107619 4MB lz4 lz4 10000 10000 random 114 3310290984 3200000000 3200061961
1727107640 1727107674 4MB lz4 off 10000 10000 compressible 34 14609994 3200000000 3200058894
1727107685 1727107716 4MB lz4 pglz 10000 10000 compressible 31 1758403 3200000000 20451166
1727107734 1727107765 4MB lz4 lz4 10000 10000 compressible 31 1021255 3200000000 13004334
From c0633fa03e7eefdf4bc5ab6f6608fe51368272a6 Mon Sep 17 00:00:00 2001
From: tomas <tomas>
Date: Wed, 18 Sep 2024 19:39:52 +0200
Subject: [PATCH] compression fixup

---
 .../logical/reorderbuffer_compression.c       | 40 +++++++++++++------
 1 file changed, 27 insertions(+), 13 deletions(-)

diff --git a/src/backend/replication/logical/reorderbuffer_compression.c b/src/backend/replication/logical/reorderbuffer_compression.c
index 6d19c60b60..6301ccc932 100644
--- a/src/backend/replication/logical/reorderbuffer_compression.c
+++ b/src/backend/replication/logical/reorderbuffer_compression.c
@@ -781,24 +781,38 @@ ReorderBufferCompress(ReorderBuffer *rb, ReorderBufferDiskHeader **header,
 				dst = (char *) palloc0(max_size);
 				dst_size = pglz_compress(src, src_size, dst, PGLZ_strategy_always);
 
-				if (dst_size < 0)
-					ereport(ERROR,
-							(errcode(ERRCODE_DATA_CORRUPTED),
-							 errmsg_internal("PGLZ compression failed")));
+				/*
+				 * If compression succeeded, build the proper compression header. If
+				 * compression fails, it means the data is not compressible. In that
+				 * case just build a no-compress item.
+				 */
+				if (dst_size > 0)		/* compressible */
+				{
+					ReorderBufferReserve(rb, (Size) (dst_size + sizeof(ReorderBufferDiskHeader)));
 
-				ReorderBufferReserve(rb, (Size) (dst_size + sizeof(ReorderBufferDiskHeader)));
+					hdr = (ReorderBufferDiskHeader *) rb->outbuf;
+					hdr->comp_strat = REORDER_BUFFER_STRAT_PGLZ;
+					hdr->size = (Size) dst_size + sizeof(ReorderBufferDiskHeader);
+					hdr->raw_size = (Size) src_size;
 
-				hdr = (ReorderBufferDiskHeader *) rb->outbuf;
-				hdr->comp_strat = REORDER_BUFFER_STRAT_PGLZ;
-				hdr->size = (Size) dst_size + sizeof(ReorderBufferDiskHeader);
-				hdr->raw_size = (Size) src_size;
+					/* Copy back compressed data into the ReorderBuffer */
+					memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskHeader), dst,
+						   dst_size);
+				}
+				else					/* not compressible */
+				{
+					hdr = (ReorderBufferDiskHeader *) rb->outbuf;
+					hdr->comp_strat = REORDER_BUFFER_STRAT_UNCOMPRESSED;
+					hdr->size = (Size) src_size + sizeof(ReorderBufferDiskHeader);
+					hdr->raw_size = (Size) src_size;
+
+					/* Copy back compressed data into the ReorderBuffer */
+					memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskHeader), src,
+						   src_size);
+				}
 
 				*header = hdr;
 
-				/* Copy back compressed data into the ReorderBuffer */
-				memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskHeader), dst,
-					   dst_size);
-
 				pfree(dst);
 
 				break;
-- 
2.39.2

Reply via email to