On 20/08/2019 05:04, Justin Pryzby wrote:
it looks like zedstore
    with lz4 gets ~4.6x for our largest customer's largest table.  zfs using
    compress=gzip-1 gives 6x compression across all their partitioned
tables,
    and I'm surprised it beats zedstore .

I did a quick test, with 10 million random IP addresses, in text format. I loaded it into a zedstore table ("create table ips (ip text) using zedstore"), and poked around a little bit to see how the space is used.

postgres=# select lokey, nitems, ncompressed, totalsz, uncompressedsz, freespace from pg_zs_btree_pages('ips') where attno=1 and level=0 limit 10;
 lokey | nitems | ncompressed | totalsz | uncompressedsz | freespace
-------+--------+-------------+---------+----------------+-----------
     1 |      4 |           4 |    6785 |           7885 |      1320
   537 |      5 |           5 |    7608 |           8818 |       492
  1136 |      4 |           4 |    6762 |           7888 |      1344
  1673 |      5 |           5 |    7548 |           8776 |       540
  2269 |      4 |           4 |    6841 |           7895 |      1256
  2807 |      5 |           5 |    7555 |           8784 |       540
  3405 |      5 |           5 |    7567 |           8772 |       524
  4001 |      4 |           4 |    6791 |           7899 |      1320
  4538 |      5 |           5 |    7596 |           8776 |       500
  5136 |      4 |           4 |    6750 |           7875 |      1360
(10 rows)

There's on average about 10% of free space on the pages. We're losing quite a bit to to ZFS compression right there. I'm sure there's some free space on the heap pages as well, but ZFS compression will squeeze it out.

The compression ratio is indeed not very good. I think one reason is that zedstore does LZ4 in relatively small chunks, while ZFS surely compresses large blocks in one go. Looking at the above, there is on average 125 datums packed into each "item" (avg(hikey-lokey) / nitems). I did a quick test with the "lz4" command-line utility, compressing flat files containing random IP addresses.

$ lz4 /tmp/125-ips.txt
Compressed filename will be : /tmp/125-ips.txt.lz4
Compressed 1808 bytes into 1519 bytes ==> 84.02%
$ lz4 /tmp/550-ips.txt
Compressed filename will be : /tmp/550-ips.txt.lz4
Compressed 7863 bytes into 6020 bytes ==> 76.56%
$ lz4 /tmp/750-ips.txt
Compressed filename will be : /tmp/750-ips.txt.lz4
Compressed 10646 bytes into 8035 bytes ==> 75.47%

The first case is roughly what we do in zedstore currently: we compress about 125 datums as one chunk. The second case is roughty what we would get, if we collected on 8k worth of datums and compressed them all as one chunk. And the third case simulates the case we would allow the input to be larger than 8k, so that the compressed chunk just fits on an 8k page. Not too much difference between the second and third case, but its pretty clear that we're being hurt by splitting the input into such small chunks.

The downside of using a larger compression chunk size is that random access becomes more expensive. Need to give the on-disk format some more thought. Although I actually don't feel too bad about the current compression ratio, perfect can be the enemy of good.

- Heikki


Reply via email to