> > 0007 adds server-side compression; currently, it only supports > server-side compression using gzip, but I hope that it won't be hard > to generalize that to support LZ4 as well, and Andres told me he > thinks we should aim to support zstd since that library has built-in > parallel compression which is very appealing in this context. >
Thanks, Robert for laying the foundation here. So, I gave a try to LZ4 streaming API for server-side compression. LZ4 APIs are documented here[1]. With the attached WIP patch, I am now able to take the backup using the lz4 compression. The attached patch is basically applicable on top of Robert's V3 patch-set[2]. I could take the backup using the command: pg_basebackup -t server:/tmp/data_lz4 -Xnone --server-compression=lz4 Further, when restored the backup `/tmp/data_lz4` and started the server, I could see the tables I created, along with the data inserted on the original server. When I tried to look into the binary difference between the original data directory and the backup `data_lz4` directory here is how it looked: $ diff -qr data/ /tmp/data_lz4 Only in /tmp/data_lz4: backup_label Only in /tmp/data_lz4: backup_manifest Only in data/base: pgsql_tmp Only in /tmp/data_lz4: base.tar Only in /tmp/data_lz4: base.tar.lz4 Files data/global/pg_control and /tmp/data_lz4/global/pg_control differ Files data/logfile and /tmp/data_lz4/logfile differ Only in data/pg_stat: db_0.stat Only in data/pg_stat: global.stat Only in data/pg_subtrans: 0000 Only in data/pg_wal: 000000010000000000000099.00000028.backup Only in data/pg_wal: 00000001000000000000009A Only in data/pg_wal: 00000001000000000000009B Only in data/pg_wal: 00000001000000000000009C Only in data/pg_wal: 00000001000000000000009D Only in data/pg_wal: 00000001000000000000009E Only in data/pg_wal/archive_status: 000000010000000000000099.00000028.backup.done Only in data/: postmaster.opts For now, what concerns me here is, the following `LZ4F_compressUpdate()` API, is the one which is doing the core work of streaming compression: size_t LZ4F_compressUpdate(LZ4F_cctx* cctx, void* dstBuffer, size_t dstCapacity, const void* srcBuffer, size_t srcSize, const LZ4F_compressOptions_t* cOptPtr); where, `dstCapacity`, is basically provided by the earlier call to `LZ4F_compressBound()` which provides minimum `dstCapacity` required to guarantee success of `LZ4F_compressUpdate()`, given a `srcSize` and `preferences`, for a worst-case scenario. `LZ4F_compressBound()` is: size_t LZ4F_compressBound(size_t srcSize, const LZ4F_preferences_t* prefsPtr); Now, hard learning here is that the `dstCapacity` returned by the `LZ4F_compressBound()` even for a single byte i.e. 1 as `srcSize` is about ~256K (seems it is has something to do with the blockSize in lz4 frame that we chose, the minimum we can have is 64K), though the actual length of compressed data by the `LZ4F_compressUpdate()` is very less. Whereas, the destination buffer length for us i.e. `mysink->base.bbs_next->bbs_buffer_length` is only 32K. In the function call `LZ4F_compressUpdate()`, if I directly try to provide this `mysink->base.bbs_next->bbs_buffer + bytes_written` as `dstBuffer` and the value returned by the `LZ4F_compressBound()` as the `dstCapacity` that sounds very much incorrect to me, since the actual out buffer length remaining is much less than what is calculated for the worst case by `LZ4F_compressBound()`. For now, I am creating a temporary buffer of the required size, passing it for compression, assert that the actual compressed bytes are less than the whatever length we have available and then copy it to our output buffer. To give an example, I put some logging statements, and I can see in the log: " bytes remaining in mysink->base.bbs_next->bbs_buffer: 16537 input size to be compressed: 512 estimated size for compressed buffer by LZ4F_compressBound(): 262667 actual compressed size: 16 " Will really appreciate any inputs, comments, suggestions here. Regards, Jeevan Ladhe [1] https://fossies.org/linux/lz4/doc/lz4frame_manual.html [2] https://www.postgresql.org/message-id/CA+TgmoYgVN=-yoh71r3p9n7ekysd7_9b9s+1qfffcs3w7z-...@mail.gmail.com
lz4_compress_wip.patch
Description: Binary data