Hi, When the content of a large transaction (size exceeding logical_decoding_work_mem) and its sub-transactions has to be reordered during logical decoding, then, all the changes are written on disk in temporary files located in pg_replslot/<slot_name>. Decoding very large transactions by multiple replication slots can lead to disk space saturation and high I/O utilization.
When compiled with LZ4 support (--with-lz4), this patch enables data compression/decompression of these temporary files. Each transaction change that must be written on disk (ReorderBufferDiskChange) is now compressed and encapsulated in a new structure. 3 different compression strategies are implemented: 1. LZ4 streaming compression is the preferred one and works efficiently for small individual changes. 2. LZ4 regular compression when the changes are too large for using the streaming API. 3. No compression when compression fails, the change is then stored not compressed. When not using compression, the following case generates 1590MB of spill files: CREATE TABLE t (i INTEGER PRIMARY KEY, t TEXT); INSERT INTO t SELECT i, 'Hello number n°'||i::TEXT FROM generate_series(1, 10000000) as i; With LZ4 compression, it creates 653MB of spill files: 58.9% less disk space usage. Open items: 1. The spill_bytes column from pg_stat_get_replication_slot() still returns plain data size, not the compressed data size. Should we expose the compressed data size when compression occurs? 2. Do we want a GUC to switch compression on/off? Regards, JT
v1-0001-Compress-ReorderBuffer-spill-files-using-LZ4.patch
Description: Binary data