On 6/6/24 16:24, Alvaro Herrera wrote: > On 2024-Jun-06, Amit Kapila wrote: > >> On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <jul...@gmail.com> wrote: >>> >>> When the content of a large transaction (size exceeding >>> logical_decoding_work_mem) and its sub-transactions has to be >>> reordered during logical decoding, then, all the changes are written >>> on disk in temporary files located in pg_replslot/<slot_name>. >>> Decoding very large transactions by multiple replication slots can >>> lead to disk space saturation and high I/O utilization. > > I like the general idea of compressing the output of logical decoding. > It's not so clear to me that we only want to do so for spilling to disk; > for instance, if the two nodes communicate over a slow network, it may > even be beneficial to compress when streaming, so to this question: > >> Why can't one use 'streaming' option to send changes to the client >> once it reaches the configured limit of 'logical_decoding_work_mem'? > > I would say that streaming doesn't necessarily have to mean we don't > want compression, because for some users it might be beneficial. > > I think a GUC would be a good idea. Also, what if for whatever reason > you want a different compression algorithm or different compression > parameters? Looking at the existing compression UI we offer in > pg_basebackup, perhaps you could add something like this: > > compress_logical_decoding = none > compress_logical_decoding = lz4:42 > compress_logical_decoding = spill-zstd:99 > > "none" says to never use compression (perhaps should be the default), > "lz4:42" says to use lz4 with parameters 42 on both spilling and > streaming, and "spill-zstd:99" says to use Zstd with parameter 99 but > only for spilling to disk. > > (I don't mean to say that you should implement Zstd compression with > this patch, only that you should choose the implementation so that > adding Zstd support (or whatever) later is just a matter of adding some > branches here and there. With the current #ifdef you propose, it's hard > to do that. Maybe separate the parts that depend on the specific > algorithm to algorithm-agnostic functions.) >
I haven't been following the "libpq compression" thread, but wouldn't that also do compression for the streaming case? That was my assumption, at least, and it seems like the right way - we probably don't want to patch every place that sends data over network independently, right? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company