Hi hackers. Recently I came to an issue about logical replicating very big transactions. Since we already have logical_decoding_work_mem to keep the memory usage, there is no risk of OOM during decoding. However, the memory usage still goes out of control in 'Tuples' memory context of reorder buffer. It seems that when restoring the spilled transactions from disk, the memory usage is still limited by max_changes_in_memory which is hard coded to 4096 like what decoding does before v13.
For big transactions, we have already supported streaming mode since v14, which should solve this issue, but using streaming mode relies on the subscriptor's support. There are still a lot of PostgreSQL running v12/13 in production, or maybe v11 or older even though EOLed. Also, there are a lot of CDCs which logical-replicates PostgreSQL seem not support streaming either. Would it be possible to make max_changes_in_memory a GUC so it can be adjusted dynamically? Make the default value 4096 as what current is. When coming with big transactions on memory-constrained machine, at least we can adjust max_changes_in_memory to a lower value to make logical WAL sender passing through this transaction. Or WAL sender may get kill -9 and recovery is needed. After recovery, WAL sender needs to restart from a point before this transaction starts, and keep this loop without anything useful. It would never have a chance to pass through this transaction except adding more memory to the machine, which is usually not practical in reality. Sincerely, Jingtang