Hi, On 2022-02-17 13:00:22 -0800, Nathan Bossart wrote: > Okay. So IIUC the problem might already exist today, but offloading these > tasks to a separate process could make it more likely.
Vastly more, yes. Before checkpoints not happening would be a (but not a great) form of backpressure. You can't cancel them without triggering a crash-restart. Whereas custodian can be cancelled etc. As I said before, I think this is tackling things from the wrong end. Instead of moving the sometimes expensive task out of the way, but still expensive, the focus should be to make the expensive task cheaper. As far as I understand, the primary concern are logical decoding serialized snapshots, because a lot of them can accumulate if there e.g. is an old unused / far behind slot. It should be easy to reduce the number of those snapshots by e.g. eliding some redundant ones. Perhaps we could also make backends in logical decoding occasionally do a bit of cleanup themselves. I've not seen reports of the number of mapping files to be an real issue? The improvements around deleting temporary files and serialized snapshots afaict don't require a dedicated process - they're only relevant during startup. We could use the approach of renaming the directory out of the way as done in this patchset but perform the cleanup in the startup process after we're up. Greetings, Andres Freund