On Thu, Feb 17, 2022 at 02:28:29PM -0800, Andres Freund wrote: > As far as I understand, the primary concern are logical decoding serialized > snapshots, because a lot of them can accumulate if there e.g. is an old unused > / far behind slot. It should be easy to reduce the number of those snapshots > by e.g. eliding some redundant ones. Perhaps we could also make backends in > logical decoding occasionally do a bit of cleanup themselves. > > I've not seen reports of the number of mapping files to be an real issue?
I routinely see all four of these tasks impacting customers, but I'd say the most common one is the temporary file cleanup. Besides eliminating some redundant files and having backends perform some cleanup, what do you think about skipping the logical decoding cleanup during end-of-recovery/shutdown checkpoints? This was something that Bharath brought up a while back [0]. As I noted in that thread, startup and shutdown could still take a while if checkpoints are regularly delayed due to logical decoding cleanup, but that might still help avoid a bit of downtime. > The improvements around deleting temporary files and serialized snapshots > afaict don't require a dedicated process - they're only relevant during > startup. We could use the approach of renaming the directory out of the way as > done in this patchset but perform the cleanup in the startup process after > we're up. Perhaps this is a good place to start. As I mentioned above, IME the temporary file cleanup is the most common problem, so I think even getting that one fixed would be a huge improvement. [0] https://postgr.es/m/CALj2ACXkkSL8EBpR7m%3DMt%3DyRGBhevcCs3x4fsp3Bc-D13yyHOg%40mail.gmail.com -- Nathan Bossart Amazon Web Services: https://aws.amazon.com