On Wed, Dec 29, 2021 at 8:06 PM Bharath Rupireddy <bharath.rupireddyforpostg...@gmail.com> wrote: > > On Wed, Dec 29, 2021 at 7:27 PM Stephen Frost <sfr...@snowman.net> wrote: > > > On Thu, Dec 23, 2021, at 9:58 AM, Bharath Rupireddy wrote: > > > > pg_archivecleanup currently takes a WAL file name as input to delete > > > > the WAL files prior to it [1]. As suggested by Satya (cc-ed) in > > > > pg_replslotdata thread [2], can we enhance the pg_archivecleanup to > > > > automatically detect the last checkpoint (from control file) LSN, > > > > calculate the lowest restart_lsn required by the replication slots, if > > > > any (by reading the replication slot info from pg_logical directory), > > > > archive the unneeded (an archive_command similar to that of the one > > > > provided in the server config can be provided as an input) WAL files > > > > before finally deleting them? Making pg_archivecleanup tool as an > > > > end-to-end solution will help greatly in disk full situations because > > > > of WAL files growth (inactive replication slots, archive command > > > > failures, infrequent checkpoint etc.). > > > > The overall idea of having a tool for this isn't a bad idea, but .. > > > > > pg_archivecleanup is a tool to remove WAL files from the *archive*. Are > > > you > > > suggesting to use it for removing files from pg_wal directory too? No, > > > thanks. > > > > We definitely shouldn't have it be part of pg_archivecleanup for the > > simple reason that it'll be really confusing and almost certainly will > > be mis-used. > > +1 > > > > WAL files are a key component for backup and replication. Hence, you > > > cannot > > > deliberately allow a tool to remove WAL files from PGDATA. IMO this issue > > > wouldn't occur if you have a monitoring system and alerts and someone to > > > keep > > > an eye on it. If the disk full situation was caused by a failed archive > > > command > > > or a disconnected standby, it is easy to figure out; the fix is simple. > > > > This is perhaps a bit far- PG does, in fact, remove WAL files from > > PGDATA. Having a tool which will do this safely when the server isn't > > able to be brought online due to lack of disk space would certainly be > > helpful rather frequently. I agree that monitoring and alerting are > > things that everyone should implement and pay attention to, but that > > doesn't happen and instead people end up just blowing away pg_wal and > > corrupting their database when, had a tool existed, they could have > > avoided that happening and brought the system back online in relatively > > short order without any data loss. > > Thanks. Yes, the end-to-end tool is helpful in rather eventual > situations and having it in the core is more helpful instead of every > postgres vendor developing their own solution and many times it's hard > to get it right. Also, I agree to not club this idea with > pg_archviecleanup. How about having a new tool like > pg_walcleanup/pg_xlogcleanup helping the developers/admins/users in > eventual situations?
Thanks for the comments. Here's a new tool called pg_walcleaner which basically deletes (optionally archiving before deletion) the unneeded WAL files. Please provide your thoughts and review the patches. Regards, Bharath Rupireddy.
v1-0001-pg_walcleaner.patch
Description: Binary data
v1-0002-pg_walcleaner-docs.patch
Description: Binary data