Greetings, * Euler Taveira (eu...@eulerto.com) wrote: > On Thu, Dec 23, 2021, at 9:58 AM, Bharath Rupireddy wrote: > > pg_archivecleanup currently takes a WAL file name as input to delete > > the WAL files prior to it [1]. As suggested by Satya (cc-ed) in > > pg_replslotdata thread [2], can we enhance the pg_archivecleanup to > > automatically detect the last checkpoint (from control file) LSN, > > calculate the lowest restart_lsn required by the replication slots, if > > any (by reading the replication slot info from pg_logical directory), > > archive the unneeded (an archive_command similar to that of the one > > provided in the server config can be provided as an input) WAL files > > before finally deleting them? Making pg_archivecleanup tool as an > > end-to-end solution will help greatly in disk full situations because > > of WAL files growth (inactive replication slots, archive command > > failures, infrequent checkpoint etc.).
The overall idea of having a tool for this isn't a bad idea, but .. > pg_archivecleanup is a tool to remove WAL files from the *archive*. Are you > suggesting to use it for removing files from pg_wal directory too? No, thanks. We definitely shouldn't have it be part of pg_archivecleanup for the simple reason that it'll be really confusing and almost certainly will be mis-used. For my 2c, we should just remove pg_archivecleanup entirely. > WAL files are a key component for backup and replication. Hence, you cannot > deliberately allow a tool to remove WAL files from PGDATA. IMO this issue > wouldn't occur if you have a monitoring system and alerts and someone to keep > an eye on it. If the disk full situation was caused by a failed archive > command > or a disconnected standby, it is easy to figure out; the fix is simple. This is perhaps a bit far- PG does, in fact, remove WAL files from PGDATA. Having a tool which will do this safely when the server isn't able to be brought online due to lack of disk space would certainly be helpful rather frequently. I agree that monitoring and alerting are things that everyone should implement and pay attention to, but that doesn't happen and instead people end up just blowing away pg_wal and corrupting their database when, had a tool existed, they could have avoided that happening and brought the system back online in relatively short order without any data loss. Thanks, Stephen
signature.asc
Description: PGP signature