Hi all, While playing with a standby as follows I noticed that xlogtemp.* generated in pg_wal may stay around when entering crash recovery. The test I was conducting is pretty simple: - Use a primary and a standby. - Run pgbench on the primary. - Then restart the standby with -m immediate and force WAL segment switch on the primary in a loop. Depending on the timing, one can see that those xlogtemp files stay around. Those files are here when creating a new segment from scratch and append the PID of the process creating them. Any previous file existing with the same name is unlinked.
The problem is that if an instance is not really stable for a reason or another and starts crash recovery periodically, then there is a risk of accumulating those temporary files. If pg_wal is on its own partition, tuned by max_wal_size, then there is a risk to run into ENOSPC and take PostgreSQL down as new WAL segments cannot be created. Shouldn't those files be cleaned up at the beginning of crash recovery? Attached is a proposal of patch doing so. Thoughts? -- Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index c633e11128..60f9f75aa9 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -881,6 +881,7 @@ static bool WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, static int emode_for_corrupt_record(int emode, XLogRecPtr RecPtr); static void XLogFileClose(void); static void PreallocXlogFiles(XLogRecPtr endptr); +static void RemoveXLogTempFiles(void); static void RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr); static void RemoveXlogFile(const char *segname, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr); static void UpdateLastRemovedPtr(char *filename); @@ -3851,6 +3852,35 @@ UpdateLastRemovedPtr(char *filename) SpinLockRelease(&XLogCtl->info_lck); } +/* + * Remove all temporary log files in pg_wal + * + * This is called at the beginning of recovery after a previous crash, + * at a point where no other processes write fresh WAL data. + */ +static void +RemoveXLogTempFiles(void) +{ + DIR *xldir; + struct dirent *xlde; + + elog(DEBUG2, "removing all temporary WAL files"); + + xldir = AllocateDir(XLOGDIR); + while ((xlde = ReadDir(xldir, XLOGDIR)) != NULL) + { + char path[MAXPGPATH]; + + if (strncmp(xlde->d_name, "xlogtemp.", 9) != 0) + continue; + + snprintf(path, MAXPGPATH, XLOGDIR "/%s", xlde->d_name); + elog(DEBUG2, "removed temporary WAL file \"%s\"", path); + unlink(path); + } + FreeDir(xldir); +} + /* * Recycle or remove all log files older or equal to passed segno. * @@ -6352,16 +6382,24 @@ StartupXLOG(void) ValidateXLOGDirectoryStructure(); /* - * If we previously crashed, there might be data which we had written, - * intending to fsync it, but which we had not actually fsync'd yet. - * Therefore, a power failure in the near future might cause earlier - * unflushed writes to be lost, even though more recent data written to - * disk from here on would be persisted. To avoid that, fsync the entire + * If we previously crashed, perform a couple of actions: + * 1) The pg_wal directory may still include some temporary WAL + * segments used when creating a new segment, so perform some + * clean up to not bloat this path. This is done first as there + * is no point to sync this temporary data. + * 2) There might be data which we had written, intending to fsync + * it, but which we had not actually fsync'd yet. Therefore, a + * power failure in the near future might cause earlier unflushed + * writes to be lost, even though more recent data written to disk + * from here on would be persisted. To avoid that, fsync the entire * data directory. */ if (ControlFile->state != DB_SHUTDOWNED && ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY) + { + RemoveXLogTempFiles(); SyncDataDirectory(); + } /* * Initialize on the assumption we want to recover to the latest timeline
signature.asc
Description: PGP signature