While testing an xidStopLimit corner case, I got this: 3656710 2019-01-05 00:05:13.910 GMT LOG: automatic aggressive vacuum to prevent wraparound of table "test.pg_toast.pg_toast_826": index scans: 0 3656710 2019-01-05 00:05:16.912 GMT LOG: could not truncate directory "pg_xact": apparent wraparound 3656710 2019-01-05 00:05:16.912 GMT DEBUG: transaction ID wrap limit is 4294486400, limited by database with OID 1 3656710 2019-01-05 00:05:16.912 GMT WARNING: database "template1" must be vacuumed within 481499 transactions 3656710 2019-01-05 00:05:16.912 GMT HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
I think the WARNING was correct about having 481499 XIDs left before xidWrapLimit, and the spurious "apparent wraparound" arose from this rounding-down in SimpleLruTruncate(): cutoffPage -= cutoffPage % SLRU_PAGES_PER_SEGMENT; ... /* * While we are holding the lock, make an important safety check: the * planned cutoff point must be <= the current endpoint page. Otherwise we * have already wrapped around, and proceeding with the truncation would * risk removing the current segment. */ if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage)) { LWLockRelease(shared->ControlLock); ereport(LOG, (errmsg("could not truncate directory \"%s\": apparent wraparound", ctl->Dir))); We round "cutoffPage" to make ctl->PagePrecedes(segpage, cutoffPage) return false for the segment containing the cutoff page. CLOGPagePrecedes() (and most SLRU PagePrecedes methods) implements a circular address space. Hence, the rounding also causes ctl->PagePrecedes(segpage, cutoffPage) to return true for the segment furthest in the future relative to the unrounded cutoffPage (if it exists). That's bad. Such a segment rarely exists, because xidStopLimit protects 1000000 XIDs, and the rounding moves truncation by no more than (BLCKSZ * CLOG_XACTS_PER_BYTE * SLRU_PAGES_PER_SEGMENT - 1) = 1048575 XIDs. Thus, I expect to see this problem at 4.9% of xidStopLimit values. I expect this is easier to see with multiStopLimit, which protects only 100 mxid. The main consequence is the false alarm. A prudent DBA will want to react to true wraparound, but no such wraparound has occurred. Also, we temporarily waste disk space in pg_xact. This feels like a recipe for future bugs. The fix I have in mind, attached, is to change instances of ctl->PagePrecedes(FIRST_PAGE_OF_SEGMENT, ROUNDED_cutoffPage) to ctl->PagePrecedes(LAST_PAGE_OF_SEGMENT, cutoffPage). I'm inclined not to back-patch this; does anyone favor back-patching? Thanks, nm
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c index 3623352..843486a 100644 --- a/src/backend/access/transam/slru.c +++ b/src/backend/access/transam/slru.c @@ -1171,11 +1171,6 @@ SimpleLruTruncate(SlruCtl ctl, int cutoffPage) int slotno; /* - * The cutoff point is the start of the segment containing cutoffPage. - */ - cutoffPage -= cutoffPage % SLRU_PAGES_PER_SEGMENT; - - /* * Scan shared memory and remove any pages preceding the cutoff page, to * ensure we won't rewrite them later. (Since this is normally called in * or just after a checkpoint, any dirty pages should have been flushed @@ -1320,11 +1315,10 @@ restart: bool SlruScanDirCbReportPresence(SlruCtl ctl, char *filename, int segpage, void *data) { + int seg_last_page = segpage + SLRU_PAGES_PER_SEGMENT - 1; int cutoffPage = *(int *) data; - cutoffPage -= cutoffPage % SLRU_PAGES_PER_SEGMENT; - - if (ctl->PagePrecedes(segpage, cutoffPage)) + if (ctl->PagePrecedes(seg_last_page, cutoffPage)) return true; /* found one; don't iterate any more */ return false; /* keep going */ @@ -1337,9 +1331,10 @@ SlruScanDirCbReportPresence(SlruCtl ctl, char *filename, int segpage, void *data static bool SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename, int segpage, void *data) { + int seg_last_page = segpage + SLRU_PAGES_PER_SEGMENT - 1; int cutoffPage = *(int *) data; - if (ctl->PagePrecedes(segpage, cutoffPage)) + if (ctl->PagePrecedes(seg_last_page, cutoffPage)) SlruInternalDeleteSegment(ctl, filename); return false; /* keep going */