While testing an xidStopLimit corner case, I got this:

3656710 2019-01-05 00:05:13.910 GMT LOG:  automatic aggressive vacuum to 
prevent wraparound of table "test.pg_toast.pg_toast_826": index scans: 0
3656710 2019-01-05 00:05:16.912 GMT LOG:  could not truncate directory 
"pg_xact": apparent wraparound
3656710 2019-01-05 00:05:16.912 GMT DEBUG:  transaction ID wrap limit is 
4294486400, limited by database with OID 1
3656710 2019-01-05 00:05:16.912 GMT WARNING:  database "template1" must be 
vacuumed within 481499 transactions
3656710 2019-01-05 00:05:16.912 GMT HINT:  To avoid a database shutdown, 
execute a database-wide VACUUM in that database.

I think the WARNING was correct about having 481499 XIDs left before
xidWrapLimit, and the spurious "apparent wraparound" arose from this
rounding-down in SimpleLruTruncate():

        cutoffPage -= cutoffPage % SLRU_PAGES_PER_SEGMENT;
...
        /*
         * While we are holding the lock, make an important safety check: the
         * planned cutoff point must be <= the current endpoint page. Otherwise 
we
         * have already wrapped around, and proceeding with the truncation would
         * risk removing the current segment.
         */
        if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
        {
                LWLockRelease(shared->ControlLock);
                ereport(LOG,
                                (errmsg("could not truncate directory \"%s\": 
apparent wraparound",
                                                ctl->Dir)));

We round "cutoffPage" to make ctl->PagePrecedes(segpage, cutoffPage) return
false for the segment containing the cutoff page.  CLOGPagePrecedes() (and
most SLRU PagePrecedes methods) implements a circular address space.  Hence,
the rounding also causes ctl->PagePrecedes(segpage, cutoffPage) to return true
for the segment furthest in the future relative to the unrounded cutoffPage
(if it exists).  That's bad.  Such a segment rarely exists, because
xidStopLimit protects 1000000 XIDs, and the rounding moves truncation by no
more than (BLCKSZ * CLOG_XACTS_PER_BYTE * SLRU_PAGES_PER_SEGMENT - 1) =
1048575 XIDs.  Thus, I expect to see this problem at 4.9% of xidStopLimit
values.  I expect this is easier to see with multiStopLimit, which protects
only 100 mxid.

The main consequence is the false alarm.  A prudent DBA will want to react to
true wraparound, but no such wraparound has occurred.  Also, we temporarily
waste disk space in pg_xact.  This feels like a recipe for future bugs.  The
fix I have in mind, attached, is to change instances of
ctl->PagePrecedes(FIRST_PAGE_OF_SEGMENT, ROUNDED_cutoffPage) to
ctl->PagePrecedes(LAST_PAGE_OF_SEGMENT, cutoffPage).  I'm inclined not to
back-patch this; does anyone favor back-patching?

Thanks,
nm
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 3623352..843486a 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -1171,11 +1171,6 @@ SimpleLruTruncate(SlruCtl ctl, int cutoffPage)
 	int			slotno;
 
 	/*
-	 * The cutoff point is the start of the segment containing cutoffPage.
-	 */
-	cutoffPage -= cutoffPage % SLRU_PAGES_PER_SEGMENT;
-
-	/*
 	 * Scan shared memory and remove any pages preceding the cutoff page, to
 	 * ensure we won't rewrite them later.  (Since this is normally called in
 	 * or just after a checkpoint, any dirty pages should have been flushed
@@ -1320,11 +1315,10 @@ restart:
 bool
 SlruScanDirCbReportPresence(SlruCtl ctl, char *filename, int segpage, void *data)
 {
+	int			seg_last_page = segpage + SLRU_PAGES_PER_SEGMENT - 1;
 	int			cutoffPage = *(int *) data;
 
-	cutoffPage -= cutoffPage % SLRU_PAGES_PER_SEGMENT;
-
-	if (ctl->PagePrecedes(segpage, cutoffPage))
+	if (ctl->PagePrecedes(seg_last_page, cutoffPage))
 		return true;			/* found one; don't iterate any more */
 
 	return false;				/* keep going */
@@ -1337,9 +1331,10 @@ SlruScanDirCbReportPresence(SlruCtl ctl, char *filename, int segpage, void *data
 static bool
 SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename, int segpage, void *data)
 {
+	int			seg_last_page = segpage + SLRU_PAGES_PER_SEGMENT - 1;
 	int			cutoffPage = *(int *) data;
 
-	if (ctl->PagePrecedes(segpage, cutoffPage))
+	if (ctl->PagePrecedes(seg_last_page, cutoffPage))
 		SlruInternalDeleteSegment(ctl, filename);
 
 	return false;				/* keep going */

Reply via email to