Re: False "pg_serial": apparent wraparound” in logs

Imseih (AWS), Sami Wed, 23 Aug 2023 13:49:49 -0700

Hi,

I dug a bit into this and what looks to be happening is the comparison
of the page containing the latest cutoff xid could falsely be reported
as in the future of the last page number because the latest
page number of the Serial slru is only set when the page is
initialized [1].


So under the correct conditions, such as in the repro, the serializable
XID has moved past the last page number, therefore to the next checkpoint
which triggers a CheckPointPredicate, it will appear that the slru
has wrapped around.

It seems what may be needed here is to advance the
latest_page_number during SerialSetActiveSerXmin and if
we are using the SLRU. See below:


diff --git a/src/backend/storage/lmgr/predicate.c 
b/src/backend/storage/lmgr/predicate.c
index 1af41213b4..6946ed21b4 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -992,6 +992,9 @@ SerialSetActiveSerXmin(TransactionId xid)

        serialControl->tailXid = xid;

+       if (serialControl->headPage > 0)
+               SerialSlruCtl->shared->latest_page_number = SerialPage(xid);
+
        LWLockRelease(SerialSLRULock);
}

[1] 
https://github.com/postgres/postgres/blob/master/src/backend/access/transam/slru.c#L306

Regards,

Sami

From: "Imseih (AWS), Sami" <[email protected]>
Date: Tuesday, August 22, 2023 at 7:56 PM
To: "[email protected]" <[email protected]>
Subject: False "pg_serial": apparent wraparound” in logs

Hi,

I Recently encountered a situation on the field in which the message
“could not truncate directory "pg_serial": apparent wraparound”
was logged even through there was no danger of wraparound. This
was on a brand new cluster and only took a few minutes to see
the message in the logs.

Reading on some history of this error message, it appears that there
was work done to improve SLRU truncation and associated wraparound
log messages [1]. The attached repro on master still shows that this message
can be logged incorrectly.

The repro runs updates with 90 threads in serializable mode and kicks
off a “long running” select on the same table in serializable mode.

As soon as the long running select commits, the next checkpoint fails
to truncate the SLRU and logs the error message.

Besides the confusing log message, there may also also be risk with
pg_serial getting unnecessarily bloated and depleting the disk space.

Is this a bug?

[1] 
https://www.postgresql.org/message-id/flat/20190202083822.GC32531%40gust.leadboat.com

Regards,

Sami Imseih
Amazon Web Services (AWS)

Re: False "pg_serial": apparent wraparound” in logs

Reply via email to