On 30/09/2023 02:16, Imseih (AWS), Sami wrote:
The initial idea was to advance the latest_page_number
during SerialSetActiveSerXmin, but the initial approach is
obviously wrong.

That approach at high level could work, a

When SerialSetActiveSerXmin is called for a new active
serializable xmin, and at that point we don't need to keep any
any earlier transactions, should SimpleLruZeroPage be called
to ensure there is a target page for the xid?

I tried something like below, which fixes my repro, by calling
SimpleLruZeroPage at the end of SerialSetActiveSerXmin.

@@ -953,6 +953,8 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
  static void
  SerialSetActiveSerXmin(TransactionId xid)
  {
+       int targetPage = SerialPage(xid);
+
         LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
/*
@@ -992,6 +994,9 @@ SerialSetActiveSerXmin(TransactionId xid)
serialControl->tailXid = xid; + if (serialControl->headPage != targetPage)
+               SimpleLruZeroPage(SerialSlruCtl, targetPage);
+
         LWLockRelease(SerialSLRULock);
  }

No, that's very wrong too. You are zeroing the page containing the oldest XID that's still needed. That page still contains important information. It might work if you zero the previous page, but I think you need to do a little more than that. (I wish we had tests that would catch that.)

The crux of the problem is that 'tailXid' can advance past 'headXid'. I was bit surprised by that, but I think it's by design. I wish it was called out explicitly in a comment though. The code mostly handles that fine, except that it confuses the "apparent wraparound" check.

'tailXid' is the oldest XID that we might still need to look up in the SLRU, based on the transactions that are still active, and 'headXid' is the newest XID that has been written out to the SLRU. But we only write an XID out to the SLRU and advance headXid if the shared memory data structure fills up. So it's possible that as old transactions age out, we advance 'tailXid' past 'headXid'.

SerialAdd() tolerates tailXid > headXid. It will zero out all the pages between the old headXid and tailXid, even though no lookups can occur on those pages. That's unnecessary but harmless.

I think the smallest fix here would be to change CheckPointPredicate() so that if tailPage > headPage, pass headPage to SimpleLruTruncate() instead of tailPage. Or perhaps it should go into the "The SLRU is no longer needed" codepath in that case. If tailPage > headPage, the SLRU isn't needed at the moment.

In addition to that, we could change SerialAdd() to not zero out the pages between old headXid and tailXid unnecessarily, but that's more of an optimization than bug fix.

--
Heikki Linnakangas
Neon (https://neon.tech)



Reply via email to