Proposed wording attached.

The typical WAL rules are broken for setting PD_ALL_VISIBLE. I'm OK
with that -- rules are meant to be broken -- but it's confusing enough
that I think we should (internally) document it better. This doesn't
guarantee things won't change again in the future, but this behavior
has been stable for a while.

The thread here:

https://postgr.es/m/ee47ee24-2928-96e3-a2b1-97cbe07b2c7b%40garret.ru

also indicates that external projects and tools are relying on our
rules for the page LSNs.

Regards,
        Jeff Davis

From 6594a9698a9edde666f883fcb14ced392066103f Mon Sep 17 00:00:00 2001
From: Jeff Davis <j...@j-davis.com>
Date: Fri, 11 Nov 2022 14:02:52 -0800
Subject: [PATCH v1] Document WAL rules related to PD_ALL_VISIBLE in README.

Also improve comments.
---
 src/backend/access/heap/heapam.c        |  5 +++--
 src/backend/access/heap/visibilitymap.c | 17 +++++++++++------
 src/backend/access/transam/README       | 17 +++++++++++++++++
 3 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 5c8cdfb9b2..fc77fd85ed 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -8193,8 +8193,9 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
  * corresponding visibility map block.  Both should have already been modified
  * and dirtied.
  *
- * If checksums are enabled, we also generate a full-page image of
- * heap_buffer, if necessary.
+ * If checksums or wal_log_hints are enabled, we also generate a full-page
+ * image of heap_buffer, if necessary. If not, we optimize away the FPI, which
+ * can substantially reduce the WAL volume in common cases.
  */
 XLogRecPtr
 log_heap_visible(RelFileLocator rlocator, Buffer heap_buffer, Buffer vm_buffer,
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index d62761728b..4ed70275e2 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -223,13 +223,13 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
  *	visibilitymap_set - set bit(s) on a previously pinned page
  *
  * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The page LSN is advanced to the
+ * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
  * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value.  cutoff_xid is the largest xmin on the page being
- * marked all-visible; it is needed for Hot Standby, and can be
- * InvalidTransactionId if the page contains no tuples.  It can also be set
- * to InvalidTransactionId when a page that is already all-visible is being
- * marked all-frozen.
+ * page LSN to that value (though the heap page's LSN may *not* be updated;
+ * see below).  cutoff_xid is the largest xmin on the page being marked
+ * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
+ * if the page contains no tuples.  It can also be set to InvalidTransactionId
+ * when a page that is already all-visible is being marked all-frozen.
  *
  * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
  * this function. Except in recovery, caller should also pass the heap
@@ -289,6 +289,11 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 				/*
 				 * If data checksums are enabled (or wal_log_hints=on), we
 				 * need to protect the heap page from being torn.
+				 *
+				 * If not, then we must *not* update the heap page's LSN. In
+				 * this case, the FPI for the heap page was omitted from the
+				 * WAL record inserted above, so it would be incorrect to
+				 * update the heap page's LSN.
 				 */
 				if (XLogHintBitIsNeeded())
 				{
diff --git a/src/backend/access/transam/README b/src/backend/access/transam/README
index 72af656060..e771640b57 100644
--- a/src/backend/access/transam/README
+++ b/src/backend/access/transam/README
@@ -645,6 +645,23 @@ If you do decide to optimise away a WAL record, then any calls to
 MarkBufferDirty() must be replaced by MarkBufferDirtyHint(),
 otherwise you will expose the risk of partial page writes.
 
+The all-visible hint in a heap page (PD_ALL_VISIBLE) is a special
+case, because it is treated like a durable change in some respects and
+a hint in other respects. It must satisfy the invariant that, if a
+heap page's associated visibilitymap (VM) bit is set, then
+PD_ALL_VISIBLE is set on the heap page itself. Clearing of
+PD_ALL_VISIBLE is always treated like a fully-durable change to
+maintain this invariant. Additionally, if checksums or wal_log_hints
+are enabled, setting PD_ALL_VISIBLE is also treated like a
+fully-durable change to protect against torn pages.
+
+But, if neither checksums or wal_log_hints is enabled, torn pages are
+of no consequence if the only change is to PD_ALL_VISIBLE; so no full
+heap page image is taken, and the heap page's LSN is not updated. NB:
+it would be incorrect to update the heap page's LSN when applying this
+optimization, even though there is an associated WAL record, because
+subsequent modifiers (e.g. an unrelated UPDATE) of the page may
+falsely believe that a full page image is not required.
 
 Write-Ahead Logging for Filesystem Actions
 ------------------------------------------
-- 
2.34.1

Reply via email to