On Wed, Feb  3, 2021 at 08:07:16PM -0500, Bruce Momjian wrote:
> > > I can try to add a hint-bit-page-write page counter, but that might
> > > overflow, and then we will need a way to change the LSN anyway.
> > 
> > That's just a question of width...
> 
> Yeah, the hint bit counter is just delaying the inevitable, plus it
> changes the page format, which I am trying to avoid.  Also, I need this
> dummy record only if the page is marked clean, meaning a write
> to the file system already happened in the current checkpoint --- should
> not be to bad.

In looking your comments on Sawada-san's POC patch for buffer
encryption:

        
https://www.postgresql.org/message-id/20210112193431.2edcz776qjen7kao%40alap3.anarazel.de

I see that he put a similar function call in exactly the same place I
did, but you pointed out that he was inserting into WAL while holding a
buffer lock.

I restructured my patch to not make that same mistake, and modified it
for non-permanent buffers --- attached.

-- 
  Bruce Momjian  <br...@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

>From d535b08d264281348ab9e8eac645a5382ec35a8a Mon Sep 17 00:00:00 2001
From: Bruce Momjian <br...@momjian.us>
Date: Thu, 4 Feb 2021 13:43:56 -0500
Subject: [PATCH] hint squash commit

---
 src/backend/access/rmgrdesc/xlogdesc.c   |  6 +-
 src/backend/access/transam/xlog.c        |  4 ++
 src/backend/access/transam/xloginsert.c  | 16 +++++
 src/backend/replication/logical/decode.c |  1 +
 src/backend/storage/buffer/bufmgr.c      | 89 ++++++++++++++++--------
 src/include/access/xloginsert.h          |  2 +
 src/include/catalog/pg_control.h         |  1 +
 7 files changed, 90 insertions(+), 29 deletions(-)

diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 92cc7ea073..16a967bb71 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -80,7 +80,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 
 		appendStringInfoString(buf, xlrec->rp_name);
 	}
-	else if (info == XLOG_FPI || info == XLOG_FPI_FOR_HINT)
+	else if (info == XLOG_FPI || info == XLOG_FPI_FOR_HINT ||
+			 info == XLOG_HINT_LSN)
 	{
 		/* no further information to print */
 	}
@@ -185,6 +186,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_HINT_LSN:
+			id = "HINT_LSN";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f03bd473e2..f67316ee07 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10260,6 +10260,10 @@ xlog_redo(XLogReaderState *record)
 			UnlockReleaseBuffer(buffer);
 		}
 	}
+	else if (info == XLOG_HINT_LSN)
+	{
+		/* nothing to do here */
+	}
 	else if (info == XLOG_BACKUP_END)
 	{
 		XLogRecPtr	startpoint;
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index 7052dc245e..7635a3d44d 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -980,6 +980,22 @@ XLogSaveBufferForHint(Buffer buffer, bool buffer_std)
 	return recptr;
 }
 
+/*
+ * On the first hint bit change during a checkpoint, XLogSaveBufferForHint()
+ * writes a full page image to the WAL and returns a new LSN which can be
+ * assigned to the page.  Cluster file encryption needs a new LSN for
+ * every hint bit page write because the LSN is used in the nonce, and
+ * the nonce must be unique.  Therefore, this routine just advances the LSN
+ * so the page can be assigned a new LSN.
+ */
+XLogRecPtr
+XLogLSNForHint(void)
+{
+	XLogBeginInsert();
+
+	return XLogInsert(RM_XLOG_ID, XLOG_HINT_LSN);
+}
+
 /*
  * Write a WAL record containing a full image of a page. Caller is responsible
  * for writing the page to disk after calling this routine.
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index afa1df00d0..2ada89c6b1 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -223,6 +223,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_HINT_LSN:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 561c212092..b9f3f76798 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -3810,13 +3810,14 @@ MarkBufferDirtyHint(Buffer buffer, bool buffer_std)
 		 * If we need to protect hint bit updates from torn writes, WAL-log a
 		 * full page image of the page. This full page image is only necessary
 		 * if the hint bit update is the first change to the page since the
-		 * last checkpoint.
+		 * last checkpoint.  If cluster file encryption is enabled, we also
+		 * need to generate new page LSNs for non-first-page writes during
+		 * a checkpoint.
 		 *
 		 * We don't check full_page_writes here because that logic is included
 		 * when we call XLogInsert() since the value changes dynamically.
 		 */
-		if (XLogHintBitIsNeeded() &&
-			(pg_atomic_read_u32(&bufHdr->state) & BM_PERMANENT))
+		if (XLogHintBitIsNeeded())
 		{
 			/*
 			 * If we must not write WAL, due to a relfilenode-specific
@@ -3826,35 +3827,67 @@ MarkBufferDirtyHint(Buffer buffer, bool buffer_std)
 			 *
 			 * See src/backend/storage/page/README for longer discussion.
 			 */
-			if (RecoveryInProgress() ||
-				RelFileNodeSkippingWAL(bufHdr->tag.rnode))
+			if (((pg_atomic_read_u32(&bufHdr->state) & BM_PERMANENT)
+				 /* XXX || file_encryption_method != 0 */) &&
+				(RecoveryInProgress() ||
+				 RelFileNodeSkippingWAL(bufHdr->tag.rnode)))
 				return;
 
+			if (pg_atomic_read_u32(&bufHdr->state) & BM_PERMANENT)
+			{
+				/*
+				 * If the block is already dirty because we either made a change
+				 * or set a hint already, then we don't need to write a full page
+				 * image.  Note that aggressive cleaning of blocks dirtied by hint
+				 * bit setting would increase the call rate. Bulk setting of hint
+				 * bits would reduce the call rate...
+				 *
+				 * We must issue the WAL record before we mark the buffer dirty.
+				 * Otherwise we might write the page before we write the WAL. That
+				 * causes a race condition, since a checkpoint might occur between
+				 * writing the WAL record and marking the buffer dirty. We solve
+				 * that with a kluge, but one that is already in use during
+				 * transaction commit to prevent race conditions. Basically, we
+				 * simply prevent the checkpoint WAL record from being written
+				 * until we have marked the buffer dirty. We don't start the
+				 * checkpoint flush until we have marked dirty, so our checkpoint
+				 * must flush the change to disk successfully or the checkpoint
+				 * never gets written, so crash recovery will fix.
+				 *
+				 * It's possible we may enter here without an xid, so it is
+				 * essential that CreateCheckpoint waits for virtual transactions
+				 * rather than full transactionids.
+				 */
+				MyProc->delayChkpt = delayChkpt = true;
+				lsn = XLogSaveBufferForHint(buffer, buffer_std);
+			}
+
 			/*
-			 * If the block is already dirty because we either made a change
-			 * or set a hint already, then we don't need to write a full page
-			 * image.  Note that aggressive cleaning of blocks dirtied by hint
-			 * bit setting would increase the call rate. Bulk setting of hint
-			 * bits would reduce the call rate...
-			 *
-			 * We must issue the WAL record before we mark the buffer dirty.
-			 * Otherwise we might write the page before we write the WAL. That
-			 * causes a race condition, since a checkpoint might occur between
-			 * writing the WAL record and marking the buffer dirty. We solve
-			 * that with a kluge, but one that is already in use during
-			 * transaction commit to prevent race conditions. Basically, we
-			 * simply prevent the checkpoint WAL record from being written
-			 * until we have marked the buffer dirty. We don't start the
-			 * checkpoint flush until we have marked dirty, so our checkpoint
-			 * must flush the change to disk successfully or the checkpoint
-			 * never gets written, so crash recovery will fix.
-			 *
-			 * It's possible we may enter here without an xid, so it is
-			 * essential that CreateCheckpoint waits for virtual transactions
-			 * rather than full transactionids.
+			 * For cluster file encryption, we generate a new page LSN
+			 * for hint-bit non-permanent and non-first page writes.
 			 */
-			MyProc->delayChkpt = delayChkpt = true;
-			lsn = XLogSaveBufferForHint(buffer, buffer_std);
+			if (/* XXX file_encryption_method != 0 && */
+				XLogRecPtrIsInvalid(lsn) &&
+				/* XXX can we check BM_DIRTY without a page lock? */
+				!(pg_atomic_read_u32(&bufHdr->state) & BM_DIRTY))
+			{
+				/*
+				 * If the buffer is clean, and lsn is valid, it must be the
+				 * first hint bit change of the checkpoint (the old page lsn
+				 * is earlier than the RedoRecPtr).  If the page is clean and
+				 * the lsn is invalid, the page must have been already written
+				 * during this checkpoint, and this is a new hint bit change.
+				 * Such page changes usually don't create new page lsns, but we
+				 * need one for cluster file encryption because the lsn is used as
+				 * part of the nonce, and we don't want to reencrypt the page
+				 * with the hint bit changes using the same nonce as the previous
+				 * writes during this checkpoint.  (It would expose the hint bit
+				 * change locations.)  Therefore we create a simple WAL record
+				 * to advance the lsn, which can can then be assigned to the
+				 * page below.
+				 */
+				lsn = XLogLSNForHint();
+			}
 		}
 
 		buf_state = LockBufHdr(bufHdr);
diff --git a/src/include/access/xloginsert.h b/src/include/access/xloginsert.h
index f1d8c39edf..a066f65229 100644
--- a/src/include/access/xloginsert.h
+++ b/src/include/access/xloginsert.h
@@ -61,6 +61,8 @@ extern void log_newpage_range(Relation rel, ForkNumber forkNum,
 							  BlockNumber startblk, BlockNumber endblk, bool page_std);
 extern XLogRecPtr XLogSaveBufferForHint(Buffer buffer, bool buffer_std);
 
+extern XLogRecPtr XLogLSNForHint(void);
+
 extern void InitXLogInsert(void);
 
 #endif							/* XLOGINSERT_H */
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index e3f48158ce..36ac7dfbb8 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_HINT_LSN					0xC0
 
 
 /*
-- 
2.20.1

Reply via email to