Hello!

Thank you very much for your feedback and essential remarks.

On 07.09.2022 10:39, Kyotaro Horiguchi wrote:

It lets XLogPageRead run the same check with what CreateRestartPoint
does, so it basically works (it is forgetting a lock, though. The
reason for omitting the lock in CreateRestartPoint is that it knows
checkopinter is the only updator of the shared variable.). I'm not
sure I like that for the code duplication.

I'm not sure we need to fix that but if we do that, I would impletent
IsNewCheckPointWALRecs() using XLogCtl->RedoRecPtr and
XLogCtl->lastCheckPoint.redo instead since they are protected by the
same lock, and they work more correct way, that is, that can avoid
restartpoint requests while the last checkpoint is running.  And I
would rename it as RestartPointAvailable() or something like that.

Corrected patch is attached (v2-0001-Fix-burst-checkpoint_req-growth.patch).
The access to Controlfile was removed so lwlock seems to be not needed.
Some logic duplication is still present and and i'm not quite sure if
it's possible to get rid of it. Would be glad to any suggestions.

Or I might want to add XLogRestartpointNeeded(readSegNo) to reduce the
required number of info_lck by reading XLogCtl members at once.

If we place this check into the XLogCheckpointNeeded() this will lead to a 
double
take of info_lck in XLogPageRead() when the restartpoint request is forming.
As it's done now taking of info_lck will be more rarely.
It seems i probably didn't understand your idea, please clarify it for me.

Depends on how we see the counter value. I think this can be annoying
but not a bug. CreateRestartPoint already handles that case.

Yes! It is in fact annoying as docs says that checkpoint_req counts
"the number of requested checkpoints that have been performed".
But really checkpoints_req counts both the number of checkpoints requests
and restartpoint ones which may not be performed and resources are not spent.
The second frightening factor is the several times faster growth
of the checkpoints_timed counter on the replica vs primary due to scheduling
replays in 15 second if an attempt to create the restartpoint failed.

Here is a patch that leaves all logic as is, but adds a stats about
restartpoints. (v1-0001-Add-restartpoint-stats.patch)
.
For instance, for the same period on primary with this patch:
# SELECT CURRENT_TIME; select * from pg_stat_bgwriter \gx
    current_time
--------------------
 00:19:15.794561+03
(1 row)

-[ RECORD 1 ]---------+-----------------------------
checkpoints_timed     | 4
checkpoints_req       | 10
restartpoints_timed   | 0
restartpoints_req     | 0
restartpoints_done    | 0

On replica:
# SELECT CURRENT_TIME; select * from pg_stat_bgwriter \gx
    current_time
--------------------
 00:19:11.363009+03
(1 row)

-[ RECORD 1 ]---------+------------------------------
checkpoints_timed     | 0
checkpoints_req       | 0
restartpoints_timed   | 42
restartpoints_req     | 67
restartpoints_done    | 10

Only the counters checkpoints_timed, checkpoints_req and restartpoints_done give
the indication of resource-intensive operations.
Without this patch, the user would see on the replica something like this:

checkpoints_timed     | 42
checkpoints_req       | 109


With best regards,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
commit d5a58d8585692be0d24db9414859088e3e7f7dad
Author: Anton A. Melnikov <a.melni...@postgrespro.ru>
Date:   Tue Sep 6 12:18:56 2022 +0300

    Remove burst growth of the checkpoint_req on replica.
    
    with correcttions according to comment:
     https://www.postgresql.org/message-id/20220907.163946.2214338139097492905.horikyota.ntt%40gmail.com

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 81d339d57d..3ed1a87943 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9054,3 +9054,20 @@ SetWalWriterSleeping(bool sleeping)
 	XLogCtl->WalWriterSleeping = sleeping;
 	SpinLockRelease(&XLogCtl->info_lck);
 }
+
+/*
+ * Check if a new checkpoint WAL record has been received since the
+ * last restartpoint. So it is possible to create a new one.
+ */
+bool RestartPointAvailable(void)
+{
+	bool result = false;
+
+	SpinLockAcquire(&XLogCtl->info_lck);
+	if (!XLogRecPtrIsInvalid(XLogCtl->lastCheckPointRecPtr)
+		&& XLogCtl->lastCheckPoint.redo > XLogCtl->RedoRecPtr)
+				result = true;
+	SpinLockRelease(&XLogCtl->info_lck);
+
+	return result;
+}
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index b41e682664..7236e0f0a4 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -3199,7 +3199,15 @@ XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen,
 			{
 				(void) GetRedoRecPtr();
 				if (XLogCheckpointNeeded(readSegNo))
-					RequestCheckpoint(CHECKPOINT_CAUSE_XLOG);
+				{
+					/*
+						* If there is no new checkpoint WAL records since the
+						* last restartpoint the creation of new one
+						* will certainly fail, so skip it.
+						*/
+					if (RestartPointAvailable())
+						RequestCheckpoint(CHECKPOINT_CAUSE_XLOG);
+				}
 			}
 		}
 
diff --git a/src/include/access/xlogrecovery.h b/src/include/access/xlogrecovery.h
index 0aa85d90e8..a248f2251e 100644
--- a/src/include/access/xlogrecovery.h
+++ b/src/include/access/xlogrecovery.h
@@ -82,6 +82,7 @@ extern void XLogRecoveryShmemInit(void);
 
 extern void InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdownPtr, bool *haveBackupLabel, bool *haveTblspcMap);
 extern void PerformWalRecovery(void);
+extern bool RestartPointAvailable(void);
 
 /*
  * FinishWalRecovery() returns this.  It contains information about the point
commit 9a155dd54634aa83dbab0852849e0a69288b0b9f
Author: Anton A. Melnikov <a.melni...@postgrespro.ru>
Date:   Thu Sep 15 01:53:17 2022 +0300

    Add statistic about restartpoint into pg_stat_bgwriter

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 55f7ec79e0..5ef2ce0c89 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1104,6 +1104,9 @@ CREATE VIEW pg_stat_bgwriter AS
     SELECT
         pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints_timed,
         pg_stat_get_bgwriter_requested_checkpoints() AS checkpoints_req,
+        pg_stat_get_bgwriter_timed_restartpoints() AS restartpoints_timed,
+        pg_stat_get_bgwriter_requested_restartpoints() AS restartpoints_req,
+        pg_stat_get_bgwriter_performed_restartpoints() AS restartpoints_done,
         pg_stat_get_checkpoint_write_time() AS checkpoint_write_time,
         pg_stat_get_checkpoint_sync_time() AS checkpoint_sync_time,
         pg_stat_get_bgwriter_buf_written_checkpoints() AS buffers_checkpoint,
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 5fc076fc14..2296701ddf 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -350,6 +350,8 @@ CheckpointerMain(void)
 		pg_time_t	now;
 		int			elapsed_secs;
 		int			cur_timeout;
+		bool		chkpt_or_rstpt_requested = false;
+		bool		chkpt_or_rstpt_timed = false;
 
 		/* Clear any already-pending wakeups */
 		ResetLatch(MyLatch);
@@ -368,7 +370,7 @@ CheckpointerMain(void)
 		if (((volatile CheckpointerShmemStruct *) CheckpointerShmem)->ckpt_flags)
 		{
 			do_checkpoint = true;
-			PendingCheckpointerStats.requested_checkpoints++;
+			chkpt_or_rstpt_requested = true;
 		}
 
 		/*
@@ -382,7 +384,7 @@ CheckpointerMain(void)
 		if (elapsed_secs >= CheckPointTimeout)
 		{
 			if (!do_checkpoint)
-				PendingCheckpointerStats.timed_checkpoints++;
+				chkpt_or_rstpt_timed = true;
 			do_checkpoint = true;
 			flags |= CHECKPOINT_CAUSE_TIME;
 		}
@@ -418,6 +420,24 @@ CheckpointerMain(void)
 			if (flags & CHECKPOINT_END_OF_RECOVERY)
 				do_restartpoint = false;
 
+			if (chkpt_or_rstpt_requested)
+			{
+				chkpt_or_rstpt_requested = false;
+				if (do_restartpoint)
+					PendingCheckpointerStats.requested_restartpoints++;
+				else
+					PendingCheckpointerStats.requested_checkpoints++;
+			}
+
+			if (chkpt_or_rstpt_timed)
+			{
+				chkpt_or_rstpt_timed = false;
+				if (do_restartpoint)
+					PendingCheckpointerStats.timed_restartpoints++;
+				else
+					PendingCheckpointerStats.timed_checkpoints++;
+			}
+
 			/*
 			 * We will warn if (a) too soon since last checkpoint (whatever
 			 * caused it) and (b) somebody set the CHECKPOINT_CAUSE_XLOG flag
@@ -481,6 +501,9 @@ CheckpointerMain(void)
 				 * checkpoints happen at a predictable spacing.
 				 */
 				last_checkpoint_time = now;
+
+				if (do_restartpoint)
+					PendingCheckpointerStats.performed_restartpoints++;
 			}
 			else
 			{
diff --git a/src/backend/utils/activity/pgstat_checkpointer.c b/src/backend/utils/activity/pgstat_checkpointer.c
index af8d513e7b..9de99a2cf1 100644
--- a/src/backend/utils/activity/pgstat_checkpointer.c
+++ b/src/backend/utils/activity/pgstat_checkpointer.c
@@ -49,6 +49,9 @@ pgstat_report_checkpointer(void)
 #define CHECKPOINTER_ACC(fld) stats_shmem->stats.fld += PendingCheckpointerStats.fld
 	CHECKPOINTER_ACC(timed_checkpoints);
 	CHECKPOINTER_ACC(requested_checkpoints);
+	CHECKPOINTER_ACC(timed_restartpoints);
+	CHECKPOINTER_ACC(requested_restartpoints);
+	CHECKPOINTER_ACC(performed_restartpoints);
 	CHECKPOINTER_ACC(checkpoint_write_time);
 	CHECKPOINTER_ACC(checkpoint_sync_time);
 	CHECKPOINTER_ACC(buf_written_checkpoints);
@@ -112,6 +115,9 @@ pgstat_checkpointer_snapshot_cb(void)
 #define CHECKPOINTER_COMP(fld) pgStatLocal.snapshot.checkpointer.fld -= reset.fld;
 	CHECKPOINTER_COMP(timed_checkpoints);
 	CHECKPOINTER_COMP(requested_checkpoints);
+	CHECKPOINTER_COMP(timed_restartpoints);
+	CHECKPOINTER_COMP(requested_restartpoints);
+	CHECKPOINTER_COMP(performed_restartpoints);
 	CHECKPOINTER_COMP(checkpoint_write_time);
 	CHECKPOINTER_COMP(checkpoint_sync_time);
 	CHECKPOINTER_COMP(buf_written_checkpoints);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index be15b4b2e5..f76448eba6 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1668,6 +1668,24 @@ pg_stat_get_bgwriter_requested_checkpoints(PG_FUNCTION_ARGS)
 	PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->requested_checkpoints);
 }
 
+Datum
+pg_stat_get_bgwriter_timed_restartpoints(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->timed_restartpoints);
+}
+
+Datum
+pg_stat_get_bgwriter_requested_restartpoints(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->requested_restartpoints);
+}
+
+Datum
+pg_stat_get_bgwriter_performed_restartpoints(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->performed_restartpoints);
+}
+
 Datum
 pg_stat_get_bgwriter_buf_written_checkpoints(PG_FUNCTION_ARGS)
 {
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index a07e737a33..a1d1fe6796 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5604,6 +5604,21 @@
   proname => 'pg_stat_get_bgwriter_requested_checkpoints', provolatile => 's',
   proparallel => 'r', prorettype => 'int8', proargtypes => '',
   prosrc => 'pg_stat_get_bgwriter_requested_checkpoints' },
+{ oid => '9938',
+  descr => 'statistics: number of timed restartpoints started by the bgwriter',
+  proname => 'pg_stat_get_bgwriter_timed_restartpoints', provolatile => 's',
+  proparallel => 'r', prorettype => 'int8', proargtypes => '',
+  prosrc => 'pg_stat_get_bgwriter_timed_restartpoints' },
+{ oid => '9939',
+  descr => 'statistics: number of backend requested restartpoints started by the bgwriter',
+  proname => 'pg_stat_get_bgwriter_requested_restartpoints', provolatile => 's',
+  proparallel => 'r', prorettype => 'int8', proargtypes => '',
+  prosrc => 'pg_stat_get_bgwriter_requested_restartpoints' },
+{ oid => '9940',
+  descr => 'statistics: number of backend performed restartpoints started by the bgwriter',
+  proname => 'pg_stat_get_bgwriter_performed_restartpoints', provolatile => 's',
+  proparallel => 'r', prorettype => 'int8', proargtypes => '',
+  prosrc => 'pg_stat_get_bgwriter_performed_restartpoints' },
 { oid => '2771',
   descr => 'statistics: number of buffers written by the bgwriter during checkpoints',
   proname => 'pg_stat_get_bgwriter_buf_written_checkpoints', provolatile => 's',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index ac28f813b4..680ff6f39b 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -269,6 +269,9 @@ typedef struct PgStat_CheckpointerStats
 {
 	PgStat_Counter timed_checkpoints;
 	PgStat_Counter requested_checkpoints;
+	PgStat_Counter timed_restartpoints;
+	PgStat_Counter requested_restartpoints;
+	PgStat_Counter performed_restartpoints;
 	PgStat_Counter checkpoint_write_time;	/* times in milliseconds */
 	PgStat_Counter checkpoint_sync_time;
 	PgStat_Counter buf_written_checkpoints;

Reply via email to