Hello! Thank you very much for your feedback and essential remarks.
On 07.09.2022 10:39, Kyotaro Horiguchi wrote:
It lets XLogPageRead run the same check with what CreateRestartPoint does, so it basically works (it is forgetting a lock, though. The reason for omitting the lock in CreateRestartPoint is that it knows checkopinter is the only updator of the shared variable.). I'm not sure I like that for the code duplication. I'm not sure we need to fix that but if we do that, I would impletent IsNewCheckPointWALRecs() using XLogCtl->RedoRecPtr and XLogCtl->lastCheckPoint.redo instead since they are protected by the same lock, and they work more correct way, that is, that can avoid restartpoint requests while the last checkpoint is running. And I would rename it as RestartPointAvailable() or something like that.
Corrected patch is attached (v2-0001-Fix-burst-checkpoint_req-growth.patch). The access to Controlfile was removed so lwlock seems to be not needed. Some logic duplication is still present and and i'm not quite sure if it's possible to get rid of it. Would be glad to any suggestions.
Or I might want to add XLogRestartpointNeeded(readSegNo) to reduce the required number of info_lck by reading XLogCtl members at once.
If we place this check into the XLogCheckpointNeeded() this will lead to a double take of info_lck in XLogPageRead() when the restartpoint request is forming. As it's done now taking of info_lck will be more rarely. It seems i probably didn't understand your idea, please clarify it for me.
Depends on how we see the counter value. I think this can be annoyingbut not a bug. CreateRestartPoint already handles that case.
Yes! It is in fact annoying as docs says that checkpoint_req counts "the number of requested checkpoints that have been performed". But really checkpoints_req counts both the number of checkpoints requests and restartpoint ones which may not be performed and resources are not spent. The second frightening factor is the several times faster growth of the checkpoints_timed counter on the replica vs primary due to scheduling replays in 15 second if an attempt to create the restartpoint failed. Here is a patch that leaves all logic as is, but adds a stats about restartpoints. (v1-0001-Add-restartpoint-stats.patch) . For instance, for the same period on primary with this patch: # SELECT CURRENT_TIME; select * from pg_stat_bgwriter \gx current_time -------------------- 00:19:15.794561+03 (1 row) -[ RECORD 1 ]---------+----------------------------- checkpoints_timed | 4 checkpoints_req | 10 restartpoints_timed | 0 restartpoints_req | 0 restartpoints_done | 0 On replica: # SELECT CURRENT_TIME; select * from pg_stat_bgwriter \gx current_time -------------------- 00:19:11.363009+03 (1 row) -[ RECORD 1 ]---------+------------------------------ checkpoints_timed | 0 checkpoints_req | 0 restartpoints_timed | 42 restartpoints_req | 67 restartpoints_done | 10 Only the counters checkpoints_timed, checkpoints_req and restartpoints_done give the indication of resource-intensive operations. Without this patch, the user would see on the replica something like this: checkpoints_timed | 42 checkpoints_req | 109 With best regards, -- Anton A. Melnikov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
commit d5a58d8585692be0d24db9414859088e3e7f7dad Author: Anton A. Melnikov <a.melni...@postgrespro.ru> Date: Tue Sep 6 12:18:56 2022 +0300 Remove burst growth of the checkpoint_req on replica. with correcttions according to comment: https://www.postgresql.org/message-id/20220907.163946.2214338139097492905.horikyota.ntt%40gmail.com diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 81d339d57d..3ed1a87943 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -9054,3 +9054,20 @@ SetWalWriterSleeping(bool sleeping) XLogCtl->WalWriterSleeping = sleeping; SpinLockRelease(&XLogCtl->info_lck); } + +/* + * Check if a new checkpoint WAL record has been received since the + * last restartpoint. So it is possible to create a new one. + */ +bool RestartPointAvailable(void) +{ + bool result = false; + + SpinLockAcquire(&XLogCtl->info_lck); + if (!XLogRecPtrIsInvalid(XLogCtl->lastCheckPointRecPtr) + && XLogCtl->lastCheckPoint.redo > XLogCtl->RedoRecPtr) + result = true; + SpinLockRelease(&XLogCtl->info_lck); + + return result; +} diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c index b41e682664..7236e0f0a4 100644 --- a/src/backend/access/transam/xlogrecovery.c +++ b/src/backend/access/transam/xlogrecovery.c @@ -3199,7 +3199,15 @@ XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen, { (void) GetRedoRecPtr(); if (XLogCheckpointNeeded(readSegNo)) - RequestCheckpoint(CHECKPOINT_CAUSE_XLOG); + { + /* + * If there is no new checkpoint WAL records since the + * last restartpoint the creation of new one + * will certainly fail, so skip it. + */ + if (RestartPointAvailable()) + RequestCheckpoint(CHECKPOINT_CAUSE_XLOG); + } } } diff --git a/src/include/access/xlogrecovery.h b/src/include/access/xlogrecovery.h index 0aa85d90e8..a248f2251e 100644 --- a/src/include/access/xlogrecovery.h +++ b/src/include/access/xlogrecovery.h @@ -82,6 +82,7 @@ extern void XLogRecoveryShmemInit(void); extern void InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdownPtr, bool *haveBackupLabel, bool *haveTblspcMap); extern void PerformWalRecovery(void); +extern bool RestartPointAvailable(void); /* * FinishWalRecovery() returns this. It contains information about the point
commit 9a155dd54634aa83dbab0852849e0a69288b0b9f Author: Anton A. Melnikov <a.melni...@postgrespro.ru> Date: Thu Sep 15 01:53:17 2022 +0300 Add statistic about restartpoint into pg_stat_bgwriter diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 55f7ec79e0..5ef2ce0c89 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -1104,6 +1104,9 @@ CREATE VIEW pg_stat_bgwriter AS SELECT pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints_timed, pg_stat_get_bgwriter_requested_checkpoints() AS checkpoints_req, + pg_stat_get_bgwriter_timed_restartpoints() AS restartpoints_timed, + pg_stat_get_bgwriter_requested_restartpoints() AS restartpoints_req, + pg_stat_get_bgwriter_performed_restartpoints() AS restartpoints_done, pg_stat_get_checkpoint_write_time() AS checkpoint_write_time, pg_stat_get_checkpoint_sync_time() AS checkpoint_sync_time, pg_stat_get_bgwriter_buf_written_checkpoints() AS buffers_checkpoint, diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c index 5fc076fc14..2296701ddf 100644 --- a/src/backend/postmaster/checkpointer.c +++ b/src/backend/postmaster/checkpointer.c @@ -350,6 +350,8 @@ CheckpointerMain(void) pg_time_t now; int elapsed_secs; int cur_timeout; + bool chkpt_or_rstpt_requested = false; + bool chkpt_or_rstpt_timed = false; /* Clear any already-pending wakeups */ ResetLatch(MyLatch); @@ -368,7 +370,7 @@ CheckpointerMain(void) if (((volatile CheckpointerShmemStruct *) CheckpointerShmem)->ckpt_flags) { do_checkpoint = true; - PendingCheckpointerStats.requested_checkpoints++; + chkpt_or_rstpt_requested = true; } /* @@ -382,7 +384,7 @@ CheckpointerMain(void) if (elapsed_secs >= CheckPointTimeout) { if (!do_checkpoint) - PendingCheckpointerStats.timed_checkpoints++; + chkpt_or_rstpt_timed = true; do_checkpoint = true; flags |= CHECKPOINT_CAUSE_TIME; } @@ -418,6 +420,24 @@ CheckpointerMain(void) if (flags & CHECKPOINT_END_OF_RECOVERY) do_restartpoint = false; + if (chkpt_or_rstpt_requested) + { + chkpt_or_rstpt_requested = false; + if (do_restartpoint) + PendingCheckpointerStats.requested_restartpoints++; + else + PendingCheckpointerStats.requested_checkpoints++; + } + + if (chkpt_or_rstpt_timed) + { + chkpt_or_rstpt_timed = false; + if (do_restartpoint) + PendingCheckpointerStats.timed_restartpoints++; + else + PendingCheckpointerStats.timed_checkpoints++; + } + /* * We will warn if (a) too soon since last checkpoint (whatever * caused it) and (b) somebody set the CHECKPOINT_CAUSE_XLOG flag @@ -481,6 +501,9 @@ CheckpointerMain(void) * checkpoints happen at a predictable spacing. */ last_checkpoint_time = now; + + if (do_restartpoint) + PendingCheckpointerStats.performed_restartpoints++; } else { diff --git a/src/backend/utils/activity/pgstat_checkpointer.c b/src/backend/utils/activity/pgstat_checkpointer.c index af8d513e7b..9de99a2cf1 100644 --- a/src/backend/utils/activity/pgstat_checkpointer.c +++ b/src/backend/utils/activity/pgstat_checkpointer.c @@ -49,6 +49,9 @@ pgstat_report_checkpointer(void) #define CHECKPOINTER_ACC(fld) stats_shmem->stats.fld += PendingCheckpointerStats.fld CHECKPOINTER_ACC(timed_checkpoints); CHECKPOINTER_ACC(requested_checkpoints); + CHECKPOINTER_ACC(timed_restartpoints); + CHECKPOINTER_ACC(requested_restartpoints); + CHECKPOINTER_ACC(performed_restartpoints); CHECKPOINTER_ACC(checkpoint_write_time); CHECKPOINTER_ACC(checkpoint_sync_time); CHECKPOINTER_ACC(buf_written_checkpoints); @@ -112,6 +115,9 @@ pgstat_checkpointer_snapshot_cb(void) #define CHECKPOINTER_COMP(fld) pgStatLocal.snapshot.checkpointer.fld -= reset.fld; CHECKPOINTER_COMP(timed_checkpoints); CHECKPOINTER_COMP(requested_checkpoints); + CHECKPOINTER_COMP(timed_restartpoints); + CHECKPOINTER_COMP(requested_restartpoints); + CHECKPOINTER_COMP(performed_restartpoints); CHECKPOINTER_COMP(checkpoint_write_time); CHECKPOINTER_COMP(checkpoint_sync_time); CHECKPOINTER_COMP(buf_written_checkpoints); diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c index be15b4b2e5..f76448eba6 100644 --- a/src/backend/utils/adt/pgstatfuncs.c +++ b/src/backend/utils/adt/pgstatfuncs.c @@ -1668,6 +1668,24 @@ pg_stat_get_bgwriter_requested_checkpoints(PG_FUNCTION_ARGS) PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->requested_checkpoints); } +Datum +pg_stat_get_bgwriter_timed_restartpoints(PG_FUNCTION_ARGS) +{ + PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->timed_restartpoints); +} + +Datum +pg_stat_get_bgwriter_requested_restartpoints(PG_FUNCTION_ARGS) +{ + PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->requested_restartpoints); +} + +Datum +pg_stat_get_bgwriter_performed_restartpoints(PG_FUNCTION_ARGS) +{ + PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->performed_restartpoints); +} + Datum pg_stat_get_bgwriter_buf_written_checkpoints(PG_FUNCTION_ARGS) { diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index a07e737a33..a1d1fe6796 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -5604,6 +5604,21 @@ proname => 'pg_stat_get_bgwriter_requested_checkpoints', provolatile => 's', proparallel => 'r', prorettype => 'int8', proargtypes => '', prosrc => 'pg_stat_get_bgwriter_requested_checkpoints' }, +{ oid => '9938', + descr => 'statistics: number of timed restartpoints started by the bgwriter', + proname => 'pg_stat_get_bgwriter_timed_restartpoints', provolatile => 's', + proparallel => 'r', prorettype => 'int8', proargtypes => '', + prosrc => 'pg_stat_get_bgwriter_timed_restartpoints' }, +{ oid => '9939', + descr => 'statistics: number of backend requested restartpoints started by the bgwriter', + proname => 'pg_stat_get_bgwriter_requested_restartpoints', provolatile => 's', + proparallel => 'r', prorettype => 'int8', proargtypes => '', + prosrc => 'pg_stat_get_bgwriter_requested_restartpoints' }, +{ oid => '9940', + descr => 'statistics: number of backend performed restartpoints started by the bgwriter', + proname => 'pg_stat_get_bgwriter_performed_restartpoints', provolatile => 's', + proparallel => 'r', prorettype => 'int8', proargtypes => '', + prosrc => 'pg_stat_get_bgwriter_performed_restartpoints' }, { oid => '2771', descr => 'statistics: number of buffers written by the bgwriter during checkpoints', proname => 'pg_stat_get_bgwriter_buf_written_checkpoints', provolatile => 's', diff --git a/src/include/pgstat.h b/src/include/pgstat.h index ac28f813b4..680ff6f39b 100644 --- a/src/include/pgstat.h +++ b/src/include/pgstat.h @@ -269,6 +269,9 @@ typedef struct PgStat_CheckpointerStats { PgStat_Counter timed_checkpoints; PgStat_Counter requested_checkpoints; + PgStat_Counter timed_restartpoints; + PgStat_Counter requested_restartpoints; + PgStat_Counter performed_restartpoints; PgStat_Counter checkpoint_write_time; /* times in milliseconds */ PgStat_Counter checkpoint_sync_time; PgStat_Counter buf_written_checkpoints;