Re: please update ps display for recovery checkpoint

Justin Pryzby Sat, 19 Sep 2020 09:00:59 -0700

On Thu, Sep 10, 2020 at 01:37:10PM +0900, Michael Paquier wrote:
> On Wed, Sep 09, 2020 at 09:00:50PM -0500, Justin Pryzby wrote:
> > What would you want the checkpointer's ps to say ?
> > 
> > Normally it just says:
> > postgres  3468  3151  0 Aug27 ?        00:20:57 postgres: checkpointer      
> >                           
> 
> Note that CreateCheckPoint() can also be called from the startup
> process if the bgwriter has not been launched once recovery finishes.
> 
> > Or do you mean do the same thing as now, but one layer lower, like:
> >
> > @@ -8728,6 +8725,9 @@ CreateCheckPoint(int flags)
> > +       if (flags & CHECKPOINT_END_OF_RECOVERY)
> > +               set_ps_display("recovery checkpoint");
> 
> For the use-case discussed here, that would be fine.  Now the
> difficult point is how much information we can actually display
> without bloating ps while still have something meaningful.  Showing
> all the information from LogCheckpointStart() would bloat the output a
> lot for example.  So, thinking about that, my take would be to have ps
> display the following at the beginning of CreateCheckpoint() and
> CreateRestartPoint():
> - restartpoint or checkpoint
> - shutdown
> - end-of-recovery
> 
> The output also needs to be cleared once the routines finish or if
> there is a skip, of course.


In my inital request, I *only* care about the startup process' recovery
checkpoint.  AFAIK, this exits when it's done, so there may be no need to
"revert" to the previous "ps".  However, one could argue that it's currently a
bug that the "recovering NNN" portion isn't updated after finishing the WAL
files.

StartupXLOG -> xlogreader -> XLogPageRead -> WaitForWALToBecomeAvailable -> 
XLogFileReadAnyTLI -> XLogFileRead
                                          -> CreateCheckPoint

Maybe it's a bad idea if the checkpointer is continuously changing its display.
I don't see the utility in it, since log_checkpoints does more than ps could
ever do.  I'm concerned that would break things for someone using something
like pgrep.
|$ ps -wwf `pgrep -f 'checkpointer *$'`
|UID        PID  PPID  C STIME TTY      STAT   TIME CMD
|postgres  9434  9418  0 Aug20 ?        Ss   214:25 postgres: checkpointer   

|pryzbyj  23010 23007  0 10:33 ?        00:00:00 postgres: checkpointer 
checkpoint

I think this one is by far the most common, but somewhat confusing, since it's
only one word.  This led me to put parenthesis around it:

|pryzbyj  26810 26809 82 10:53 ?        00:00:12 postgres: startup 
(end-of-recovery checkpoint)

Related: I have always thought that this message meant "recovery will complete
Real Soon", but I now understand it to mean "beginning the recovery checkpoint,
which is flagged CHECKPOINT_IMMEDIATE" (and may take a long time).

|2020-09-19 10:53:26.345 CDT [26810] LOG:  checkpoint starting: end-of-recovery 
immediate

-- 
Justin

>From d1d473beb4b59a93ceb7859f4776da47d9381aee Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryz...@telsasoft.com>
Date: Sat, 19 Sep 2020 10:12:53 -0500
Subject: [PATCH v2 1/4] Clear PS display after recovery

...so it doesn't continue to say 'recovering NNNN...' during checkpoint
---
 src/backend/access/transam/xlog.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 61754312e2..0183cae9a9 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7396,6 +7396,7 @@ StartupXLOG(void)
 			/*
 			 * end of main redo apply loop
 			 */
+			set_ps_display("");
 
 			if (reachedRecoveryTarget)
 			{
-- 
2.17.0

>From 0e0bbeb6501f5b1a7a5f0e4109dbf8a8152de249 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryz...@telsasoft.com>
Date: Sat, 8 Feb 2020 09:16:14 -0600
Subject: [PATCH v2 2/4] Update PS display following replay of last xlog..

..otherwise it shows "recovering <file>" for the duration of the recovery
checkpoint.
---
 src/backend/access/transam/xlog.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 0183cae9a9..3d8220ba78 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8892,6 +8892,13 @@ CreateCheckPoint(int flags)
 	if (log_checkpoints)
 		LogCheckpointStart(flags, false);
 
+	if (flags & CHECKPOINT_END_OF_RECOVERY)
+		set_ps_display("(end-of-recovery checkpoint)");
+	else if (flags & CHECKPOINT_IS_SHUTDOWN)
+		set_ps_display("(shutdown checkpoint)");
+	else
+		set_ps_display("(checkpoint)");
+
 	TRACE_POSTGRESQL_CHECKPOINT_START(flags);
 
 	/*
@@ -9106,6 +9113,7 @@ CreateCheckPoint(int flags)
 
 	/* Real work is done, but log and update stats before releasing lock. */
 	LogCheckpointEnd(false);
+	set_ps_display("");
 
 	TRACE_POSTGRESQL_CHECKPOINT_DONE(CheckpointStats.ckpt_bufs_written,
 									 NBuffers,
@@ -9349,6 +9357,11 @@ CreateRestartPoint(int flags)
 	if (log_checkpoints)
 		LogCheckpointStart(flags, true);
 
+	if (flags & CHECKPOINT_IS_SHUTDOWN)
+		set_ps_display("(shutdown restartpoint)");
+	else
+		set_ps_display("(restartpoint)");
+
 	CheckPointGuts(lastCheckPoint.redo, flags);
 
 	/*
@@ -9466,6 +9479,7 @@ CreateRestartPoint(int flags)
 
 	/* Real work is done, but log and update before releasing lock. */
 	LogCheckpointEnd(true);
+	set_ps_display("");
 
 	xtime = GetLatestXTime();
 	ereport((log_checkpoints ? LOG : DEBUG2),
-- 
2.17.0

>From e8a29a55affbc7018738ad471dfef72145c2d430 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryz...@telsasoft.com>
Date: Sat, 18 Jan 2020 13:44:33 -0600
Subject: [PATCH v2 3/4] Avoid ambiguous "that"

---
 doc/src/sgml/config.sgml | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 2c75876e32..ed445a70aa 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2246,7 +2246,7 @@ include_dir 'conf.d'
          dirty data in the kernel's page cache, reducing the likelihood of
          stalls when an <function>fsync</function> is issued at the end of a checkpoint, or when
          the OS writes data back in larger batches in the background.  Often
-         that will result in greatly reduced transaction latency, but there
+         this feature will result in greatly reduced transaction latency, but there
          also are some cases, especially with workloads that are bigger than
          <xref linkend="guc-shared-buffers"/>, but smaller than the OS's page
          cache, where performance might degrade.  This setting may have no
@@ -2488,7 +2488,7 @@ include_dir 'conf.d'
          amount of dirty data in the kernel's page cache, reducing the
          likelihood of stalls when an <function>fsync</function> is issued at the end of a
          checkpoint, or when the OS writes data back in larger batches in the
-         background.  Often that will result in greatly reduced transaction
+         background.  Often this feature will result in greatly reduced transaction
          latency, but there also are some cases, especially with workloads
          that are bigger than <xref linkend="guc-shared-buffers"/>, but smaller
          than the OS's page cache, where performance might degrade.  This
@@ -3172,7 +3172,7 @@ include_dir 'conf.d'
         limit the amount of dirty data in the kernel's page cache, reducing
         the likelihood of stalls when an <function>fsync</function> is issued at the end of the
         checkpoint, or when the OS writes data back in larger batches in the
-        background.  Often that will result in greatly reduced transaction
+        background.  This feature will often result in greatly reduced transaction
         latency, but there also are some cases, especially with workloads
         that are bigger than <xref linkend="guc-shared-buffers"/>, but smaller
         than the OS's page cache, where performance might degrade.  This
-- 
2.17.0

>From f5bbd1f9c84869718b540bb23690898db5da9e10 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryz...@telsasoft.com>
Date: Sat, 18 Jan 2020 13:44:08 -0600
Subject: [PATCH v2 4/4] Document that checkpoint_flush_after applies to
 end-of-recovery checkpoint

---
 doc/src/sgml/config.sgml | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ed445a70aa..7477824546 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3177,6 +3177,8 @@ include_dir 'conf.d'
         that are bigger than <xref linkend="guc-shared-buffers"/>, but smaller
         than the OS's page cache, where performance might degrade.  This
         setting may have no effect on some platforms.
+        This setting also applies to the checkpoint written at the end of crash
+        recovery.
         If this value is specified without units, it is taken as blocks,
         that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
         The valid range is
-- 
2.17.0

Re: please update ps display for recovery checkpoint

Reply via email to