Re: shared-memory based stats collector

Fujii Masao Tue, 09 Mar 2021 22:21:05 -0800



On 2021/03/10 12:10, Kyotaro Horiguchi wrote:

At Tue, 9 Mar 2021 23:24:10 +0900, Fujii Masao <masao.fu...@oss.nttdata.com> 
wrote in



On 2021/03/09 16:51, Kyotaro Horiguchi wrote:

At Sat, 6 Mar 2021 00:32:07 +0900, Fujii Masao <masao.fu...@oss.nttdata.com>
wrote in

I don't think that we should treat non-zero exit condition as a crash,
as before. Otherwise when archive_command fails on a signal,
archiver emits FATAL error and which leads the server restart.

Sounds reasonable. Now archiver is treated the same way to wal
receiver.  Specifically exit(1) doesn't cause server restart.


Thanks!

-                       if (PgArchStartupAllowed())
-                               PgArchPID = pgarch_start();

In the latest patch, why did you remove the code to restart new archiver
in reaper()? When archiver dies, I think new archiver should be restarted like
the current reaper() does. Otherwise, the restart of archiver can be
delayed until the next cycle of ServerLoop, which may take time.


Agreed. The code moved to the original place and added the crash
handling code. And I added a phrase to the comment.

+                * Was it the archiver?  If exit status is zero (normal) or one 
(FATAL
+                * exit), we assume everything is all right just like normal 
backends
+                * and just try to restart a new one so that we immediately 
retry
                                              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+                * archiving of remaining files. (If fail, we'll try again in 
future
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~



"of" of "archiving of remaining" should be replaced with "the", or removed?


Just for record. Previously LogChildExit() was called and the following LOG
message was output when the archiver reported FATAL error. OTOH the patch
prevents that and the following LOG message is not output at FATAL exit of
archiver. But I don't think that the following message is required in that case
because FATAL message indicating the similar thing is already output.
Therefore, I'm ok with the patch.

LOG:  archiver process (PID 46418) exited with exit code 1

I read v50_003 patch.

When archiver dies, ProcGlobal->archiverLatch should be reset to NULL,
like walreceiver does the similar thing in WalRcvDie()?


Differently from walwriter and checkpointer, archiver as well as
walreceiver may die while server is running. Leaving the latch pointer
alone may lead to nudging a wrong process that took over the same
procarray slot. Added pgarch_die() to do that.


Thanks!

+       if (IsUnderPostmaster && ProcGlobal->archiverLatch)
+               SetLatch(ProcGlobal->archiverLatch);

The latch can be reset to NULL in pgarch_die() between the if-condition and
SetLatch(), and which would be problematic. Probably we should protect
the access to the latch by using spinlock, like we do for walreceiver's latch?


(I moved the archiverLatch to just after checkpointerLatch in this version.)

In pgarch.c, #include "postmaster/fork_process.h" seems no longer necessary.


Right. That's not due to this patch, postmater.h, dsm.h and pg_shmem.h
are not used. (fd.h is not necessary but pgarch.c uses AllocateDir().)

+       if (strcmp(argv[1], "--forkarch") == 0)
+       {

Why is this necessary? I was thinking that "--forkboot" handles archiver
in SubPostmasterMain().


Yeah, the correspondent code is removed in the same patch at the same
time.

The attached is v51 patchset.


Thanks a lot!

Regards,


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: shared-memory based stats collector

Reply via email to