On 07/05/14 22:32, Robert Haas wrote:
On Tue, May 6, 2014 at 8:25 PM, Petr Jelinek <p...@2ndquadrant.com> wrote:
I've committed the portion of your patch which does this, with pretty
extensive changes. I moved the function which resets the crash times
to bgworker.c, renamed it, and made it just reset all of the crash
times unconditionally; there didn't seem to be any point in skipping
the irrelevant ones, so it seemed best to keep things simple. I also
moved the call from reaper() where you had it to
PostmasterStateMachine(), because the old placement did not work for
me when I carried out the following steps:
1. Kill a background worker with a 60-second restart time using
SIGTERM, so that it does exit(1).
2. Before it restarts, kill a regular backend with SIGQUIT.
Let me know if you think I've got it wrong.
No I think it's fine, I didn't try that combination and just wanted to
put it as deep in the call as possible.
(2) If a shmem-connected backend fails to release the deadman switch
or exits with an exit code other than 0 or 1, we crash-and-restart. A
non-shmem-connected backend never causes a crash-and-restart.
+1
I did this as a separate commit,
e2ce9aa27bf20eff2d991d0267a15ea5f7024cd7, just moving the check for
ReleasePostmasterChildSlot inside the if statement. Then I realized
that was bogus, so I just pushed
eee6cf1f337aa488a20e9111df446cdad770e645 to fix that. Hopefully it's
OK now.
The fixed one looks ok to me.
(3) When a background worker exits without triggering a
crash-and-restart, an exit code of precisely 0 causes the worker to be
unregistered; any other exit code has no special effect, so
bgw_restart_time controls.
+1
This isn't done yet.
Unless I am missing something this change was included in every patch I
sent - setting rw->rw_terminate = true; in CleanupBackgroundWorker for
zero exit code + comment changes. Or do you have objections to this
approach?
Anyway missing parts attached.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/doc/src/sgml/bgworker.sgml b/doc/src/sgml/bgworker.sgml
index fd32d6c..5f86100 100644
--- a/doc/src/sgml/bgworker.sgml
+++ b/doc/src/sgml/bgworker.sgml
@@ -166,10 +166,12 @@ typedef struct BackgroundWorker
</para>
<para>
- Background workers are expected to be continuously running; if they exit
- cleanly, <command>postgres</> will restart them immediately. Consider doing
- interruptible sleep when they have nothing to do; this can be achieved by
- calling <function>WaitLatch()</function>. Make sure the
+ Background workers are normally expected to be running continuously;
+ they get restarted automaticaly in case of crash (see
+ <structfield>bgw_restart_time</structfield> documentation for details),
+ but if they exit cleanly, <command>postgres</> will not restart them.
+ Consider doing interruptible sleep when they have nothing to do; this can be
+ achieved by calling <function>WaitLatch()</function>. Make sure the
<literal>WL_POSTMASTER_DEATH</> flag is set when calling that function, and
verify the return code for a prompt exit in the emergency case that
<command>postgres</> itself has terminated.
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 64c9722..b642f37 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -884,8 +884,9 @@ RegisterDynamicBackgroundWorker(BackgroundWorker *worker,
* running but is no longer.
*
* In the latter case, the worker may be stopped temporarily (if it is
- * configured for automatic restart, or if it exited with code 0) or gone
- * for good (if it is configured not to restart and exited with code 1).
+ * configured for automatic restart and exited with code 1) or gone for
+ * good (if it exited with code other than 1 or if it is configured not
+ * to restart).
*/
BgwHandleStatus
GetBackgroundWorkerPid(BackgroundWorkerHandle *handle, pid_t *pidp)
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 79d1c50..a0331e6 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2849,7 +2849,11 @@ CleanupBackgroundWorker(int pid,
if (!EXIT_STATUS_0(exitstatus))
rw->rw_crashed_at = GetCurrentTimestamp();
else
+ {
+ /* Zero exit status means terminate */
rw->rw_crashed_at = 0;
+ rw->rw_terminate = true;
+ }
/*
* Additionally, for shared-memory-connected workers, just like a
diff --git a/src/include/postmaster/bgworker.h b/src/include/postmaster/bgworker.h
index c9550cc..d3dd1f2 100644
--- a/src/include/postmaster/bgworker.h
+++ b/src/include/postmaster/bgworker.h
@@ -16,10 +16,11 @@
* that the failure can only be transient (fork failure due to high load,
* memory pressure, too many processes, etc); more permanent problems, like
* failure to connect to a database, are detected later in the worker and dealt
- * with just by having the worker exit normally. A worker which exits with a
- * return code of 0 will be immediately restarted by the postmaster. A worker
- * which exits with a return code of 1 will be restarted after the configured
- * restart interval, or never if that interval is set to BGW_NEVER_RESTART.
+ * with just by having the worker exit normally. A worker which exits with
+ * a return code of 0 will never be restarted and will be removed from worker
+ * list. A worker which exits with a return code of 1 will be restarted after
+ * the configured restart interval, or never if that interval is set to
+ * BGW_NEVER_RESTART.
* The TerminateBackgroundWorker() function can be used to terminate a
* dynamically registered background worker; the worker will be sent a SIGTERM
* and will not be restarted after it exits. Whenever the postmaster knows
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers