As I mentioned in another thread, I came across a reproducible situation in which a memory clobber in a child backend crashes the postmaster too, at least on FreeBSD/arm64. Needless to say, this is Not Cool. I've now traced down what is happening, and it's this:
1. Careless coding in aset.c causes it to decide to wipe_mem the universe. (I'll have more to say about that separately; the point of this thread is keeping the postmaster alive afterwards.) Apparently, there's not any non-live memory space between process-local memory and shared memory on this platform, so the failing backend manages to trash shared memory too before it finally hits SIGSEGV. 2. Most of the background processes die on something like TRAP: FailedAssertion("latch->owner_pid == MyProcPid", File: "latch.c", Line: 686, PID: 5916) or they encounter what seems to be a stuck spinlock. The postmaster, however, SIGSEGVs. It's not supposed to do that; it is supposed to be sufficiently arms-length from shared memory that it can recover despite a backend trashing shared memory contents. 3. The cause of the SIGSEGV is that AssignPostmasterChildSlot naively believes that it can trust PMSignalState->next_child_flag to be a valid array index, so after that's been clobbered with something like 0x7f7f7f7f we index off the end of memory. I see no good reason for that state variable to be in shared memory at all, so the attached patch just moves it to postmaster static data. We also need a less-exposed copy of the array size variable. 4. That's enough to stop the SIGSEGV crash, but the postmaster still fails to recover, because then it hits elog(FATAL, "no free slots in PMChildFlags array"); since all of the array entries have been clobbered as well. In the attached patch, I fixed this by treating the case similarly to failure to fork a new child process. This seems to be enough to let the postmaster survive, and recover after it starts noticing crashing children. 5. It's possible that we should take some proactive steps to get out of the "no free slots" situation, rather than just wait for some child to crash. I'm inclined not to, however. It'd be hard-to-test corner-case code, and given the lack of field reports like this, the situation must be awfully rare. Thoughts? regards, tom lane
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 383bc4776e..4a50b24b57 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -4173,7 +4173,17 @@ BackendStartup(Port *port) * Unless it's a dead_end child, assign it a child slot number */ if (!bn->dead_end) + { bn->child_slot = MyPMChildSlot = AssignPostmasterChildSlot(); + if (bn->child_slot == 0) + { + /* No child slots ... treat like fork failure */ + free(bn); + /* error is already logged, but send something to the client */ + report_fork_failure_to_client(port, ENOMEM); + return STATUS_ERROR; + } + } else bn->child_slot = 0; @@ -5509,23 +5519,27 @@ StartAutovacuumWorker(void) bn->child_slot = MyPMChildSlot = AssignPostmasterChildSlot(); bn->bgworker_notify = false; - bn->pid = StartAutoVacWorker(); - if (bn->pid > 0) + /* if we fail to get a child slot, treat it like fork failure */ + if (bn->child_slot > 0) { - bn->bkend_type = BACKEND_TYPE_AUTOVAC; - dlist_push_head(&BackendList, &bn->elem); + bn->pid = StartAutoVacWorker(); + if (bn->pid > 0) + { + bn->bkend_type = BACKEND_TYPE_AUTOVAC; + dlist_push_head(&BackendList, &bn->elem); #ifdef EXEC_BACKEND - ShmemBackendArrayAdd(bn); + ShmemBackendArrayAdd(bn); #endif - /* all OK */ - return; - } + /* all OK */ + return; + } - /* - * fork failed, fall through to report -- actual error message was - * logged by StartAutoVacWorker - */ - (void) ReleasePostmasterChildSlot(bn->child_slot); + /* + * fork failed, fall through to report -- actual error message + * was logged by StartAutoVacWorker + */ + (void) ReleasePostmasterChildSlot(bn->child_slot); + } free(bn); } else @@ -5908,6 +5922,12 @@ assign_backendlist_entry(RegisteredBgWorker *rw) bn->cancel_key = MyCancelKey; bn->child_slot = MyPMChildSlot = AssignPostmasterChildSlot(); + if (bn->child_slot == 0) + { + /* couldn't get a slot; failure is already logged */ + free(bn); + return false; + } bn->bkend_type = BACKEND_TYPE_BGWORKER; bn->dead_end = false; bn->bgworker_notify = false; diff --git a/src/backend/storage/ipc/pmsignal.c b/src/backend/storage/ipc/pmsignal.c index 3f0ec5e6b8..51776779f4 100644 --- a/src/backend/storage/ipc/pmsignal.c +++ b/src/backend/storage/ipc/pmsignal.c @@ -75,12 +75,21 @@ struct PMSignalData QuitSignalReason sigquit_reason; /* why SIGQUIT was sent */ /* per-child-process flags */ int num_child_flags; /* # of entries in PMChildFlags[] */ - int next_child_flag; /* next slot to try to assign */ sig_atomic_t PMChildFlags[FLEXIBLE_ARRAY_MEMBER]; }; +/* PMSignalState pointer is valid in both postmaster and child processes */ NON_EXEC_STATIC volatile PMSignalData *PMSignalState = NULL; +/* + * These static variables are valid only in the postmaster. The postmaster + * should always use num_child_flags, not the copy in shared memory, so + * that it will not go insane if some failing child clobbers the copy + * in shared memory. + */ +static int num_child_flags; /* # of entries in PMChildFlags[] */ +static int next_child_flag; /* next slot to try to assign */ + /* * Signal handler to be notified if postmaster dies. */ @@ -142,7 +151,8 @@ PMSignalShmemInit(void) { /* initialize all flags to zeroes */ MemSet(unvolatize(PMSignalData *, PMSignalState), 0, PMSignalShmemSize()); - PMSignalState->num_child_flags = MaxLivePostmasterChildren(); + num_child_flags = MaxLivePostmasterChildren(); + PMSignalState->num_child_flags = num_child_flags; } } @@ -210,7 +220,7 @@ GetQuitSignalReason(void) /* * AssignPostmasterChildSlot - select an unused slot for a new postmaster * child process, and set its state to ASSIGNED. Returns a slot number - * (one to N). + * (one to N). On failure, log a message and return zero. * * Only the postmaster is allowed to execute this routine, so we need no * special locking. @@ -218,28 +228,31 @@ GetQuitSignalReason(void) int AssignPostmasterChildSlot(void) { - int slot = PMSignalState->next_child_flag; + int slot = next_child_flag; int n; /* * Scan for a free slot. We track the last slot assigned so as not to * waste time repeatedly rescanning low-numbered slots. */ - for (n = PMSignalState->num_child_flags; n > 0; n--) + for (n = num_child_flags; n > 0; n--) { if (--slot < 0) - slot = PMSignalState->num_child_flags - 1; + slot = num_child_flags - 1; if (PMSignalState->PMChildFlags[slot] == PM_CHILD_UNUSED) { PMSignalState->PMChildFlags[slot] = PM_CHILD_ASSIGNED; - PMSignalState->next_child_flag = slot; + next_child_flag = slot; return slot + 1; } } - /* Out of slots ... should never happen, else postmaster.c messed up */ - elog(FATAL, "no free slots in PMChildFlags array"); - return 0; /* keep compiler quiet */ + /* + * Out of slots. Should never happen if postmaster.c counts children + * accurately, but conceivably a failing child has clobbered the array. + */ + elog(LOG, "no free slots in PMChildFlags array"); + return 0; } /* @@ -254,7 +267,7 @@ ReleasePostmasterChildSlot(int slot) { bool result; - Assert(slot > 0 && slot <= PMSignalState->num_child_flags); + Assert(slot > 0 && slot <= num_child_flags); slot--; /* @@ -269,12 +282,12 @@ ReleasePostmasterChildSlot(int slot) /* * IsPostmasterChildWalSender - check if given slot is in use by a - * walsender process. + * walsender process. This is called only by the postmaster. */ bool IsPostmasterChildWalSender(int slot) { - Assert(slot > 0 && slot <= PMSignalState->num_child_flags); + Assert(slot > 0 && slot <= num_child_flags); slot--; if (PMSignalState->PMChildFlags[slot] == PM_CHILD_WALSENDER)