On Tue, Dec 6, 2022 at 7:09 AM Andres Freund <and...@anarazel.de> wrote: > On 2022-12-05 22:45:57 +1300, Thomas Munro wrote: > > The reason for the existing sleep-based approach was that we didn't > > want to accept any more connections (or spin furiously because the > > listen queue was non-empty). So in this version I invented a way to > > suppress socket events temporarily with WL_SOCKET_IGNORE, and then > > reactivate them after crash reinit. > > That seems like an odd special flag. Why do we need it? Is it just because we > want to have assertions ensuring that something is being queried?
Yeah. Perhaps 0 would be a less clumsy way to say "no events please". I removed the assertions and did it that way in this next iteration. I realised that the previous approach didn't actually suppress POLLHUP and POLLERR in the poll and epoll implementations (even though our code seems to think it needs to ask for those events, it's not necessary, you get them anyway), and, being level-triggered, if those were ever reported we'd finish up pegging the CPU to 100% until the children exited. Unlikely to happen with a server socket, but wrong on principle, and maybe a problem for other potential users of this temporary event suppression mode. One way to fix that for the epoll version is to EPOLL_CTL_DEL and EPOLL_CTL_ADD, whenever transitioning to/from a zero event mask. Tried like that in this version. Another approach would be to (finally) write DeleteWaitEvent() to do the same thing at a higher level... seems overkill for this. The kqueue version was already doing that because of the way it was implemented, and the poll and Windows versions needed only a small adjustment. I'm not too sure about the Windows change; my two ideas are passing the 0 through as shown in this version (not sure if it really works the way I want, but it makes some sense and the WSAEventSelect() call doesn't fail...), or sticking a dummy unsignaled event in the array passed to WaitForMultipleObjects(). To make sure this code is exercised, I made the state machine code eager about silencing the socket events during PM_WAIT_DEAD_END, so crash TAP tests go through the cycle. Regular non-crash shutdown also runs EPOLL_CTL_DEL/EV_DELETE, which stands out if you trace the postmaster. > > * WL_SOCKET_ACCEPT is a new event for an incoming connection (on Unix, > > this is just another name for WL_SOCKET_READABLE, but Window has a > > different underlying event; this mirrors WL_SOCKET_CONNECTED on the > > other end of a connection) > > Perhaps worth committing separately and soon? Seems pretty uncontroversial > from here. Alright, I split this into a separate patch. > > +/* > > + * Object representing the state of a postmaster. > > + * > > + * XXX Lots of global variables could move in here. > > + */ > > +typedef struct > > +{ > > + WaitEventSet *wes; > > +} Postmaster; > > + > > Seems weird to introduce this but then basically have it be unused. I'd say > either have a preceding patch move at least a few members into it, or just > omit it for now. Alright, I'll just have to make a global variable wait_set for now to keep things simple. > > + /* This may configure SIGURG, depending on platform. */ > > + InitializeLatchSupport(); > > + InitLocalLatch(); > > I'm mildly preferring InitProcessLocalLatch(), but not sure why - there's not > really a conflicting meaning of "local" here. Done. > > +/* > > + * Initialize the WaitEventSet we'll use in our main event loop. > > + */ > > +static void > > +InitializeWaitSet(Postmaster *postmaster) > > +{ > > + /* Set up a WaitEventSet for our latch and listening sockets. */ > > + postmaster->wes = CreateWaitEventSet(CurrentMemoryContext, 1 + > > MAXLISTEN); > > + AddWaitEventToSet(postmaster->wes, WL_LATCH_SET, PGINVALID_SOCKET, > > MyLatch, NULL); > > + for (int i = 0; i < MAXLISTEN; i++) > > + { > > + int fd = ListenSocket[i]; > > + > > + if (fd == PGINVALID_SOCKET) > > + break; > > + AddWaitEventToSet(postmaster->wes, WL_SOCKET_ACCEPT, fd, > > NULL, NULL); > > + } > > +} > > The naming seems like it could be confused with latch.h > infrastructure. InitPostmasterWaitSet()? OK. > > +/* > > + * Activate or deactivate the server socket events. > > + */ > > +static void > > +AdjustServerSocketEvents(Postmaster *postmaster, bool active) > > +{ > > + for (int pos = 1; pos < GetNumRegisteredWaitEvents(postmaster->wes); > > ++pos) > > + ModifyWaitEvent(postmaster->wes, > > + pos, active ? > > WL_SOCKET_ACCEPT : WL_SOCKET_IGNORE, > > + NULL); > > +} > > This seems to hardcode the specific wait events we're waiting for based on > latch.c infrastructure. Not really convinced that's a good idea. What are you objecting to? The assumption that the first socket is at position 1? The use of GetNumRegisteredWaitEvents()? > > + /* Process work scheduled by signal handlers. > > */ > > + if (pending_action_request) > > + process_action_request(postmaster); > > + if (pending_child_exit) > > + process_child_exit(postmaster); > > + if (pending_reload_request) > > + process_reload_request(); > > + if (pending_shutdown_request) > > + process_shutdown_request(postmaster); > Is the order of operations here quite right? Shouldn't we process a shutdown > request before the others? And a child exit before the request to start an > autovac worker etc? > > ISTM it should be 1) shutdown request 2) child exit 3) config reload 4) action > request. OK, reordered like that. > > - ereport(DEBUG2, > > - (errmsg_internal("postmaster received signal %d", > > - > > postgres_signal_arg))); > Hm, not having the "postmaster received signal" message anymore seems like a > loss when debugging things. I think process_shutdown_request() should emit > something like it. I added some of these. > I wonder if we should have a elog_sighand() that's written to be signal > safe. I've written versions of that numerous times for debugging, and it's a > bit silly to do that over and over again. Right, I was being dogmatic about kicking everything that doesn't have a great big neon "async-signal-safe" sign on it out of the handlers. > > + > > + /* start accepting server socket connection events again */ > > + reenable_server_socket_events = true; > > } > > } > > I don't think reenable_server_socket_events does anything as the patch stands > - I don't see it being checked anywhere? And in the path above, you're using > AdjustServerSocketEvents() directly. Sorry, that was a left over unused variable from an earlier attempt, which I only noticed after clicking send. Removed. > > @@ -4094,6 +4130,7 @@ BackendStartup(Port *port) > > /* Hasn't asked to be notified about any bgworkers yet */ > > bn->bgworker_notify = false; > > > > + PG_SETMASK(&BlockSig); > > #ifdef EXEC_BACKEND > > pid = backend_forkexec(port); > > #else /* > > !EXEC_BACKEND */ > > There are other calls to fork_process() - why don't they need the same > treatment? > > Perhaps we should add an assertion to fork_process() ensuring that signals are > masked? If we're going to put an assertion in there, we might as well consider setting and restoring the mask in that wrapper. Tried like that in this version. > > diff --git a/src/backend/storage/ipc/latch.c > > b/src/backend/storage/ipc/latch.c > > index eb3a569aae..3bfef592eb 100644 > > --- a/src/backend/storage/ipc/latch.c > > +++ b/src/backend/storage/ipc/latch.c > > @@ -283,6 +283,17 @@ InitializeLatchSupport(void) > > #ifdef WAIT_USE_SIGNALFD > > sigset_t signalfd_mask; > > > > + if (IsUnderPostmaster) > > + { > > + if (signal_fd != -1) > > + { > > + /* Release postmaster's signal FD; ignore any error */ > > + (void) close(signal_fd); > > + signal_fd = -1; > > + ReleaseExternalFD(); > > + } > > + } > > + > > Hm - arguably it's a bug that we don't do this right now, correct? Yes, I would say it's a non-live bug. A signalfd descriptor inherited by a child process isn't dangerous (it doesn't see the parent's signals, it sees the child's signals), but it's a waste because we'd leak it. I guess we could re-use it instead but that seems a little weird. I've put this into a separate commit in case someone wants to argue for back-patching, but it's a pretty hypothetical concern since the postmaster never initialised latch support before... One thing that does seem a bit odd to me, though, is why we're cleaning up inherited descriptors in a function called InitializeLatchSupport(). I wonder if we should move it into FreeLatchSupportAfterFork()? We should also close the postmaster's epoll fd, so I invented FreeWaitEventSetAfterFork(). I found that ClosePostmasterPorts() was a good place to call that, though it doesn't really fit the name of that function too well... > > @@ -1201,6 +1214,7 @@ WaitEventAdjustKqueue(WaitEventSet *set, WaitEvent > > *event, int old_events) > > event->events == WL_POSTMASTER_DEATH || > > (event->events & (WL_SOCKET_READABLE | > > WL_SOCKET_WRITEABLE | > > + WL_SOCKET_IGNORE | > > WL_SOCKET_CLOSED))); > > > > if (event->events == WL_POSTMASTER_DEATH) > > @@ -1312,6 +1326,8 @@ WaitEventAdjustWin32(WaitEventSet *set, WaitEvent > > *event) > > flags |= FD_WRITE; > > if (event->events & WL_SOCKET_CONNECTED) > > flags |= FD_CONNECT; > > + if (event->events & WL_SOCKET_ACCEPT) > > + flags |= FD_ACCEPT; > > > > if (*handle == WSA_INVALID_EVENT) > > { > > I wonder if the code would end up easier to understand if we handled > WL_SOCKET_CONNECTED, WL_SOCKET_ACCEPT explicitly in the !WIN32 cases, rather > than redefining it to WL_SOCKET_READABLE. Yeah maybe we could try that separately. > > diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c > > index 3082093d1e..655e881688 100644 > > --- a/src/backend/tcop/postgres.c > > +++ b/src/backend/tcop/postgres.c > > @@ -24,7 +24,6 @@ > > #include <signal.h> > > #include <unistd.h> > > #include <sys/resource.h> > > -#include <sys/select.h> > > #include <sys/socket.h> > > #include <sys/time.h> > > Do you know why this include even existed? That turned out to be a fun question to answer: apparently there used to be an optional 'multiplexed backend' mode, removed by commit d5bbe2aca5 in 1998. A single backend could be connected to multiple frontends.
From 3da9af2a9250ef052ba25be434f5bc01d4e36520 Mon Sep 17 00:00:00 2001 From: Thomas Munro <thomas.mu...@gmail.com> Date: Tue, 6 Dec 2022 15:21:11 +1300 Subject: [PATCH v4 1/5] Add WL_SOCKET_ACCEPT event to WaitEventSet API. To be able to handle incoming connections on a server socket with the WaitEventSet API, we'll need a new kind of event to indicate that the the socket is ready to accept a connection. On Unix, it's just the same as WL_SOCKET_READABLE, but on Windows there is a different kernel event that we need to map our abstraction to. A future commit will use this. Reviewed-by: Andres Freund <and...@anarazel.de> Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com --- src/backend/storage/ipc/latch.c | 13 ++++++++++++- src/include/storage/latch.h | 7 +++++++ 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c index eb3a569aae..7ced8264f0 100644 --- a/src/backend/storage/ipc/latch.c +++ b/src/backend/storage/ipc/latch.c @@ -864,6 +864,9 @@ FreeWaitEventSet(WaitEventSet *set) * - WL_SOCKET_CONNECTED: Wait for socket connection to be established, * can be combined with other WL_SOCKET_* events (on non-Windows * platforms, this is the same as WL_SOCKET_WRITEABLE) + * - WL_SOCKET_ACCEPT: Wait for new connection to a server socket, + * can be combined with other WL_SOCKET_* events (on non-Windows + * platforms, this is the same as WL_SOCKET_READABLE) * - WL_SOCKET_CLOSED: Wait for socket to be closed by remote peer. * - WL_EXIT_ON_PM_DEATH: Exit immediately if the postmaster dies * @@ -874,7 +877,7 @@ FreeWaitEventSet(WaitEventSet *set) * i.e. it must be a process-local latch initialized with InitLatch, or a * shared latch associated with the current process by calling OwnLatch. * - * In the WL_SOCKET_READABLE/WRITEABLE/CONNECTED cases, EOF and error + * In the WL_SOCKET_READABLE/WRITEABLE/CONNECTED/ACCEPT cases, EOF and error * conditions cause the socket to be reported as readable/writable/connected, * so that the caller can deal with the condition. * @@ -1312,6 +1315,8 @@ WaitEventAdjustWin32(WaitEventSet *set, WaitEvent *event) flags |= FD_WRITE; if (event->events & WL_SOCKET_CONNECTED) flags |= FD_CONNECT; + if (event->events & WL_SOCKET_ACCEPT) + flags |= FD_ACCEPT; if (*handle == WSA_INVALID_EVENT) { @@ -2067,6 +2072,12 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout, /* connected */ occurred_events->events |= WL_SOCKET_CONNECTED; } + if ((cur_event->events & WL_SOCKET_ACCEPT) && + (resEvents.lNetworkEvents & FD_ACCEPT)) + { + /* incoming connection could be accepted */ + occurred_events->events |= WL_SOCKET_ACCEPT; + } if (resEvents.lNetworkEvents & FD_CLOSE) { /* EOF/error, so signal all caller-requested socket flags */ diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h index 68ab740f16..c55838db60 100644 --- a/src/include/storage/latch.h +++ b/src/include/storage/latch.h @@ -135,9 +135,16 @@ typedef struct Latch #define WL_SOCKET_CONNECTED WL_SOCKET_WRITEABLE #endif #define WL_SOCKET_CLOSED (1 << 7) +#ifdef WIN32 +#define WL_SOCKET_ACCEPT (1 << 8) +#else +/* avoid having to deal with case on platforms not requiring it */ +#define WL_SOCKET_ACCEPT WL_SOCKET_READABLE +#endif #define WL_SOCKET_MASK (WL_SOCKET_READABLE | \ WL_SOCKET_WRITEABLE | \ WL_SOCKET_CONNECTED | \ + WL_SOCKET_ACCEPT | \ WL_SOCKET_CLOSED) typedef struct WaitEvent -- 2.30.2
From 61480441f67ca7fac96ca4bcfe85f27013a47aa8 Mon Sep 17 00:00:00 2001 From: Thomas Munro <thomas.mu...@gmail.com> Date: Tue, 6 Dec 2022 16:13:36 +1300 Subject: [PATCH v4 2/5] Don't leak a signalfd when using latches in the postmaster. At the time of commit 6a2a70a02 we didn't use latch infrastructure in the postmaster. We're planning to start doing that, so we'd better make sure that the signalfd inherited from a postmaster is not duplicated and then leaked in the child. Reviewed-by: Andres Freund <and...@anarazel.de> Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com --- src/backend/storage/ipc/latch.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c index 7ced8264f0..c4b9153690 100644 --- a/src/backend/storage/ipc/latch.c +++ b/src/backend/storage/ipc/latch.c @@ -283,6 +283,22 @@ InitializeLatchSupport(void) #ifdef WAIT_USE_SIGNALFD sigset_t signalfd_mask; + if (IsUnderPostmaster) + { + /* + * It would probably be safe to re-use the inherited signalfd since + * signalfds only see the current processes pending signals, but it + * seems less surprising to close it and create our own. + */ + if (signal_fd != -1) + { + /* Release postmaster's signal FD; ignore any error */ + (void) close(signal_fd); + signal_fd = -1; + ReleaseExternalFD(); + } + } + /* Block SIGURG, because we'll receive it through a signalfd. */ sigaddset(&UnBlockSig, SIGURG); -- 2.30.2
From e3a0156228684520c17f3aaa8b1ef60c5f15b350 Mon Sep 17 00:00:00 2001 From: Thomas Munro <thomas.mu...@gmail.com> Date: Tue, 6 Dec 2022 16:24:05 +1300 Subject: [PATCH v4 3/5] Allow parent's WaitEventSets to be freed after fork(). An epoll fd belonging to the parent should be closed in the child. A kqueue fd is automatically closed, but we should adjust our counter. For poll and Windows systems, nothing special is required. On all systems we free the memory. Reviewed-by: Andres Freund <and...@anarazel.de> Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com --- src/backend/storage/ipc/latch.c | 17 +++++++++++++++++ src/include/storage/latch.h | 1 + 2 files changed, 18 insertions(+) diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c index c4b9153690..51c239eefa 100644 --- a/src/backend/storage/ipc/latch.c +++ b/src/backend/storage/ipc/latch.c @@ -869,6 +869,23 @@ FreeWaitEventSet(WaitEventSet *set) pfree(set); } +/* + * Free a previously created WaitEventSet in a child process after a fork(). + */ +void +FreeWaitEventSetAfterFork(WaitEventSet *set) +{ +#if defined(WAIT_USE_EPOLL) + close(set->epoll_fd); + ReleaseExternalFD(); +#elif defined(WAIT_USE_KQUEUE) + /* kqueues are not normally inherited by child processes */ + ReleaseExternalFD(); +#endif + + pfree(set); +} + /* --- * Add an event to the set. Possible events are: * - WL_LATCH_SET: Wait for the latch to be set diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h index c55838db60..63a1fc440c 100644 --- a/src/include/storage/latch.h +++ b/src/include/storage/latch.h @@ -175,6 +175,7 @@ extern void ShutdownLatchSupport(void); extern WaitEventSet *CreateWaitEventSet(MemoryContext context, int nevents); extern void FreeWaitEventSet(WaitEventSet *set); +extern void FreeWaitEventSetAfterFork(WaitEventSet *set); extern int AddWaitEventToSet(WaitEventSet *set, uint32 events, pgsocket fd, Latch *latch, void *user_data); extern void ModifyWaitEvent(WaitEventSet *set, int pos, uint32 events, Latch *latch); -- 2.30.2
From 7ceb93c7602ffb80ae32bbd87705dc638640c38a Mon Sep 17 00:00:00 2001 From: Thomas Munro <thomas.mu...@gmail.com> Date: Tue, 6 Dec 2022 16:40:48 +1300 Subject: [PATCH v4 4/5] Allow socket WaitEvents to be temporarily blocked. Allow ModifyWaitEvent(.., ..., 0, ...) as a way to suppress events from a socket that we are not interested in. This also blocks event errors from being reported. Another call to ModifyWaitEvent() can be used to reenable events. For kqueue, no change other than removing an assertion, because the code already deletes all events for 0 (that falls out of having separate events for read and write). For epoll, teach ModifyWaitEvent() calls to delete and re-add to transition between non-zero and zero event masks. For poll, use fd -1 (allowed by POSIX for an empty entry) to make sure we don't get any events. For Windows, suppress the request for FD_CLOSE, which conveys errors. Reviewed-by: Andres Freund <and...@anarazel.de> Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com --- src/backend/storage/ipc/latch.c | 36 ++++++++++++++++++++++----------- 1 file changed, 24 insertions(+), 12 deletions(-) diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c index 51c239eefa..1d6fe69ef7 100644 --- a/src/backend/storage/ipc/latch.c +++ b/src/backend/storage/ipc/latch.c @@ -1008,14 +1008,14 @@ void ModifyWaitEvent(WaitEventSet *set, int pos, uint32 events, Latch *latch) { WaitEvent *event; -#if defined(WAIT_USE_KQUEUE) +#if defined(WAIT_USE_KQUEUE) || defined(WAIT_USE_EPOLL) int old_events; #endif Assert(pos < set->nevents); event = &set->events[pos]; -#if defined(WAIT_USE_KQUEUE) +#if defined(WAIT_USE_KQUEUE) || defined(WAIT_USE_EPOLL) old_events = event->events; #endif @@ -1065,7 +1065,12 @@ ModifyWaitEvent(WaitEventSet *set, int pos, uint32 events, Latch *latch) } #if defined(WAIT_USE_EPOLL) - WaitEventAdjustEpoll(set, event, EPOLL_CTL_MOD); + if (events == 0 && old_events != 0) + WaitEventAdjustEpoll(set, event, EPOLL_CTL_DEL); + else if (events != 0 && old_events == 0) + WaitEventAdjustEpoll(set, event, EPOLL_CTL_ADD); + else + WaitEventAdjustEpoll(set, event, EPOLL_CTL_MOD); #elif defined(WAIT_USE_KQUEUE) WaitEventAdjustKqueue(set, event, old_events); #elif defined(WAIT_USE_POLL) @@ -1103,9 +1108,6 @@ WaitEventAdjustEpoll(WaitEventSet *set, WaitEvent *event, int action) else { Assert(event->fd != PGINVALID_SOCKET); - Assert(event->events & (WL_SOCKET_READABLE | - WL_SOCKET_WRITEABLE | - WL_SOCKET_CLOSED)); if (event->events & WL_SOCKET_READABLE) epoll_ev.events |= EPOLLIN; @@ -1149,6 +1151,14 @@ WaitEventAdjustPoll(WaitEventSet *set, WaitEvent *event) { pollfd->events = POLLIN; } + else if (event->events == 0) + { + /* + * If we're suppressing all socket events, remove the file descriptor + * so poll() ignores this entry. + */ + pollfd->fd = -1; + } else { Assert(event->events & (WL_SOCKET_READABLE | @@ -1233,11 +1243,6 @@ WaitEventAdjustKqueue(WaitEventSet *set, WaitEvent *event, int old_events) return; Assert(event->events != WL_LATCH_SET || set->latch != NULL); - Assert(event->events == WL_LATCH_SET || - event->events == WL_POSTMASTER_DEATH || - (event->events & (WL_SOCKET_READABLE | - WL_SOCKET_WRITEABLE | - WL_SOCKET_CLOSED))); if (event->events == WL_POSTMASTER_DEATH) { @@ -1340,7 +1345,14 @@ WaitEventAdjustWin32(WaitEventSet *set, WaitEvent *event) } else { - int flags = FD_CLOSE; /* always check for errors/EOF */ + int flags = 0; + + /* + * Check for errors/EOF unless we're completely suppressing all events + * for this socket. + */ + if (event->events != 0) + flags = FD_CLOSE; if (event->events & WL_SOCKET_READABLE) flags |= FD_READ; -- 2.30.2
From 435a15c4f017ce2f058abebc35b4d22c04e9f48a Mon Sep 17 00:00:00 2001 From: Thomas Munro <thomas.mu...@gmail.com> Date: Wed, 9 Nov 2022 22:59:58 +1300 Subject: [PATCH v4 5/5] Give the postmaster a WaitEventSet and a latch. Traditionally, the postmaster's architecture was quite unusual. It did a lot of work inside signal handlers, which were only unblocked while waiting in select() to make that safe. Switch to a more typical architecture, where signal handlers just set flags and use a latch to close races. Now the postmaster looks like all other PostgreSQL processes, multiplexing its event processing in epoll_wait()/kevent()/poll()/WaitForMultipleObjects() depending on the OS. Changes: * Allow the postmaster to set up its own local latch. For now we don't want other backends setting the postmaster's latch directly (that would require latches robust against arbitrary corruption of shared memory). * The existing signal handlers are cut in two: a handle_XXX part that sets a pending_XXX variable and sets the local latch, and a process_XXX part. * Signal handlers are now installed with the regular pqsignal() function rather then the special pqsignal_pm() function; the concerns about the portability of SA_RESTART vs select() are no longer relevant: SUSv2 left it implementation-defined whether select() restarts, but didn't add that qualification for poll(), and it doesn't matter anyway because we call SetLatch() creating a new reason to wake up. Reviewed-by: Andres Freund <and...@anarazel.de> Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com --- src/backend/libpq/pqsignal.c | 40 --- src/backend/postmaster/fork_process.c | 12 +- src/backend/postmaster/postmaster.c | 377 ++++++++++++++------------ src/backend/tcop/postgres.c | 1 - src/backend/utils/init/miscinit.c | 13 +- src/include/libpq/pqsignal.h | 3 - src/include/miscadmin.h | 1 + 7 files changed, 223 insertions(+), 224 deletions(-) diff --git a/src/backend/libpq/pqsignal.c b/src/backend/libpq/pqsignal.c index 1ab34c5214..718043a39d 100644 --- a/src/backend/libpq/pqsignal.c +++ b/src/backend/libpq/pqsignal.c @@ -97,43 +97,3 @@ pqinitmask(void) sigdelset(&StartupBlockSig, SIGALRM); #endif } - -/* - * Set up a postmaster signal handler for signal "signo" - * - * Returns the previous handler. - * - * This is used only in the postmaster, which has its own odd approach to - * signal handling. For signals with handlers, we block all signals for the - * duration of signal handler execution. We also do not set the SA_RESTART - * flag; this should be safe given the tiny range of code in which the - * postmaster ever unblocks signals. - * - * pqinitmask() must have been invoked previously. - */ -pqsigfunc -pqsignal_pm(int signo, pqsigfunc func) -{ - struct sigaction act, - oact; - - act.sa_handler = func; - if (func == SIG_IGN || func == SIG_DFL) - { - /* in these cases, act the same as pqsignal() */ - sigemptyset(&act.sa_mask); - act.sa_flags = SA_RESTART; - } - else - { - act.sa_mask = BlockSig; - act.sa_flags = 0; - } -#ifdef SA_NOCLDSTOP - if (signo == SIGCHLD) - act.sa_flags |= SA_NOCLDSTOP; -#endif - if (sigaction(signo, &act, &oact) < 0) - return SIG_ERR; - return oact.sa_handler; -} diff --git a/src/backend/postmaster/fork_process.c b/src/backend/postmaster/fork_process.c index ec67761487..e1e7d91c52 100644 --- a/src/backend/postmaster/fork_process.c +++ b/src/backend/postmaster/fork_process.c @@ -12,24 +12,28 @@ #include "postgres.h" #include <fcntl.h> +#include <signal.h> #include <time.h> #include <sys/stat.h> #include <sys/time.h> #include <unistd.h> +#include "libpq/pqsignal.h" #include "postmaster/fork_process.h" #ifndef WIN32 /* * Wrapper for fork(). Return values are the same as those for fork(): * -1 if the fork failed, 0 in the child process, and the PID of the - * child in the parent process. + * child in the parent process. Signals are blocked while forking, so + * the child must unblock. */ pid_t fork_process(void) { pid_t result; const char *oomfilename; + sigset_t save_mask; #ifdef LINUX_PROFILE struct itimerval prof_itimer; @@ -51,6 +55,7 @@ fork_process(void) getitimer(ITIMER_PROF, &prof_itimer); #endif + sigprocmask(SIG_SETMASK, &BlockSig, &save_mask); result = fork(); if (result == 0) { @@ -103,6 +108,11 @@ fork_process(void) /* do post-fork initialization for random number generation */ pg_strong_random_init(); } + else + { + /* in parent, restore signal mask */ + sigprocmask(SIG_SETMASK, &save_mask, NULL); + } return result; } diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index a8a246921f..08dc80fcea 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -70,7 +70,6 @@ #include <time.h> #include <sys/wait.h> #include <ctype.h> -#include <sys/select.h> #include <sys/stat.h> #include <sys/socket.h> #include <fcntl.h> @@ -362,6 +361,15 @@ static volatile sig_atomic_t WalReceiverRequested = false; static volatile bool StartWorkerNeeded = true; static volatile bool HaveCrashedWorker = false; +/* set when signals arrive */ +static volatile sig_atomic_t pending_action_request; +static volatile sig_atomic_t pending_child_exit; +static volatile sig_atomic_t pending_reload_request; +static volatile sig_atomic_t pending_shutdown_request; + +/* I/O multiplexing event */ +static WaitEventSet *wait_set; + #ifdef USE_SSL /* Set when and if SSL has been initialized properly */ static bool LoadedSSL = false; @@ -380,10 +388,14 @@ static void getInstallationPaths(const char *argv0); static void checkControlFile(void); static Port *ConnCreate(int serverFd); static void ConnFree(Port *port); -static void SIGHUP_handler(SIGNAL_ARGS); -static void pmdie(SIGNAL_ARGS); -static void reaper(SIGNAL_ARGS); -static void sigusr1_handler(SIGNAL_ARGS); +static void handle_action_request_signal(SIGNAL_ARGS); +static void handle_child_exit_signal(SIGNAL_ARGS); +static void handle_reload_request_signal(SIGNAL_ARGS); +static void handle_shutdown_request_signal(SIGNAL_ARGS); +static void process_action_request(void); +static void process_child_exit(void); +static void process_reload_request(void); +static void process_shutdown_request(void); static void process_startup_packet_die(SIGNAL_ARGS); static void dummy_handler(SIGNAL_ARGS); static void StartupPacketTimeoutHandler(void); @@ -401,7 +413,6 @@ static int BackendStartup(Port *port); static int ProcessStartupPacket(Port *port, bool ssl_done, bool gss_done); static void SendNegotiateProtocolVersion(List *unrecognized_protocol_options); static void processCancelRequest(Port *port, void *pkt); -static int initMasks(fd_set *rmask); static void report_fork_failure_to_client(Port *port, int errnum); static CAC_state canAcceptConnections(int backend_type); static bool RandomCancelKey(int32 *cancel_key); @@ -609,26 +620,6 @@ PostmasterMain(int argc, char *argv[]) /* * Set up signal handlers for the postmaster process. * - * In the postmaster, we use pqsignal_pm() rather than pqsignal() (which - * is used by all child processes and client processes). That has a - * couple of special behaviors: - * - * 1. We tell sigaction() to block all signals for the duration of the - * signal handler. This is faster than our old approach of - * blocking/unblocking explicitly in the signal handler, and it should also - * prevent excessive stack consumption if signals arrive quickly. - * - * 2. We do not set the SA_RESTART flag. This is because signals will be - * blocked at all times except when ServerLoop is waiting for something to - * happen, and during that window, we want signals to exit the select(2) - * wait so that ServerLoop can respond if anything interesting happened. - * On some platforms, signals marked SA_RESTART would not cause the - * select() wait to end. - * - * Child processes will generally want SA_RESTART, so pqsignal() sets that - * flag. We expect children to set up their own handlers before - * unblocking signals. - * * CAUTION: when changing this list, check for side-effects on the signal * handling setup of child processes. See tcop/postgres.c, * bootstrap/bootstrap.c, postmaster/bgwriter.c, postmaster/walwriter.c, @@ -638,26 +629,21 @@ PostmasterMain(int argc, char *argv[]) pqinitmask(); PG_SETMASK(&BlockSig); - pqsignal_pm(SIGHUP, SIGHUP_handler); /* reread config file and have - * children do same */ - pqsignal_pm(SIGINT, pmdie); /* send SIGTERM and shut down */ - pqsignal_pm(SIGQUIT, pmdie); /* send SIGQUIT and die */ - pqsignal_pm(SIGTERM, pmdie); /* wait for children and shut down */ - pqsignal_pm(SIGALRM, SIG_IGN); /* ignored */ - pqsignal_pm(SIGPIPE, SIG_IGN); /* ignored */ - pqsignal_pm(SIGUSR1, sigusr1_handler); /* message from child process */ - pqsignal_pm(SIGUSR2, dummy_handler); /* unused, reserve for children */ - pqsignal_pm(SIGCHLD, reaper); /* handle child termination */ + pqsignal(SIGHUP, handle_reload_request_signal); + pqsignal(SIGINT, handle_shutdown_request_signal); + pqsignal(SIGQUIT, handle_shutdown_request_signal); + pqsignal(SIGTERM, handle_shutdown_request_signal); + pqsignal(SIGALRM, SIG_IGN); /* ignored */ + pqsignal(SIGPIPE, SIG_IGN); /* ignored */ + pqsignal(SIGUSR1, handle_action_request_signal); + pqsignal(SIGUSR2, dummy_handler); /* unused, reserve for children */ + pqsignal(SIGCHLD, handle_child_exit_signal); -#ifdef SIGURG + /* This may configure SIGURG, depending on platform. */ + InitializeLatchSupport(); + InitProcessLocalLatch(); - /* - * Ignore SIGURG for now. Child processes may change this (see - * InitializeLatchSupport), but they will not receive any such signals - * until they wait on a latch. - */ - pqsignal_pm(SIGURG, SIG_IGN); /* ignored */ -#endif + PG_SETMASK(&UnBlockSig); /* * No other place in Postgres should touch SIGTTIN/SIGTTOU handling. We @@ -667,15 +653,15 @@ PostmasterMain(int argc, char *argv[]) * child processes should just allow the inherited settings to stand. */ #ifdef SIGTTIN - pqsignal_pm(SIGTTIN, SIG_IGN); /* ignored */ + pqsignal(SIGTTIN, SIG_IGN); /* ignored */ #endif #ifdef SIGTTOU - pqsignal_pm(SIGTTOU, SIG_IGN); /* ignored */ + pqsignal(SIGTTOU, SIG_IGN); /* ignored */ #endif /* ignore SIGXFSZ, so that ulimit violations work like disk full */ #ifdef SIGXFSZ - pqsignal_pm(SIGXFSZ, SIG_IGN); /* ignored */ + pqsignal(SIGXFSZ, SIG_IGN); /* ignored */ #endif /* @@ -1698,6 +1684,35 @@ DetermineSleepTime(struct timeval *timeout) } } +/* + * Initialize the WaitEventSet we'll use in our main event loop. + */ +static void +InitializePostmasterWaitSet(void) +{ + /* Set up a WaitEventSet for our latch and listening sockets. */ + wait_set = CreateWaitEventSet(CurrentMemoryContext, 1 + MAXLISTEN); + AddWaitEventToSet(wait_set, WL_LATCH_SET, PGINVALID_SOCKET, MyLatch, NULL); + for (int i = 0; i < MAXLISTEN; i++) + { + int fd = ListenSocket[i]; + + if (fd == PGINVALID_SOCKET) + break; + AddWaitEventToSet(wait_set, WL_SOCKET_ACCEPT, fd, NULL, NULL); + } +} + +/* + * Activate or deactivate the server socket events. + */ +static void +AdjustServerSocketEvents(bool active) +{ + for (int pos = 1; pos < GetNumRegisteredWaitEvents(wait_set); ++pos) + ModifyWaitEvent(wait_set, pos, active ? WL_SOCKET_ACCEPT : 0, NULL); +} + /* * Main idle loop of postmaster * @@ -1706,97 +1721,62 @@ DetermineSleepTime(struct timeval *timeout) static int ServerLoop(void) { - fd_set readmask; - int nSockets; time_t last_lockfile_recheck_time, last_touch_time; + WaitEvent events[MAXLISTEN]; + int nevents; + InitializePostmasterWaitSet(); last_lockfile_recheck_time = last_touch_time = time(NULL); - nSockets = initMasks(&readmask); - for (;;) { - fd_set rmask; - int selres; time_t now; + struct timeval timeout; - /* - * Wait for a connection request to arrive. - * - * We block all signals except while sleeping. That makes it safe for - * signal handlers, which again block all signals while executing, to - * do nontrivial work. - * - * If we are in PM_WAIT_DEAD_END state, then we don't want to accept - * any new connections, so we don't call select(), and just sleep. - */ - memcpy((char *) &rmask, (char *) &readmask, sizeof(fd_set)); - - if (pmState == PM_WAIT_DEAD_END) - { - PG_SETMASK(&UnBlockSig); - - pg_usleep(100000L); /* 100 msec seems reasonable */ - selres = 0; - - PG_SETMASK(&BlockSig); - } - else - { - /* must set timeout each time; some OSes change it! */ - struct timeval timeout; - - /* Needs to run with blocked signals! */ - DetermineSleepTime(&timeout); - - PG_SETMASK(&UnBlockSig); - - selres = select(nSockets, &rmask, NULL, NULL, &timeout); - - PG_SETMASK(&BlockSig); - } + DetermineSleepTime(&timeout); - /* Now check the select() result */ - if (selres < 0) - { - if (errno != EINTR && errno != EWOULDBLOCK) - { - ereport(LOG, - (errcode_for_socket_access(), - errmsg("select() failed in postmaster: %m"))); - return STATUS_ERROR; - } - } + nevents = WaitEventSetWait(wait_set, + timeout.tv_sec * 1000 + timeout.tv_usec / 1000, + events, + lengthof(events), + 0 /* postmaster posts no wait_events */); /* - * New connection pending on any of our sockets? If so, fork a child - * process to deal with it. + * Latch set by signal handler, or new connection pending on any of our + * sockets? If the latter, fork a child process to deal with it. */ - if (selres > 0) + for (int i = 0; i < nevents; i++) { - int i; - - for (i = 0; i < MAXLISTEN; i++) + if (events[i].events & WL_LATCH_SET) { - if (ListenSocket[i] == PGINVALID_SOCKET) - break; - if (FD_ISSET(ListenSocket[i], &rmask)) + ResetLatch(MyLatch); + + /* Process work scheduled by signal handlers. */ + if (pending_shutdown_request) + process_shutdown_request(); + if (pending_child_exit) + process_child_exit(); + if (pending_reload_request) + process_reload_request(); + if (pending_action_request) + process_action_request(); + } + else if (events[i].events & WL_SOCKET_ACCEPT) + { + Port *port; + + port = ConnCreate(events[i].fd); + if (port) { - Port *port; + BackendStartup(port); - port = ConnCreate(ListenSocket[i]); - if (port) - { - BackendStartup(port); - - /* - * We no longer need the open socket or port structure - * in this process - */ - StreamClose(port->sock); - ConnFree(port); - } + /* + * We no longer need the open socket or port structure + * in this process + */ + StreamClose(port->sock); + ConnFree(port); } } } @@ -1939,34 +1919,6 @@ ServerLoop(void) } } -/* - * Initialise the masks for select() for the ports we are listening on. - * Return the number of sockets to listen on. - */ -static int -initMasks(fd_set *rmask) -{ - int maxsock = -1; - int i; - - FD_ZERO(rmask); - - for (i = 0; i < MAXLISTEN; i++) - { - int fd = ListenSocket[i]; - - if (fd == PGINVALID_SOCKET) - break; - FD_SET(fd, rmask); - - if (fd > maxsock) - maxsock = fd; - } - - return maxsock + 1; -} - - /* * Read a client's startup packet and do something according to it. * @@ -2609,6 +2561,10 @@ ClosePostmasterPorts(bool am_syslogger) { int i; + /* Release resources held by the postmaster's WaitEventSet. */ + if (wait_set) + FreeWaitEventSetAfterFork(wait_set); + #ifndef WIN32 /* @@ -2707,14 +2663,45 @@ InitProcessGlobals(void) #endif } +/* + * Child processes use SIGUSR1 to for pmsignals. pg_ctl uses SIGUSR1 to ask + * postmaster to check for logrotate and promote files. + */ +static void +handle_action_request_signal(SIGNAL_ARGS) +{ + int save_errno = errno; + + pending_action_request = true; + SetLatch(MyLatch); + + errno = save_errno; +} /* - * SIGHUP -- reread config files, and tell children to do same + * pg_ctl uses SIGHUP to request a reload of the configuration files. */ static void -SIGHUP_handler(SIGNAL_ARGS) +handle_reload_request_signal(SIGNAL_ARGS) { - int save_errno = errno; + int save_errno = errno; + + pending_reload_request = true; + SetLatch(MyLatch); + + errno = save_errno; +} + +/* + * Re-read config files, and tell children to do same. + */ +static void +process_reload_request(void) +{ + pending_reload_request = false; + + ereport(DEBUG2, + (errmsg_internal("postmaster received reload request signal"))); if (Shutdown <= SmartShutdown) { @@ -2771,27 +2758,50 @@ SIGHUP_handler(SIGNAL_ARGS) write_nondefault_variables(PGC_SIGHUP); #endif } +} + +/* + * pg_ctl uses SIGTERM, SIGINT and SIGQUIT to request different types of + * shutdown. + */ +static void +handle_shutdown_request_signal(SIGNAL_ARGS) +{ + int save_errno = errno; + + switch (postgres_signal_arg) + { + case SIGTERM: + pending_shutdown_request = SmartShutdown; + break; + case SIGINT: + pending_shutdown_request = FastShutdown; + break; + case SIGQUIT: + pending_shutdown_request = ImmediateShutdown; + break; + } + SetLatch(MyLatch); errno = save_errno; } - /* - * pmdie -- signal handler for processing various postmaster signals. + * Process shutdown request. */ static void -pmdie(SIGNAL_ARGS) +process_shutdown_request(void) { - int save_errno = errno; + int mode = pending_shutdown_request; ereport(DEBUG2, - (errmsg_internal("postmaster received signal %d", - postgres_signal_arg))); + (errmsg_internal("postmaster received shutdown request signal"))); - switch (postgres_signal_arg) - { - case SIGTERM: + pending_shutdown_request = NoShutdown; + switch (mode) + { + case SmartShutdown: /* * Smart Shutdown: * @@ -2830,7 +2840,7 @@ pmdie(SIGNAL_ARGS) PostmasterStateMachine(); break; - case SIGINT: + case FastShutdown: /* * Fast Shutdown: @@ -2871,7 +2881,7 @@ pmdie(SIGNAL_ARGS) PostmasterStateMachine(); break; - case SIGQUIT: + case ImmediateShutdown: /* * Immediate Shutdown: @@ -2908,20 +2918,30 @@ pmdie(SIGNAL_ARGS) PostmasterStateMachine(); break; } +} + +static void +handle_child_exit_signal(SIGNAL_ARGS) +{ + int save_errno = errno; + + pending_child_exit = true; + SetLatch(MyLatch); errno = save_errno; } /* - * Reaper -- signal handler to cleanup after a child process dies. + * Cleanup after a child process dies. */ static void -reaper(SIGNAL_ARGS) +process_child_exit(void) { - int save_errno = errno; int pid; /* process id of dead child process */ int exitstatus; /* its exit status */ + pending_child_exit = false; + ereport(DEBUG4, (errmsg_internal("reaping dead processes"))); @@ -3213,8 +3233,6 @@ reaper(SIGNAL_ARGS) * or actions to make. */ PostmasterStateMachine(); - - errno = save_errno; } /* @@ -3642,8 +3660,9 @@ LogChildExit(int lev, const char *procname, int pid, int exitstatus) /* * Advance the postmaster's state machine and take actions as appropriate * - * This is common code for pmdie(), reaper() and sigusr1_handler(), which - * receive the signals that might mean we need to change state. + * This is common code for process_shutdown_request(), process_child_exit() and + * process_action_request(), which process the signals that might mean we need + * to change state. */ static void PostmasterStateMachine(void) @@ -3796,6 +3815,9 @@ PostmasterStateMachine(void) if (pmState == PM_WAIT_DEAD_END) { + /* Don't allow any new socket connection events. */ + AdjustServerSocketEvents(false); + /* * PM_WAIT_DEAD_END state ends when the BackendList is entirely empty * (ie, no dead_end children remain), and the archiver is gone too. @@ -3905,6 +3927,9 @@ PostmasterStateMachine(void) pmState = PM_STARTUP; /* crash recovery started, reset SIGKILL flag */ AbortStartTime = 0; + + /* start accepting server socket connection events again */ + AdjustServerSocketEvents(true); } } @@ -5013,12 +5038,16 @@ ExitPostmaster(int status) } /* - * sigusr1_handler - handle signal conditions from child processes + * Handle pmsignal conditions representing requests from backends, + * and check for promote and logrotate requests from pg_ctl. */ static void -sigusr1_handler(SIGNAL_ARGS) +process_action_request(void) { - int save_errno = errno; + pending_action_request = false; + + ereport(DEBUG2, + (errmsg_internal("postmaster received action request signal"))); /* * RECOVERY_STARTED and BEGIN_HOT_STANDBY signals are ignored in @@ -5159,8 +5188,6 @@ sigusr1_handler(SIGNAL_ARGS) */ signal_child(StartupPID, SIGUSR2); } - - errno = save_errno; } /* diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index 3082093d1e..655e881688 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -24,7 +24,6 @@ #include <signal.h> #include <unistd.h> #include <sys/resource.h> -#include <sys/select.h> #include <sys/socket.h> #include <sys/time.h> diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c index eb1046450b..1a8885b73e 100644 --- a/src/backend/utils/init/miscinit.c +++ b/src/backend/utils/init/miscinit.c @@ -135,8 +135,7 @@ InitPostmasterChild(void) /* Initialize process-local latch support */ InitializeLatchSupport(); - MyLatch = &LocalLatchData; - InitLatch(MyLatch); + InitProcessLocalLatch(); InitializeLatchWaitSet(); /* @@ -189,8 +188,7 @@ InitStandaloneProcess(const char *argv0) /* Initialize process-local latch support */ InitializeLatchSupport(); - MyLatch = &LocalLatchData; - InitLatch(MyLatch); + InitProcessLocalLatch(); InitializeLatchWaitSet(); /* @@ -232,6 +230,13 @@ SwitchToSharedLatch(void) SetLatch(MyLatch); } +void +InitProcessLocalLatch(void) +{ + MyLatch = &LocalLatchData; + InitLatch(MyLatch); +} + void SwitchBackToLocalLatch(void) { diff --git a/src/include/libpq/pqsignal.h b/src/include/libpq/pqsignal.h index 7890b426a8..76eb380a4f 100644 --- a/src/include/libpq/pqsignal.h +++ b/src/include/libpq/pqsignal.h @@ -53,7 +53,4 @@ extern PGDLLIMPORT sigset_t StartupBlockSig; extern void pqinitmask(void); -/* pqsigfunc is declared in src/include/port.h */ -extern pqsigfunc pqsignal_pm(int signo, pqsigfunc func); - #endif /* PQSIGNAL_H */ diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h index 795182fa51..f64f81cf00 100644 --- a/src/include/miscadmin.h +++ b/src/include/miscadmin.h @@ -310,6 +310,7 @@ extern PGDLLIMPORT char *DatabasePath; /* now in utils/init/miscinit.c */ extern void InitPostmasterChild(void); extern void InitStandaloneProcess(const char *argv0); +extern void InitProcessLocalLatch(void); extern void SwitchToSharedLatch(void); extern void SwitchBackToLocalLatch(void); -- 2.30.2