Re: [HACKERS] Unportable implementation of background worker start

2017-04-27 Thread Tom Lane
Andres Freund writes: > On 2017-04-27 16:35:29 -0400, Tom Lane wrote: >> It looks like it might be sufficient to do "#ifdef EPOLL_CLOEXEC" >> in latch.c, rather than bothering with a full-blown configure check. > Yea, that sounds worth trying. Wonder if we need to care about kernels > not suppor

Re: [HACKERS] Unportable implementation of background worker start

2017-04-27 Thread Andres Freund
On 2017-04-27 16:35:29 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2017-04-26 17:05:39 -0400, Tom Lane wrote: > >> I went ahead and changed the call to epoll_create into epoll_create1. > >> I'm not too concerned about loss of portability there --- it seems > >> unlikely that many people

Re: [HACKERS] Unportable implementation of background worker start

2017-04-27 Thread Tom Lane
Andres Freund writes: > On 2017-04-26 17:05:39 -0400, Tom Lane wrote: >> I went ahead and changed the call to epoll_create into epoll_create1. >> I'm not too concerned about loss of portability there --- it seems >> unlikely that many people are still using ten-year-old glibc, and >> even less lik

Re: [HACKERS] Unportable implementation of background worker start

2017-04-26 Thread Tom Lane
Robert Haas writes: > On Wed, Apr 26, 2017 at 8:37 PM, Andres Freund wrote: >>> 3. Go ahead with converting the postmaster to use WaitEventSet, a la >>> the draft patch I posted earlier. I'd be happy to do this if we were >>> at the start of a devel cycle, but right now seems a bit late --- not

Re: [HACKERS] Unportable implementation of background worker start

2017-04-26 Thread Andres Freund
On 2017-04-26 17:05:39 -0400, Tom Lane wrote: > Here's an updated version of that, which makes use of our previous > conclusion that F_SETFD/FD_CLOEXEC are available everywhere except > Windows, and fixes some sloppy thinking about the EXEC_BACKEND case. > > I went ahead and changed the call to ep

Re: [HACKERS] Unportable implementation of background worker start

2017-04-26 Thread Robert Haas
On Wed, Apr 26, 2017 at 8:37 PM, Andres Freund wrote: >> 3. Go ahead with converting the postmaster to use WaitEventSet, a la >> the draft patch I posted earlier. I'd be happy to do this if we were >> at the start of a devel cycle, but right now seems a bit late --- not >> to mention that we real

Re: [HACKERS] Unportable implementation of background worker start

2017-04-26 Thread Andres Freund
On 2017-04-26 11:42:38 -0400, Tom Lane wrote: > 1. Let HEAD stand as it is. We have a problem with slow response to > bgworker start requests that arrive while ServerLoop is active, but that's > a pretty tight window usually (although I believe I've seen it hit at > least once in testing). > > 2.

Re: [HACKERS] Unportable implementation of background worker start

2017-04-26 Thread Tom Lane
Andres Freund writes: > I'd still like to get something like your CLOEXEC patch applied > independently however. Here's an updated version of that, which makes use of our previous conclusion that F_SETFD/FD_CLOEXEC are available everywhere except Windows, and fixes some sloppy thinking about the

Re: [HACKERS] Unportable implementation of background worker start

2017-04-26 Thread Tom Lane
I wrote: > =?utf-8?Q?R=C3=A9mi_Zara?= writes: >> coypu was not stuck (no buildfarm related process running), but failed to >> clean-up shared memory and semaphores. >> I’ve done the clean-up. > Huh, that's even more interesting. I installed NetBSD 5.1.5 on an old Mac G4; I believe this is a rea

Re: [HACKERS] Unportable implementation of background worker start

2017-04-25 Thread Tom Lane
=?utf-8?Q?R=C3=A9mi_Zara?= writes: >> Le 25 avr. 2017 à 01:47, Tom Lane a écrit : >> It looks like coypu is going to need manual intervention (ie, kill -9 >> on the leftover postmaster) to get unwedged :-(. That's particularly >> disturbing because it implies that ServerLoop isn't iterating at a

Re: [HACKERS] Unportable implementation of background worker start

2017-04-24 Thread Rémi Zara
> Le 25 avr. 2017 à 01:47, Tom Lane a écrit : > > I wrote: >> What I'm inclined to do is to revert the pselect change but not the other, >> to see if that fixes these two animals. If it does, we could look into >> blacklisting these particular platforms when choosing pselect. > > It looks like

Re: [HACKERS] Unportable implementation of background worker start

2017-04-24 Thread Tom Lane
I wrote: > What I'm inclined to do is to revert the pselect change but not the other, > to see if that fixes these two animals. If it does, we could look into > blacklisting these particular platforms when choosing pselect. It looks like coypu is going to need manual intervention (ie, kill -9 on

Re: [HACKERS] Unportable implementation of background worker start

2017-04-24 Thread Tom Lane
Andres Freund writes: > On 2017-04-24 18:14:41 -0400, Tom Lane wrote: >> A bit of googling establishes that NetBSD 5.1 has a broken pselect >> implementation: >> >> http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=43625 > Yikes. Do I understand correctly that they effectively just mapp

Re: [HACKERS] Unportable implementation of background worker start

2017-04-24 Thread Andres Freund
On 2017-04-24 18:14:41 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2017-04-24 17:33:39 -0400, Tom Lane wrote: > >> coypu's problem is unrelated: > > > Note I was linking the 9.6 report form coypu, not HEAD. Afaics the 9.6 > > failure is the same as gharial's mode of failure. > > [ look

Re: [HACKERS] Unportable implementation of background worker start

2017-04-24 Thread Tom Lane
Andres Freund writes: > On 2017-04-24 17:33:39 -0400, Tom Lane wrote: >> coypu's problem is unrelated: > Note I was linking the 9.6 report form coypu, not HEAD. Afaics the 9.6 > failure is the same as gharial's mode of failure. [ looks closer... ] Oh: the 9.6 run occurred first, and the failure

Re: [HACKERS] Unportable implementation of background worker start

2017-04-24 Thread Andres Freund
On 2017-04-24 17:33:39 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2017-04-24 13:16:44 -0700, Andres Freund wrote: > >> Unclear if related, but > >> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2017-04-24%2019%3A30%3A42 > >> has a suspicious timing of failing in a w

Re: [HACKERS] Unportable implementation of background worker start

2017-04-24 Thread Tom Lane
Andres Freund writes: > On 2017-04-24 13:16:44 -0700, Andres Freund wrote: >> Unclear if related, but >> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2017-04-24%2019%3A30%3A42 >> has a suspicious timing of failing in a weird way. > Given that gharial is also failing on 9.6 (

Re: [HACKERS] Unportable implementation of background worker start

2017-04-24 Thread Andres Freund
On 2017-04-24 13:16:44 -0700, Andres Freund wrote: > Unclear if related, but > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2017-04-24%2019%3A30%3A42 > has a suspicious timing of failing in a weird way. Given that gharial is also failing on 9.6 (same set of commits) and coypu

Re: [HACKERS] Unportable implementation of background worker start

2017-04-24 Thread Andres Freund
On 2017-04-21 23:50:41 -0400, Tom Lane wrote: > I wrote: > > Attached is a lightly-tested draft patch that converts the postmaster to > > use a WaitEventSet for waiting in ServerLoop. I've got mixed emotions > > about whether this is the direction to proceed, though. > > Attached are a couple of

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Andres Freund
On 2017-04-21 23:50:41 -0400, Tom Lane wrote: > Attached are a couple of patches that represent a plausible Plan B. > The first one changes the postmaster to run its signal handlers without > specifying SA_RESTART. I've confirmed that that seems to fix the > select_parallel-test-takes-a-long-time

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Tom Lane
I wrote: > Attached is a lightly-tested draft patch that converts the postmaster to > use a WaitEventSet for waiting in ServerLoop. I've got mixed emotions > about whether this is the direction to proceed, though. Attached are a couple of patches that represent a plausible Plan B. The first one c

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Tom Lane
Alvaro Herrera writes: > Tom Lane wrote: >> It also appears to me that do_start_bgworker's treatment of fork >> failure is completely brain dead. Did anyone really think about >> that case? > Hmm, I probably modelled it on autovacuum without giving that case much > additional consideration. Att

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Tom Lane
Tom Lane writes: > Andres Freund writes: >> On 2017-04-21 14:08:21 -0400, Tom Lane wrote: >>> but I see that SUSv2 >>> mandates that fcntl.h provide both F_SETFD and FD_CLOEXEC, so by our own >>> coding rules it ought to be okay to assume they're there. I'm tempted to >>> rip out the quoted bit,

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Tom Lane
Andres Freund writes: > On 2017-04-21 14:08:21 -0400, Tom Lane wrote: >> but I see that SUSv2 >> mandates that fcntl.h provide both F_SETFD and FD_CLOEXEC, so by our own >> coding rules it ought to be okay to assume they're there. I'm tempted to >> rip out the quoted bit, as well as the #ifdef F_

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Andres Freund
Tom, On 2017-04-21 13:49:27 -0400, Tom Lane wrote: > >> * If we are in PM_WAIT_DEAD_END state, then we don't want to accept > >> - * any new connections, so we don't call select(), and just > >> sleep. > >> + * any new connections, so we don't call WaitEventSetWait(), > >> an

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Andres Freund
On 2017-04-21 14:08:21 -0400, Tom Lane wrote: > but I see that SUSv2 > mandates that fcntl.h provide both F_SETFD and FD_CLOEXEC, so by our own > coding rules it ought to be okay to assume they're there. I'm tempted to > rip out the quoted bit, as well as the #ifdef F_SETFD, from libpq and see > i

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Tom Lane
I wrote: > Andres Freund writes: >> On 2017-04-21 12:50:04 -0400, Tom Lane wrote: >>> +#ifndef FD_CLOEXEC >>> +#define FD_CLOEXEC 1 >>> +#endif >> Hm? Are we sure this is portable? Is there really cases that have >> F_SETFD, but not CLOEXEC? > Copied-and-pasted from our only existing use of FD_

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Tom Lane
Andres Freund writes: > On 2017-04-21 12:50:04 -0400, Tom Lane wrote: >> Attached is a lightly-tested draft patch that converts the postmaster to >> use a WaitEventSet for waiting in ServerLoop. I've got mixed emotions >> about whether this is the direction to proceed, though. It adds at least >

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Tom Lane
Alvaro Herrera writes: > Tom Lane wrote: >> I'm coming back to the idea that at least in the back branches, the >> thing to do is allow maybe_start_bgworker to start multiple workers. >> >> Is there any actual evidence for the claim that that might have >> bad side effects? > Well, I ran tests w

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Andres Freund
Hi, On 2017-04-21 12:50:04 -0400, Tom Lane wrote: > Attached is a lightly-tested draft patch that converts the postmaster to > use a WaitEventSet for waiting in ServerLoop. I've got mixed emotions > about whether this is the direction to proceed, though. It adds at least > a couple of kernel cal

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Tom Lane
Attached is a lightly-tested draft patch that converts the postmaster to use a WaitEventSet for waiting in ServerLoop. I've got mixed emotions about whether this is the direction to proceed, though. It adds at least a couple of kernel calls per postmaster signal delivery, and probably to every po

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Alvaro Herrera
Tom Lane wrote: > After sleeping and thinking more, I've realized that the > slow-bgworker-start issue actually exists on *every* platform, it's just > harder to hit when select() is interruptable. But consider the case > where multiple bgworker-start requests arrive while ServerLoop is > activel

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Tom Lane
Andres Freund writes: > On 2017-04-20 00:50:13 -0400, Tom Lane wrote: >> My first reaction was that that sounded like a lot more work than removing >> two lines from maybe_start_bgworker and adjusting some comments. But on >> closer inspection, the slow-bgworker-start issue isn't the only problem

Re: [HACKERS] Unportable implementation of background worker start

2017-04-20 Thread Andres Freund
On 2017-04-20 20:10:41 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2017-04-20 20:05:02 -0400, Tom Lane wrote: > >> Also, if it's not there we'd fall back to using plain poll(), which is > >> not so awful that we need to work hard to avoid it. I'd just as soon > >> keep the number of com

Re: [HACKERS] Unportable implementation of background worker start

2017-04-20 Thread Tom Lane
Andres Freund writes: > On 2017-04-20 20:05:02 -0400, Tom Lane wrote: >> Also, if it's not there we'd fall back to using plain poll(), which is >> not so awful that we need to work hard to avoid it. I'd just as soon >> keep the number of combinations down. > Just using fcntl(SET, CLOEXEC) wound'

Re: [HACKERS] Unportable implementation of background worker start

2017-04-20 Thread Andres Freund
On 2017-04-20 20:05:02 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2017-04-20 19:53:02 -0400, Tom Lane wrote: > >> So ... what would you say to replacing epoll_create() with > >> epoll_create1(EPOLL_CLOEXEC) ? Then a WaitEventSet would not > >> represent inheritable-across-exec resource

Re: [HACKERS] Unportable implementation of background worker start

2017-04-20 Thread Tom Lane
Andres Freund writes: > On 2017-04-20 19:53:02 -0400, Tom Lane wrote: >> So ... what would you say to replacing epoll_create() with >> epoll_create1(EPOLL_CLOEXEC) ? Then a WaitEventSet would not >> represent inheritable-across-exec resources on any platform, >> making it a lot easier to deal wit

Re: [HACKERS] Unportable implementation of background worker start

2017-04-20 Thread Andres Freund
On 2017-04-20 19:53:02 -0400, Tom Lane wrote: > I wrote: > > Andres Freund writes: > >> On 2017-04-20 19:23:28 -0400, Tom Lane wrote: > >>> or are the HANDLEs in a Windows WaitEventSet not inheritable > >>> resources? > > >> So that kind of sounds like it should be doable. > > > Ah, good. I'll

Re: [HACKERS] Unportable implementation of background worker start

2017-04-20 Thread Tom Lane
I wrote: > Andres Freund writes: >> On 2017-04-20 19:23:28 -0400, Tom Lane wrote: >>> or are the HANDLEs in a Windows WaitEventSet not inheritable >>> resources? >> So that kind of sounds like it should be doable. > Ah, good. I'll add a comment about that and press on. So ... what would you sa

Re: [HACKERS] Unportable implementation of background worker start

2017-04-20 Thread Andres Freund
On 2017-04-20 00:50:13 -0400, Tom Lane wrote: > I wrote: > > Alvaro Herrera writes: > >> Tom Lane wrote: > >>> Hm. Do you have a more-portable alternative? > > >> I was thinking in a WaitEventSet from latch.c. > > My first reaction was that that sounded like a lot more work than removing > two

Re: [HACKERS] Unportable implementation of background worker start

2017-04-20 Thread Tom Lane
Andres Freund writes: > On 2017-04-20 19:23:28 -0400, Tom Lane wrote: >> or are the HANDLEs in a Windows WaitEventSet not inheritable >> resources? > I think we have control over that. According to > https://msdn.microsoft.com/en-us/library/windows/desktop/ms724466(v=vs.85).aspx > CreateProcess()

Re: [HACKERS] Unportable implementation of background worker start

2017-04-20 Thread Andres Freund
On 2017-04-20 19:23:28 -0400, Tom Lane wrote: > or are the HANDLEs in a Windows WaitEventSet not inheritable > resources? I think we have control over that. According to https://msdn.microsoft.com/en-us/library/windows/desktop/ms724466(v=vs.85).aspx CreateProcess() has to be called with bInheritHa

Re: [HACKERS] Unportable implementation of background worker start

2017-04-20 Thread Tom Lane
I wrote: > Alvaro Herrera writes: >> Tom Lane wrote: >>> Hm. Do you have a more-portable alternative? >> I was thinking in a WaitEventSet from latch.c. > My first reaction was that that sounded like a lot more work than removing > two lines from maybe_start_bgworker and adjusting some comments.

Re: [HACKERS] Unportable implementation of background worker start

2017-04-19 Thread Tom Lane
I wrote: > Alvaro Herrera writes: >> Tom Lane wrote: >>> Hm. Do you have a more-portable alternative? >> I was thinking in a WaitEventSet from latch.c. My first reaction was that that sounded like a lot more work than removing two lines from maybe_start_bgworker and adjusting some comments. Bu

Re: [HACKERS] Unportable implementation of background worker start

2017-04-19 Thread Tom Lane
Andres Freund writes: > FWIW, I'd wished before that we used something a bit more modern than > select() if available... It's nice to be able to listen to a larger > number of sockets without repeated O(sockets) overhead. [ raised eyebrow... ] Is anyone really running postmasters with enough lis

Re: [HACKERS] Unportable implementation of background worker start

2017-04-19 Thread Andres Freund
On 2017-04-19 18:56:26 -0400, Tom Lane wrote: > Alvaro Herrera writes: > > Tom Lane wrote: > >> Hm. Do you have a more-portable alternative? > > > I was thinking in a WaitEventSet from latch.c. > > Yeah, some googling turns up the suggestion that a self-pipe is a portable > way to get consisten

Re: [HACKERS] Unportable implementation of background worker start

2017-04-19 Thread Tom Lane
Alvaro Herrera writes: > Tom Lane wrote: >> Hm. Do you have a more-portable alternative? > I was thinking in a WaitEventSet from latch.c. Yeah, some googling turns up the suggestion that a self-pipe is a portable way to get consistent semantics from select(); latch.c has already done that. I s

Re: [HACKERS] Unportable implementation of background worker start

2017-04-19 Thread Alvaro Herrera
Tom Lane wrote: > Alvaro Herrera writes: > > Tom Lane wrote: > >> So I'm wondering what the design rationale was for only starting one > >> bgworker per invocation. > > > The rationale was that there may be other tasks waiting for postmaster > > attention, and if there are many bgworkers needing

Re: [HACKERS] Unportable implementation of background worker start

2017-04-19 Thread Tom Lane
Alvaro Herrera writes: > Tom Lane wrote: >> So I'm wondering what the design rationale was for only starting one >> bgworker per invocation. > The rationale was that there may be other tasks waiting for postmaster > attention, and if there are many bgworkers needing to be started, the > other wor

Re: [HACKERS] Unportable implementation of background worker start

2017-04-19 Thread Alvaro Herrera
Tom Lane wrote: > While I haven't yet tested it, it seems like a fix might be as simple > as deleting these lines in maybe_start_bgworker: > > /* > * Have ServerLoop call us again. Note that there might not > * actually *be* another runnable worker, but we d