Andres Freund writes:
> On 2017-04-27 16:35:29 -0400, Tom Lane wrote:
>> It looks like it might be sufficient to do "#ifdef EPOLL_CLOEXEC"
>> in latch.c, rather than bothering with a full-blown configure check.
> Yea, that sounds worth trying. Wonder if we need to care about kernels
> not suppor
On 2017-04-27 16:35:29 -0400, Tom Lane wrote:
> Andres Freund writes:
> > On 2017-04-26 17:05:39 -0400, Tom Lane wrote:
> >> I went ahead and changed the call to epoll_create into epoll_create1.
> >> I'm not too concerned about loss of portability there --- it seems
> >> unlikely that many people
Andres Freund writes:
> On 2017-04-26 17:05:39 -0400, Tom Lane wrote:
>> I went ahead and changed the call to epoll_create into epoll_create1.
>> I'm not too concerned about loss of portability there --- it seems
>> unlikely that many people are still using ten-year-old glibc, and
>> even less lik
Robert Haas writes:
> On Wed, Apr 26, 2017 at 8:37 PM, Andres Freund wrote:
>>> 3. Go ahead with converting the postmaster to use WaitEventSet, a la
>>> the draft patch I posted earlier. I'd be happy to do this if we were
>>> at the start of a devel cycle, but right now seems a bit late --- not
On 2017-04-26 17:05:39 -0400, Tom Lane wrote:
> Here's an updated version of that, which makes use of our previous
> conclusion that F_SETFD/FD_CLOEXEC are available everywhere except
> Windows, and fixes some sloppy thinking about the EXEC_BACKEND case.
>
> I went ahead and changed the call to ep
On Wed, Apr 26, 2017 at 8:37 PM, Andres Freund wrote:
>> 3. Go ahead with converting the postmaster to use WaitEventSet, a la
>> the draft patch I posted earlier. I'd be happy to do this if we were
>> at the start of a devel cycle, but right now seems a bit late --- not
>> to mention that we real
On 2017-04-26 11:42:38 -0400, Tom Lane wrote:
> 1. Let HEAD stand as it is. We have a problem with slow response to
> bgworker start requests that arrive while ServerLoop is active, but that's
> a pretty tight window usually (although I believe I've seen it hit at
> least once in testing).
>
> 2.
Andres Freund writes:
> I'd still like to get something like your CLOEXEC patch applied
> independently however.
Here's an updated version of that, which makes use of our previous
conclusion that F_SETFD/FD_CLOEXEC are available everywhere except
Windows, and fixes some sloppy thinking about the
I wrote:
> =?utf-8?Q?R=C3=A9mi_Zara?= writes:
>> coypu was not stuck (no buildfarm related process running), but failed to
>> clean-up shared memory and semaphores.
>> I’ve done the clean-up.
> Huh, that's even more interesting.
I installed NetBSD 5.1.5 on an old Mac G4; I believe this is a rea
=?utf-8?Q?R=C3=A9mi_Zara?= writes:
>> Le 25 avr. 2017 à 01:47, Tom Lane a écrit :
>> It looks like coypu is going to need manual intervention (ie, kill -9
>> on the leftover postmaster) to get unwedged :-(. That's particularly
>> disturbing because it implies that ServerLoop isn't iterating at a
> Le 25 avr. 2017 à 01:47, Tom Lane a écrit :
>
> I wrote:
>> What I'm inclined to do is to revert the pselect change but not the other,
>> to see if that fixes these two animals. If it does, we could look into
>> blacklisting these particular platforms when choosing pselect.
>
> It looks like
I wrote:
> What I'm inclined to do is to revert the pselect change but not the other,
> to see if that fixes these two animals. If it does, we could look into
> blacklisting these particular platforms when choosing pselect.
It looks like coypu is going to need manual intervention (ie, kill -9
on
Andres Freund writes:
> On 2017-04-24 18:14:41 -0400, Tom Lane wrote:
>> A bit of googling establishes that NetBSD 5.1 has a broken pselect
>> implementation:
>>
>> http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=43625
> Yikes. Do I understand correctly that they effectively just mapp
On 2017-04-24 18:14:41 -0400, Tom Lane wrote:
> Andres Freund writes:
> > On 2017-04-24 17:33:39 -0400, Tom Lane wrote:
> >> coypu's problem is unrelated:
>
> > Note I was linking the 9.6 report form coypu, not HEAD. Afaics the 9.6
> > failure is the same as gharial's mode of failure.
>
> [ look
Andres Freund writes:
> On 2017-04-24 17:33:39 -0400, Tom Lane wrote:
>> coypu's problem is unrelated:
> Note I was linking the 9.6 report form coypu, not HEAD. Afaics the 9.6
> failure is the same as gharial's mode of failure.
[ looks closer... ] Oh: the 9.6 run occurred first, and the failure
On 2017-04-24 17:33:39 -0400, Tom Lane wrote:
> Andres Freund writes:
> > On 2017-04-24 13:16:44 -0700, Andres Freund wrote:
> >> Unclear if related, but
> >> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2017-04-24%2019%3A30%3A42
> >> has a suspicious timing of failing in a w
Andres Freund writes:
> On 2017-04-24 13:16:44 -0700, Andres Freund wrote:
>> Unclear if related, but
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2017-04-24%2019%3A30%3A42
>> has a suspicious timing of failing in a weird way.
> Given that gharial is also failing on 9.6 (
On 2017-04-24 13:16:44 -0700, Andres Freund wrote:
> Unclear if related, but
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2017-04-24%2019%3A30%3A42
> has a suspicious timing of failing in a weird way.
Given that gharial is also failing on 9.6 (same set of commits) and
coypu
On 2017-04-21 23:50:41 -0400, Tom Lane wrote:
> I wrote:
> > Attached is a lightly-tested draft patch that converts the postmaster to
> > use a WaitEventSet for waiting in ServerLoop. I've got mixed emotions
> > about whether this is the direction to proceed, though.
>
> Attached are a couple of
On 2017-04-21 23:50:41 -0400, Tom Lane wrote:
> Attached are a couple of patches that represent a plausible Plan B.
> The first one changes the postmaster to run its signal handlers without
> specifying SA_RESTART. I've confirmed that that seems to fix the
> select_parallel-test-takes-a-long-time
I wrote:
> Attached is a lightly-tested draft patch that converts the postmaster to
> use a WaitEventSet for waiting in ServerLoop. I've got mixed emotions
> about whether this is the direction to proceed, though.
Attached are a couple of patches that represent a plausible Plan B.
The first one c
Alvaro Herrera writes:
> Tom Lane wrote:
>> It also appears to me that do_start_bgworker's treatment of fork
>> failure is completely brain dead. Did anyone really think about
>> that case?
> Hmm, I probably modelled it on autovacuum without giving that case much
> additional consideration.
Att
Tom Lane writes:
> Andres Freund writes:
>> On 2017-04-21 14:08:21 -0400, Tom Lane wrote:
>>> but I see that SUSv2
>>> mandates that fcntl.h provide both F_SETFD and FD_CLOEXEC, so by our own
>>> coding rules it ought to be okay to assume they're there. I'm tempted to
>>> rip out the quoted bit,
Andres Freund writes:
> On 2017-04-21 14:08:21 -0400, Tom Lane wrote:
>> but I see that SUSv2
>> mandates that fcntl.h provide both F_SETFD and FD_CLOEXEC, so by our own
>> coding rules it ought to be okay to assume they're there. I'm tempted to
>> rip out the quoted bit, as well as the #ifdef F_
Tom,
On 2017-04-21 13:49:27 -0400, Tom Lane wrote:
> >> * If we are in PM_WAIT_DEAD_END state, then we don't want to accept
> >> - * any new connections, so we don't call select(), and just
> >> sleep.
> >> + * any new connections, so we don't call WaitEventSetWait(),
> >> an
On 2017-04-21 14:08:21 -0400, Tom Lane wrote:
> but I see that SUSv2
> mandates that fcntl.h provide both F_SETFD and FD_CLOEXEC, so by our own
> coding rules it ought to be okay to assume they're there. I'm tempted to
> rip out the quoted bit, as well as the #ifdef F_SETFD, from libpq and see
> i
I wrote:
> Andres Freund writes:
>> On 2017-04-21 12:50:04 -0400, Tom Lane wrote:
>>> +#ifndef FD_CLOEXEC
>>> +#define FD_CLOEXEC 1
>>> +#endif
>> Hm? Are we sure this is portable? Is there really cases that have
>> F_SETFD, but not CLOEXEC?
> Copied-and-pasted from our only existing use of FD_
Andres Freund writes:
> On 2017-04-21 12:50:04 -0400, Tom Lane wrote:
>> Attached is a lightly-tested draft patch that converts the postmaster to
>> use a WaitEventSet for waiting in ServerLoop. I've got mixed emotions
>> about whether this is the direction to proceed, though. It adds at least
>
Alvaro Herrera writes:
> Tom Lane wrote:
>> I'm coming back to the idea that at least in the back branches, the
>> thing to do is allow maybe_start_bgworker to start multiple workers.
>>
>> Is there any actual evidence for the claim that that might have
>> bad side effects?
> Well, I ran tests w
Hi,
On 2017-04-21 12:50:04 -0400, Tom Lane wrote:
> Attached is a lightly-tested draft patch that converts the postmaster to
> use a WaitEventSet for waiting in ServerLoop. I've got mixed emotions
> about whether this is the direction to proceed, though. It adds at least
> a couple of kernel cal
Attached is a lightly-tested draft patch that converts the postmaster to
use a WaitEventSet for waiting in ServerLoop. I've got mixed emotions
about whether this is the direction to proceed, though. It adds at least
a couple of kernel calls per postmaster signal delivery, and probably to
every po
Tom Lane wrote:
> After sleeping and thinking more, I've realized that the
> slow-bgworker-start issue actually exists on *every* platform, it's just
> harder to hit when select() is interruptable. But consider the case
> where multiple bgworker-start requests arrive while ServerLoop is
> activel
Andres Freund writes:
> On 2017-04-20 00:50:13 -0400, Tom Lane wrote:
>> My first reaction was that that sounded like a lot more work than removing
>> two lines from maybe_start_bgworker and adjusting some comments. But on
>> closer inspection, the slow-bgworker-start issue isn't the only problem
On 2017-04-20 20:10:41 -0400, Tom Lane wrote:
> Andres Freund writes:
> > On 2017-04-20 20:05:02 -0400, Tom Lane wrote:
> >> Also, if it's not there we'd fall back to using plain poll(), which is
> >> not so awful that we need to work hard to avoid it. I'd just as soon
> >> keep the number of com
Andres Freund writes:
> On 2017-04-20 20:05:02 -0400, Tom Lane wrote:
>> Also, if it's not there we'd fall back to using plain poll(), which is
>> not so awful that we need to work hard to avoid it. I'd just as soon
>> keep the number of combinations down.
> Just using fcntl(SET, CLOEXEC) wound'
On 2017-04-20 20:05:02 -0400, Tom Lane wrote:
> Andres Freund writes:
> > On 2017-04-20 19:53:02 -0400, Tom Lane wrote:
> >> So ... what would you say to replacing epoll_create() with
> >> epoll_create1(EPOLL_CLOEXEC) ? Then a WaitEventSet would not
> >> represent inheritable-across-exec resource
Andres Freund writes:
> On 2017-04-20 19:53:02 -0400, Tom Lane wrote:
>> So ... what would you say to replacing epoll_create() with
>> epoll_create1(EPOLL_CLOEXEC) ? Then a WaitEventSet would not
>> represent inheritable-across-exec resources on any platform,
>> making it a lot easier to deal wit
On 2017-04-20 19:53:02 -0400, Tom Lane wrote:
> I wrote:
> > Andres Freund writes:
> >> On 2017-04-20 19:23:28 -0400, Tom Lane wrote:
> >>> or are the HANDLEs in a Windows WaitEventSet not inheritable
> >>> resources?
>
> >> So that kind of sounds like it should be doable.
>
> > Ah, good. I'll
I wrote:
> Andres Freund writes:
>> On 2017-04-20 19:23:28 -0400, Tom Lane wrote:
>>> or are the HANDLEs in a Windows WaitEventSet not inheritable
>>> resources?
>> So that kind of sounds like it should be doable.
> Ah, good. I'll add a comment about that and press on.
So ... what would you sa
On 2017-04-20 00:50:13 -0400, Tom Lane wrote:
> I wrote:
> > Alvaro Herrera writes:
> >> Tom Lane wrote:
> >>> Hm. Do you have a more-portable alternative?
>
> >> I was thinking in a WaitEventSet from latch.c.
>
> My first reaction was that that sounded like a lot more work than removing
> two
Andres Freund writes:
> On 2017-04-20 19:23:28 -0400, Tom Lane wrote:
>> or are the HANDLEs in a Windows WaitEventSet not inheritable
>> resources?
> I think we have control over that. According to
> https://msdn.microsoft.com/en-us/library/windows/desktop/ms724466(v=vs.85).aspx
> CreateProcess()
On 2017-04-20 19:23:28 -0400, Tom Lane wrote:
> or are the HANDLEs in a Windows WaitEventSet not inheritable
> resources?
I think we have control over that. According to
https://msdn.microsoft.com/en-us/library/windows/desktop/ms724466(v=vs.85).aspx
CreateProcess() has to be called with bInheritHa
I wrote:
> Alvaro Herrera writes:
>> Tom Lane wrote:
>>> Hm. Do you have a more-portable alternative?
>> I was thinking in a WaitEventSet from latch.c.
> My first reaction was that that sounded like a lot more work than removing
> two lines from maybe_start_bgworker and adjusting some comments.
I wrote:
> Alvaro Herrera writes:
>> Tom Lane wrote:
>>> Hm. Do you have a more-portable alternative?
>> I was thinking in a WaitEventSet from latch.c.
My first reaction was that that sounded like a lot more work than removing
two lines from maybe_start_bgworker and adjusting some comments. Bu
Andres Freund writes:
> FWIW, I'd wished before that we used something a bit more modern than
> select() if available... It's nice to be able to listen to a larger
> number of sockets without repeated O(sockets) overhead.
[ raised eyebrow... ] Is anyone really running postmasters with enough
lis
On 2017-04-19 18:56:26 -0400, Tom Lane wrote:
> Alvaro Herrera writes:
> > Tom Lane wrote:
> >> Hm. Do you have a more-portable alternative?
>
> > I was thinking in a WaitEventSet from latch.c.
>
> Yeah, some googling turns up the suggestion that a self-pipe is a portable
> way to get consisten
Alvaro Herrera writes:
> Tom Lane wrote:
>> Hm. Do you have a more-portable alternative?
> I was thinking in a WaitEventSet from latch.c.
Yeah, some googling turns up the suggestion that a self-pipe is a portable
way to get consistent semantics from select(); latch.c has already done
that. I s
Tom Lane wrote:
> Alvaro Herrera writes:
> > Tom Lane wrote:
> >> So I'm wondering what the design rationale was for only starting one
> >> bgworker per invocation.
>
> > The rationale was that there may be other tasks waiting for postmaster
> > attention, and if there are many bgworkers needing
Alvaro Herrera writes:
> Tom Lane wrote:
>> So I'm wondering what the design rationale was for only starting one
>> bgworker per invocation.
> The rationale was that there may be other tasks waiting for postmaster
> attention, and if there are many bgworkers needing to be started, the
> other wor
Tom Lane wrote:
> While I haven't yet tested it, it seems like a fix might be as simple
> as deleting these lines in maybe_start_bgworker:
>
> /*
> * Have ServerLoop call us again. Note that there might not
> * actually *be* another runnable worker, but we d
I chanced to notice that on gaur/pademelon, the "select_parallel"
regression test sometimes takes a great deal longer than normal,
for no obvious reason. It does eventually terminate, but sometimes
only after 10 to 20 minutes rather than the couple dozen seconds
that are typical on that slow machi
51 matches
Mail list logo