Re: backends stuck in "startup"

2017-11-25 Thread Justin Pryzby
On Sat, Nov 25, 2017 at 05:45:59PM -0500, Tom Lane wrote: > Justin Pryzby writes: > > We never had any issue during the ~2 years running PG96 on this VM, until > > upgrading Monday to PG10.1, and we've now hit it 5+ times. > > > BTW this is a VM run on a hypervisor managed by our customer: > > DM

Re: backends stuck in "startup"

2017-11-25 Thread Tom Lane
Justin Pryzby writes: > We never had any issue during the ~2 years running PG96 on this VM, until > upgrading Monday to PG10.1, and we've now hit it 5+ times. > BTW this is a VM run on a hypervisor managed by our customer: > DMI: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platfo

Re: backends stuck in "startup"

2017-11-22 Thread Tom Lane
Justin Pryzby writes: > On Wed, Nov 22, 2017 at 07:43:50PM -0500, Tom Lane wrote: >> My hypothesis about a missed memory barrier would imply that there's (at >> least) one process that's waiting but is not in the lock's wait queue and > Do I have to also check the wait queue to verify? Give a hi

Re: backends stuck in "startup"

2017-11-22 Thread Justin Pryzby
On Wed, Nov 22, 2017 at 07:43:50PM -0500, Tom Lane wrote: > Justin Pryzby writes: > > For starters, I found that PID 27427 has: > > > (gdb) p proc->lwWaiting > > $1 = 0 '\000' > > (gdb) p proc->lwWaitMode > > $2 = 1 '\001' > > To confirm, this is LWLockAcquire's "proc", equal to MyProc? > If so,

Re: backends stuck in "startup"

2017-11-22 Thread Tom Lane
Justin Pryzby writes: > For starters, I found that PID 27427 has: > (gdb) p proc->lwWaiting > $1 = 0 '\000' > (gdb) p proc->lwWaitMode > $2 = 1 '\001' To confirm, this is LWLockAcquire's "proc", equal to MyProc? If so, and if LWLockAcquire is blocked at PGSemaphoreLock, that sure seems like a sm

Re: backends stuck in "startup"

2017-11-22 Thread Justin Pryzby
On Wed, Nov 22, 2017 at 01:27:12PM -0500, Tom Lane wrote: > Justin Pryzby writes: > > On Tue, Nov 21, 2017 at 03:40:27PM -0800, Andres Freund wrote: > >> Could you try stracing next time? > > > I straced all the "startup" PIDs, which were all in futex, without > > exception: > > If you've got d

Re: backends stuck in "startup"

2017-11-22 Thread Justin Pryzby
On Wed, Nov 22, 2017 at 01:27:12PM -0500, Tom Lane wrote: > Justin Pryzby writes: > [ in an earlier post: ] > > BTW this is a VM run on a hypervisor managed by our customer: > > DMI: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, > > BIOS 6.00 06/22/2012 > > Hmm. Can't a

Re: backends stuck in "startup"

2017-11-22 Thread Tom Lane
Justin Pryzby writes: > On Tue, Nov 21, 2017 at 03:40:27PM -0800, Andres Freund wrote: >> Could you try stracing next time? > I straced all the "startup" PIDs, which were all in futex, without exception: If you've got debug symbols installed, could you investigate the states of the LWLocks the p

Re: backends stuck in "startup"

2017-11-22 Thread Justin Pryzby
On Tue, Nov 21, 2017 at 03:40:27PM -0800, Andres Freund wrote: > Hi, > > On 2017-11-21 17:09:26 -0600, Justin Pryzby wrote: > > I'm sorry to report this previously reported problem is happening again, > > starting shortly after pg_upgrading a customer to PG10.1 from 9.6.5. > > > > As $subject: ba

Re: backends stuck in "startup"

2017-11-21 Thread Justin Pryzby
On Tue, Nov 21, 2017 at 03:40:27PM -0800, Andres Freund wrote: > Hi, > > On 2017-11-21 17:09:26 -0600, Justin Pryzby wrote: > > I'm sorry to report this previously reported problem is happening again, > > starting shortly after pg_upgrading a customer to PG10.1 from 9.6.5. > > > > As $subject: ba

Re: backends stuck in "startup"

2017-11-21 Thread Justin Pryzby
On Tue, Nov 21, 2017 at 03:45:58PM -0800, Andres Freund wrote: > On 2017-11-21 18:21:16 -0500, Tom Lane wrote: > > Justin Pryzby writes: > > > As $subject: backends are stuck in startup for minutes at a time. I > > > didn't > > > strace this time, but I believe last time I saw one was waiting in

Re: backends stuck in "startup"

2017-11-21 Thread Tom Lane
Rakesh Kumar writes: > why is that I did not receive the first 4 emails on this topic? Perhaps you need to adjust your mail filters. > I see that only the old email address "pgsql-gene...@postgresql.org" is > mentioned. Could that be the reason ? > ps: I am adding the new lists address. Pleas

Re: backends stuck in "startup"

2017-11-21 Thread Rakesh Kumar
why is that I did not receive the first 4 emails on this topic? I see that only the old email address "pgsql-gene...@postgresql.org" is mentioned. Could that be the reason ? ps: I am adding the new lists address. On 2017-11-21 19:02:01 -0500, Tom Lane wrote: > and...@anarazel.de (An

Re: backends stuck in "startup"

2017-11-21 Thread Tom Lane
I wrote: > ... Maybe we need > to take a closer look at where LWLocks devolve to blocking on the process > semaphore and see if there's any implicit assumptions about barriers there. Like, say, here: for (;;) { PGSemaphoreLock(proc->sem);

Re: backends stuck in "startup"

2017-11-21 Thread Andres Freund
On 2017-11-21 19:02:01 -0500, Tom Lane wrote: > and...@anarazel.de (Andres Freund) writes: > > On 2017-11-21 18:50:05 -0500, Tom Lane wrote: > >> (If Justin saw that while still on 9.6, then it'd be worth looking > >> closer.) > > > Right. I took this to be referring to something before the curren

Re: backends stuck in "startup"

2017-11-21 Thread Tom Lane
and...@anarazel.de (Andres Freund) writes: > On 2017-11-21 18:50:05 -0500, Tom Lane wrote: >> (If Justin saw that while still on 9.6, then it'd be worth looking >> closer.) > Right. I took this to be referring to something before the current > migration, but I might have overinterpreted things. Th

Re: backends stuck in "startup"

2017-11-21 Thread Andres Freund
On 2017-11-21 18:50:05 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2017-11-21 18:21:16 -0500, Tom Lane wrote: > >> Justin Pryzby writes: > >>> As $subject: backends are stuck in startup for minutes at a time. I > >>> didn't > >>> strace this time, but I believe last time I saw one was

Re: backends stuck in "startup"

2017-11-21 Thread Tom Lane
Andres Freund writes: > On 2017-11-21 18:21:16 -0500, Tom Lane wrote: >> Justin Pryzby writes: >>> As $subject: backends are stuck in startup for minutes at a time. I didn't >>> strace this time, but I believe last time I saw one was waiting in a futex. > A futex? Hm, that was stock postgres?

Re: backends stuck in "startup"

2017-11-21 Thread Andres Freund
On 2017-11-21 18:21:16 -0500, Tom Lane wrote: > Justin Pryzby writes: > > As $subject: backends are stuck in startup for minutes at a time. I didn't > > strace this time, but I believe last time I saw one was waiting in a futex. > > Hm... A futex? Hm, that was stock postgres? > > I saved ~40

Re: backends stuck in "startup"

2017-11-21 Thread Andres Freund
Hi, On 2017-11-21 17:09:26 -0600, Justin Pryzby wrote: > I'm sorry to report this previously reported problem is happening again, > starting shortly after pg_upgrading a customer to PG10.1 from 9.6.5. > > As $subject: backends are stuck in startup for minutes at a time. I didn't > strace this ti

Re: backends stuck in "startup"

2017-11-21 Thread Tom Lane
Justin Pryzby writes: > As $subject: backends are stuck in startup for minutes at a time. I didn't > strace this time, but I believe last time I saw one was waiting in a futex. Hm... > I saved ~40 cores from backends from the most recent incident, which are all > essentially identical: This on

backends stuck in "startup"

2017-11-21 Thread Justin Pryzby
I'm sorry to report this previously reported problem is happening again, starting shortly after pg_upgrading a customer to PG10.1 from 9.6.5. As $subject: backends are stuck in startup for minutes at a time. I didn't strace this time, but I believe last time I saw one was waiting in a futex. pos