On Sat, Nov 25, 2017 at 05:45:59PM -0500, Tom Lane wrote:
> Justin Pryzby writes:
> > We never had any issue during the ~2 years running PG96 on this VM, until
> > upgrading Monday to PG10.1, and we've now hit it 5+ times.
>
> > BTW this is a VM run on a hypervisor managed by our customer:
> > DM
Justin Pryzby writes:
> We never had any issue during the ~2 years running PG96 on this VM, until
> upgrading Monday to PG10.1, and we've now hit it 5+ times.
> BTW this is a VM run on a hypervisor managed by our customer:
> DMI: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platfo
Justin Pryzby writes:
> On Wed, Nov 22, 2017 at 07:43:50PM -0500, Tom Lane wrote:
>> My hypothesis about a missed memory barrier would imply that there's (at
>> least) one process that's waiting but is not in the lock's wait queue and
> Do I have to also check the wait queue to verify? Give a hi
On Wed, Nov 22, 2017 at 07:43:50PM -0500, Tom Lane wrote:
> Justin Pryzby writes:
> > For starters, I found that PID 27427 has:
>
> > (gdb) p proc->lwWaiting
> > $1 = 0 '\000'
> > (gdb) p proc->lwWaitMode
> > $2 = 1 '\001'
>
> To confirm, this is LWLockAcquire's "proc", equal to MyProc?
> If so,
Justin Pryzby writes:
> For starters, I found that PID 27427 has:
> (gdb) p proc->lwWaiting
> $1 = 0 '\000'
> (gdb) p proc->lwWaitMode
> $2 = 1 '\001'
To confirm, this is LWLockAcquire's "proc", equal to MyProc?
If so, and if LWLockAcquire is blocked at PGSemaphoreLock,
that sure seems like a sm
On Wed, Nov 22, 2017 at 01:27:12PM -0500, Tom Lane wrote:
> Justin Pryzby writes:
> > On Tue, Nov 21, 2017 at 03:40:27PM -0800, Andres Freund wrote:
> >> Could you try stracing next time?
>
> > I straced all the "startup" PIDs, which were all in futex, without
> > exception:
>
> If you've got d
On Wed, Nov 22, 2017 at 01:27:12PM -0500, Tom Lane wrote:
> Justin Pryzby writes:
> [ in an earlier post: ]
> > BTW this is a VM run on a hypervisor managed by our customer:
> > DMI: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform,
> > BIOS 6.00 06/22/2012
>
> Hmm. Can't a
Justin Pryzby writes:
> On Tue, Nov 21, 2017 at 03:40:27PM -0800, Andres Freund wrote:
>> Could you try stracing next time?
> I straced all the "startup" PIDs, which were all in futex, without exception:
If you've got debug symbols installed, could you investigate the states
of the LWLocks the p
On Tue, Nov 21, 2017 at 03:40:27PM -0800, Andres Freund wrote:
> Hi,
>
> On 2017-11-21 17:09:26 -0600, Justin Pryzby wrote:
> > I'm sorry to report this previously reported problem is happening again,
> > starting shortly after pg_upgrading a customer to PG10.1 from 9.6.5.
> >
> > As $subject: ba
On Tue, Nov 21, 2017 at 03:40:27PM -0800, Andres Freund wrote:
> Hi,
>
> On 2017-11-21 17:09:26 -0600, Justin Pryzby wrote:
> > I'm sorry to report this previously reported problem is happening again,
> > starting shortly after pg_upgrading a customer to PG10.1 from 9.6.5.
> >
> > As $subject: ba
On Tue, Nov 21, 2017 at 03:45:58PM -0800, Andres Freund wrote:
> On 2017-11-21 18:21:16 -0500, Tom Lane wrote:
> > Justin Pryzby writes:
> > > As $subject: backends are stuck in startup for minutes at a time. I
> > > didn't
> > > strace this time, but I believe last time I saw one was waiting in
Rakesh Kumar writes:
> why is that I did not receive the first 4 emails on this topic?
Perhaps you need to adjust your mail filters.
> I see that only the old email address "pgsql-gene...@postgresql.org" is
> mentioned. Could that be the reason ?
> ps: I am adding the new lists address.
Pleas
why is that I did not receive the first 4 emails on this topic? I see that
only the old email address "pgsql-gene...@postgresql.org" is mentioned. Could
that be the reason ?
ps: I am adding the new lists address.
On 2017-11-21 19:02:01 -0500, Tom Lane wrote:
> and...@anarazel.de (An
I wrote:
> ... Maybe we need
> to take a closer look at where LWLocks devolve to blocking on the process
> semaphore and see if there's any implicit assumptions about barriers there.
Like, say, here:
for (;;)
{
PGSemaphoreLock(proc->sem);
On 2017-11-21 19:02:01 -0500, Tom Lane wrote:
> and...@anarazel.de (Andres Freund) writes:
> > On 2017-11-21 18:50:05 -0500, Tom Lane wrote:
> >> (If Justin saw that while still on 9.6, then it'd be worth looking
> >> closer.)
>
> > Right. I took this to be referring to something before the curren
and...@anarazel.de (Andres Freund) writes:
> On 2017-11-21 18:50:05 -0500, Tom Lane wrote:
>> (If Justin saw that while still on 9.6, then it'd be worth looking
>> closer.)
> Right. I took this to be referring to something before the current
> migration, but I might have overinterpreted things. Th
On 2017-11-21 18:50:05 -0500, Tom Lane wrote:
> Andres Freund writes:
> > On 2017-11-21 18:21:16 -0500, Tom Lane wrote:
> >> Justin Pryzby writes:
> >>> As $subject: backends are stuck in startup for minutes at a time. I
> >>> didn't
> >>> strace this time, but I believe last time I saw one was
Andres Freund writes:
> On 2017-11-21 18:21:16 -0500, Tom Lane wrote:
>> Justin Pryzby writes:
>>> As $subject: backends are stuck in startup for minutes at a time. I didn't
>>> strace this time, but I believe last time I saw one was waiting in a futex.
> A futex? Hm, that was stock postgres?
On 2017-11-21 18:21:16 -0500, Tom Lane wrote:
> Justin Pryzby writes:
> > As $subject: backends are stuck in startup for minutes at a time. I didn't
> > strace this time, but I believe last time I saw one was waiting in a futex.
>
> Hm...
A futex? Hm, that was stock postgres?
> > I saved ~40
Hi,
On 2017-11-21 17:09:26 -0600, Justin Pryzby wrote:
> I'm sorry to report this previously reported problem is happening again,
> starting shortly after pg_upgrading a customer to PG10.1 from 9.6.5.
>
> As $subject: backends are stuck in startup for minutes at a time. I didn't
> strace this ti
Justin Pryzby writes:
> As $subject: backends are stuck in startup for minutes at a time. I didn't
> strace this time, but I believe last time I saw one was waiting in a futex.
Hm...
> I saved ~40 cores from backends from the most recent incident, which are all
> essentially identical:
This on
I'm sorry to report this previously reported problem is happening again,
starting shortly after pg_upgrading a customer to PG10.1 from 9.6.5.
As $subject: backends are stuck in startup for minutes at a time. I didn't
strace this time, but I believe last time I saw one was waiting in a futex.
pos
22 matches
Mail list logo