Re: Condition variable live lock

2018-01-08 Thread Robert Haas
On Mon, Jan 8, 2018 at 7:02 PM, Tom Lane wrote: > I'm not real sure BTW why we have some callers that ereport and some > that just exit(1). Seems like it would be better to be consistent, > though I'm not entirely sure which behavior to standardize on. I think at one point we had an idea that re

Re: Condition variable live lock

2018-01-08 Thread Tom Lane
Thomas Munro writes: > On Sun, Jan 7, 2018 at 10:00 AM, Tom Lane wrote: >> Another loose end that I'm seeing here is that while a process waiting on >> a condition variable will respond to a cancel or die interrupt, it will >> not notice postmaster death. This seems unwise to me. I think we sho

Re: Condition variable live lock

2018-01-08 Thread Tom Lane
Thomas Munro writes: > On Sun, Jan 7, 2018 at 10:00 AM, Tom Lane wrote: >> I began with the intention of making no non-cosmetic changes, but then >> I started to wonder why ConditionVariablePrepareToSleep bothers with a >> !proclist_contains test, when the calling process surely ought not be >> i

Re: Condition variable live lock

2018-01-08 Thread Tom Lane
Robert Haas writes: > I guess my aversion to converting the existing Assert-test into an > elog test was really a concern that we'd be countenancing the use of > CVs in any coding pattern more complicated than a very simple > test-and-wait loop. Suppose someone were to propose adding runtime > ch

Re: Condition variable live lock

2018-01-08 Thread Robert Haas
On Sun, Jan 7, 2018 at 6:38 PM, Tom Lane wrote: >> Actually ... perhaps a better design would be to have >> ConditionVariable[PrepareTo]Sleep auto-cancel any prepared sleep for >> a different condition variable, analogously to what we just did in >> ConditionVariableBroadcast, on the same theory t

Re: Condition variable live lock

2018-01-08 Thread Tom Lane
Thomas Munro writes: > One very small thing after another look: > - Assert(cv_sleep_target == NULL); > + if (cv_sleep_target != NULL) > + ConditionVariableCancelSleep(); > The test for cv_sleep_target != NULL is redundant since > ConditionVariableCancelSleep() would ret

Re: Condition variable live lock

2018-01-08 Thread Thomas Munro
On Mon, Jan 8, 2018 at 5:25 PM, Thomas Munro wrote: > On Sun, Jan 7, 2018 at 10:43 AM, Tom Lane wrote: >> Actually ... perhaps a better design would be to have >> ConditionVariable[PrepareTo]Sleep auto-cancel any prepared sleep for >> a different condition variable, analogously to what we just di

Re: Condition variable live lock

2018-01-07 Thread Thomas Munro
On Sun, Jan 7, 2018 at 10:00 AM, Tom Lane wrote: > I began with the intention of making no non-cosmetic changes, but then > I started to wonder why ConditionVariablePrepareToSleep bothers with a > !proclist_contains test, when the calling process surely ought not be > in the list -- or any other l

Re: Condition variable live lock

2018-01-07 Thread Tom Lane
Thomas Munro writes: > On Mon, Jan 8, 2018 at 12:38 PM, Tom Lane wrote: >> Concretely, as per attached. > +1 for the idea. Haven't looked at the code yet but I'll review this > and the proclist patch shortly. Thanks. BTW, I realized that there is a second (and perhaps more important) reason w

Re: Condition variable live lock

2018-01-07 Thread Thomas Munro
On Mon, Jan 8, 2018 at 12:38 PM, Tom Lane wrote: > I wrote: >> Actually ... perhaps a better design would be to have >> ConditionVariable[PrepareTo]Sleep auto-cancel any prepared sleep for >> a different condition variable, analogously to what we just did in >> ConditionVariableBroadcast, on the s

Re: Condition variable live lock

2018-01-07 Thread Tom Lane
I wrote: > Actually ... perhaps a better design would be to have > ConditionVariable[PrepareTo]Sleep auto-cancel any prepared sleep for > a different condition variable, analogously to what we just did in > ConditionVariableBroadcast, on the same theory that whenever control > returns to the other

Re: Condition variable live lock

2018-01-06 Thread Tom Lane
I wrote: > I still think that we ought to change the Asserts on cv_sleep_target in > ConditionVariablePrepareToSleep and ConditionVariableSleep to be full > test-and-elog tests. Those are checking a global correctness property > ("global" meaning "interactions between completely unrelated modules

Re: Condition variable live lock

2018-01-06 Thread Tom Lane
I wrote: > Robert Haas writes: >> No opinion without seeing what you propose to change. > OK, will put out a proposal. I began with the intention of making no non-cosmetic changes, but then I started to wonder why ConditionVariablePrepareToSleep bothers with a !proclist_contains test, when the c

Re: Condition variable live lock

2018-01-05 Thread Thomas Munro
On Sat, Jan 6, 2018 at 6:33 AM, Tom Lane wrote: > As I feared, the existing regression tests are not really adequate for > this: gcov testing shows that the sentinel-inserting code path is > never entered, meaning ConditionVariableBroadcast never sees more > than one waiter. What's more, it's now

Re: Condition variable live lock

2018-01-05 Thread Tom Lane
Robert Haas writes: > On Fri, Jan 5, 2018 at 2:38 PM, Tom Lane wrote: >> * I think the Asserts in ConditionVariablePrepareToSleep and >> ConditionVariableSleep ought to be replaced by full-fledged test and >> elog(ERROR), so that they are enforced even in non-assert builds. >> I don't have a lot

Re: Condition variable live lock

2018-01-05 Thread Robert Haas
On Fri, Jan 5, 2018 at 2:38 PM, Tom Lane wrote: > * I think the Asserts in ConditionVariablePrepareToSleep and > ConditionVariableSleep ought to be replaced by full-fledged test and > elog(ERROR), so that they are enforced even in non-assert builds. > I don't have a lot of confidence that corner c

Re: Condition variable live lock

2018-01-05 Thread Tom Lane
I wrote: > As I feared, the existing regression tests are not really adequate for > this: gcov testing shows that the sentinel-inserting code path is > never entered, meaning ConditionVariableBroadcast never sees more > than one waiter. Hmm ... adding tracing log printouts in BarrierArriveAndWait

Re: Condition variable live lock

2018-01-05 Thread Tom Lane
I wrote: > It's a question of how complicated you're willing to make this > logic, and whether you trust that we'll be able to test all the > code paths. Attached is a patch incorporating all the ideas mentioned in this thread, except that I think in HEAD we should change both ConditionVariableSig

Re: Condition variable live lock

2018-01-05 Thread Tom Lane
Thomas Munro writes: > Could we install the sentinel and pop the first entry at the same > time, so that we're not adding an extra spinlock acquire/release? Hm, maybe. Other ideas in that space: * if queue is empty when we first acquire the spinlock, we don't have to do anything at all. * if q

Re: Condition variable live lock

2018-01-05 Thread Alvaro Herrera
Tom Lane wrote: > Obviously, this trades a risk of loss of wakeup for a risk > of spurious wakeup, but presumably the latter is something we can > cope with. I wonder if it'd be useful to have a test mode for condition variables that spurious wakups happen randomly, to verify that every user is p

Re: Condition variable live lock

2018-01-05 Thread Thomas Munro
On Fri, Jan 5, 2018 at 7:42 PM, Tom Lane wrote: > Indeed, it looks like no other caller is paying attention to the result. > We could live with the uncertainty in the back branches, and redefine > ConditionVariableSignal as returning void in master. +1 Could we install the sentinel and pop the f

Re: Condition variable live lock

2018-01-04 Thread Tom Lane
Thomas Munro writes: > On Fri, Jan 5, 2018 at 7:10 PM, Tom Lane wrote: >> Should we rejigger the logic so that it awakens one additional waiter >> (if there is one) after detecting that someone else has removed the >> sentinel? Obviously, this trades a risk of loss of wakeup for a risk >> of spu

Re: Condition variable live lock

2018-01-04 Thread Thomas Munro
On Fri, Jan 5, 2018 at 7:10 PM, Tom Lane wrote: > I thought of another possible issue, though. In the situation where > someone else has removed our sentinel (presumably, by issuing > ConditionVariableSignal just before we were about to remove the > sentinel), my patch assumes we can just do noth

Re: Condition variable live lock

2018-01-04 Thread Tom Lane
Thomas Munro writes: > On Fri, Jan 5, 2018 at 5:27 PM, Tom Lane wrote: >> Now, the limitation with this is that we can't be waiting for any *other* >> condition variable, because then we'd be trashing our state about that >> variable. As coded, we can't be waiting for the target CV either, but >

Re: Condition variable live lock

2018-01-04 Thread Thomas Munro
On Fri, Jan 5, 2018 at 5:27 PM, Tom Lane wrote: > I share Robert's discomfort with that solution, but it seems to me there > might be a better way. The attached patch uses our own cvWaitLink as a > sentinel to detect when we've woken everybody who was on the wait list > before we arrived. That g

Re: Condition variable live lock

2018-01-04 Thread Tom Lane
Andres Freund writes: > On 2018-01-04 12:39:47 -0500, Robert Haas wrote: >>> Given that the proclist_contains() checks in condition_variable.c are >>> already racy, I think it might be feasible to collect all procnos to >>> signal while holding the spinlock, and then signal all of them in one >>>

Re: Condition variable live lock

2018-01-04 Thread Andres Freund
On 2018-01-04 12:39:47 -0500, Robert Haas wrote: > > Given that the proclist_contains() checks in condition_variable.c are > > already racy, I think it might be feasible to collect all procnos to > > signal while holding the spinlock, and then signal all of them in one > > go. > > That doesn't see

Re: Condition variable live lock

2018-01-04 Thread Robert Haas
On Fri, Dec 29, 2017 at 2:38 PM, Andres Freund wrote: > Hm, I'm not quite convinced by this approach. Partially because of the > backpatch issue you mention, partially because using the list length as > a limit doesn't seem quite nice. Seems OK to me. Certainly better than your competing proposa

Re: Condition variable live lock

2017-12-29 Thread Andres Freund
On 2017-12-29 12:16:20 +1300, Thomas Munro wrote: > Here is one way to fix it: track the wait queue size and use that > number to limit the wakeup loop. See attached. > > That's unbackpatchable though, because it changes the size of struct > ConditionVariable, potentially breaking extensions compi

Re: Condition variable live lock

2017-12-28 Thread Thomas Munro
On Fri, Dec 22, 2017 at 4:46 PM, Thomas Munro wrote: > while (ConditionVariableSignal(cv)) > ++nwoken; > > The problem is that another backend can be woken up, determine that it > would like to wait for the condition variable again, and then get > itself added to the back o

Condition variable live lock

2017-12-21 Thread Thomas Munro
Hi hackers, While debugging a build farm assertion failure after commit 18042840, and with the assumption that the problem is timing/scheduling sensitive, I tried hammering the problem workload on a few different machines and noticed that my slow 2-core test machine fairly regularly got into a liv