On Fri, Jul 12, 2019 at 6:08 PM Shawn Debnath <s...@amazon.com> wrote: > On Tue, Jul 09, 2019 at 11:03:18PM +1200, Thomas Munro wrote: > > On Sun, Jul 7, 2019 at 3:09 PM Thomas Munro <thomas.mu...@gmail.com> wrote: > > > + /* Timed out */ > > > + if (rc == 0) > > > + return true; > > > > Here's a version without that bit, because I don't think we need it. > > This works. Agree that letting it fall through covers the first gap.
Pushed, like that (with the now unused rc variable also removed). Thanks for the patch! > > > That still leaves the danger that the CV can be signalled some time > > > after ConditionVariableTimedSleep() returns. So now I'm wondering if > > > ConditionVariableCancelSleep() should signal the CV if it discovers > > > that this process is not in the proclist, on the basis that that must > > > indicate that we've been signalled even though we're not interested in > > > the message anymore, and yet some other process else might be > > > interested, and that might have been the only signal that is ever > > > going to be delivered by ConditionVariableSignal(). > > > > And a separate patch to do that. Thoughts? > > I like it. This covers the gap all the way till cancel is invoked and it > manipulates the list to remove itself or realizes that it needs to > forward the signal to some other process. I pushed this too. It's a separate commit, because I think there is at least a theoretical argument that it should be back-patched. I'm not going to do that today though, because I doubt anyone is relying on ConditionVariableSignal() working that reliably yet, and it's really with timeouts that it becomes a likely problem. I thought about this edge case because I have long wanted to propose a pair of functions that provide a simplified payloadless blocking alternative to NOTIFY, that would allow for just the right number of waiting sessions to wake up to handle SKIP LOCKED-style job queues. Otherwise you sometimes get thundering herds of wakeups fighting over crumbs. That made me think about the case where a worker session decides to time out and shut down due to being idle for too long, but eats a wakeup on its way out. Another question that comes up in that use case is whether CV wakeup queues should be LIFO or FIFO. I think the answer is LIFO, to support class worker pool designs that stabilise at the right size using a simple idle timeout rule. They're currently FIFO (proclist_pop_head_node() to wake up, but proclist_push_tail() to sleep). I understand why Robert didn't care about that last time I mentioned it: all our uses of CVs today are "broadcast" wakeups. But a productised version of the "poke" hack I showed earlier that supports poking just one waiter would care about the thing this patch fixed, and also the wakeup queue order. -- Thomas Munro https://enterprisedb.com