On Tue, Dec 24, 2019 at 3:10 PM Thomas Munro <thomas.mu...@gmail.com> wrote: > On Sat, Dec 21, 2019 at 2:10 PM Shawn Debnath <s...@amazon.com> wrote: > > On Fri, Dec 20, 2019 at 12:05:34PM +1300, Thomas Munro wrote: > > > I think we should probably just remove the unusual ResetLatch() call, > > > rather than adding a CFI(). See attached. Thoughts? > > > > I agree: removing the ResetLatch() and having the wait event code deal > > with it is a better way to go. I wonder why the reset was done in the > > first place?
I have pushed this on master only. > Thanks for the review. I was preparing to commit this today, but I > think I've spotted another latch protocol problem in the stable > branches only and I'd to get some more eyeballs on it. I bet it's > really hard to hit, but ConditionVariableSleep()'s return path (ie > when the CV has been signalled) forgets that the the latch is > multiplexed and latches are merged: just because it was set by > ConditionVariableSignal() doesn't mean it wasn't *also* set by die() > or some other interrupt, and yet at the point we return, we've reset > the latch and not run CFI(), and there's nothing to say the caller > won't then immediately wait on the latch in some other code path, and > sleep like a log despite the earlier delivery of (say) SIGTERM. If > I'm right about that, I think the solution is to move that CFI down in > the stable branches (which you already did for master in commit > 1321509f). I'm not going to back-patch this (ie the back-branches version from my previous email) without more discussion, but I still think it's subtly broken.