On 2020-08-24 13:42:35 -0700, Andres Freund wrote: > Hi, > > On 2020-08-23 22:53:18 -0400, Tom Lane wrote: > > We've seen repeated failures in the tests added by commit 43e084197: > > ... > > I dug into this a bit today, and found that I can reproduce the failure > > reliably by adding a short delay in the right place, as attached. > > > > However, after studying the test awhile I have to admit that I do not > > understand why all these failures look the same, because it seems to > > me that this test is a house of cards. It *repeatedly* expects that > > issuing a command to session X will result in session Y reporting > > some notice before X's command terminates, even though X's command will > > never meet the conditions for isolationtester to think it's blocked.
> > AFAICS that is nothing but wishful thinking. Why is it that only one of > > those places has failed so far? > > Are there really that many places expecting that? I've not gone through > this again exhaustively by any means, but most places seem to print > something only before waiting for a lock. ISTM the issue at hand isn't so much that X expects something to be printed by Y before it terminates, but that it expects the next step to not be executed before Y unlocks. If I understand the wrong output correctly, what happens is that "controller_print_speculative_locks" is executed, even though s1 hasn't yet acquired the next lock. Note how the +s1: NOTICE: blurt_and_lock_123() called for k1 in session 1 +s1: NOTICE: acquiring advisory lock on 2 is *after* "step controller_print_speculative_locks", not just after "step s2_upsert: <... completed>" To be clear, there'd still be an issue about whether the NOTICE is printed before/after the "s2_upsert: <... completed>", but it looks to me the issue is bigger than that. It's easy enough to add another wait in s2_upsert, but that doesn't help if the entire scheduling just continues regardless of there not really being a waiter. Am I missing something here? Greetings, Andres Freund