Excerpts from Noah Misch's message of sáb jul 16 13:11:49 -0400 2011: > In any event, I have attached a patch that fixes the problems I have described > here. To ignore autovacuum, it only recognizes a wait when one of the > backends under test holds a conflicting lock. (It occurs to me that perhaps > we should expose a pg_lock_conflicts(lockmode_held text, lockmode_req text) > function to simplify this query -- this is a fairly common monitoring need.)
Applied it. I agree that having such an utility function is worthwhile, particularly if we're working on making pg_locks more usable as a whole. (I wasn't able to reproduce Rémi's hangups here, so I wasn't able to reproduce the other bits either.) > With that change in place, my setup survived through about fifty suite runs at > a time. The streak would end when session 2 would unexpectedly detect a > deadlock that session 1 should have detected. The session 1 deadlock_timeout > I chose, 20ms, is too aggressive. When session 2 is to issue the command that > completes the deadlock, it must do so before session 1 runs the deadlock > detector. Since we burn 10ms just noticing that the previous statement has > blocked, that left only 10ms to issue the next statement. This patch bumps > the figure from 20s to 100ms; hopefully that will be enough for even a > decently-loaded virtual host. Committed this too. > With this patch in its final form, I have completed 180+ suite runs without a > failure. In the absence of better theories on the cause for the buildfarm > failures, we should give the buildfarm a whirl with this patch. Great. If there is some other failure mechanism, we'll find out ... > I apologize for the quantity of errata this change is entailing. No need to apologize. I might as well apologize myself because I didn't detect these problems on review. But we don't do that -- we just fix the problems and move on. It's great that you were able to come up with a fix quickly. And this is precisely why I committed this way ahead of the patch that it was written to help: we're now not fixing problems in both simultaneously. By the time we get that other patch in, this test harness will be fully robust. Thanks for all your effort in this. -- Álvaro Herrera <alvhe...@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers