On Tue, Feb 16, 2016 at 2:59 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Andres Freund <and...@anarazel.de> writes: >> I wonder if we shouldn't just expose a 'which pid is process X waiting >> for' API, implemented serverside. That's generally really useful, and >> looks like it's actually going to be less complicated than that >> query... And it's surely going to be faster. > > Attached is a draft patch for a new function that reports the set of PIDs > directly blocking a given PID (that is, holding or awaiting conflicting > locks on the lockable object it's waiting for). > > I replaced isolationtester's pg_locks query with this, and found that > it's about 9X faster in a normal build, and 3X faster with > CLOBBER_CACHE_ALWAYS turned on. That would give us some nice headroom > for the isolation tests with CLOBBER_CACHE_ALWAYS animals. (Note that > in view of > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=markhor&dt=2016-02-14%2007%3A38%3A37 > we still need to do *something* about the speed of the new deadlock-hard > test; this patch could avoid the need to dumb down or slow down that test > even further.) > > Not to be neglected also is that (I believe) this gives the right answer, > whereas isolationtester's existing query is currently completely broken by > parallel queries, and it doesn't understand non-conflicting lock modes > either. (It did, at least partially, before commit 38f8bdcac4982215; > I am not sure that taking out the mode checks was a good idea. But > putting them back would make the query slower yet.)
The reason I took that out is because it breaks the deadlock-soft test. It's possible to have a situation where no granted lock conflicts with an awaited lock. If that happens, the old query wrongly concluded that the waiting process was not in fact waiting. (Consider A hold AccessShareLock, B awaits AccessExclusiveLock, C now requests AccessShareLock and *waits*.) As for the patch itself, I'm having trouble grokking what it's trying to do. I think it might be worth having a comment defining precisely what we mean by "A blocks B". I would define "A blocks B" in general as either A holds a lock which conflicts with one sought by B (hard-blocked) or A awaits a lock which conflicts with one sought by B and precedes it in the wait queue (soft-blocked). I have wondered before if we shouldn't modify pg_locks to expose the wait-queue ordering; without that, you can't reliably determine in general whether A soft-blocks B, which means every view anyone has ever written over pg_locks that purports to say who blocks who is necessarily buggy. For parallel queries, there's a further relevant distinction when we say "A blocks B". We might mean either that (1) process B cannot resume execution until the lock conflict is resolved or (2) the group leader for process B cannot complete the current parallel operation until the lock conflict is resolved. If you're trying to figure out why one particular member of a parallel group is stuck, you want to answer question #1. If you're trying to figure out what all the things that need to get out of the way to finish the query, you want to answer question #2. I think this function is aiming to answer question #2, not question #1, but I'm less clear on the reason behind that definitional choice. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers