On Fri, 2010-04-30 at 08:47 -0500, Kevin Grittner wrote: > Peter Eisentraut <pete...@gmx.net> wrote: > > > This is not a deadlock. > > Agreed. > > > It's just two locks on the same table > > Not as I read it. > > > (and maybe a bit more readable formatting). > > Eliminating null columns and mangling column headers for length, I > get this: > > >> locktype | tranid | virtualx | pid | mode | gr > >> transactionid | 39773877 | 63/15761 | 11157 | ShareLock | f > >> transactionid | 39773877 | 4/10902 | 6421 | ExclusiveLock | t > > So it looks like two locks on the same transaction ID by different > transactions. How does that happen?
Exactly, I found that odd. And too, you're right in that looking at the locks doesn't suggest a deadlock. But the reality of pid 6421 stopping making any progress whatsoever makes me think about an unregistered lock - perhaps a mutex somewhere that isn't tracked by pg_locks. When it doesn't happen, the related queries take at most a few minutes to update ~30k rows. When the deadlock happens, though, same number of rows (same query in fact), it remained "working" for day and a half, no progress, every other update on the same page blocked by that big update. The server's load drops suggesting the big update isn't doing much other than waiting. If I manually cancel the small query, for instance, the big update starts consuming resources again (suggesting progress), but quickly deadlocks again (because my app retries the query I just cancelled). If I manually cancel the big query, of course, everything goes back to normal (after the little while it takes to roll back). I can't reproduce it in a test environment because it's tied to concurrency patterns that I couldn't reproduce yet. I will keep trying though.