On Fri, Oct 31, 2014 at 2:46 PM, Simon Riggs <si...@2ndquadrant.com> wrote: > On 31 October 2014 18:29, Robert Haas <robertmh...@gmail.com> wrote: >> Suppose somebody fires off a parallel sort on a text column, or a >> parallel sequential scan involving a qual of the form textcol = 'zog'. >> We launch a bunch of workers to do comparisons; they do lookups >> against pg_collation. After some but not all of them have loaded the >> collation information they need into their local cache, the DBA types >> "cluster pg_collate". It queues up for an AccessExclusiveLock. The >> remaining parallel workers join the wait queue waiting for their >> AccessShareLock. The system is now deadlocked; the only solution is to >> move the parallel workers ahead of the AccessExclusiveLock request, >> but the deadlock detector won't do that unless it knows about the >> relationship among the parallel workers. > > It's an obscure case and its not the only solution either.
I don't think that's an obscure situation at all. Do you really think a patch that could cause an attempt to VACUUM FULL a system catalog to suffer an undetected deadlock meets this community's quality standards? Because that's what we're talking about. You are right that it isn't the only solution. I have said the same thing myself, multiple times, on this thread. > I'm really surprised that having workers do their own locking doesn't > scare you. Personally, it scares me. Frankly, the reverse decision scares me a heck of a lot more. It superficially seems like a good idea to have the master take locks on behalf of the workers, but when you start to really think about how many low-level parts of the code take locks, it quickly becomes evident, at least to me, that trying to make the resulting system robust will be a nightmare. Don't forget that there are not only relation locks but also page locks, tuple locks, relation extension locks, XID and VXID locks, locks on arbitrary database or shared objects. The deadlock detector handles them all. If avoiding deadlock outside the lock manager is so easy, why do we have a deadlock detector in the first place? Why not just change all of our other code to avoid them instead? > It's not like we're the first do parallel query. Honestly, how many > parallel databases do you think do this? All of them. I might be wrong, of course, but I see no reason at all to believe that Oracle or SQL Server have ignoring deadlock detection when parallel query is in use. > I can't see this being a practical, workable solution for running a > parallel query across a cluster of many nodes. Shared, distributed > lock table? I am not proposing a distributed lock manager that can run across many nodes. The considerations for such a piece of software are largely orthogonal to the problem that I'm actually trying to solve here, and at least an order of magnitude more complex. If you make it sound like that's the project that I'm doing, then of course it's going to sound frighteningly complicated. But I'm not. I'm sort of baffled by your concern on this point. Sure, the lock manager is a complex and performance-sensitive piece of code, and we have to be careful not to break it or make it slower. But we're not talking about launching a rocket to Mars here; we're just talking about making the lock manager interface look the same way to two cooperating backends issuing lock requests in parallel that it looks to a single backend making those requests serially. I don't think it's noticeably harder than getting Hot Standby conflict resolution work, or getting historical snapshots working for logical decoding, or making the visibility map crash-safe - in fact, I'd bet on it being substantially easier than the first two of those, because all the logic is inside a single, relatively simple backend subsystem. Why we'd want to minimize complexity there at the cost of spreading it across half the backend is mysterious to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers