Jimmy Yih <j...@vmware.com> writes: > When dropping a partitioned index, the locking order starts with the > partitioned table, then its partitioned index, then the partition > indexes dependent on that partitioned index, and finally the dependent > partition indexes' parent tables. This order allows a deadlock > scenario to happen if for example an ANALYZE happens on one of the > partition tables which locks the partition table and then blocks when > it attempts to lock its index (the DROP INDEX has the index lock but > cannot get the partition table lock).
I agree this is a bug, and I think that you may have the right general idea about how to fix it: acquire the necessary locks in RangeVarCallbackForDropRelation. However, this patch still needs a fair amount of work. 1. RangeVarCallbackForDropRelation can get called more than once during a lookup (in case of concurrent rename and suchlike). If so it needs to be prepared to drop the lock(s) it got last time. You have not implemented any such logic. This doesn't seem hard to fix, just store the OID list into struct DropRelationCallbackState. 2. I'm not sure you have fully thought through the interaction with the subsequent "is_partition" stanza. If the target is an intermediate partitioned index, don't we need to acquire lock on its parent index before starting to lock children? (It may be that that stanza is already in the wrong place relative to the table-locking stanza it currently follows, but not sure.) 3. Calling find_all_inheritors with lockmode NoLock, and then locking those relation OIDs later, is both unsafe and code-wasteful. Just tell find_all_inheritors to get the locks you want. 4. This code will invoke find_all_inheritors on InvalidOid if IndexGetRelation fails. It needs to be within the if-test just above. 5. Reading classform again at this point in the function is not merely inefficient, but outright wrong, because we already released the syscache entry. Use the local variable. 6. You've not paid enough attention to updating existing comments, particularly the header comment for RangeVarCallbackForDropRelation. Actually though, maybe you *don't* want to do this in RangeVarCallbackForDropRelation. Because of point 2, it might be better to run find_all_inheritors after we've successfully identified and locked the direct target relation, ie do it back in RemoveRelations. I've not thought hard about that, but it's attractive if only because it'd mean you don't have to fix point 1. regards, tom lane