Michael Paquier <mich...@paquier.xyz> writes: > On Fri, Jun 07, 2024 at 11:32:14AM +0800, jian he wrote: >> in deleteObjectsInList, under certain conditions trying to sort the to >> be deleted object list >> by just using sort_object_addresses seems to work, >> but it looks like a hack. >> maybe the proper fix would be in findDependentObjects.
> I have not studied the patch in details, but this looks > overcomplicated to me. I dunno about overcomplicated, but it's fundamentally the wrong thing: it won't do much except move the problem from this example to other example(s). The difficulty is precisely that we cannot simply delete objects in reverse OID order and expect that to be OK. It appears to work in simple cases because reverse OID order usually means deleting newest objects first, and that usually means dropping depender objects before dependees --- but dependencies added as a result of later ALTER commands may not be honored correctly. Not to mention that you can lose if an OID wraparound happened during the sequence of object creations. In the case at hand, the reason we're having trouble with g_intbig_options() is that the sequence of extension scripts run by CREATE EXTENSION creates the gist__intbig_ops opfamily first, and then creates g_intbig_options() and attaches it to the opfamily later (in intarray--1.2--1.3.sql). So g_intbig_options() has a larger OID than the opclass that the index depends on. In DROP EXTENSION, the first level of findDependentObjects() finds all the direct dependencies (members) of the extension, and then sorts them by OID descending, concluding that g_intbig_options() can be dropped before the opclass. Subsequent recursive levels will find the index and recognize that it must be dropped before the opclass --- but this fails to account for the fact that we'd better drop the opclass before any of the functions it depends on. At some point along the line, we will come across the dependency that says so; but we don't do anything in response, because if findDependentObjects() sees that the current object is already in targetObjects then it thinks it's done. I think when I wrote this code I was assuming that the dependency- order traversal performed by findDependentObjects() was sufficient to guarantee producing a safe deletion order, but it's now obvious that that's not so. At minimum, when findDependentObjects() finds that a dependency exists on an object that's already in targetObjects, it'd need to do something about moving that object to after the one it's working on. But I doubt we can fix it with just that, because that won't be enough to handle indirect dependencies. It looks to me that the only real fix will require performing a topological sort, similar to what pg_dump does, to produce a safe deletion order that honors all the direct and indirect dependencies found by findDependentObjects(). An open question is whether we will need dependency-loop-breaking logic, or whether the hackery done in findDependentObjects() is sufficient to ensure that we can assume there are no loops in the dependencies it chooses to output. It might be a better idea to get rid of that logic and instead have explicit loop-breaking like the way pg_dump does it. It's also tempting to wonder if we can share code for this with pg_dump. The TopoSort function alone is not that big, but if we have to duplicate the loop-breaking logic it'd get annoying. Anyway, this is a very long-standing problem and I don't think we should try to solve it in a rush. regards, tom lane