On Tue, Aug 20, 2024 at 11:53 PM Robert Haas <robertmh...@gmail.com> wrote: > On Tue, Aug 20, 2024 at 9:00 AM Amit Langote <amitlangot...@gmail.com> wrote: > > I think we'd modify plancache.c to postpone the locking of only > > prunable relations (i.e., partitions), so we're looking at only a > > handful of concurrent modifications that are going to cause execution > > errors. That's because we disallow many DDL modifications of > > partitions unless they are done via recursion from the parent, so the > > space of errors in practice would be smaller compared to if we were to > > postpone *all* cached plan locks to ExecInitNode() time. DROP INDEX > > a_partion_only_index comes to mind as something that might cause an > > error. I've not tested if other partition-only constraints can cause > > unsafe behaviors. > > This seems like a valid point to some extent, but in other contexts > we've had discussions about how we don't actually guarantee all that > much uniformity between a partitioned table and its partitions, and > it's been questioned whether we made the right decisions there. So I'm > not entirely sure that the surface area for problems here will be as > narrow as you're hoping -- I think we'd need to go through all of the > ALTER TABLE variants and think it through. But maybe the problems > aren't that bad.
Many changeable properties that are reflected in the RelationData of a partition after getting the lock on it seem to cause no issues as long as the executor code only looks at RelationData, which is true for most Scan nodes. It also seems true for ModifyTable which looks into RelationData for relation properties relevant to insert/deletes. The two things that don't cope are: * Index Scan nodes with concurrent DROP INDEX of partition-only indexes. * Concurrent DROP CONSTRAINT of partition-only CHECK and NOT NULL constraints can lead to incorrect result as I write below. > It does seem like constraints can change the plan. Imagine the > partition had a CHECK(false) constraint before and now doesn't, or > something. Yeah, if the CHECK constraint gets dropped concurrently, any new rows that got added after that will not be returned by executing a stale cached plan, because the plan would have been created based on the assumption that such rows shouldn't be there due to the CHECK constraint. We currently don't explicitly check that the constraints that were used during planning still exist before executing the plan. Overall, I'm starting to feel less enthused by the idea throwing an error in the executor due to known and unknown hazards of trying to execute a stale plan. Even if we made a note in the docs of such hazards, any users who run into these rare errors are likely to head to -bugs or -hackers anyway. Tom said we should perhaps look at the hazards caused by intra-session locking, but we'd still be left with the hazards of missing index and constraints, AFAICS, due to DROP from other sessions. So, the options: * The replanning aspect of the lock-in-the-executor design would be simpler if a CachedPlan contained the plan for a single query rather than a list of queries, as previously mentioned. This is particularly due to the requirements of the PORTAL_MULTI_QUERY case. However, this option might be impractical. * Polish the patch for the old design of doing the initial pruning before AcquireExecutorLocks() and focus on hashing out any bugs and issues of that design. -- Thanks, Amit Langote