On Thu, Jun 20, 2024 at 2:09 AM Alvaro Herrera <alvhe...@alvh.no-ip.org> wrote: > I hope we can get this new executor code in 18.
Thanks for doing the benchmark, Alvaro, and sorry for the late reply. Yes, I'm hoping to get *some* version of this into v18. I've been thinking how to move this forward and I'm starting to think that we should go back to or at least consider as an option the old approach of changing the plancache to do the initial runtime pruning instead of changing the executor to take locks, which is the design that the latest patch set tries to implement. Here are the challenges facing the implementation of the current design: 1. I went through many iterations of the changes to ExecInitNode() to return a partially initialized PlanState tree when it detects that the CachedPlan was invalidated after locking a child table and to ExecEndNode() to account for the PlanState tree sometimes being partially initialized, but it still seems fragile and bug-prone to me. It might be because this approach is fundamentally hard to get right or I haven't invested enough effort in becoming more confident in its robustness. 2. Refactoring needed due to the ExecutorStart() API change especially that pertaining to portals does not seem airtight. I'm especially worried about moving the ExecutorStart() call for the PORTAL_MULTI_QUERY case from where it is currently to PortalStart(). That requires additional bookkeeping in PortalData and I am not totally sure that the snapshot handling changes after that move are entirely correct. 3. The need to add *back* the fields to store the RT indexes of relations that are not looked at by ExecInitNode() traversal such as root partitioned tables and non-leaf partitions. I'm worried about #2 the most. One complaint about the previous design was that the interface changes to capture and pass the result of doing initial pruning in plancache.c to the executor did not look great. However, after having tried doing #2, the changes to pass the pruning result into the executor and changes to reuse it in ExecInit[Merge]Append() seem a tad bit simpler than the refactoring and adjustments needed to handle failed ExecutorStart() calls, at multiple code sites. About #1, I tend to agree with David that adding complexity around PlanState tree construction may not be a good idea, because we might want to rethink Plan initialization code and data structures in the not too distant future. One idea I thought of is to take the remaining locks (to wit, those on inheritance children if running a cached plan) at the beginning of InitPlan(), that is before ExecInitNode(), like we handle the permission checking, so that we don't need to worry about ever returning a partially initialized PlanState tree. However, we're still left with the tall task to implement #2 such that it doesn't break anything. Another concern about the old design was the unnecessary overhead of initializing bitmapset fields in PlannedStmt that are meant for the locking algorithm in AcquireExecutorLocks(). Andres suggested an idea offlist to either piggyback on cursorOptions argument of pg_plan_queries() or adding a new boolean parameter to let the planner know if the plan is one that might get cached and thus have AcquireExecutorLocks() called on it. Another idea David and I discussed offlist is inventing a RTELockInfo (cf RTEPermissionInfo) and only creating one for each RT entry that is un-prunable and do away with PlannedStmt.rtable. For partitioned tables, that entry will point to the PartitionPruneInfo that will contain the RT indexes of partitions (or maybe just OIDs) mapped from their subplan indexes that are returned by the pruning code. So AcquireExecutorLocks() will lock all un-prunable relations by referring to their RTELockInfo entries and for each entry that points to a PartitionPruneInfo with initial pruning steps, will only lock the partitions that survive the pruning. I am planning to polish that old patch set and post after playing with those new ideas. -- Thanks, Amit Langote