On Fri, Dec 31, 2021 at 7:56 AM Amit Langote <amitlangot...@gmail.com> wrote: > > On Tue, Dec 28, 2021 at 22:12 Ashutosh Bapat <ashutosh.bapat....@gmail.com> > wrote: >> >> On Sat, Dec 25, 2021 at 9:06 AM Amit Langote <amitlangot...@gmail.com> wrote: >> > >> > Executing generic plans involving partitions is known to become slower >> > as partition count grows due to a number of bottlenecks, with >> > AcquireExecutorLocks() showing at the top in profiles. >> > >> > Previous attempt at solving that problem was by David Rowley [1], >> > where he proposed delaying locking of *all* partitions appearing under >> > an Append/MergeAppend until "initial" pruning is done during the >> > executor initialization phase. A problem with that approach that he >> > has described in [2] is that leaving partitions unlocked can lead to >> > race conditions where the Plan node belonging to a partition can be >> > invalidated when a concurrent session successfully alters the >> > partition between AcquireExecutorLocks() saying the plan is okay to >> > execute and then actually executing it. >> > >> > However, using an idea that Robert suggested to me off-list a little >> > while back, it seems possible to determine the set of partitions that >> > we can safely skip locking. The idea is to look at the "initial" or >> > "pre-execution" pruning instructions contained in a given Append or >> > MergeAppend node when AcquireExecutorLocks() is collecting the >> > relations to lock and consider relations from only those sub-nodes >> > that survive performing those instructions. I've attempted >> > implementing that idea in the attached patch. >> > >> >> In which cases, we will have "pre-execution" pruning instructions that >> can be used to skip locking partitions? Can you please give a few >> examples where this approach will be useful? > > > This is mainly to be useful for prepared queries, so something like: > > prepare q as select * from partitioned_table where key = $1; > > And that too when execute q(…) uses a generic plan. Generic plans are > problematic because it must contain nodes for all partitions (without any > plan time pruning), which means CheckCachedPlan() has to spend time > proportional to the number of partitions to determine that the plan is still > usable / has not been invalidated; most of that is AcquireExecutorLocks(). > > Other bottlenecks, not addressed in this patch, pertain to some executor > startup/shutdown subroutines that process the range table of a PlannedStmt in > its entirety, whose length is also proportional to the number of partitions > when the plan is generic. > >> The benchmark is showing good results, indeed. > Indeed.
Here are few comments for v1 patch: + /* Caller error if we get here without contains_init_steps */ + Assert(pruneinfo->contains_init_steps); - prunedata = prunestate->partprunedata[i]; - pprune = &prunedata->partrelprunedata[0]; - /* Perform pruning without using PARAM_EXEC Params */ - find_matching_subplans_recurse(prunedata, pprune, true, &result); + if (parentrelids) + *parentrelids = NULL; You got two blank lines after Assert. -- + /* Set up EState if not in the executor proper. */ + if (estate == NULL) + { + estate = CreateExecutorState(); + estate->es_param_list_info = params; + free_estate = true; } ... [Skip] + if (free_estate) + { + FreeExecutorState(estate); + estate = NULL; } I think this work should be left to the caller. -- /* * Stuff that follows matches exactly what ExecCreatePartitionPruneState() * does, except we don't need a PartitionPruneState here, so don't call * that function. * * XXX some refactoring might be good. */ +1, while doing it would be nice if foreach_current_index() is used instead of the i & j sequence in the respective foreach() block, IMO. -- + while ((i = bms_next_member(validsubplans, i)) >= 0) + { + Plan *subplan = list_nth(subplans, i); + + context->relations = + bms_add_members(context->relations, + get_plan_scanrelids(subplan)); + } I think instead of get_plan_scanrelids() the GetLockableRelations_worker() can be used; if so, then no need to add get_plan_scanrelids() function. -- /* Nodes containing prunable subnodes. */ + case T_MergeAppend: + { + PlannedStmt *plannedstmt = context->plannedstmt; + List *rtable = plannedstmt->rtable; + ParamListInfo params = context->params; + PartitionPruneInfo *pruneinfo; + Bitmapset *validsubplans; + Bitmapset *parentrelids; ... if (pruneinfo && pruneinfo->contains_init_steps) { int i; ... return false; } } break; Most of the declarations need to be moved inside the if-block. Also, initially, I was a bit concerned regarding this code block inside GetLockableRelations_worker(), what if (pruneinfo && pruneinfo->contains_init_steps) evaluated to false? After debugging I realized that plan_tree_walker() will do the needful -- a bit of comment would have helped. -- + case T_CustomScan: + foreach(lc, ((CustomScan *) plan)->custom_plans) + { + if (walker((Plan *) lfirst(lc), context)) + return true; + } + break; Why not plan_walk_members() call like other nodes? Regards, Amul