On Fri, Dec 31, 2021 at 7:56 AM Amit Langote <amitlangot...@gmail.com> wrote:
>
> On Tue, Dec 28, 2021 at 22:12 Ashutosh Bapat <ashutosh.bapat....@gmail.com> 
> wrote:
>>
>> On Sat, Dec 25, 2021 at 9:06 AM Amit Langote <amitlangot...@gmail.com> wrote:
>> >
>> > Executing generic plans involving partitions is known to become slower
>> > as partition count grows due to a number of bottlenecks, with
>> > AcquireExecutorLocks() showing at the top in profiles.
>> >
>> > Previous attempt at solving that problem was by David Rowley [1],
>> > where he proposed delaying locking of *all* partitions appearing under
>> > an Append/MergeAppend until "initial" pruning is done during the
>> > executor initialization phase.  A problem with that approach that he
>> > has described in [2] is that leaving partitions unlocked can lead to
>> > race conditions where the Plan node belonging to a partition can be
>> > invalidated when a concurrent session successfully alters the
>> > partition between AcquireExecutorLocks() saying the plan is okay to
>> > execute and then actually executing it.
>> >
>> > However, using an idea that Robert suggested to me off-list a little
>> > while back, it seems possible to determine the set of partitions that
>> > we can safely skip locking.  The idea is to look at the "initial" or
>> > "pre-execution" pruning instructions contained in a given Append or
>> > MergeAppend node when AcquireExecutorLocks() is collecting the
>> > relations to lock and consider relations from only those sub-nodes
>> > that survive performing those instructions.   I've attempted
>> > implementing that idea in the attached patch.
>> >
>>
>> In which cases, we will have "pre-execution" pruning instructions that
>> can be used to skip locking partitions? Can you please give a few
>> examples where this approach will be useful?
>
>
> This is mainly to be useful for prepared queries, so something like:
>
> prepare q as select * from partitioned_table where key = $1;
>
> And that too when execute q(…) uses a generic plan. Generic plans are 
> problematic because it must contain nodes for all partitions (without any 
> plan time pruning), which means CheckCachedPlan() has to spend time 
> proportional to the number of partitions to determine that the plan is still 
> usable / has not been invalidated; most of that is AcquireExecutorLocks().
>
> Other bottlenecks, not addressed in this patch, pertain to some executor 
> startup/shutdown subroutines that process the range table of a PlannedStmt in 
> its entirety, whose length is also proportional to the number of partitions 
> when the plan is generic.
>
>> The benchmark is showing good results, indeed.
>
Indeed.

Here are few comments for v1 patch:

+   /* Caller error if we get here without contains_init_steps */
+   Assert(pruneinfo->contains_init_steps);

-       prunedata = prunestate->partprunedata[i];
-       pprune = &prunedata->partrelprunedata[0];

-       /* Perform pruning without using PARAM_EXEC Params */
-       find_matching_subplans_recurse(prunedata, pprune, true, &result);
+   if (parentrelids)
+       *parentrelids = NULL;

You got two blank lines after Assert.
--

+   /* Set up EState if not in the executor proper. */
+   if (estate == NULL)
+   {
+       estate = CreateExecutorState();
+       estate->es_param_list_info = params;
+       free_estate = true;
    }

... [Skip]

+   if (free_estate)
+   {
+       FreeExecutorState(estate);
+       estate = NULL;
    }

I think this work should be left to the caller.
--

    /*
     * Stuff that follows matches exactly what ExecCreatePartitionPruneState()
     * does, except we don't need a PartitionPruneState here, so don't call
     * that function.
     *
     * XXX some refactoring might be good.
     */

+1, while doing it would be nice if foreach_current_index() is used
instead of the i & j sequence in the respective foreach() block, IMO.
--

+                   while ((i = bms_next_member(validsubplans, i)) >= 0)
+                   {
+                       Plan   *subplan = list_nth(subplans, i);
+
+                       context->relations =
+                           bms_add_members(context->relations,
+                                           get_plan_scanrelids(subplan));
+                   }

I think instead of get_plan_scanrelids() the
GetLockableRelations_worker() can be used; if so, then no need to add
get_plan_scanrelids() function.
--

     /* Nodes containing prunable subnodes. */
+       case T_MergeAppend:
+           {
+               PlannedStmt *plannedstmt = context->plannedstmt;
+               List       *rtable = plannedstmt->rtable;
+               ParamListInfo params = context->params;
+               PartitionPruneInfo *pruneinfo;
+               Bitmapset  *validsubplans;
+               Bitmapset  *parentrelids;

...
                if (pruneinfo && pruneinfo->contains_init_steps)
                {
                    int     i;
...
                   return false;
                }
            }
            break;

Most of the declarations need to be moved inside the if-block.

Also, initially, I was a bit concerned regarding this code block
inside GetLockableRelations_worker(), what if (pruneinfo &&
pruneinfo->contains_init_steps) evaluated to false? After debugging I
realized that plan_tree_walker() will do the needful -- a bit of
comment would have helped.
--

+       case T_CustomScan:
+           foreach(lc, ((CustomScan *) plan)->custom_plans)
+           {
+               if (walker((Plan *) lfirst(lc), context))
+                   return true;
+           }
+           break;

Why not plan_walk_members() call like other nodes?

Regards,
Amul


Reply via email to