Re: Parallel INSERT (INTO ... SELECT ...)

Amit Kapila Wed, 10 Feb 2021 00:24:14 -0800

On Wed, Feb 10, 2021 at 1:00 PM Amit Langote <amitlangot...@gmail.com> wrote:
>
> On Wed, Feb 10, 2021 at 1:35 PM Greg Nancarrow <gregn4...@gmail.com> wrote:
> > On Wed, Feb 10, 2021 at 2:39 PM Amit Langote <amitlangot...@gmail.com> 
> > wrote:
> > >
> > > > The code check that you have identified above ensures that the INSERT
> > > > has an underlying SELECT, because the planner won't (and shouldn't
> > > > anyway) generate a parallel plan for INSERT...VALUES, so there is no
> > > > point doing any parallel-safety checks in this case.
> > > > It just so happens that the problem test case uses INSERT...VALUES -
> > > > and it shouldn't have triggered the parallel-safety checks for
> > > > parallel INSERT for this case anyway, because INSERT...VALUES can't
> > > > (and shouldn't) be parallelized.
> > >
> > > AFAICS, max_parallel_hazard() path never bails from doing further
> > > safety checks based on anything other than finding a query component
> > > whose hazard level crosses context->max_interesting.
> >
> > It's parallel UNSAFE because it's not safe or even possible to have a
> > parallel plan for this.
> > (as UNSAFE is the max hazard level, no point in referencing
> > context->max_interesting).
> > And there are existing cases of bailing out and not doing further
> > safety checks (even your v15_delta.diff patch placed code - for
> > bailing out on "ON CONFLICT ... DO UPDATE" - underneath one such
> > existing case in max_parallel_hazard_walker()):
> >
> >     else if (IsA(node, Query))
> >     {
> >         Query      *query = (Query *) node;
> >
> >         /* SELECT FOR UPDATE/SHARE must be treated as unsafe */
> >         if (query->rowMarks != NULL)
> >         {
> >             context->max_hazard = PROPARALLEL_UNSAFE;
> >             return true;
> >         }
>
> In my understanding, the max_parallel_hazard() query tree walk is to
> find constructs that are parallel unsafe in that they call code that
> can't run in parallel mode.  For example, FOR UPDATE/SHARE on
> traditional heap AM tuples calls AssignTransactionId() which doesn't
> support being called in parallel mode.  Likewise ON CONFLICT ... DO
> UPDATE calls heap_update() which doesn't support parallelism.  I'm not
> aware of any such hazards downstream of ExecValuesScan().
>
> > >You're trying to
> > > add something that bails based on second-guessing that a parallel path
> > > would not be chosen, which I find somewhat objectionable.
> > >
> > > If the main goal of bailing out is to avoid doing the potentially
> > > expensive modification safety check on the target relation, maybe we
> > > should try to somehow make the check less expensive.  I remember
> > > reading somewhere in the thread about caching the result of this check
> > > in relcache, but haven't closely studied the feasibility of doing so.
> > >
> >
> > There's no "second-guessing" involved here.
> > There is no underlying way of dividing up the VALUES data of
> > "INSERT...VALUES" amongst the parallel workers, even if the planner
> > was updated to produce a parallel-plan for the "INSERT...VALUES" case
> > (apart from the fact that spawning off parallel workers to insert that
> > data would almost always result in worse performance than a
> > non-parallel plan...)
> > The division of work for parallel workers is part of the table AM
> > (scan) implementation, which is not invoked for "INSERT...VALUES".
>
> I don't disagree that the planner would not normally assign a parallel
> path simply to pull values out of a VALUES list mentioned in the
> INSERT command, but deciding something based on the certainty of it in
> an earlier planning phase seems odd to me.  Maybe that's just me
> though.
>


I think it is more of a case where neither is a need for parallelism
nor we want to support parallelism of it. The other possibility for
such a check could be at some earlier phase say in standard_planner
[1] where we are doing checks for other constructs where we don't want
parallelism (I think the check for 'parse->hasModifyingCTE' is quite
similar). If you see in that check as well we just assume other
operations to be in the category of parallel-unsafe. I think we should
rule out such cases earlier than later. Do you have better ideas than
what Greg has done to avoid parallelism for such cases?

[1] -
standard_planner()
{
..
if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
IsUnderPostmaster &&
parse->commandType == CMD_SELECT &&
!parse->hasModifyingCTE &&
max_parallel_workers_per_gather > 0 &&
!IsParallelWorker())
{
/* all the cheap tests pass, so scan the query tree */
glob->maxParallelHazard = max_parallel_hazard(parse);
glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
}
else
{
/* skip the query tree scan, just assume it's unsafe */
glob->maxParallelHazard = PROPARALLEL_UNSAFE;
glob->parallelModeOK = false;
}

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

Reply via email to