On Wed, Feb 10, 2021 at 1:00 PM Amit Langote <amitlangot...@gmail.com> wrote: > > On Wed, Feb 10, 2021 at 1:35 PM Greg Nancarrow <gregn4...@gmail.com> wrote: > > On Wed, Feb 10, 2021 at 2:39 PM Amit Langote <amitlangot...@gmail.com> > > wrote: > > > > > > > The code check that you have identified above ensures that the INSERT > > > > has an underlying SELECT, because the planner won't (and shouldn't > > > > anyway) generate a parallel plan for INSERT...VALUES, so there is no > > > > point doing any parallel-safety checks in this case. > > > > It just so happens that the problem test case uses INSERT...VALUES - > > > > and it shouldn't have triggered the parallel-safety checks for > > > > parallel INSERT for this case anyway, because INSERT...VALUES can't > > > > (and shouldn't) be parallelized. > > > > > > AFAICS, max_parallel_hazard() path never bails from doing further > > > safety checks based on anything other than finding a query component > > > whose hazard level crosses context->max_interesting. > > > > It's parallel UNSAFE because it's not safe or even possible to have a > > parallel plan for this. > > (as UNSAFE is the max hazard level, no point in referencing > > context->max_interesting). > > And there are existing cases of bailing out and not doing further > > safety checks (even your v15_delta.diff patch placed code - for > > bailing out on "ON CONFLICT ... DO UPDATE" - underneath one such > > existing case in max_parallel_hazard_walker()): > > > > else if (IsA(node, Query)) > > { > > Query *query = (Query *) node; > > > > /* SELECT FOR UPDATE/SHARE must be treated as unsafe */ > > if (query->rowMarks != NULL) > > { > > context->max_hazard = PROPARALLEL_UNSAFE; > > return true; > > } > > In my understanding, the max_parallel_hazard() query tree walk is to > find constructs that are parallel unsafe in that they call code that > can't run in parallel mode. For example, FOR UPDATE/SHARE on > traditional heap AM tuples calls AssignTransactionId() which doesn't > support being called in parallel mode. Likewise ON CONFLICT ... DO > UPDATE calls heap_update() which doesn't support parallelism. I'm not > aware of any such hazards downstream of ExecValuesScan(). > > > >You're trying to > > > add something that bails based on second-guessing that a parallel path > > > would not be chosen, which I find somewhat objectionable. > > > > > > If the main goal of bailing out is to avoid doing the potentially > > > expensive modification safety check on the target relation, maybe we > > > should try to somehow make the check less expensive. I remember > > > reading somewhere in the thread about caching the result of this check > > > in relcache, but haven't closely studied the feasibility of doing so. > > > > > > > There's no "second-guessing" involved here. > > There is no underlying way of dividing up the VALUES data of > > "INSERT...VALUES" amongst the parallel workers, even if the planner > > was updated to produce a parallel-plan for the "INSERT...VALUES" case > > (apart from the fact that spawning off parallel workers to insert that > > data would almost always result in worse performance than a > > non-parallel plan...) > > The division of work for parallel workers is part of the table AM > > (scan) implementation, which is not invoked for "INSERT...VALUES". > > I don't disagree that the planner would not normally assign a parallel > path simply to pull values out of a VALUES list mentioned in the > INSERT command, but deciding something based on the certainty of it in > an earlier planning phase seems odd to me. Maybe that's just me > though. >
I think it is more of a case where neither is a need for parallelism nor we want to support parallelism of it. The other possibility for such a check could be at some earlier phase say in standard_planner [1] where we are doing checks for other constructs where we don't want parallelism (I think the check for 'parse->hasModifyingCTE' is quite similar). If you see in that check as well we just assume other operations to be in the category of parallel-unsafe. I think we should rule out such cases earlier than later. Do you have better ideas than what Greg has done to avoid parallelism for such cases? [1] - standard_planner() { .. if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 && IsUnderPostmaster && parse->commandType == CMD_SELECT && !parse->hasModifyingCTE && max_parallel_workers_per_gather > 0 && !IsParallelWorker()) { /* all the cheap tests pass, so scan the query tree */ glob->maxParallelHazard = max_parallel_hazard(parse); glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE); } else { /* skip the query tree scan, just assume it's unsafe */ glob->maxParallelHazard = PROPARALLEL_UNSAFE; glob->parallelModeOK = false; } -- With Regards, Amit Kapila.