Thanks for the clarification. w.r.t. the commit message, maybe I was momentarily sidetracked by the phrase: With this change. It seems the scenarios you listed are known parallel safety constraints.
Probably rephrase that sentence a little bit to make this clearer. Cheers On Wed, Jan 6, 2021 at 8:01 PM Bharath Rupireddy < bharath.rupireddyforpostg...@gmail.com> wrote: > On Thu, Jan 7, 2021 at 5:12 AM Zhihong Yu <z...@yugabyte.com> wrote: > > > > Hi, > > For v20-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patch : > > > > workers to Gather node to 0. With this change, there are chances > > that the planner may choose the parallel plan. > > > > It would be nice if the scenarios where a parallel plan is not chosen > are listed. > > There are many reasons, the planner may not choose a parallel plan for > the select part, for instance if there are temporary tables, parallel > unsafe functions, or the parallelism GUCs are not set properly, > foreign tables and so on. see > https://www.postgresql.org/docs/devel/parallel-safety.html. I don't > think so, we will add all the scenarios into the commit message. > > Having said that, we have extensive comments in the code(especially in > the function SetParallelInsertState) about when parallel inserts are > chosen. > > + * Parallel insertions are possible only if the upper node is Gather. > */ > + if (!IsA(gstate, GatherState)) > return; > > + * Parallelize inserts only when the upper Gather node has no > projections. > */ > + if (!gstate->ps.ps_ProjInfo) > + { > + /* Okay to parallelize inserts, so mark it. */ > + if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS) > + ((DR_intorel *) dest)->is_parallel = true; > + > + /* > + * For parallelizing inserts, we must send some information so > that the > + * workers can build their own dest receivers. For CTAS, this > info is > + * into clause, object id (to open the created table). > + * > + * Since the required information is available in the dest > receiver, > + * store a reference to it in the Gather state so that it will be > used > + * in ExecInitParallelPlan to pick the information. > + */ > + gstate->dest = dest; > + } > + else > + { > + /* > + * Upper Gather node has projections, so parallel insertions are > not > + * allowed. > + */ > > > + if ((root->parse->parallelInsCmdTupleCostOpt & > > + PARALLEL_INSERT_SELECT_QUERY) && > > + (root->parse->parallelInsCmdTupleCostOpt & > > + PARALLEL_INSERT_CAN_IGN_TUP_COST)) > > + { > > + /* We are ignoring the parallel tuple cost, so mark it. */ > > + root->parse->parallelInsCmdTupleCostOpt |= > > + > PARALLEL_INSERT_TUP_COST_IGNORED; > > > > If I read the code correctly, when both PARALLEL_INSERT_SELECT_QUERY and > PARALLEL_INSERT_CAN_IGN_TUP_COST are set, PARALLEL_INSERT_TUP_COST_IGNORED > is implied. > > > > Maybe we don't need the PARALLEL_INSERT_TUP_COST_IGNORED enum - the > setting (1) of the first two bits should suffice. > > The way these flags work is as follows: before planning in CTAS, we > set PARALLEL_INSERT_SELECT_QUERY, before we go for generating upper > gather path we set PARALLEL_INSERT_CAN_IGN_TUP_COST, and when we > actually ignored the tuple cost in cost_gather we set > PARALLEL_INSERT_TUP_COST_IGNORED. There are chances that we set > PARALLEL_INSERT_CAN_IGN_TUP_COST before calling > generate_useful_gather_paths, and the function > generate_useful_gather_paths can return before reaching cost_gather, > see below snippets. So, we need the 3 flags. > > void > generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool > override_rows) > { > ListCell *lc; > double rows; > double *rowsp = NULL; > List *useful_pathkeys_list = NIL; > Path *cheapest_partial_path = NULL; > > /* If there are no partial paths, there's nothing to do here. */ > if (rel->partial_pathlist == NIL) > return; > > /* Should we override the rel's rowcount estimate? */ > if (override_rows) > rowsp = &rows; > > /* generate the regular gather (merge) paths */ > generate_gather_paths(root, rel, override_rows); > > void > generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool > override_rows) > { > Path *cheapest_partial_path; > Path *simple_gather_path; > ListCell *lc; > double rows; > double *rowsp = NULL; > > /* If there are no partial paths, there's nothing to do here. */ > if (rel->partial_pathlist == NIL) > return; > > > With Regards, > Bharath Rupireddy. > EnterpriseDB: http://www.enterprisedb.com >