Re: Parallel Inserts in CREATE TABLE AS

Zhihong Yu Wed, 06 Jan 2021 20:22:18 -0800

Thanks for the clarification.

w.r.t. the commit message, maybe I was momentarily sidetracked by the
phrase: With this change.
It seems the scenarios you listed are known parallel safety constraints.


Probably rephrase that sentence a little bit to make this clearer.

Cheers

On Wed, Jan 6, 2021 at 8:01 PM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Thu, Jan 7, 2021 at 5:12 AM Zhihong Yu <z...@yugabyte.com> wrote:
> >
> > Hi,
> > For v20-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patch :
> >
> > workers to Gather node to 0. With this change, there are chances
> > that the planner may choose the parallel plan.
> >
> > It would be nice if the scenarios where a parallel plan is not chosen
> are listed.
>
> There are many reasons, the planner may not choose a parallel plan for
> the select part, for instance if there are temporary tables, parallel
> unsafe functions, or the parallelism GUCs are not set properly,
> foreign tables and so on. see
> https://www.postgresql.org/docs/devel/parallel-safety.html. I don't
> think so, we will add all the scenarios into the commit message.
>
> Having said that, we have extensive comments in the code(especially in
> the function SetParallelInsertState) about when parallel inserts are
> chosen.
>
> +     * Parallel insertions are possible only if the upper node is Gather.
>       */
> +    if (!IsA(gstate, GatherState))
>          return;
>
> +     * Parallelize inserts only when the upper Gather node has no
> projections.
>       */
> +    if (!gstate->ps.ps_ProjInfo)
> +    {
> +        /* Okay to parallelize inserts, so mark it. */
> +        if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> +            ((DR_intorel *) dest)->is_parallel = true;
> +
> +        /*
> +         * For parallelizing inserts, we must send some information so
> that the
> +         * workers can build their own dest receivers. For CTAS, this
> info is
> +         * into clause, object id (to open the created table).
> +         *
> +         * Since the required information is available in the dest
> receiver,
> +         * store a reference to it in the Gather state so that it will be
> used
> +         * in ExecInitParallelPlan to pick the information.
> +         */
> +        gstate->dest = dest;
> +    }
> +    else
> +    {
> +        /*
> +         * Upper Gather node has projections, so parallel insertions are
> not
> +         * allowed.
> +         */
>
> > +   if ((root->parse->parallelInsCmdTupleCostOpt &
> > +        PARALLEL_INSERT_SELECT_QUERY) &&
> > +       (root->parse->parallelInsCmdTupleCostOpt &
> > +        PARALLEL_INSERT_CAN_IGN_TUP_COST))
> > +   {
> > +       /* We are ignoring the parallel tuple cost, so mark it. */
> > +       root->parse->parallelInsCmdTupleCostOpt |=
> > +
>  PARALLEL_INSERT_TUP_COST_IGNORED;
> >
> > If I read the code correctly, when both PARALLEL_INSERT_SELECT_QUERY and
> PARALLEL_INSERT_CAN_IGN_TUP_COST are set, PARALLEL_INSERT_TUP_COST_IGNORED
> is implied.
> >
> > Maybe we don't need the PARALLEL_INSERT_TUP_COST_IGNORED enum - the
> setting (1) of the first two bits should suffice.
>
> The way these flags work is as follows: before planning in CTAS, we
> set PARALLEL_INSERT_SELECT_QUERY, before we go for generating upper
> gather path we set PARALLEL_INSERT_CAN_IGN_TUP_COST, and when we
> actually ignored the tuple cost in cost_gather we set
> PARALLEL_INSERT_TUP_COST_IGNORED. There are chances that we set
> PARALLEL_INSERT_CAN_IGN_TUP_COST before calling
> generate_useful_gather_paths, and the function
> generate_useful_gather_paths can return before reaching cost_gather,
> see below snippets. So, we need the 3 flags.
>
> void
> generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool
> override_rows)
> {
>     ListCell   *lc;
>     double        rows;
>     double       *rowsp = NULL;
>     List       *useful_pathkeys_list = NIL;
>     Path       *cheapest_partial_path = NULL;
>
>     /* If there are no partial paths, there's nothing to do here. */
>     if (rel->partial_pathlist == NIL)
>         return;
>
>     /* Should we override the rel's rowcount estimate? */
>     if (override_rows)
>         rowsp = &rows;
>
>     /* generate the regular gather (merge) paths */
>     generate_gather_paths(root, rel, override_rows);
>
> void
> generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool
> override_rows)
> {
>     Path       *cheapest_partial_path;
>     Path       *simple_gather_path;
>     ListCell   *lc;
>     double        rows;
>     double       *rowsp = NULL;
>
>     /* If there are no partial paths, there's nothing to do here. */
>     if (rel->partial_pathlist == NIL)
>         return;
>
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
>

Re: Parallel Inserts in CREATE TABLE AS

Reply via email to