On Mon, Dec 7, 2020 at 2:55 PM Amit Kapila <amit.kapil...@gmail.com> wrote:
>
> On Mon, Dec 7, 2020 at 11:32 AM Hou, Zhijie <houzj.f...@cn.fujitsu.com> wrote:
> >
> > Hi
> >
> > +       /*
> > +        * Flag to let the planner know that the SELECT query is for CTAS. 
> > This is
> > +        * used to calculate the tuple transfer cost from workers to gather 
> > node(in
> > +        * case parallelism kicks in for the SELECT part of the CTAS), to 
> > zero as
> > +        * each worker will insert its share of tuples in parallel.
> > +        */
> > +       if (IsParallelInsertInCTASAllowed(into, NULL))
> > +               query->isForCTAS = true;
> >
> >
> > +       /*
> > +        * We do not compute the parallel_tuple_cost for CTAS because the 
> > number of
> > +        * tuples that are transferred from workers to the gather node is 
> > zero as
> > +        * each worker, in parallel, inserts the tuples that are resulted 
> > from its
> > +        * chunk of plan execution. This change may make the parallel plan 
> > cheap
> > +        * among all other plans, and influence the planner to consider this
> > +        * parallel plan.
> > +        */
> > +       if (!(root->parse->isForCTAS &&
> > +               root->query_level == 1))
> > +               run_cost += parallel_tuple_cost * path->path.rows;
> >
> > I noticed that the parallel_tuple_cost will still be ignored,
> > When Gather is not the top node.
> >
> > Example:
> >         Create table test(i int);
> >         insert into test values(generate_series(1,10000000,1));
> >         explain create table ntest3 as select * from test where i < 200 
> > limit 10000;
> >
> >                                   QUERY PLAN
> > -------------------------------------------------------------------------------
> >  Limit  (cost=1000.00..97331.33 rows=1000 width=4)
> >    ->  Gather  (cost=1000.00..97331.33 rows=1000 width=4)
> >          Workers Planned: 2
> >          ->  Parallel Seq Scan on test  (cost=0.00..96331.33 rows=417 
> > width=4)
> >                Filter: (i < 200)
> >
> >
> > The isForCTAS will be true because [create table as], the
> > query_level is always 1 because there is no subquery.
> > So even if gather is not the top node, parallel cost will still be ignored.
> >
> > Is that works as expected ?
> >
>
> I don't think that is expected and is not the case without this patch.
> The cost shouldn't be changed for existing cases where the write is
> not pushed to workers.
>

Thanks for pointing that out. Yes it should not change for the cases
where parallel inserts will not be picked later.

Any better suggestions on how to make the planner consider that the
CTAS might choose parallel inserts later at the same time avoiding the
above issue in case it doesn't?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


Reply via email to