Re: Parallel Inserts in CREATE TABLE AS

Dilip Kumar Thu, 24 Dec 2020 21:06:51 -0800

On Fri, Dec 25, 2020 at 10:04 AM Amit Kapila <amit.kapil...@gmail.com> wrote:
>
> On Fri, Dec 25, 2020 at 9:54 AM Bharath Rupireddy
> <bharath.rupireddyforpostg...@gmail.com> wrote:
> >
> > On Fri, Dec 25, 2020 at 7:12 AM vignesh C <vignes...@gmail.com> wrote:
> > > On Thu, Dec 24, 2020 at 11:29 AM Amit Kapila <amit.kapil...@gmail.com> 
> > > wrote:
> > > >
> > > > On Thu, Dec 24, 2020 at 10:25 AM vignesh C <vignes...@gmail.com> wrote:
> > > > >
> > > > > On Tue, Dec 22, 2020 at 2:16 PM Bharath Rupireddy
> > > > > <bharath.rupireddyforpostg...@gmail.com> wrote:
> > > > > >
> > > > > > On Tue, Dec 22, 2020 at 12:32 PM Bharath Rupireddy
> > > > > > Attaching v14 patch set that has above changes. Please consider this
> > > > > > for further review.
> > > > > >
> > > > >
> > > > > Few comments:
> > > > > In the below case, should create be above Gather?
> > > > > postgres=# explain  create table t7 as select * from t6;
> > > > >                             QUERY PLAN
> > > > > -------------------------------------------------------------------
> > > > >  Gather  (cost=0.00..9.17 rows=0 width=4)
> > > > >    Workers Planned: 2
> > > > >  ->  Create t7
> > > > >    ->  Parallel Seq Scan on t6  (cost=0.00..9.17 rows=417 width=4)
> > > > > (4 rows)
> > > > >
> > > > > Can we change it to something like:
> > > > > -------------------------------------------------------------------
> > > > > Create t7
> > > > >  -> Gather  (cost=0.00..9.17 rows=0 width=4)
> > > > >   Workers Planned: 2
> > > > >   ->  Parallel Seq Scan on t6  (cost=0.00..9.17 rows=417 width=4)
> > > > > (4 rows)
> > > > >
> > > >
> > > > I think it is better to have it in a way as in the current patch
> > > > because that reflects that we are performing insert/create below
> > > > Gather which is the purpose of this patch. I think this is similar to
> > > > what the Parallel Insert patch [1] has for a similar plan.
> > > >
> > > >
> > > > [1] - https://commitfest.postgresql.org/31/2844/
> > > >
> > >
> > > Also another thing that I felt was that actually the Gather nodes will 
> > > actually do the insert operation, the Create table will be done earlier 
> > > itself. Should we change Create table to Insert table something like 
> > > below:
> > >                              QUERY PLAN
> > > -------------------------------------------------------------------
> > >  Gather  (cost=0.00..9.17 rows=0 width=4)
> > >    Workers Planned: 2
> > >  ->  Insert table2 (instead of Create table2)
> > >    ->  Parallel Seq Scan on table1  (cost=0.00..9.17 rows=417 width=4)
> >
> > IMO, showing Insert under Gather makes sense if the query is INSERT
> > INTO SELECT as it's in the other patch [1]. Since here it is a CTAS
> > query, so having Create under Gather looks fine to me. This way we can
> > also distinguish the EXPLAINs of parallel inserts in INSERT INTO
> > SELECT and CTAS.
> >
>
> Right, IIRC, we have done the way it is in the patch for convenience
> and to move forward with it and come back to it later once all other
> parts of the patch are good.
>
> > And also, some might wonder that Create under Gather means that each
> > parallel worker is creating the table, it's actually not the creation
> > of the table that's parallelized but it's insertion. If required, we
> > can clarify it in CTAS docs with a sample EXPLAIN. I have not yet
> > added docs related to allowing parallel inserts in CTAS. Shall I add a
> > para saying when parallel inserts can be picked and how the sample
> > EXPLAIN looks? Thoughts?
> >
>
> Yeah, I don't see any problem with it, and maybe we can move  Explain
> related code to a separate patch. The reason is we don't display DDL
> part without parallelism and this might need a separate discussion.
>


This makes sense to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel Inserts in CREATE TABLE AS

Reply via email to