Hi , > On Thu, Nov 26, 2020 at 7:47 AM Hou, Zhijie <houzj.f...@cn.fujitsu.com> > wrote: > > > > Hi, > > > > I have an issue about the following code: > > > > econtext = node->ps.ps_ExprContext; > > ResetExprContext(econtext); > > > > + if (ISCTAS(node->ps.intoclause)) > > + { > > + ExecParallelInsertInCTAS(node); > > + return NULL; > > + } > > > > /* If no projection is required, we're done. */ > > if (node->ps.ps_ProjInfo == NULL) > > return slot; > > > > /* > > * Form the result tuple using ExecProject(), and return it. > > */ > > econtext->ecxt_outertuple = slot; > > return ExecProject(node->ps.ps_ProjInfo); > > > > It seems the projection will be skipped. > > Is this because projection is not required in this case ? > > (I'm not very familiar with where the projection will be.) > > > > For parallel inserts in CTAS, I don't think we need to project the tuples > being returned from the underlying plan nodes, and also we have nothing > to project from the Gather node further up. The required projection will > happen while the tuples are being returned from the underlying nodes and > the projected tuples are being directly fed to CTAS's dest receiver > intorel_receive(), from there into the created table. We don't need > ExecProject again in ExecParallelInsertInCTAS(). > > For instance, projection will always be done when the tuple is being returned > from an underlying sequential scan node(see ExecScan() --> > ExecProject() and this is true for both leader and workers. In both leader > and workers, we are just calling CTAS's dest receiver intorel_receive(). > > Thoughts?
I took a deep look at the projection logic. In most cases, you are right that Gather node does not need projection. In some rare cases, such as Subplan (or initplan I guess). The projection will happen in Gather node. The example: Create table test(i int); Create table test2(a int, b int); insert into test values(generate_series(1,10000000,1)); insert into test2 values(generate_series(1,1000,1), generate_series(1,1000,1)); postgres=# explain(verbose, costs off) select test.i,(select i from (select * from test2) as tt limit 1) from test where test.i < 2000; QUERY PLAN ---------------------------------------- Gather Output: test.i, (SubPlan 1) Workers Planned: 2 -> Parallel Seq Scan on public.test Output: test.i Filter: (test.i < 2000) SubPlan 1 -> Limit Output: (test.i) -> Seq Scan on public.test2 Output: test.i In this case, projection is necessary, because the subplan will be executed in projection. If skipped, the table created will loss some data. Best regards, houzj