Re: fix cost subqueryscan wrong parallel cost

Tom Lane Fri, 29 Apr 2022 08:53:33 -0700

I wrote:
> -ENOCAFFEINE, sorry about that.

As penance for that blunder, I spent a little time looking into this
issue (responding to Robert's upthread complaint that it wasn't
explained clearly).  See the test case in parallel_subquery.sql,
attached, which is adapted from the test in incremental_sort.sql.
It produces these plans:


explain select * from t union select * from t;

 Unique  (cost=29344.85..30544.85 rows=120000 width=12)
   ->  Sort  (cost=29344.85..29644.85 rows=120000 width=12)
         Sort Key: t.a, t.b, t.c
         ->  Append  (cost=0.00..2950.00 rows=120000 width=12)
               ->  Gather  (cost=0.00..575.00 rows=60000 width=12)
                     Workers Planned: 2
                     ->  Parallel Seq Scan on t  (cost=0.00..575.00 rows=25000 
width=12)
               ->  Gather  (cost=0.00..575.00 rows=60000 width=12)
                     Workers Planned: 2
                     ->  Parallel Seq Scan on t t_1  (cost=0.00..575.00 
rows=25000 width=12)

explain select * from t union all select * from t;

 Gather  (cost=0.00..1400.00 rows=120000 width=12)
   Workers Planned: 2
   ->  Parallel Append  (cost=0.00..1400.00 rows=50000 width=12)
         ->  Parallel Seq Scan on t  (cost=0.00..575.00 rows=25000 width=12)
         ->  Parallel Seq Scan on t t_1  (cost=0.00..575.00 rows=25000 width=12)

I take no position on whether the second plan's costs are correct;
but if they are, then the Gather-atop-Append structure is clearly
cheaper than the Append-atop-Gathers structure, so the planner made
the wrong choice in the first case.

I then modified setrefs.c to not remove SubqueryScan nodes
(just make trivial_subqueryscan() return constant false)
and got this output from the UNION case:

 Unique  (cost=29344.85..30544.85 rows=120000 width=12)
   ->  Sort  (cost=29344.85..29644.85 rows=120000 width=12)
         Sort Key: "*SELECT* 1".a, "*SELECT* 1".b, "*SELECT* 1".c
         ->  Append  (cost=0.00..2950.00 rows=120000 width=12)
               ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..1175.00 
rows=60000 width=12)
                     ->  Gather  (cost=0.00..575.00 rows=60000 width=12)
                           Workers Planned: 2
                           ->  Parallel Seq Scan on t  (cost=0.00..575.00 
rows=25000 width=12)
               ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..1175.00 
rows=60000 width=12)
                     ->  Gather  (cost=0.00..575.00 rows=60000 width=12)
                           Workers Planned: 2
                           ->  Parallel Seq Scan on t t_1  (cost=0.00..575.00 
rows=25000 width=12)

The UNION ALL plan doesn't change, because no SubqueryScans are ever
created in that case (we flattened the UNION ALL to an appendrel
early on).  Removing the test case's settings to allow a parallel
plan, the non-parallelized plan for the UNION case looks like

 Unique  (cost=30044.85..31244.85 rows=120000 width=12)
   ->  Sort  (cost=30044.85..30344.85 rows=120000 width=12)
         Sort Key: "*SELECT* 1".a, "*SELECT* 1".b, "*SELECT* 1".c
         ->  Append  (cost=0.00..3650.00 rows=120000 width=12)
               ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..1525.00 
rows=60000 width=12)
                     ->  Seq Scan on t  (cost=0.00..925.00 rows=60000 width=12)
               ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..1525.00 
rows=60000 width=12)
                     ->  Seq Scan on t t_1  (cost=0.00..925.00 rows=60000 
width=12)

So: the row counts for SubqueryScan are correct, thus the upthread
proposal to change them is not.  The cost estimates, however, seem
pretty bogus.  What we're seeing here is that we're charging 600
cost units to pass the data through SubqueryScan, and that's
consistent across the parallel and non-parallel cases, so it's
correct on its own terms.  But actually it's totally wrong because
we're going to elide that node later.

So I think the actual problem here is that we leave the decision
to elide no-op SubqueryScan nodes till setrefs.c.  We should detect
that earlier, and when it applies, label the SubqueryScanPath with
the exact cost of its child.

(I think the current implementation might've been all right when
it was devised, on the grounds that we had no real planning
flexibility for UNION operations anyway.  But now we do, and the
bogus cost charge is causing visible problems.)

                        regards, tom lane

drop table if exists t;
create table t (a int, b int, c int);
insert into t select mod(i,10),mod(i,10),i from generate_series(1,60000) s(i);
create index on t (a);
analyze t;

set min_parallel_table_scan_size = '1kB';
set min_parallel_index_scan_size = '1kB';
set parallel_setup_cost = 0;
set parallel_tuple_cost = 0;
set max_parallel_workers_per_gather = 2;

set enable_hashagg to off;

explain select * from t union select * from t;
explain select * from t union all select * from t;

Re: fix cost subqueryscan wrong parallel cost

Reply via email to