Hi,

I've been re-running the TPC-H benchmark, to remind myself the common
issues with OLAP workloads, and one of the most annoying problems seems
to be the misestimates in Q2. The query is not particularly complex,
although it does have a correlated subquery with an aggregate, but it's
one of the queries prone to a cascade of nested loops running forever.
I wonder if there's something we could do to handle this better.

A raw Q2 looks like this:

    select
        s_acctbal,
        s_name,
        n_name,
        p_partkey,
        p_mfgr,
        s_address,
        s_phone,
        s_comment
    from
        part,
        supplier,
        partsupp,
        nation,
        region
    where
        p_partkey = ps_partkey
        and s_suppkey = ps_suppkey
        and p_size = 16
        and p_type like '%NICKEL'
        and s_nationkey = n_nationkey
        and n_regionkey = r_regionkey
        and r_name = 'AMERICA'
        and ps_supplycost = (
            select
                min(ps_supplycost)
            from
                partsupp,
                supplier,
                nation,
                region
            where
                p_partkey = ps_partkey
                and s_suppkey = ps_suppkey
                and s_nationkey = n_nationkey
                and n_regionkey = r_regionkey
                and r_name = 'AMERICA'
        )
    order by
        s_acctbal desc,
        n_name,
        s_name,
        p_partkey;

and the full query plan is attached (q2-original-plan.txt).

The relevant part of the plan is the final join, which also considers
the subplan result (all estimates are for scale 10):

   ->  Merge Join  (cost=638655.36..1901120.61 rows=1 width=192)
                   (actual time=7299.121..10993.517 rows=4737 loops=1)
         Merge Cond: (part.p_partkey = partsupp.ps_partkey)
         Join Filter: (partsupp.ps_supplycost = (SubPlan 1))
         Rows Removed by Join Filter: 1661

Yeah, this is estimated as 1 row but actually returns 4737 rows. All
the other nodes are estimated very accurately, it's just this final join
that is entirely wrong.

If you tweak the costs a bit (e.g. reducing random_page_cost etc.) the
plan can easily switch to nested loops, with this join much deeper in
the plan. See the attached q2-nested-loops.txt for an example (I had to
disable merge/hash joins to trigger this on scale 10, on larger scales
it can happen much easier).

Now, the query seems a bit complex, but we can easily simplify it by
creating an extra table and reducing the number of joins:

    create table foo as select
      *
    from
        partsupp,
        supplier,
        nation,
        region
    where
        s_suppkey = ps_suppkey
        and s_nationkey = n_nationkey
        and n_regionkey = r_regionkey
        and r_name = 'AMERICA';

    reate index on t (ps_partkey);
which allows us to rewrite Q2 like this (this also ditches the ORDER BY
and LIMIT clauses):

    select
        1
    from
        part,
        t
    where
        p_partkey = ps_partkey
        and p_size = 16
        and p_type like '%NICKEL'
        and ps_supplycost = (
            select
                min(ps_supplycost)
            from t
            where
                p_partkey = ps_partkey
        );

in fact, we can ditch even the conditions on p_size/p_type which makes
the issue even more severe:

    select
        1
    from
        part,
        t
    where
        p_partkey = ps_partkey
        and ps_supplycost = (
            select
                min(ps_supplycost)
            from t
            where
                p_partkey = ps_partkey
        );

with the join estimated like this:

   Hash Join  (cost=89761.10..1239195.66 rows=17 width=4)
              (actual time=15379.356..29315.436 rows=1182889 loops=1)
     Hash Cond: ((t.ps_partkey = part.p_partkey) AND
                 (t.ps_supplycost = (SubPlan 1)))

Yeah, that's underestimated by a factor of 70000 :-(

An interesting observation is that if you remove the condition on supply
cost (with the correlated subquery), the estimates get perfect atain. So
this seems to be about this particular condition, or how we combine the
selectivities ...

I'm not sure I've figured all the details yet, but this seems to be due
to a dependency between the ps_partkey and ps_supplycost columns.

When estimating the second condition, we end up calling eqjoinsel()
with SubPlan and Var arguments. We clearly won't have ndistinct of MCVs
for the SubPlan, so we use

    nd1 = 200;   /* default */
    nd2 = 94005; /* n_distinct for t.ps_supplycost */

and end up (thanks to eqjoinsel_inner and no NULLs in data) with

    selec_inner = 0.00001 = Min(1/nd1, 1/nd2)

But that's entirely bogus, because while there are ~100k distinct values
in t.ps_supplycost, those are for *all* ps_partkey values combined. But
each ps_partkey value has only about ~1.4 distinct ps_supplycost values
on average:

    select avg(x) from (select count(distinct ps_supplycost) as x
                          from t group by ps_partkey) foo;

avg --------------------
     1.3560712631162481
    (1 row)

Which I think is the root cause here ...

The fact that we're using the same table "t" in both the main query and
the correlated subquery seems rather irrelevant here, because we might
also create

    create table s as select
        ps_partkey,
        min(ps_supplycost) as min_ps_supplycost
    from t group by ps_partkey;

and use that instead, and we'd still have the same issue. It's just the
fact for a given ps_partkey value there's only a couple ps_supplycost
values, not the 100k we have for a table.

I wonder if we could use the ndistinct coefficients to improve this,
somehow. I suppose eqjoinsel/eqjoinsel_inner could look at

    ndistinct(ps_partkey, ps_supplycost) / ndistinct(ps_partkey)

when estimating the (SubPlan = Var) condition, and tweak selec_inner
accordingly.

I do see two challenges with this, though:

1) This probably requires considering all join clauses at once (just
like we do for regular clauses), but eqjoinsel and friends seem rather
heavily designed for inspecting the clauses one by one.

2) I'm not sure what to do about the SubPlan side, for which we may not
have any reliable ndistinct estimates at all.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
                                                                                
     QUERY PLAN                                                                 
                     
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=1901120.62..1901120.62 rows=1 width=192) (actual 
time=10998.473..11000.664 rows=4737 loops=1)
   Sort Key: supplier.s_acctbal DESC, nation.n_name, supplier.s_name, 
part.p_partkey
   Sort Method: quicksort  Memory: 1469kB
   ->  Merge Join  (cost=638655.36..1901120.61 rows=1 width=192) (actual 
time=7299.121..10993.517 rows=4737 loops=1)
         Merge Cond: (part.p_partkey = partsupp.ps_partkey)
         Join Filter: (partsupp.ps_supplycost = (SubPlan 1))
         Rows Removed by Join Filter: 1661
         ->  Index Scan using part_pkey on part  (cost=0.43..102913.21 
rows=8829 width=30) (actual time=0.142..412.621 rows=8046 loops=1)
               Filter: (((p_type)::text ~~ '%NICKEL'::text) AND (p_size = 16))
               Rows Removed by Filter: 1991954
         ->  Materialize  (cost=638654.72..646656.44 rows=1600344 width=172) 
(actual time=7298.795..9539.910 rows=1603828 loops=1)
               ->  Sort  (cost=638654.72..642655.58 rows=1600344 width=172) 
(actual time=7298.781..8178.512 rows=1603828 loops=1)
                     Sort Key: partsupp.ps_partkey
                     Sort Method: external merge  Disk: 293504kB
                     ->  Nested Loop  (cost=42.10..200242.67 rows=1600344 
width=172) (actual time=2.873..5427.412 rows=1604080 loops=1)
                           ->  Nested Loop  (cost=41.67..2184.07 rows=20000 
width=166) (actual time=2.805..74.435 rows=20051 loops=1)
                                 ->  Nested Loop  (cost=0.14..13.95 rows=5 
width=30) (actual time=0.112..0.192 rows=5 loops=1)
                                       Join Filter: (nation.n_regionkey = 
region.r_regionkey)
                                       Rows Removed by Join Filter: 20
                                       ->  Index Scan using nation_pkey on 
nation  (cost=0.14..12.51 rows=25 width=34) (actual time=0.025..0.047 rows=25 
loops=1)
                                       ->  Materialize  (cost=0.00..1.07 rows=1 
width=4) (actual time=0.003..0.003 rows=1 loops=25)
                                             ->  Seq Scan on region  
(cost=0.00..1.06 rows=1 width=4) (actual time=0.027..0.032 rows=1 loops=1)
                                                   Filter: (r_name = 
'AMERICA'::bpchar)
                                                   Rows Removed by Filter: 4
                                 ->  Bitmap Heap Scan on supplier  
(cost=41.53..394.02 rows=4000 width=144) (actual time=0.853..10.520 rows=4010 
loops=5)
                                       Recheck Cond: (s_nationkey = 
nation.n_nationkey)
                                       Heap Blocks: exact=9388
                                       ->  Bitmap Index Scan on 
idx_supplier_nation_key  (cost=0.00..40.53 rows=4000 width=0) (actual 
time=0.532..0.532 rows=4010 loops=5)
                                             Index Cond: (s_nationkey = 
nation.n_nationkey)
                           ->  Index Scan using idx_partsupp_suppkey on 
partsupp  (cost=0.43..9.09 rows=81 width=14) (actual time=0.006..0.195 rows=80 
loops=20051)
                                 Index Cond: (ps_suppkey = supplier.s_suppkey)
         SubPlan 1
           ->  Aggregate  (cost=162.40..162.41 rows=1 width=32) (actual 
time=0.043..0.044 rows=1 loops=6398)
                 ->  Nested Loop  (cost=0.86..162.39 rows=4 width=6) (actual 
time=0.022..0.040 rows=2 loops=6398)
                       Join Filter: (nation_1.n_regionkey = 
region_1.r_regionkey)
                       Rows Removed by Join Filter: 2
                       ->  Seq Scan on region region_1  (cost=0.00..1.06 rows=1 
width=4) (actual time=0.001..0.002 rows=1 loops=6398)
                             Filter: (r_name = 'AMERICA'::bpchar)
                             Rows Removed by Filter: 4
                       ->  Nested Loop  (cost=0.86..161.10 rows=18 width=10) 
(actual time=0.012..0.034 rows=4 loops=6398)
                             ->  Nested Loop  (cost=0.72..158.33 rows=18 
width=10) (actual time=0.010..0.024 rows=4 loops=6398)
                                   ->  Index Scan using idx_partsupp_partkey on 
partsupp partsupp_1  (cost=0.43..8.75 rows=18 width=10) (actual 
time=0.005..0.008 rows=4 loops=6398)
                                         Index Cond: (ps_partkey = 
part.p_partkey)
                                   ->  Index Scan using supplier_pkey on 
supplier supplier_1  (cost=0.29..8.31 rows=1 width=8) (actual time=0.002..0.002 
rows=1 loops=25592)
                                         Index Cond: (s_suppkey = 
partsupp_1.ps_suppkey)
                             ->  Index Scan using nation_pkey on nation 
nation_1  (cost=0.14..0.16 rows=1 width=8) (actual time=0.001..0.001 rows=1 
loops=25592)
                                   Index Cond: (n_nationkey = 
supplier_1.s_nationkey)
 Planning Time: 3.406 ms
 Execution Time: 11034.252 ms
(49 rows)
                                                                        QUERY 
PLAN                                                                        
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=7781667.06..7781667.06 rows=1 width=192)
   Sort Key: supplier.s_acctbal DESC, nation.n_name, supplier.s_name, 
part.p_partkey
   ->  Nested Loop  (cost=0.43..7781667.05 rows=1 width=192)
         Join Filter: (nation.n_regionkey = region.r_regionkey)
         ->  Nested Loop  (cost=0.43..7781665.97 rows=1 width=196)
               Join Filter: (supplier.s_nationkey = nation.n_nationkey)
               ->  Nested Loop  (cost=0.43..7781664.41 rows=1 width=170)
                     Join Filter: (partsupp.ps_suppkey = supplier.s_suppkey)
                     ->  Nested Loop  (cost=0.43..7777198.41 rows=1 width=34)
                           ->  Seq Scan on part  (cost=0.00..70961.83 rows=8829 
width=30)
                                 Filter: (((p_type)::text ~~ '%NICKEL'::text) 
AND (p_size = 16))
                           ->  Index Scan using idx_partsupp_partkey on 
partsupp  (cost=0.43..872.82 rows=1 width=14)
                                 Index Cond: (ps_partkey = part.p_partkey)
                                 Filter: ((SubPlan 1) = ps_supplycost)
                                 SubPlan 1
                                   ->  Aggregate  (cost=48.33..48.34 rows=1 
width=32)
                                         ->  Nested Loop  (cost=0.72..48.32 
rows=4 width=6)
                                               Join Filter: 
(supplier_1.s_nationkey = nation_1.n_nationkey)
                                               ->  Nested Loop  
(cost=0.72..44.33 rows=18 width=10)
                                                     ->  Index Scan using 
idx_partsupp_partkey on partsupp partsupp_1  (cost=0.43..2.75 rows=18 width=10)
                                                           Index Cond: 
(ps_partkey = part.p_partkey)
                                                     ->  Index Scan using 
supplier_pkey on supplier supplier_1  (cost=0.29..2.31 rows=1 width=8)
                                                           Index Cond: 
(s_suppkey = partsupp_1.ps_suppkey)
                                               ->  Materialize  
(cost=0.00..2.65 rows=5 width=4)
                                                     ->  Nested Loop  
(cost=0.00..2.62 rows=5 width=4)
                                                           Join Filter: 
(nation_1.n_regionkey = region_1.r_regionkey)
                                                           ->  Seq Scan on 
region region_1  (cost=0.00..1.06 rows=1 width=4)
                                                                 Filter: 
(r_name = 'AMERICA'::bpchar)
                                                           ->  Seq Scan on 
nation nation_1  (cost=0.00..1.25 rows=25 width=8)
                     ->  Seq Scan on supplier  (cost=0.00..3216.00 rows=100000 
width=144)
               ->  Seq Scan on nation  (cost=0.00..1.25 rows=25 width=34)
         ->  Seq Scan on region  (cost=0.00..1.06 rows=1 width=4)
               Filter: (r_name = 'AMERICA'::bpchar)
(33 rows)
                                                                    QUERY PLAN  
                                                                   
---------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=89761.10..1239195.66 rows=17 width=4) (actual 
time=15379.356..29315.436 rows=1182889 loops=1)
   Hash Cond: ((t.ps_partkey = part.p_partkey) AND (t.ps_supplycost = (SubPlan 
1)))
   ->  Seq Scan on t  (cost=0.00..123281.20 rows=1593920 width=10) (actual 
time=0.144..934.971 rows=1604080 loops=1)
   ->  Hash  (cost=51948.26..51948.26 rows=1999989 width=4) (actual 
time=15376.793..15376.794 rows=1182884 loops=1)
         Buckets: 131072  Batches: 32  Memory Usage: 2313kB
         ->  Index Only Scan using part_pkey on part  (cost=0.43..51948.26 
rows=1999989 width=4) (actual time=0.057..983.194 rows=2000000 loops=1)
               Heap Fetches: 0
         SubPlan 1
           ->  Aggregate  (cost=12.47..12.48 rows=1 width=32) (actual 
time=0.006..0.007 rows=1 loops=3182889)
                 ->  Index Scan using t_ps_partkey_idx on t t_1  
(cost=0.43..12.46 rows=2 width=6) (actual time=0.003..0.004 rows=1 
loops=3182889)
                       Index Cond: (ps_partkey = part.p_partkey)
 Planning Time: 0.839 ms
 Execution Time: 29818.482 ms
(13 rows)
                                                                   QUERY PLAN   
                                                                 
-------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=1.88..117183.33 rows=1593920 width=4) (actual 
time=0.090..4127.465 rows=1604080 loops=1)
   Merge Cond: (part.p_partkey = t.ps_partkey)
   ->  Index Only Scan using part_pkey on part  (cost=0.43..51948.26 
rows=1999989 width=4) (actual time=0.038..928.974 rows=2000000 loops=1)
         Heap Fetches: 0
   ->  Index Only Scan using t_ps_partkey_idx on t  (cost=0.43..40321.23 
rows=1593920 width=4) (actual time=0.032..870.579 rows=1604080 loops=1)
         Heap Fetches: 0
 Planning Time: 0.567 ms
 Execution Time: 4795.921 ms
(8 rows)

Reply via email to