Hi,
I've been re-running the TPC-H benchmark, to remind myself the common
issues with OLAP workloads, and one of the most annoying problems seems
to be the misestimates in Q2. The query is not particularly complex,
although it does have a correlated subquery with an aggregate, but it's
one of the queries prone to a cascade of nested loops running forever.
I wonder if there's something we could do to handle this better.
A raw Q2 looks like this:
select
s_acctbal,
s_name,
n_name,
p_partkey,
p_mfgr,
s_address,
s_phone,
s_comment
from
part,
supplier,
partsupp,
nation,
region
where
p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and p_size = 16
and p_type like '%NICKEL'
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'AMERICA'
and ps_supplycost = (
select
min(ps_supplycost)
from
partsupp,
supplier,
nation,
region
where
p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'AMERICA'
)
order by
s_acctbal desc,
n_name,
s_name,
p_partkey;
and the full query plan is attached (q2-original-plan.txt).
The relevant part of the plan is the final join, which also considers
the subplan result (all estimates are for scale 10):
-> Merge Join (cost=638655.36..1901120.61 rows=1 width=192)
(actual time=7299.121..10993.517 rows=4737 loops=1)
Merge Cond: (part.p_partkey = partsupp.ps_partkey)
Join Filter: (partsupp.ps_supplycost = (SubPlan 1))
Rows Removed by Join Filter: 1661
Yeah, this is estimated as 1 row but actually returns 4737 rows. All
the other nodes are estimated very accurately, it's just this final join
that is entirely wrong.
If you tweak the costs a bit (e.g. reducing random_page_cost etc.) the
plan can easily switch to nested loops, with this join much deeper in
the plan. See the attached q2-nested-loops.txt for an example (I had to
disable merge/hash joins to trigger this on scale 10, on larger scales
it can happen much easier).
Now, the query seems a bit complex, but we can easily simplify it by
creating an extra table and reducing the number of joins:
create table foo as select
*
from
partsupp,
supplier,
nation,
region
where
s_suppkey = ps_suppkey
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'AMERICA';
reate index on t (ps_partkey);
which allows us to rewrite Q2 like this (this also ditches the ORDER BY
and LIMIT clauses):
select
1
from
part,
t
where
p_partkey = ps_partkey
and p_size = 16
and p_type like '%NICKEL'
and ps_supplycost = (
select
min(ps_supplycost)
from t
where
p_partkey = ps_partkey
);
in fact, we can ditch even the conditions on p_size/p_type which makes
the issue even more severe:
select
1
from
part,
t
where
p_partkey = ps_partkey
and ps_supplycost = (
select
min(ps_supplycost)
from t
where
p_partkey = ps_partkey
);
with the join estimated like this:
Hash Join (cost=89761.10..1239195.66 rows=17 width=4)
(actual time=15379.356..29315.436 rows=1182889 loops=1)
Hash Cond: ((t.ps_partkey = part.p_partkey) AND
(t.ps_supplycost = (SubPlan 1)))
Yeah, that's underestimated by a factor of 70000 :-(
An interesting observation is that if you remove the condition on supply
cost (with the correlated subquery), the estimates get perfect atain. So
this seems to be about this particular condition, or how we combine the
selectivities ...
I'm not sure I've figured all the details yet, but this seems to be due
to a dependency between the ps_partkey and ps_supplycost columns.
When estimating the second condition, we end up calling eqjoinsel()
with SubPlan and Var arguments. We clearly won't have ndistinct of MCVs
for the SubPlan, so we use
nd1 = 200; /* default */
nd2 = 94005; /* n_distinct for t.ps_supplycost */
and end up (thanks to eqjoinsel_inner and no NULLs in data) with
selec_inner = 0.00001 = Min(1/nd1, 1/nd2)
But that's entirely bogus, because while there are ~100k distinct values
in t.ps_supplycost, those are for *all* ps_partkey values combined. But
each ps_partkey value has only about ~1.4 distinct ps_supplycost values
on average:
select avg(x) from (select count(distinct ps_supplycost) as x
from t group by ps_partkey) foo;
avg
--------------------
1.3560712631162481
(1 row)
Which I think is the root cause here ...
The fact that we're using the same table "t" in both the main query and
the correlated subquery seems rather irrelevant here, because we might
also create
create table s as select
ps_partkey,
min(ps_supplycost) as min_ps_supplycost
from t group by ps_partkey;
and use that instead, and we'd still have the same issue. It's just the
fact for a given ps_partkey value there's only a couple ps_supplycost
values, not the 100k we have for a table.
I wonder if we could use the ndistinct coefficients to improve this,
somehow. I suppose eqjoinsel/eqjoinsel_inner could look at
ndistinct(ps_partkey, ps_supplycost) / ndistinct(ps_partkey)
when estimating the (SubPlan = Var) condition, and tweak selec_inner
accordingly.
I do see two challenges with this, though:
1) This probably requires considering all join clauses at once (just
like we do for regular clauses), but eqjoinsel and friends seem rather
heavily designed for inspecting the clauses one by one.
2) I'm not sure what to do about the SubPlan side, for which we may not
have any reliable ndistinct estimates at all.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=1901120.62..1901120.62 rows=1 width=192) (actual
time=10998.473..11000.664 rows=4737 loops=1)
Sort Key: supplier.s_acctbal DESC, nation.n_name, supplier.s_name,
part.p_partkey
Sort Method: quicksort Memory: 1469kB
-> Merge Join (cost=638655.36..1901120.61 rows=1 width=192) (actual
time=7299.121..10993.517 rows=4737 loops=1)
Merge Cond: (part.p_partkey = partsupp.ps_partkey)
Join Filter: (partsupp.ps_supplycost = (SubPlan 1))
Rows Removed by Join Filter: 1661
-> Index Scan using part_pkey on part (cost=0.43..102913.21
rows=8829 width=30) (actual time=0.142..412.621 rows=8046 loops=1)
Filter: (((p_type)::text ~~ '%NICKEL'::text) AND (p_size = 16))
Rows Removed by Filter: 1991954
-> Materialize (cost=638654.72..646656.44 rows=1600344 width=172)
(actual time=7298.795..9539.910 rows=1603828 loops=1)
-> Sort (cost=638654.72..642655.58 rows=1600344 width=172)
(actual time=7298.781..8178.512 rows=1603828 loops=1)
Sort Key: partsupp.ps_partkey
Sort Method: external merge Disk: 293504kB
-> Nested Loop (cost=42.10..200242.67 rows=1600344
width=172) (actual time=2.873..5427.412 rows=1604080 loops=1)
-> Nested Loop (cost=41.67..2184.07 rows=20000
width=166) (actual time=2.805..74.435 rows=20051 loops=1)
-> Nested Loop (cost=0.14..13.95 rows=5
width=30) (actual time=0.112..0.192 rows=5 loops=1)
Join Filter: (nation.n_regionkey =
region.r_regionkey)
Rows Removed by Join Filter: 20
-> Index Scan using nation_pkey on
nation (cost=0.14..12.51 rows=25 width=34) (actual time=0.025..0.047 rows=25
loops=1)
-> Materialize (cost=0.00..1.07 rows=1
width=4) (actual time=0.003..0.003 rows=1 loops=25)
-> Seq Scan on region
(cost=0.00..1.06 rows=1 width=4) (actual time=0.027..0.032 rows=1 loops=1)
Filter: (r_name =
'AMERICA'::bpchar)
Rows Removed by Filter: 4
-> Bitmap Heap Scan on supplier
(cost=41.53..394.02 rows=4000 width=144) (actual time=0.853..10.520 rows=4010
loops=5)
Recheck Cond: (s_nationkey =
nation.n_nationkey)
Heap Blocks: exact=9388
-> Bitmap Index Scan on
idx_supplier_nation_key (cost=0.00..40.53 rows=4000 width=0) (actual
time=0.532..0.532 rows=4010 loops=5)
Index Cond: (s_nationkey =
nation.n_nationkey)
-> Index Scan using idx_partsupp_suppkey on
partsupp (cost=0.43..9.09 rows=81 width=14) (actual time=0.006..0.195 rows=80
loops=20051)
Index Cond: (ps_suppkey = supplier.s_suppkey)
SubPlan 1
-> Aggregate (cost=162.40..162.41 rows=1 width=32) (actual
time=0.043..0.044 rows=1 loops=6398)
-> Nested Loop (cost=0.86..162.39 rows=4 width=6) (actual
time=0.022..0.040 rows=2 loops=6398)
Join Filter: (nation_1.n_regionkey =
region_1.r_regionkey)
Rows Removed by Join Filter: 2
-> Seq Scan on region region_1 (cost=0.00..1.06 rows=1
width=4) (actual time=0.001..0.002 rows=1 loops=6398)
Filter: (r_name = 'AMERICA'::bpchar)
Rows Removed by Filter: 4
-> Nested Loop (cost=0.86..161.10 rows=18 width=10)
(actual time=0.012..0.034 rows=4 loops=6398)
-> Nested Loop (cost=0.72..158.33 rows=18
width=10) (actual time=0.010..0.024 rows=4 loops=6398)
-> Index Scan using idx_partsupp_partkey on
partsupp partsupp_1 (cost=0.43..8.75 rows=18 width=10) (actual
time=0.005..0.008 rows=4 loops=6398)
Index Cond: (ps_partkey =
part.p_partkey)
-> Index Scan using supplier_pkey on
supplier supplier_1 (cost=0.29..8.31 rows=1 width=8) (actual time=0.002..0.002
rows=1 loops=25592)
Index Cond: (s_suppkey =
partsupp_1.ps_suppkey)
-> Index Scan using nation_pkey on nation
nation_1 (cost=0.14..0.16 rows=1 width=8) (actual time=0.001..0.001 rows=1
loops=25592)
Index Cond: (n_nationkey =
supplier_1.s_nationkey)
Planning Time: 3.406 ms
Execution Time: 11034.252 ms
(49 rows)
QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=7781667.06..7781667.06 rows=1 width=192)
Sort Key: supplier.s_acctbal DESC, nation.n_name, supplier.s_name,
part.p_partkey
-> Nested Loop (cost=0.43..7781667.05 rows=1 width=192)
Join Filter: (nation.n_regionkey = region.r_regionkey)
-> Nested Loop (cost=0.43..7781665.97 rows=1 width=196)
Join Filter: (supplier.s_nationkey = nation.n_nationkey)
-> Nested Loop (cost=0.43..7781664.41 rows=1 width=170)
Join Filter: (partsupp.ps_suppkey = supplier.s_suppkey)
-> Nested Loop (cost=0.43..7777198.41 rows=1 width=34)
-> Seq Scan on part (cost=0.00..70961.83 rows=8829
width=30)
Filter: (((p_type)::text ~~ '%NICKEL'::text)
AND (p_size = 16))
-> Index Scan using idx_partsupp_partkey on
partsupp (cost=0.43..872.82 rows=1 width=14)
Index Cond: (ps_partkey = part.p_partkey)
Filter: ((SubPlan 1) = ps_supplycost)
SubPlan 1
-> Aggregate (cost=48.33..48.34 rows=1
width=32)
-> Nested Loop (cost=0.72..48.32
rows=4 width=6)
Join Filter:
(supplier_1.s_nationkey = nation_1.n_nationkey)
-> Nested Loop
(cost=0.72..44.33 rows=18 width=10)
-> Index Scan using
idx_partsupp_partkey on partsupp partsupp_1 (cost=0.43..2.75 rows=18 width=10)
Index Cond:
(ps_partkey = part.p_partkey)
-> Index Scan using
supplier_pkey on supplier supplier_1 (cost=0.29..2.31 rows=1 width=8)
Index Cond:
(s_suppkey = partsupp_1.ps_suppkey)
-> Materialize
(cost=0.00..2.65 rows=5 width=4)
-> Nested Loop
(cost=0.00..2.62 rows=5 width=4)
Join Filter:
(nation_1.n_regionkey = region_1.r_regionkey)
-> Seq Scan on
region region_1 (cost=0.00..1.06 rows=1 width=4)
Filter:
(r_name = 'AMERICA'::bpchar)
-> Seq Scan on
nation nation_1 (cost=0.00..1.25 rows=25 width=8)
-> Seq Scan on supplier (cost=0.00..3216.00 rows=100000
width=144)
-> Seq Scan on nation (cost=0.00..1.25 rows=25 width=34)
-> Seq Scan on region (cost=0.00..1.06 rows=1 width=4)
Filter: (r_name = 'AMERICA'::bpchar)
(33 rows)
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=89761.10..1239195.66 rows=17 width=4) (actual
time=15379.356..29315.436 rows=1182889 loops=1)
Hash Cond: ((t.ps_partkey = part.p_partkey) AND (t.ps_supplycost = (SubPlan
1)))
-> Seq Scan on t (cost=0.00..123281.20 rows=1593920 width=10) (actual
time=0.144..934.971 rows=1604080 loops=1)
-> Hash (cost=51948.26..51948.26 rows=1999989 width=4) (actual
time=15376.793..15376.794 rows=1182884 loops=1)
Buckets: 131072 Batches: 32 Memory Usage: 2313kB
-> Index Only Scan using part_pkey on part (cost=0.43..51948.26
rows=1999989 width=4) (actual time=0.057..983.194 rows=2000000 loops=1)
Heap Fetches: 0
SubPlan 1
-> Aggregate (cost=12.47..12.48 rows=1 width=32) (actual
time=0.006..0.007 rows=1 loops=3182889)
-> Index Scan using t_ps_partkey_idx on t t_1
(cost=0.43..12.46 rows=2 width=6) (actual time=0.003..0.004 rows=1
loops=3182889)
Index Cond: (ps_partkey = part.p_partkey)
Planning Time: 0.839 ms
Execution Time: 29818.482 ms
(13 rows)
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=1.88..117183.33 rows=1593920 width=4) (actual
time=0.090..4127.465 rows=1604080 loops=1)
Merge Cond: (part.p_partkey = t.ps_partkey)
-> Index Only Scan using part_pkey on part (cost=0.43..51948.26
rows=1999989 width=4) (actual time=0.038..928.974 rows=2000000 loops=1)
Heap Fetches: 0
-> Index Only Scan using t_ps_partkey_idx on t (cost=0.43..40321.23
rows=1593920 width=4) (actual time=0.032..870.579 rows=1604080 loops=1)
Heap Fetches: 0
Planning Time: 0.567 ms
Execution Time: 4795.921 ms
(8 rows)