I'm trying to understand a bad estimate by the planner, and what I can do about 
it. The anonymized plan is here: https://explain.depesz.com/s/0MDz

The item I'm focused on is node 23. The estimate is for 7 rows, actual is 896 
(multiplied by 1062 loops). I'm confused about two things in this node.

The first is Postgres' estimate. The condition for this index scan contains 
three expressions --

(five_uniform = zulu_five.five_uniform) AND
(whiskey_mike = juliet_india.whiskey_mike) AND
(bravo = 'mike'::text)

The columns in the first two expressions (five_uniform and whiskey_mike) are 
NOT NULL, and have foreign key constraints to their respective tables 
(zulu_five.five_uniform and juliet_india.whiskey_mike). The planner can know in 
advance that 100% of the rows in the table will satisfy those criteria.

For the third expression (bravo = 'mike'), Postgres has excellent statistics. 
The estimated frequency of 'mike' is 2.228%, actual frequency is 2.242%, so 
Postgres' estimate is only off by a tiny amount (0.014%).

From what I understand, the planner has all the information it needs to make a 
very accurate estimate here, but it's off by quite a lot. What information am I 
failing to give to the planner?

My second point of confusion is related. There are 564,071 rows in the source 
table (xray_india, aliased as papa) that satisfy the condition bravo = 'mike'. 
EXPLAIN reports the actual number of rows returned as 896*1062 ~= 951,552. I 
understand that the number reported by EXPLAIN is often a bit bigger, but this 
discrepancy is much larger than I'm expecting. What am I missing here?

Thanks,
Philip

Reply via email to