On 3/9/2024 14:58, Andrei Lepikhov wrote:
On 17/6/2024 18:10, Tomas Vondra wrote: x = $1 AND y = $2 AND ... As I see, current patch doesn't resolve this issue currently.
Let's explain my previous argument with an example (see in attachment).
The query designed to be executed with parameterised NL join: EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM test t1 NATURAL JOIN test1 t2 WHERE t2.x1 < 1;After applying the topmost patch from the patchset we can see two different estimations (explain tuned a little bit) before and after extended statistics:
-- before: Nested Loop (rows=1) (actual rows=10000 loops=1) -> Seq Scan on test1 t2 (rows=100) (actual rows=100 loops=1) Filter: (x1 < 1) -> Memoize (rows=1) (actual rows=100 loops=100) Cache Key: t2.x1, t2.x2, t2.x3, t2.x4 -> Index Scan using test_x1_x2_x3_x4_idx on test t1 (rows=1 width=404) (actual rows=100 loops=1) Index Cond: ((x1 = t2.x1) AND (x2 = t2.x2) AND (x3 = t2.x3) AND (x4 = t2.x4)) -- after: Nested Loop (rows=10000) (actual rows=10000 loops=1) -> Seq Scan on test1 t2 (rows=100) (actual rows=100 loops=1) Filter: (x1 < 1) -> Memoize (rows=1) (actual rows=100 loops=100) Cache Key: t2.x1, t2.x2, t2.x3, t2.x4 -> Index Scan using test_x1_x2_x3_x4_idx on test t1 (rows=1) (actual rows=100 loops=1) Index Cond: ((x1 = t2.x1) AND (x2 = t2.x2) AND (x3 = t2.x3) AND (x4 = t2.x4))You can see, that index condition was treated as join clause and PNL estimated correctly by an MCV on both sides.
But scan estimation is incorrect.Moreover, sometimes we don't have MCV at all. And the next step for this patch should be implementation of bare estimation by the only ndistinct on each side.
What to do with the scan filter? Not sure so far, but it looks like here may be used the logic similar to var_eq_non_const().
-- regards, Andrei Lepikhov
example.sql
Description: application/sql