On Tue, Dec 24, 2024 at 5:00 PM Richard Guo <guofengli...@gmail.com> wrote: > It seems to me that when estimating the number of groups, we do not > need to concern ourselves with the outer joins that could null the > Vars/PHVs contained in the grouping expressions, and we should not > count the same Var more than once.
Furthermore, accounting for nullingrels in the grouping expressions could prevent us from matching a grouping expression to expressional index columns or to the expressions in extended statistics, as these expressions are not decorated with any nullingrels bits. As an example, consider create table t (a int, b int); insert into t select i%10, i%10 from generate_series(1,1000)i; create statistics s (ndistinct) on (a+b), (a-b) from t; analyze; -- after v16 explain (costs on) select count(*) from t t1 left join t t2 on t1.a = t2.a group by t2.a+t2.b; QUERY PLAN -------------------------------------------------------------------------- HashAggregate (cost=1920.00..1921.25 rows=100 width=12) Group Key: (t2.a + t2.b) -> Hash Left Join (cost=27.50..1420.00 rows=100000 width=4) Hash Cond: (t1.a = t2.a) -> Seq Scan on t t1 (cost=0.00..15.00 rows=1000 width=4) -> Hash (cost=15.00..15.00 rows=1000 width=8) -> Seq Scan on t t2 (cost=0.00..15.00 rows=1000 width=8) (7 rows) -- before v16 explain (costs on) select count(*) from t t1 left join t t2 on t1.a = t2.a group by t2.a+t2.b; QUERY PLAN -------------------------------------------------------------------------- HashAggregate (cost=1920.00..1920.12 rows=10 width=12) Group Key: (t2.a + t2.b) -> Hash Left Join (cost=27.50..1420.00 rows=100000 width=4) Hash Cond: (t1.a = t2.a) -> Seq Scan on t t1 (cost=0.00..15.00 rows=1000 width=4) -> Hash (cost=15.00..15.00 rows=1000 width=8) -> Seq Scan on t t2 (cost=0.00..15.00 rows=1000 width=8) (7 rows) Attached is a more formal patch to strip out all the nullingrels from the grouping expressions before we estimate number of groups. Thanks Richard
v1-0001-Do-not-account-for-nullingrels-when-estimating-number-of-groups.patch
Description: Binary data