On Wed, Dec 25, 2024 at 5:14 PM Richard Guo <guofengli...@gmail.com> wrote: > On Wed, Dec 25, 2024 at 11:34 AM Andrei Lepikhov <lepi...@gmail.com> wrote: > > 2. It is ok for Vars. But what about expressions? We use equal() in > > distinct, MCV and dependencies modules. Do we need to remove nulls > > before using extended statistics as a general rule? > > AFAIU, the expressions in extended statistics are not decorated with > any nullingrels bits, are they?
I've just realized that there may be other places with similar issues, not just in estimate_num_groups. For instance, -- after v16 explain (costs on) select * from t t1 left join t t2 on true where (t2.a+t2.b) is null; QUERY PLAN -------------------------------------------------------------------- Nested Loop Left Join (cost=0.00..15032.50 rows=5000 width=16) Filter: ((t2.a + t2.b) IS NULL) -> Seq Scan on t t1 (cost=0.00..15.00 rows=1000 width=8) -> Materialize (cost=0.00..20.00 rows=1000 width=8) -> Seq Scan on t t2 (cost=0.00..15.00 rows=1000 width=8) (5 rows) -- before v16 explain (costs on) select * from t t1 left join t t2 on true where (t2.a+t2.b) is null; QUERY PLAN -------------------------------------------------------------------- Nested Loop Left Join (cost=0.00..15032.50 rows=1 width=16) Filter: ((t2.a + t2.b) IS NULL) -> Seq Scan on t t1 (cost=0.00..15.00 rows=1000 width=8) -> Materialize (cost=0.00..20.00 rows=1000 width=8) -> Seq Scan on t t2 (cost=0.00..15.00 rows=1000 width=8) (5 rows) In v16 and later, the nullingrels within the expression "t2.a + t2.b" prevent it from being matched to the corresponding expression in extended statistics, forcing us to use DEFAULT_UNK_SEL(0.005). It seems that we need to strip out the nullingrels bits from expressions before matching them to extended statistics or expressional index columns in more places. Thanks Richard