I ran into this error in estimate_multivariate_ndistinct, and it can be reproduced with the query below.
create table t (a int, b int); insert into t select 1, 1; create statistics s (ndistinct) on a, b from t; analyze; explain select 1 from t t1 left join (select a c1, coalesce(a) c2 from t t2) s on true group by s.c1, s.c2; ERROR: corrupt MVNDistinct entry And the first bad commit is: 2489d76c4906f4461a364ca8ad7e0751ead8aa0d is the first bad commit commit 2489d76c4906f4461a364ca8ad7e0751ead8aa0d Author: Tom Lane <t...@sss.pgh.pa.us> Date: Mon Jan 30 13:16:20 2023 -0500 Make Vars be outer-join-aware. So in this query, there are two grouping expressions: s.c1 is Var t2.a with nullingrels set to {3}; s.c2 is a PHV with nullingrels also being {3}, and its contained expression is Var t2.a with empty nullingrels. This eventually leads to estimate_num_groups creating two separate GroupVarInfos for Var t2.a: one with nullingrels {3}, and another with empty nullingrels. As a result, estimate_multivariate_ndistinct incorrectly assumes there are two matching expressions. When it later fails to find the exact match for the combination, it mistakenly concludes that there is a corrupt MVNDistinct entry. It seems to me that when estimating the number of groups, we do not need to concern ourselves with the outer joins that could null the Vars/PHVs contained in the grouping expressions, and we should not count the same Var more than once. So I wonder if we can fix this issue by removing the nullingrels within the grouping expressions first in estimate_num_groups. Such as: --- a/src/backend/utils/adt/selfuncs.c +++ b/src/backend/utils/adt/selfuncs.c @@ -121,6 +121,7 @@ #include "parser/parse_clause.h" #include "parser/parse_relation.h" #include "parser/parsetree.h" +#include "rewrite/rewriteManip.h" #include "statistics/statistics.h" #include "storage/bufmgr.h" #include "utils/acl.h" @@ -3446,6 +3447,10 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows, if (groupExprs == NIL || (pgset && *pgset == NIL)) return 1.0; + groupExprs = (List *) remove_nulling_relids((Node *) groupExprs, + root->outer_join_rels, + NULL); + /* * Count groups derived from boolean grouping expressions. For other * expressions, find the unique Vars used, treating an expression as a Var Any thoughts? Thanks Richard