Re: Improve statistics estimation considering GROUP-BY as a 'uniqueiser'

Heikki Linnakangas Mon, 25 Nov 2024 08:46:09 -0800

On 24/09/2024 08:08, Andrei Lepikhov wrote:

On 19/9/2024 09:55, Andrei Lepikhov wrote:
This wrong prediction makes things much worse if the query has moreupper query blocks.His question was: Why not consider the grouping column unique in theupper query block? It could improve estimations.After a thorough investigation, I discovered that in commit4767bc8ff2 most of the work was already done for DISTINCT clauses. So,why not do the same for grouping? A sketch of the patch is attached.As I see it, grouping in this sense works quite similarly to DISTINCT,and we have no reason to ignore it. After applying the patch, you cansee that prediction has been improved:
Hash Right Join  (cost=5.62..162.56 rows=50 width=36)
A regression test is added into new version.
The code looks tiny, simple and non-invasive - it will be easy to commitor reject. So I add it to next commitfest.


Looks good at a quick glance.

@@ -5843,11 +5852,11 @@ get_variable_numdistinct(VariableStatData *vardata, 
bool *isdefault)
        }

/*

-        * If there is a unique index or DISTINCT clause for the variable, 
assume
-        * it is unique no matter what pg_statistic says; the statistics could 
be
-        * out of date, or we might have found a partial unique index that 
proves
-        * the var is unique for this query.  However, we'd better still believe
-        * the null-fraction statistic.
+        * If there is a unique index, DISTINCT or GROUP-BY clause for the 
variable,
+        * assume it is unique no matter what pg_statistic says; the statistics
+        * could be out of date, or we might have found a partial unique index 
that
+        * proves the var is unique for this query.  However, we'd better still
+        * believe the null-fraction statistic.
         */
        if (vardata->isunique)
                stadistinct = -1.0 * (1.0 - stanullfrac);

I wonder about the "we'd better still believe the null-fractionstatistic" part. It makes sense for a unique index, but a DISTINCT orGROUP BY collapses all the NULLs to a single row. So I think there'ssome more work to be done here.


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: Improve statistics estimation considering GROUP-BY as a 'uniqueiser'

Reply via email to