On 8/20/21 8:56 PM, Robert Haas wrote:
On Fri, Aug 20, 2021 at 2:21 PM Tomas Vondra
<tomas.von...@enterprisedb.com> wrote:
After looking at this for a while, it's clear the main issue is handling
of clauses referencing the same Var twice, like for example (a = a) or
(a < a). But it's not clear to me if this is something worth fixing, or
if extended statistics is the right place to do it.
If those clauses are worth the effort, why not to handle them better
even without extended statistics? We can easily evaluate these clauses
on per-column MCV, because they only reference a single Var.
+1.
It seems to me that what we ought to do is make "a < a", "a > a", and
"a != 0" all have an estimate of zero, and make "a <= a", "a >= a",
and "a = a" estimate 1-nullfrac. The extended statistics mechanism can
just ignore the first three types of clauses; the zero estimate has to
be 100% correct. It can't necessarily ignore the second three cases,
though. If the query says "WHERE a = a AND b = 1", "b = 1" may be more
or less likely given that a is known to be not null, and extended
statistics can tell us that.
Yeah, I agree this seems like the right approach (except I guess you
meant "a != a" and not "a != 0"). Assuming we want to do something about
these clauses at all - I'm still wondering if those clauses are common
in practice or just synthetic.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company