On Fri, Aug 20, 2021 at 2:21 PM Tomas Vondra <tomas.von...@enterprisedb.com> wrote: > After looking at this for a while, it's clear the main issue is handling > of clauses referencing the same Var twice, like for example (a = a) or > (a < a). But it's not clear to me if this is something worth fixing, or > if extended statistics is the right place to do it. > > If those clauses are worth the effort, why not to handle them better > even without extended statistics? We can easily evaluate these clauses > on per-column MCV, because they only reference a single Var.
+1. It seems to me that what we ought to do is make "a < a", "a > a", and "a != 0" all have an estimate of zero, and make "a <= a", "a >= a", and "a = a" estimate 1-nullfrac. The extended statistics mechanism can just ignore the first three types of clauses; the zero estimate has to be 100% correct. It can't necessarily ignore the second three cases, though. If the query says "WHERE a = a AND b = 1", "b = 1" may be more or less likely given that a is known to be not null, and extended statistics can tell us that. -- Robert Haas EDB: http://www.enterprisedb.com