extended statistics - functional dependencies vs. MCV lists

Tomas Vondra Thu, 10 Dec 2020 13:10:52 -0800

Hi,

over in the pgsql-general channel, Michael Lewis reported [1] a bit
strange behavior switching between good/bad estimates with extended
statistics.


The crux of the issue is that with statistics containing both MCV and
functional dependencies, we prefer applying the MCV. And functional
dependencies are used only for the remaining clauses on columns not
covered by the MCV list.

This works perfectly fine when the clauses match a MCV item (or even
multiple of them). But if there's no matching MCV item, this may be
problematic - statext_mcv_clauselist_selectivity tries to be smart, but
when the MCV represents only a small fraction of the data set the
results may not be far from just a product of selectivities (as if the
clauses were independent).

So I'm wondering about two things:

1) Does it actually make sense to define extended statistics with both
MCV and functional dependencies? ISTM the MCV part will always filter
all the clauses, before we even try to apply the dependencies.

2) Could we consider the functional dependencies when estimating the
part not covered by the MCV list. Of course, this could only help with
equality clauses (as supported by functional dependencies).


regards


[1]
https://www.postgresql.org/message-id/CAMcsB%3Dy%3D3G_%2Bs_zFYPu2-O6OMWOvOkb3t1MU%3D907yk5RC_RaYw%40mail.gmail.com

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

extended statistics - functional dependencies vs. MCV lists

Reply via email to