"Shulgin, Oleksandr" <oleksandr.shul...@zalando.de> writes: > Alright. I'm attaching the latest version of this patch split in two > parts: the first one is NULLs-related bugfix and the second is the > "improvement" part, which applies on top of the first one.
I've applied the first of these patches, broken into two parts first because it seemed like there were two issues and second because Tomas deserved primary credit for one part, ie realizing we were using the Haas-Stokes formula wrong. As for the other part, I committed it with one non-cosmetic change: I do not think it is right to omit "too wide" values when considering the threshold for MCVs. As submitted, the patch was inconsistent on that point anyway since it did it differently in compute_distinct_stats and compute_scalar_stats. But the larger picture here is that we define the MCV population to exclude nulls, so it's reasonable to consider a value as an MCV even if it's greatly outnumbered by nulls. There is no such exclusion for "too wide" values; those things are just an implementation limitation in analyze.c, not something that is part of the pg_statistic definition. If there are a lot of "too wide" values in the sample, we don't know whether any of them are duplicates, but we do know that the frequencies of the normal-width values have to be discounted appropriately. Haven't looked at 0002 yet. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers