On 2023/6/17 06:46, Tom Lane wrote:
Quan Zongliang <quanzongli...@yeah.net> writes:
Perhaps we should discard this (dups cnt > 1) restriction?
That's not going to happen on the basis of one test case that you
haven't even shown us. The implications of doing it are very unclear.
In particular, I seem to recall that there are bits of logic that
depend on the assumption that MCV entries always represent more than
one row. The nmultiple calculation Tomas referred to may be failing
because of that, but I'm worried about there being other places.
The statistics for the other table look like this:
stadistinct | 6
stanumbers1 | {0.50096667,0.49736667,0.0012}
stavalues1 | {v22,v23,v5}
The value that appears twice in the small table (v1 and v2) does not
appear here. The stadistinct's true value is 18 instead of 6 (three
values in the small table do not appear here).
When calculating the selectivity:
if (nd2 > sslot2->nvalues)
totalsel1 += unmatchfreq1 * otherfreq2 / (nd2 - sslot2->nvalues);
totalsel1 = 0
nd2 = 21
sslot2->nvalues = 2
unmatchfreq1 = 0.99990002016420476
otherfreq2 = 0.82608695328235626
result: totalsel1 = 0.043473913749706022
rows = 0.043473913749706022 * 23 * 2,000,000 = 1999800
Basically, you're proposing a rather fundamental change in the rules
by which Postgres has gathered statistics for decades. You need to
bring some pretty substantial evidence to support that. The burden
of proof is on you, not on the status quo.
regards, tom lane