Re: Use merge-based matching for MCVs in eqjoinsel

Ilia Evdokimov Wed, 19 Nov 2025 07:52:33 -0800


On 19.11.2025 18:38, David Geier wrote:

On 19.11.2025 03:19, Tom Lane wrote:

I spent some effort on actually measuring timings of the v6 patch,
and concluded that this is all splitting hairs that we don't need
to split.  The actual crossover between hash-loses and hash-wins
is more than what my theoretical argument suggested, but still
probably less than 100 MCVs on each side.  I think we should go with

        (sslot1.nvalues + sslot2.nvalues) >= 200

and call it good.

To arrive at this result, I built the v6 patchset with
EQJOINSEL_MCV_HASH_THRESHOLD changed to either 0 (to force hashing)
or 1000000 (to prevent it).  I then ran the attached scripts with
different values of "nstats" and collected timings from the postmaster
log output produced by the 0001 patch.

Thanks for working out the details!

I've ran your script on my development machine with 1000, 100 and 50
MCVs with the following results. As the runtimes had quite some variance
I didn't bother trying more variations. I think your proposal to go with
200 is fine.

nstats | off INT | off TEXT | on INT | on TEXT
-------------------------------------|------
1000   | 697     | 8907     |  14    |  2417
  100   |  13.7   |  213     |   2.3  |   239
   50   |   1.4   |    7.6   |   1.5  |    49

The results suggest that the hash function for the non-deterministic
collation is really slow. If we could properly include the operator
cost, we could enable the optimization earlier in the case of simple
data types such as INT. That can be future work.



LGTM

For simple types (integer columns), both algorithms finish in a couplemilliseconds when the MCV counts are under 100, and the differencebetween them is very small (JOB results show the same trend). For texttypes, the planning time shifts gradually from one algorithm to theother around that range, without any sharp transition. And it seems tome that the current criterion is a reasonable compromise, withoutrequiring us to complicate the threshold any further.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/

Re: Use merge-based matching for MCVs in eqjoinsel

Reply via email to