Thanks for the detailed feedback!
On 04.11.2025 00:55, Tom Lane wrote:
Hmm. Those results sure look like there is a performance regression
up to at least 100 MCVs ... not a large one, but consistently a few
percent. That's a bit sad for a patch purporting to improve
performance. It looks to me like perhaps we should stick to the old
algorithm up to 100 or possibly even more MCVs. Certainly the
threshold needs to be higher than 1, as you have it now.
I re-ran the benchmark on JOB with a threshold of 100.Here are the
updated results:
default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
Planner After (ms)
--------------------------------------------------------------------------------
1 | 1.00 | 2320.412 |
2318.377
5 | 0.99 | 2335.894 |
2360.890
10 | 1.00 | 2350.612 |
2347.154
25 | 1.01 | 2365.977 |
2342.312
50 | 0.99 | 2381.554 |
2405.262
75 | 1.00 | 2396.481 |
2399.828
100 | 1.00 | 2410.367 |
2412.456
1000 | 1.11 | 2850.853 |
2564.303
2500 | 2.04 | 5571.688 |
2731.545
5000 | 6.05 | 18850.084 |
3114.692
7500 | 11.96 | 39160.898 |
3273.688
10000 | 19.04 | 71334.113 |
3745.955
I eyeballed the patch itself very briefly, and have a couple
quick comments:
* Is hash_msv a typo for hash_mcv? If not, maybe spell out what
it's supposed to mean.
Yes, that was a typo — fixed.
* The patch would be easier to read if it didn't reindent a couple
large chunks of existing code. Can we change the factorization
to avoid that? If not, I'd recommend submitting without that
reindentation, and reminding the committer to reindent at the last
moment.
Fixed as well. I’ve removed all reindentation changes. I will keep that
in mind for future submissions.
* The calculation loops in eqjoinsel_inner and eqjoinsel_semi
are not identical, which makes it look quite weird to be
writing just one function that conditionally replaces both.
I wonder if we should refactor to have just one copy (and
just eat the extra cycles of calculating matchprodfreq).
Agreed. I’ve dropped the attempt to merge them into a single function.
* In fact ... I wonder if we should try harder to not do essentially
identical calculations twice, remembering that eqjoinsel_semi is
always used alongside eqjoinsel_inner. (Of course, we could only do
that if clamped_nvalues2 is the same as sslot2->nvalues, but that's
frequently true.) I think the reason it's duplicative right now
is that we regarded this semijoin calculation as an experiment and
so didn't want to invest a lot of effort in it ... but this patch
is exactly a lot of effort, so maybe it's time to deal with that
unfinished business.
regards, tom lane
Good point. I addressed this in a separate patch: eqjoinsel_inner() now
saves matchfreq1, matchfreq2, nmatches so that eqjoinsel_semi() can
reuse them when (clamped_nvalues2 == sslot2->nvalues). If the MCV list
on the RHS is clamped, we still recompute locally. If you have a cleaner
idea for how to share these values between the two functions without
passing them explicitly, I’d be happy to consider it.
I’m attaching two patches:
1. v4-0001-Avoid-duplicate-MCV-matching-in-eqjoinsel_semi-an.patch -
removes redundant MCV matching for semi/anti joins;
2. v4-0002-Optimize-MCV-matching-in-eqjoinsel_inner-and-eqjo.patch -
adds hash-based MCV matching with a configurable threshold and includes
fixes based on your comments.
--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/
From 71f59bd83e559df6b36720a4592bf3fdd689504a Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov <[email protected]>
Date: Mon, 10 Nov 2025 16:11:43 +0300
Subject: [PATCH v1] Avoid duplicate MCV matching in eqjoinsel_semi and
eqjoinsel_inner.
Previously both eqjoinsel_inner() and eqjoinsel_semi() performed identical
O(N^2) loops over MCV lists, even though the semi join case always follows
the inner join case in eqjoinsel(). Now the MCV matching results from
eqjoinsel_inner() are reused in eqjoinsel_semi() when possible (i.e., when
the RHS MCV list is not clamped). This saves redundant computation and
simplifies the code.
Author: Ilia Evdokimov <[email protected]>
Reviewed-by: Tom Lane <[email protected]>
Reviewed-by: David Geier <[email protected]>
---
src/backend/utils/adt/selfuncs.c | 36 ++++++++++++++++++++++++++------
1 file changed, 30 insertions(+), 6 deletions(-)
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index cb23ad52782..55cd0486bf9 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -154,7 +154,9 @@ static double eqjoinsel_inner(Oid opfuncoid, Oid collation,
bool isdefault1, bool isdefault2,
AttStatsSlot *sslot1, AttStatsSlot *sslot2,
Form_pg_statistic stats1, Form_pg_statistic stats2,
- bool have_mcvs1, bool have_mcvs2);
+ bool have_mcvs1, bool have_mcvs2,
+ double *matchfreq_mcvs1, double *matchfreq_mcvs2,
+ int *nmatches_mcvs);
static double eqjoinsel_semi(Oid opfuncoid, Oid collation,
VariableStatData *vardata1, VariableStatData *vardata2,
double nd1, double nd2,
@@ -162,6 +164,7 @@ static double eqjoinsel_semi(Oid opfuncoid, Oid collation,
AttStatsSlot *sslot1, AttStatsSlot *sslot2,
Form_pg_statistic stats1, Form_pg_statistic stats2,
bool have_mcvs1, bool have_mcvs2,
+ double matchfreq1, int nmatches,
RelOptInfo *inner_rel);
static bool estimate_multivariate_ndistinct(PlannerInfo *root,
RelOptInfo *rel, List **varinfos, double *ndistinct);
@@ -2313,6 +2316,9 @@ eqjoinsel(PG_FUNCTION_ARGS)
bool get_mcv_stats;
bool join_is_reversed;
RelOptInfo *inner_rel;
+ int nmatches_mcvs = 0;
+ double matchfreq_mcvs1 = 0.0;
+ double matchfreq_mcvs2 = 0.0;
get_join_variables(root, args, sjinfo,
&vardata1, &vardata2, &join_is_reversed);
@@ -2367,7 +2373,9 @@ eqjoinsel(PG_FUNCTION_ARGS)
isdefault1, isdefault2,
&sslot1, &sslot2,
stats1, stats2,
- have_mcvs1, have_mcvs2);
+ have_mcvs1, have_mcvs2,
+ &matchfreq_mcvs1, &matchfreq_mcvs2,
+ &nmatches_mcvs);
switch (sjinfo->jointype)
{
@@ -2395,6 +2403,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
&sslot1, &sslot2,
stats1, stats2,
have_mcvs1, have_mcvs2,
+ matchfreq_mcvs1, nmatches_mcvs,
inner_rel);
else
{
@@ -2408,6 +2417,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
&sslot2, &sslot1,
stats2, stats1,
have_mcvs2, have_mcvs1,
+ matchfreq_mcvs2, nmatches_mcvs,
inner_rel);
}
@@ -2455,7 +2465,9 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
bool isdefault1, bool isdefault2,
AttStatsSlot *sslot1, AttStatsSlot *sslot2,
Form_pg_statistic stats1, Form_pg_statistic stats2,
- bool have_mcvs1, bool have_mcvs2)
+ bool have_mcvs1, bool have_mcvs2,
+ double *matchfreq_mcvs1, double *matchfreq_mcvs2,
+ int *nmatches_mcvs)
{
double selec;
@@ -2595,6 +2607,11 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
totalsel2 += otherfreq2 * (otherfreq1 + unmatchfreq1) /
(nd1 - nmatches);
+ /* Save MCV match statistics for possible reuse by eqjoinsel_semi() */
+ *matchfreq_mcvs1 = matchfreq1;
+ *matchfreq_mcvs2 = matchfreq2;
+ *nmatches_mcvs = nmatches;
+
/*
* Use the smaller of the two estimates. This can be justified in
* essentially the same terms as given below for the no-stats case: to
@@ -2653,6 +2670,7 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
AttStatsSlot *sslot1, AttStatsSlot *sslot2,
Form_pg_statistic stats1, Form_pg_statistic stats2,
bool have_mcvs1, bool have_mcvs2,
+ double matchfreq1, int nmatches,
RelOptInfo *inner_rel)
{
double selec;
@@ -2705,11 +2723,9 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
bool *hasmatch1;
bool *hasmatch2;
double nullfrac1 = stats1->stanullfrac;
- double matchfreq1,
- uncertainfrac,
+ double uncertainfrac,
uncertain;
int i,
- nmatches,
clamped_nvalues2;
/*
@@ -2721,6 +2737,13 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
*/
clamped_nvalues2 = Min(sslot2->nvalues, nd2);
+ /*
+ * eqjoinsel_inner() normally already did the full MCV comparison,
+ * so we reuse its results unless RHS MCVs were clamped, in which
+ * case we must redo the loop for the reduced list.
+ */
+ if (clamped_nvalues2 != sslot2->nvalues)
+ {
fmgr_info(opfuncoid, &eqproc);
/*
@@ -2777,6 +2800,7 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
CLAMP_PROBABILITY(matchfreq1);
pfree(hasmatch1);
pfree(hasmatch2);
+ }
/*
* Now we need to estimate the fraction of relation 1 that has at
--
2.34.1
From 27c6f4b6e947af27ccc9f6ae881407010f141bbe Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov <[email protected]>
Date: Mon, 10 Nov 2025 17:06:34 +0300
Subject: [PATCH v4 2/2] Optimize MCV matching in eqjoinsel_inner() and
eqjoinsel_semi()
Previously, MCV values from both sides of a join were compared using
an O(N^2) nested-loop algorithm. When default_statistics_target was
set to large values, this could make query planning noticeably slower.
This patch introduces a hash-based O(N) algorithm for matching MCVs.
The planner now switches to the hash method when the MCV lists are
large enough to amortize hash setup costs, while keeping the old
nested-loop path for small lists. The threshold is currently set
to 100 entries.
For data types that do not support hashing, the code still falls back
to the O(N^2) algorithm.
Author: David Geier <[email protected]>
Author: Ilia Evdokimov <[email protected]>
Reviewed-by: Tom Lane <[email protected]>
---
src/backend/utils/adt/selfuncs.c | 211 +++++++++++++++++++++++++++++--
1 file changed, 202 insertions(+), 9 deletions(-)
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 55cd0486bf9..b22e2e5d920 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -143,12 +143,20 @@
#define DEFAULT_PAGE_CPU_MULTIPLIER 50.0
+/*
+ * Switch to hash-based MCV matching when lists are large enough
+ * to amortize hash setup cost.
+ */
+#define EQJOINSEL_MCV_HASH_THRESHOLD 100
+
+struct McvHashTable_hash;
+
/* Hooks for plugins to get control when we ask for stats */
get_relation_stats_hook_type get_relation_stats_hook = NULL;
get_index_stats_hook_type get_index_stats_hook = NULL;
static double eqsel_internal(PG_FUNCTION_ARGS, bool negate);
-static double eqjoinsel_inner(Oid opfuncoid, Oid collation,
+static double eqjoinsel_inner(Oid operator, Oid collation,
VariableStatData *vardata1, VariableStatData *vardata2,
double nd1, double nd2,
bool isdefault1, bool isdefault2,
@@ -157,7 +165,7 @@ static double eqjoinsel_inner(Oid opfuncoid, Oid collation,
bool have_mcvs1, bool have_mcvs2,
double *matchfreq_mcvs1, double *matchfreq_mcvs2,
int *nmatches_mcvs);
-static double eqjoinsel_semi(Oid opfuncoid, Oid collation,
+static double eqjoinsel_semi(Oid operator, Oid opfuncoid, Oid collation,
VariableStatData *vardata1, VariableStatData *vardata2,
double nd1, double nd2,
bool isdefault1, bool isdefault2,
@@ -220,6 +228,55 @@ static bool get_actual_variable_endpoint(Relation heapRel,
static RelOptInfo *find_join_input_rel(PlannerInfo *root, Relids relids);
static double btcost_correlation(IndexOptInfo *index,
VariableStatData *vardata);
+static uint32 hash_mcv(struct McvHashTable_hash *hashTable, Datum key);
+static bool are_mcvs_equal(struct McvHashTable_hash *hashTable, Datum value1, Datum value2);
+
+typedef struct McvHashEntry
+{
+ Datum value;
+ uint32 index;
+ uint32 hash;
+ char status;
+} McvHashEntry;
+
+typedef struct McvHashContext
+{
+ FmgrInfo equal_proc;
+ FmgrInfo hash_proc;
+ Oid collation;
+} McvHashContext;
+
+#define SH_PREFIX McvHashTable
+#define SH_ELEMENT_TYPE McvHashEntry
+#define SH_KEY_TYPE Datum
+#define SH_KEY value
+#define SH_HASH_KEY(mcvs, key) hash_mcv(mcvs, key)
+#define SH_EQUAL(mcvs, key0, key1) are_mcvs_equal(mcvs, key0, key1)
+#define SH_SCOPE static inline
+#define SH_STORE_HASH
+#define SH_GET_HASH(mcvs, key) key->hash
+#define SH_DEFINE
+#define SH_DECLARE
+#include "lib/simplehash.h"
+
+static uint32
+hash_mcv(struct McvHashTable_hash *hashTable, Datum key)
+{
+ McvHashContext *context = (McvHashContext *)hashTable->private_data;
+ return DatumGetUInt32(FunctionCall1Coll(&context->hash_proc, context->collation, key));
+}
+
+static bool
+are_mcvs_equal(struct McvHashTable_hash *hashTable, Datum value1, Datum value2)
+{
+ /*
+ * We can safely use FunctionCall2Coll() which requires the result to
+ * never be NULL, because MCV arrays from 'pg_statistic' don't contain
+ * NULL values
+ */
+ McvHashContext *context = (McvHashContext *)hashTable->private_data;
+ return DatumGetBool(FunctionCall2Coll(&context->equal_proc, context->collation, value1, value2));
+}
/*
@@ -2367,7 +2424,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
}
/* We need to compute the inner-join selectivity in all cases */
- selec_inner = eqjoinsel_inner(opfuncoid, collation,
+ selec_inner = eqjoinsel_inner(operator, collation,
&vardata1, &vardata2,
nd1, nd2,
isdefault1, isdefault2,
@@ -2396,7 +2453,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
inner_rel = find_join_input_rel(root, sjinfo->min_righthand);
if (!join_is_reversed)
- selec = eqjoinsel_semi(opfuncoid, collation,
+ selec = eqjoinsel_semi(operator, opfuncoid, collation,
&vardata1, &vardata2,
nd1, nd2,
isdefault1, isdefault2,
@@ -2410,7 +2467,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
Oid commop = get_commutator(operator);
Oid commopfuncoid = OidIsValid(commop) ? get_opcode(commop) : InvalidOid;
- selec = eqjoinsel_semi(commopfuncoid, collation,
+ selec = eqjoinsel_semi(operator, commopfuncoid, collation,
&vardata2, &vardata1,
nd2, nd1,
isdefault2, isdefault1,
@@ -2459,7 +2516,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
* that it's worth trying to distinguish them here.
*/
static double
-eqjoinsel_inner(Oid opfuncoid, Oid collation,
+eqjoinsel_inner(Oid operator, Oid collation,
VariableStatData *vardata1, VariableStatData *vardata2,
double nd1, double nd2,
bool isdefault1, bool isdefault2,
@@ -2502,8 +2559,11 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
totalsel2;
int i,
nmatches;
+ Oid hashLeft = InvalidOid;
+ Oid hashRight = InvalidOid;
- fmgr_info(opfuncoid, &eqproc);
+ fmgr_info(get_opcode(operator), &eqproc);
+ get_op_hash_functions(operator, &hashLeft, &hashRight);
/*
* Save a few cycles by setting up the fcinfo struct just once. Using
@@ -2527,6 +2587,70 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
*/
matchprodfreq = 0.0;
nmatches = 0;
+
+ /*
+ * If one MCV array contains less than 100 values, there's no gain in using a hash table.
+ * The sweet spot of using hash table lookups instead of iterating is slightly higher
+ * than 1 but we don't bother here because the gains are neglectable.
+ */
+ if (OidIsValid(hashLeft) && hashLeft == hashRight &&
+ Min(sslot1->nvalues, sslot2->nvalues) > EQJOINSEL_MCV_HASH_THRESHOLD)
+ {
+ AttStatsSlot *statsInner = sslot2;
+ AttStatsSlot *statsOuter = sslot1;
+ bool *hasMatchInner = hasmatch2;
+ bool *hasMatchOuter = hasmatch1;
+ int nvaluesInner = sslot2->nvalues;
+ int nvaluesOuter = sslot1->nvalues;
+ McvHashContext hashContext;
+ McvHashTable_hash *hashTable;
+
+ /* Make sure we build the hash table on the smaller array. */
+ if (sslot1->nvalues < sslot2->nvalues)
+ {
+ statsInner = sslot1;
+ statsOuter = sslot2;
+ hasMatchInner = hasmatch1;
+ hasMatchOuter = hasmatch2;
+ nvaluesInner = sslot1->nvalues;
+ nvaluesOuter = sslot2->nvalues;
+ }
+
+ /* 1. Create hash table of smaller 'pg_statistic' array. That's O(n). */
+ fmgr_info(get_opcode(operator), &hashContext.equal_proc);
+ fmgr_info(hashLeft, &hashContext.hash_proc); /* hashLeft == hashRight */
+ hashContext.collation = collation;
+
+ hashTable = McvHashTable_create(CurrentMemoryContext, nvaluesInner, &hashContext);
+
+ for (i = 0; i < nvaluesInner; i++)
+ {
+ bool found = false;
+ McvHashEntry *entry = McvHashTable_insert(hashTable, statsInner->values[i], &found);
+
+ Assert(!found);
+
+ entry->index = i;
+ }
+
+ /* 2. Look-up values from other 'pg_statistic' array against hash map to find matches. */
+ for (i = 0; i < nvaluesOuter; i++)
+ {
+ McvHashEntry *entry = McvHashTable_lookup(hashTable, statsOuter->values[i]);
+
+ if (entry != NULL)
+ {
+ hasMatchInner[entry->index] = hasMatchOuter[i] = true;
+ nmatches++;
+ matchprodfreq += statsInner->numbers[entry->index] * statsOuter->numbers[i];
+ }
+ }
+
+ McvHashTable_destroy(hashTable);
+ }
+ else
+ {
+ /* Fallback to O(N^2) algorithm if hash based variant didn't succeed. */
for (i = 0; i < sslot1->nvalues; i++)
{
int j;
@@ -2551,6 +2675,7 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
}
}
}
+ }
CLAMP_PROBABILITY(matchprodfreq);
/* Sum up frequencies of matched and unmatched MCVs */
matchfreq1 = unmatchfreq1 = 0.0;
@@ -2663,7 +2788,7 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
* Unlike eqjoinsel_inner, we have to cope with opfuncoid being InvalidOid.
*/
static double
-eqjoinsel_semi(Oid opfuncoid, Oid collation,
+eqjoinsel_semi(Oid operator, Oid opfuncoid, Oid collation,
VariableStatData *vardata1, VariableStatData *vardata2,
double nd1, double nd2,
bool isdefault1, bool isdefault2,
@@ -2744,7 +2869,11 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
*/
if (clamped_nvalues2 != sslot2->nvalues)
{
- fmgr_info(opfuncoid, &eqproc);
+ Oid hashLeft = InvalidOid;
+ Oid hashRight = InvalidOid;
+
+ fmgr_info(get_opcode(operator), &eqproc);
+ get_op_hash_functions(operator, &hashLeft, &hashRight);
/*
* Save a few cycles by setting up the fcinfo struct just once. Using
@@ -2767,6 +2896,69 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
* and because the math wouldn't add up...
*/
nmatches = 0;
+
+ /*
+ * If one MCV array contains less than 100 values, there's no gain in using a hash table.
+ * The sweet spot of using hash table lookups instead of iterating is slightly higher
+ * than 1 but we don't bother here because the gains are neglectable.
+ */
+ if (OidIsValid(hashLeft) && hashLeft == hashRight &&
+ Min(sslot1->nvalues, clamped_nvalues2) > EQJOINSEL_MCV_HASH_THRESHOLD)
+ {
+ AttStatsSlot *statsInner = sslot2;
+ AttStatsSlot *statsOuter = sslot1;
+ bool *hasMatchInner = hasmatch2;
+ bool *hasMatchOuter = hasmatch1;
+ int nvaluesInner = clamped_nvalues2;
+ int nvaluesOuter = sslot1->nvalues;
+ McvHashContext hashContext;
+ McvHashTable_hash *hashTable;
+
+ /* Make sure we build the hash table on the smaller array. */
+ if (sslot1->nvalues < clamped_nvalues2)
+ {
+ statsInner = sslot1;
+ statsOuter = sslot2;
+ hasMatchInner = hasmatch1;
+ hasMatchOuter = hasmatch2;
+ nvaluesInner = sslot1->nvalues;
+ nvaluesOuter = clamped_nvalues2;
+ }
+
+ /* 1. Create hash table of smaller 'pg_statistic' array. That's O(n). */
+ fmgr_info(get_opcode(operator), &hashContext.equal_proc);
+ fmgr_info(hashLeft, &hashContext.hash_proc); /* hashLeft == hashRight */
+ hashContext.collation = collation;
+
+ hashTable = McvHashTable_create(CurrentMemoryContext, nvaluesInner, &hashContext);
+
+ for (i = 0; i < nvaluesInner; i++)
+ {
+ bool found = false;
+ McvHashEntry *entry = McvHashTable_insert(hashTable, statsInner->values[i], &found);
+
+ Assert(!found);
+
+ entry->index = i;
+ }
+
+ /* 2. Look-up values from other 'pg_statistic' array against hash map to find matches. */
+ for (i = 0; i < nvaluesOuter; i++)
+ {
+ McvHashEntry *entry = McvHashTable_lookup(hashTable, statsOuter->values[i]);
+
+ if (entry != NULL)
+ {
+ hasMatchInner[entry->index] = hasMatchOuter[i] = true;
+ nmatches++;
+ }
+ }
+
+ McvHashTable_destroy(hashTable);
+ }
+ else
+ {
+ /* Fallback to O(N^2) algorithm if hash based variant didn't succeed. */
for (i = 0; i < sslot1->nvalues; i++)
{
int j;
@@ -2790,6 +2982,7 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
}
}
}
+ }
/* Sum up frequencies of matched MCVs */
matchfreq1 = 0.0;
for (i = 0; i < sslot1->nvalues; i++)
--
2.34.1