Hi hackers,
eqjoinsel() can be optimized by not reading MCV stats if at least one of
the two join attributes is unique. As primary keys are implicitly unique
this situation can occur frequently. For unique columns no MCV stats are
stored and eqjoinsel_inner() and eqjoinsel_semi(), called from
eqjoinsel(), only consider MCV stats in the join selectivity computation
if they're present on both columns. Attached is a small patch that
implements the skipping.
With this change we saw some queries improve planning time by more than
2x, especially with larger values for default_statistics_target. That's
because get_attstatsslot() deconstructs the array holding the MCV. The
size of that array depends on default_statistics_target.
Thanks for your consideration!
--
David Geier
(ServiceNow)
From 7a8176b9eb9dd9982662d83b86996c5402378674 Mon Sep 17 00:00:00 2001
From: David Geier <geidav...@gmail.com>
Date: Fri, 11 Nov 2022 12:33:11 +0100
Subject: [PATCH] Skip reading MCV stats for unique join attributes
MCV stats are not stored for unique columns. If one of the two join
attributes is unique, reading the MCV stats can be skipped for both
columns because they are only ever used during join selectivity
estimation if they're present on both attributes.
---
src/backend/utils/adt/selfuncs.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index d597b7e81f..3c9007c8d1 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -2263,6 +2263,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
bool have_mcvs2 = false;
bool join_is_reversed;
RelOptInfo *inner_rel;
+ bool get_mcv_stats;
get_join_variables(root, args, sjinfo,
&vardata1, &vardata2, &join_is_reversed);
@@ -2270,6 +2271,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
nd1 = get_variable_numdistinct(&vardata1, &isdefault1);
nd2 = get_variable_numdistinct(&vardata2, &isdefault2);
+ get_mcv_stats = !vardata1.isunique && !vardata2.isunique;
opfuncoid = get_opcode(operator);
memset(&sslot1, 0, sizeof(sslot1));
@@ -2279,7 +2281,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
{
/* note we allow use of nullfrac regardless of security check */
stats1 = (Form_pg_statistic) GETSTRUCT(vardata1.statsTuple);
- if (statistic_proc_security_check(&vardata1, opfuncoid))
+ if (get_mcv_stats && statistic_proc_security_check(&vardata1, opfuncoid))
have_mcvs1 = get_attstatsslot(&sslot1, vardata1.statsTuple,
STATISTIC_KIND_MCV, InvalidOid,
ATTSTATSSLOT_VALUES | ATTSTATSSLOT_NUMBERS);
@@ -2289,7 +2291,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
{
/* note we allow use of nullfrac regardless of security check */
stats2 = (Form_pg_statistic) GETSTRUCT(vardata2.statsTuple);
- if (statistic_proc_security_check(&vardata2, opfuncoid))
+ if (get_mcv_stats && statistic_proc_security_check(&vardata2, opfuncoid))
have_mcvs2 = get_attstatsslot(&sslot2, vardata2.statsTuple,
STATISTIC_KIND_MCV, InvalidOid,
ATTSTATSSLOT_VALUES | ATTSTATSSLOT_NUMBERS);
--
2.34.1