Hi hackers,

eqjoinsel() can be optimized by not reading MCV stats if at least one of the two join attributes is unique. As primary keys are implicitly unique this situation can occur frequently. For unique columns no MCV stats are stored and eqjoinsel_inner() and eqjoinsel_semi(), called from eqjoinsel(), only consider MCV stats in the join selectivity computation if they're present on both columns. Attached is a small patch that implements the skipping.

With this change we saw some queries improve planning time by more than 2x, especially with larger values for default_statistics_target. That's because get_attstatsslot() deconstructs the array holding the MCV. The size of that array depends on default_statistics_target.

Thanks for your consideration!

--
David Geier
(ServiceNow)
From 7a8176b9eb9dd9982662d83b86996c5402378674 Mon Sep 17 00:00:00 2001
From: David Geier <geidav...@gmail.com>
Date: Fri, 11 Nov 2022 12:33:11 +0100
Subject: [PATCH] Skip reading MCV stats for unique join attributes

MCV stats are not stored for unique columns. If one of the two join
attributes is unique, reading the MCV stats can be skipped for both
columns because they are only ever used during join selectivity
estimation if they're present on both attributes.
---
 src/backend/utils/adt/selfuncs.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index d597b7e81f..3c9007c8d1 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -2263,6 +2263,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
 	bool		have_mcvs2 = false;
 	bool		join_is_reversed;
 	RelOptInfo *inner_rel;
+	bool		get_mcv_stats;
 
 	get_join_variables(root, args, sjinfo,
 					   &vardata1, &vardata2, &join_is_reversed);
@@ -2270,6 +2271,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
 	nd1 = get_variable_numdistinct(&vardata1, &isdefault1);
 	nd2 = get_variable_numdistinct(&vardata2, &isdefault2);
 
+	get_mcv_stats = !vardata1.isunique && !vardata2.isunique;
 	opfuncoid = get_opcode(operator);
 
 	memset(&sslot1, 0, sizeof(sslot1));
@@ -2279,7 +2281,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
 	{
 		/* note we allow use of nullfrac regardless of security check */
 		stats1 = (Form_pg_statistic) GETSTRUCT(vardata1.statsTuple);
-		if (statistic_proc_security_check(&vardata1, opfuncoid))
+		if (get_mcv_stats && statistic_proc_security_check(&vardata1, opfuncoid))
 			have_mcvs1 = get_attstatsslot(&sslot1, vardata1.statsTuple,
 										  STATISTIC_KIND_MCV, InvalidOid,
 										  ATTSTATSSLOT_VALUES | ATTSTATSSLOT_NUMBERS);
@@ -2289,7 +2291,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
 	{
 		/* note we allow use of nullfrac regardless of security check */
 		stats2 = (Form_pg_statistic) GETSTRUCT(vardata2.statsTuple);
-		if (statistic_proc_security_check(&vardata2, opfuncoid))
+		if (get_mcv_stats && statistic_proc_security_check(&vardata2, opfuncoid))
 			have_mcvs2 = get_attstatsslot(&sslot2, vardata2.statsTuple,
 										  STATISTIC_KIND_MCV, InvalidOid,
 										  ATTSTATSSLOT_VALUES | ATTSTATSSLOT_NUMBERS);
-- 
2.34.1

Reply via email to