Thanks everyone for the great feedback and suggestions.
Yes, it is.  Using zero flag would short-cut get_attstatsslot() to just
return whether the slot type exists without loading it.  Do you think we
need to emphasize this use case in the comments for 'flags'?
Perhaps, it's not really obvious now.
Comment added.


I wonder whether we need to also check statistic_proc_security_check()
when determining if MCVs exists in both sides.
Yeah, I thought about hoisting the statistic_proc_security_check
tests up into get_mcv_stats.  I don't think it's a great idea
though.  Again, it'd complicate untangling this if we ever
generalize the use of MCVs in this function.  Also, I don't
think we should be micro-optimizing the case where the security
check doesn't pass --- if it doesn't, you're going to be hurting
from bad plans a lot more than you are from some wasted cycles
here.
Sounds reasonable.

Attached is v2 of the patch.
This is basically Tom's version plus a comment for the flags of get_attstatslot() as suggested by Richard.
I couldn't come up with any reasonable way of writing an automated test 
for that.
Any ideas?

--
David Geier
(ServiceNow)
From 66b16792e8e16dad33adb5614a0936cc41002f4c Mon Sep 17 00:00:00 2001
From: David Geier <geidav...@gmail.com>
Date: Fri, 18 Nov 2022 09:35:08 +0100
Subject: [PATCH] Don't read MCV stats needlessly in join selectivity
 estimation

Join selectivity estimation only uses MCV stats if they're present
on both join attributes. Therefore, if MCV stats are not available
on one of the two columns, skip reading MCV stats altogether.

Frequently encountered situations in which MCV stats not available
is unique columns or columns for which all values have roughly
the same frequency and therefore ANALYZE decided to not store them.
---
 src/backend/utils/adt/selfuncs.c    | 15 +++++++++++++--
 src/backend/utils/cache/lsyscache.c |  4 ++++
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index d597b7e81f..5baab4e3ac 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -2263,6 +2263,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
 	bool		have_mcvs2 = false;
 	bool		join_is_reversed;
 	RelOptInfo *inner_rel;
+	bool		get_mcv_stats;
 
 	get_join_variables(root, args, sjinfo,
 					   &vardata1, &vardata2, &join_is_reversed);
@@ -2275,11 +2276,21 @@ eqjoinsel(PG_FUNCTION_ARGS)
 	memset(&sslot1, 0, sizeof(sslot1));
 	memset(&sslot2, 0, sizeof(sslot2));
 
+	/*
+	 * There is no use in fetching one side's MCVs if we lack MCVs for the
+	 * other side, so do a quick check to verify that both stats exist.
+	 */
+	get_mcv_stats = (HeapTupleIsValid(vardata1.statsTuple) &&
+			 HeapTupleIsValid(vardata2.statsTuple) &&
+			 get_attstatsslot(&sslot1, vardata1.statsTuple, STATISTIC_KIND_MCV, InvalidOid, 0) &&
+			 get_attstatsslot(&sslot2, vardata2.statsTuple, STATISTIC_KIND_MCV, InvalidOid, 0));
+    
+    
 	if (HeapTupleIsValid(vardata1.statsTuple))
 	{
 		/* note we allow use of nullfrac regardless of security check */
 		stats1 = (Form_pg_statistic) GETSTRUCT(vardata1.statsTuple);
-		if (statistic_proc_security_check(&vardata1, opfuncoid))
+		if (get_mcv_stats && statistic_proc_security_check(&vardata1, opfuncoid))
 			have_mcvs1 = get_attstatsslot(&sslot1, vardata1.statsTuple,
 										  STATISTIC_KIND_MCV, InvalidOid,
 										  ATTSTATSSLOT_VALUES | ATTSTATSSLOT_NUMBERS);
@@ -2289,7 +2300,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
 	{
 		/* note we allow use of nullfrac regardless of security check */
 		stats2 = (Form_pg_statistic) GETSTRUCT(vardata2.statsTuple);
-		if (statistic_proc_security_check(&vardata2, opfuncoid))
+		if (get_mcv_stats && statistic_proc_security_check(&vardata2, opfuncoid))
 			have_mcvs2 = get_attstatsslot(&sslot2, vardata2.statsTuple,
 										  STATISTIC_KIND_MCV, InvalidOid,
 										  ATTSTATSSLOT_VALUES | ATTSTATSSLOT_NUMBERS);
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index a16a63f495..191f68f7b4 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -3157,6 +3157,10 @@ get_attavgwidth(Oid relid, AttrNumber attnum)
  * reqkind: STAKIND code for desired statistics slot kind.
  * reqop: STAOP value wanted, or InvalidOid if don't care.
  * flags: bitmask of ATTSTATSSLOT_VALUES and/or ATTSTATSSLOT_NUMBERS.
+ *        Passing 0 will only check if the requested slot type exists but not
+ *        extract any content. This is useful in cases where MCV stats are only
+ *        useful if they exists on all involved columns (e.g. during join
+ *        selectivity computation).
  *
  * If a matching slot is found, true is returned, and *sslot is filled thus:
  * staop: receives the actual STAOP value.
-- 
2.34.1

Reply via email to