PoC: Using Count-Min Sketch for join cardinality estimation

Tomas Vondra Wed, 16 Jun 2021 09:23:51 -0700

Hi,

During the recent "CMU vaccination" talk given by Robert [1], a coupleof the attendees (some of which were engineers working on various otherdatabase systems) asked whether PostgreSQL optimizer uses sketches.Which it does not, as far as I'm aware. Perhaps some of our statisticscould be considered sketches, but we've not using data structures likehyperloglog, count-min sketch, etc.

But it reminded me that I thought about using one of the common sketchesin the past, namely the Count-Min sketch [2], which is often mentionedas useful to estimating join cardinalities. There's a couple papersexplaining how it works [3], [4], [5], but the general idea is that itapproximates frequency table, i.e. a table tracking frequencies for allvalues. Our MCV list is one way to do that, but that only keeps alimited number of common values - for the rest we approximate thefrequencies as uniform distribution. When the MCV covers only a tinyfraction of the data, or missing entirely, this may be an issue.

We can't possibly store exact frequencies all values for tables withmany distinct values. The Count-Min sketch works around this by trackingfrequencies in a limited number of counters - imagine you have 128counters. To add a value to the sketch, we hash it and the hash sayswhich counter to increment.

To estimate a join size, we simply calculate "dot product" of the twosketches (which need to use the same number of counters):


  S = sum(s1(i) * s2(i) for i in 1 .. 128)

The actual sketches have multiple of those arrays (e.g. 8) usingdifferent hash functions, and we use the minimum of the sums. Thatlimits the error, but I'll ignore it here for simplicity.

The attached patch is a very simple (and perhaps naive) implementationadding count-min sketch to pg_statistic for all attributes with a hashfunction (as a new statistics slot kind), and considering it inequijoinsel_inner. There's a GUC use_count_min_sketch to make it easierto see how it works.


A simple example

  create table t1 (a int, b int);
  create table t2 (a int, b int);

  insert into t1 select pow(random(), 2) * 1000, i
    from generate_series(1,30000) s(i);
  insert into t2 select pow(random(), 2) * 1000, i
    from generate_series(1,30000) s(i);

  analyze t1, t2;

  explain analyze select * from t1 join t2 using (a);

                             QUERY PLAN
  ------------------------------------------------------------------
   Hash Join  (cost=808.00..115470.35 rows=8936685 width=12)
              (actual time=31.231..1083.330 rows=2177649 loops=1)

So it's about 4x over-estimated, while without the count-min sketch it'sabout 2x under-estimated:


  set use_count_min_sketch = false;

                             QUERY PLAN
  ------------------------------------------------------------------
   Merge Join  (cost=5327.96..18964.16 rows=899101 width=12)
               (actual time=60.780..2896.829 rows=2177649 loops=1)

More about this a bit later.

The nice thing on count-min sketch is that there are pretty clearboundaries for error:


  size(t1,t2) <= dot_product(s1,2) <= epsilon * size(t1) * size(t2)

where s1/s2 are sketches on t1/t2, and epsilon is relative error. Usermay pick epsilon, and that determines size of the necessary sketch as2/epsilon. So with 128 buckets, the relative error is ~1.6%.

The trouble here is that this is relative to cartesian product of thetwo relations. So with two relations, each 30k rows, the error is up to~14.5M. Which is not great. We can pick lower epsilon value, but thatincreases the sketch size.

Where does the error come from? Each counter combines frequencies formultiple distinct values. So for example with 128 counters and 1024distinct values, each counter is representing ~4 values on average. Butthe dot product ignores this - it treats as if all the frequency was fora single value. It calculates the worst case for the bucket, because ifyou split the frequency e.g. in half, the estimate is always lower


   (f/2)^2 + (f/2)^2 < f^2

So maybe this could calculate the average number of items per counterand correct for this, somehow. We'd lose some of the sketch guarantees,but maybe it's the right thing to do.

There's a bunch of commented-out code doing this in different ways, andwith the geometric mean variant the result looks like this:


                             QUERY PLAN
  ------------------------------------------------------------------
   Merge Join  (cost=5328.34..53412.58 rows=3195688 width=12)
               (actual time=64.037..2937.818 rows=2177649 loops=1)

which is much closer, but of course that depends on how exactly is thedata set skewed.



There's a bunch of other open questions:

1) The papers about count-min sketch seem to be written for streaminguse cases, which implies all the inserted data pass through the sketch.This patch only builds the sketch on analyze sample, which makes it lessreliable. I doubt we want to do something different (e.g. because it'drequire handling deletes, etc.).

2) The patch considers the sketch before MCVs, simply because it makesit much simpler to enable/disable the sketch, and compare it to MCVs.That's probably not what should be done - if we have MCVs, we shouldprefer using that, simply because it determines the frequencies moreaccurately than the sketch. And only use the sketch as a fallback, whenwe don't have MCVs on both sides of the join, instead of just assuminguniform distribution and relying on ndistinct.

We may have histograms, but AFAIK we don't use those when estimatingjoins (at least not equijoins). That's another thing we might maybe lookinto, comparing the histograms to verify how much they overlap. Butthat's irrelevant here.

Anyway, count-min sketches would be a better way to estimate the partnot covered by MCVs - we might even assume the uniform distribution forindividual counters, because that's what we do without MCVs anyway.

3) It's not clear to me how to extend this for multiple columns, so thatit can be used to estimate joins on multiple correlated columns. ForMCVs it was pretty simple, but let's say we add this as a new extendedstatistics kind, and user does


    CREATE STATISTICS s (cmsketch) ON a, b, c FROM t;

Should that build sketch on (a,b,c) or something else? The trouble is asketch on (a,b,c) is useless for joins on (a,b).

We might do something like for ndistinct coefficients, and build asketch for each combination of the columns. The sketches are much largerthan ndistinct coefficients, though. But maybe that's fine - with 8columns we'd need ~56 sketches, each ~8kB. So that's not extreme.



regards

[1]https://db.cs.cmu.edu/events/vaccination-2021-postgresql-optimizer-methodology-robert-haas/


[2] https://en.wikipedia.org/wiki/Count%E2%80%93min_sketch

[3] https://dsf.berkeley.edu/cs286/papers/countmin-latin2004.pdf

[4] http://dimacs.rutgers.edu/~graham/pubs/papers/cmsoft.pdf

[5] http://dimacs.rutgers.edu/~graham/pubs/papers/cmz-sdm.pdf

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 999d984068..81c7975f31 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -201,6 +201,7 @@ CREATE VIEW pg_stats WITH (security_barrier) AS
             WHEN stakind3 = 1 THEN stavalues3
             WHEN stakind4 = 1 THEN stavalues4
             WHEN stakind5 = 1 THEN stavalues5
+            WHEN stakind6 = 1 THEN stavalues6
         END AS most_common_vals,
         CASE
             WHEN stakind1 = 1 THEN stanumbers1
@@ -208,6 +209,7 @@ CREATE VIEW pg_stats WITH (security_barrier) AS
             WHEN stakind3 = 1 THEN stanumbers3
             WHEN stakind4 = 1 THEN stanumbers4
             WHEN stakind5 = 1 THEN stanumbers5
+            WHEN stakind6 = 1 THEN stanumbers6
         END AS most_common_freqs,
         CASE
             WHEN stakind1 = 2 THEN stavalues1
@@ -215,6 +217,7 @@ CREATE VIEW pg_stats WITH (security_barrier) AS
             WHEN stakind3 = 2 THEN stavalues3
             WHEN stakind4 = 2 THEN stavalues4
             WHEN stakind5 = 2 THEN stavalues5
+            WHEN stakind6 = 2 THEN stavalues6
         END AS histogram_bounds,
         CASE
             WHEN stakind1 = 3 THEN stanumbers1[1]
@@ -222,6 +225,7 @@ CREATE VIEW pg_stats WITH (security_barrier) AS
             WHEN stakind3 = 3 THEN stanumbers3[1]
             WHEN stakind4 = 3 THEN stanumbers4[1]
             WHEN stakind5 = 3 THEN stanumbers5[1]
+            WHEN stakind6 = 3 THEN stanumbers6[1]
         END AS correlation,
         CASE
             WHEN stakind1 = 4 THEN stavalues1
@@ -229,6 +233,7 @@ CREATE VIEW pg_stats WITH (security_barrier) AS
             WHEN stakind3 = 4 THEN stavalues3
             WHEN stakind4 = 4 THEN stavalues4
             WHEN stakind5 = 4 THEN stavalues5
+            WHEN stakind6 = 4 THEN stavalues6
         END AS most_common_elems,
         CASE
             WHEN stakind1 = 4 THEN stanumbers1
@@ -236,6 +241,7 @@ CREATE VIEW pg_stats WITH (security_barrier) AS
             WHEN stakind3 = 4 THEN stanumbers3
             WHEN stakind4 = 4 THEN stanumbers4
             WHEN stakind5 = 4 THEN stanumbers5
+            WHEN stakind6 = 4 THEN stanumbers6
         END AS most_common_elem_freqs,
         CASE
             WHEN stakind1 = 5 THEN stanumbers1
@@ -243,7 +249,16 @@ CREATE VIEW pg_stats WITH (security_barrier) AS
             WHEN stakind3 = 5 THEN stanumbers3
             WHEN stakind4 = 5 THEN stanumbers4
             WHEN stakind5 = 5 THEN stanumbers5
-        END AS elem_count_histogram
+            WHEN stakind6 = 5 THEN stanumbers6
+        END AS elem_count_histogram,
+        CASE
+            WHEN stakind1 = 8 THEN stanumbers1
+            WHEN stakind2 = 8 THEN stanumbers2
+            WHEN stakind3 = 8 THEN stanumbers3
+            WHEN stakind4 = 8 THEN stanumbers4
+            WHEN stakind5 = 8 THEN stanumbers5
+            WHEN stakind6 = 8 THEN stanumbers6
+        END AS count_min_sketch
     FROM pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)
          JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
          LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace)
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 426c1e6710..a7970adaa6 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -66,6 +66,7 @@
 #include "utils/spccache.h"
 #include "utils/syscache.h"
 #include "utils/timestamp.h"
+#include "utils/typcache.h"
 
 
 /* Per-index data for ANALYZE */
@@ -2369,6 +2370,75 @@ compute_distinct_stats(VacAttrStatsP stats,
 }
 
 
+/*
+ * depth 8 and width 128 is sufficient for relative error ~1.5% with a
+ * probability of approximately 99.6%
+ */
+#define	CM_SKETCH_DEPTH	8
+#define	CM_SKETCH_WIDTH	128
+
+/* hard-coded seeds to create CM_SKETCH_DEPTH hash functions */
+static int64 coun_min_sketch_seeds[] = {460301880, 158177425, 666659290, 607370179,
+										282915002, 235039873, 62050793, 177805379};
+
+typedef struct count_min_sketch {
+	int	nvalues;
+	int	depth;
+	int	width;
+	int	counters[FLEXIBLE_ARRAY_MEMBER];
+} count_min_sketch;
+
+static count_min_sketch *
+count_min_sketch_alloc(void)
+{
+	count_min_sketch *sketch;
+
+	sketch = palloc0(offsetof(count_min_sketch, counters) +
+					 sizeof(int) * CM_SKETCH_DEPTH * CM_SKETCH_WIDTH);
+
+	sketch->depth = CM_SKETCH_DEPTH;
+	sketch->width = CM_SKETCH_WIDTH;
+
+	return sketch;
+}
+
+static void
+count_min_sketch_add(count_min_sketch *sketch,
+					 TypeCacheEntry *typentry, Oid collation, Datum value)
+{
+	int	i;
+
+	if (!sketch)
+		return;
+
+	sketch->nvalues++;
+
+	for (i = 0; i < CM_SKETCH_DEPTH; i++)
+	{
+		LOCAL_FCINFO(locfcinfo, 2);
+		uint64	hash_value;
+		uint64	index;
+
+		InitFunctionCallInfoData(*locfcinfo, &typentry->hash_extended_proc_finfo, 2,
+								 collation, NULL, NULL);
+		locfcinfo->args[0].value = value;
+		locfcinfo->args[0].isnull = false;
+
+		locfcinfo->args[1].value = Int64GetDatum(coun_min_sketch_seeds[i]);
+		locfcinfo->args[0].isnull = false;
+
+		hash_value = DatumGetUInt64(FunctionCallInvoke(locfcinfo));
+
+		/* We don't expect hash support functions to return null */
+		Assert(!locfcinfo->isnull);
+
+		/* update the right counter */
+		index = i * CM_SKETCH_WIDTH + (hash_value % CM_SKETCH_WIDTH);
+
+		sketch->counters[index] += 1;
+	}
+}
+
 /*
  *	compute_scalar_stats() -- compute column statistics
  *
@@ -2407,6 +2477,10 @@ compute_scalar_stats(VacAttrStatsP stats,
 	int			num_bins = stats->attr->attstattarget;
 	StdAnalyzeData *mystats = (StdAnalyzeData *) stats->extra_data;
 
+	/* count-min sketch build info */
+	TypeCacheEntry *typentry;
+	count_min_sketch *sketch = NULL;
+
 	values = (ScalarItem *) palloc(samplerows * sizeof(ScalarItem));
 	tupnoLink = (int *) palloc(samplerows * sizeof(int));
 	track = (ScalarMCVItem *) palloc(num_mcv * sizeof(ScalarMCVItem));
@@ -2416,6 +2490,14 @@ compute_scalar_stats(VacAttrStatsP stats,
 	ssup.ssup_collation = stats->attrcollid;
 	ssup.ssup_nulls_first = false;
 
+	/* hashing for count-min sketch */
+	typentry = lookup_type_cache(stats->attrtype->oid, TYPECACHE_HASH_EXTENDED_PROC_FINFO);
+
+	if (OidIsValid(typentry->hash_extended_proc_finfo.fn_oid))
+		sketch = count_min_sketch_alloc();
+	else
+		elog(WARNING, "no hash_extended_proc found for type %d", stats->attrtype->oid);
+
 	/*
 	 * For now, don't perform abbreviated key conversion, because full values
 	 * are required for MCV slot generation.  Supporting that optimization
@@ -2443,6 +2525,8 @@ compute_scalar_stats(VacAttrStatsP stats,
 		}
 		nonnull_cnt++;
 
+		count_min_sketch_add(sketch, typentry, stats->attrcollid, value);
+
 		/*
 		 * If it's a variable-width field, add up widths for average width
 		 * calculation.  Note that if the value is toasted, we use the toasted
@@ -2871,6 +2955,45 @@ compute_scalar_stats(VacAttrStatsP stats,
 			stats->numnumbers[slot_idx] = 1;
 			slot_idx++;
 		}
+
+		/*
+		 * Finally store the count-min sketch (if built) as a simple sequence
+		 * of float4 values
+		 */
+		if (sketch)
+		{
+			int			i;
+			int			nvalues;
+			float4	   *values;
+			MemoryContext old_context;
+
+			nvalues = 3 + CM_SKETCH_DEPTH * CM_SKETCH_WIDTH;
+
+			/* Must copy the target values into anl_context */
+			old_context = MemoryContextSwitchTo(stats->anl_context);
+			values = (float4 *) palloc(nvalues * sizeof(float4));
+			MemoryContextSwitchTo(old_context);
+
+			values[0] = sketch->nvalues;
+			values[1] = sketch->depth;
+			values[2] = sketch->width;
+
+			for (i = 0; i < CM_SKETCH_DEPTH * CM_SKETCH_WIDTH; i++)
+				values[3+i] = sketch->counters[i];
+
+			stats->stakind[slot_idx] = STATISTIC_KIND_COUNT_MIN_SKETCH;
+			stats->staop[slot_idx] = mystats->eqopr;
+			stats->stacoll[slot_idx] = stats->attrcollid;
+			stats->stanumbers[slot_idx] = values;
+			stats->numnumbers[slot_idx] = nvalues;
+
+			/*
+			 * Accept the defaults for stats->statypid and others. They have
+			 * been set before we were called (see vacuum.h)
+			 */
+			slot_idx++;
+		}
+
 	}
 	else if (nonnull_cnt > 0)
 	{
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 0c8c05f6c2..ddd594876d 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -151,7 +151,8 @@ static double eqjoinsel_inner(Oid opfuncoid, Oid collation,
 							  bool isdefault1, bool isdefault2,
 							  AttStatsSlot *sslot1, AttStatsSlot *sslot2,
 							  Form_pg_statistic stats1, Form_pg_statistic stats2,
-							  bool have_mcvs1, bool have_mcvs2);
+							  bool have_mcvs1, bool have_mcvs2,
+							  bool have_cms1, bool have_cms2);
 static double eqjoinsel_semi(Oid opfuncoid, Oid collation,
 							 VariableStatData *vardata1, VariableStatData *vardata2,
 							 double nd1, double nd2,
@@ -212,6 +213,8 @@ static bool get_actual_variable_endpoint(Relation heapRel,
 static RelOptInfo *find_join_input_rel(PlannerInfo *root, Relids relids);
 
 
+bool use_count_min_sketch = true;
+
 /*
  *		eqsel			- Selectivity of "=" for any data types.
  *
@@ -2260,6 +2263,8 @@ eqjoinsel(PG_FUNCTION_ARGS)
 	Form_pg_statistic stats2 = NULL;
 	bool		have_mcvs1 = false;
 	bool		have_mcvs2 = false;
+	bool		have_cms1 = false;
+	bool		have_cms2 = false;
 	bool		join_is_reversed;
 	RelOptInfo *inner_rel;
 
@@ -2279,9 +2284,14 @@ eqjoinsel(PG_FUNCTION_ARGS)
 		/* note we allow use of nullfrac regardless of security check */
 		stats1 = (Form_pg_statistic) GETSTRUCT(vardata1.statsTuple);
 		if (statistic_proc_security_check(&vardata1, opfuncoid))
+		{
 			have_mcvs1 = get_attstatsslot(&sslot1, vardata1.statsTuple,
 										  STATISTIC_KIND_MCV, InvalidOid,
 										  ATTSTATSSLOT_VALUES | ATTSTATSSLOT_NUMBERS);
+			have_cms1 = get_attstatsslot(&sslot1, vardata1.statsTuple,
+										  STATISTIC_KIND_COUNT_MIN_SKETCH, InvalidOid,
+										  ATTSTATSSLOT_NUMBERS);
+		}
 	}
 
 	if (HeapTupleIsValid(vardata2.statsTuple))
@@ -2289,9 +2299,14 @@ eqjoinsel(PG_FUNCTION_ARGS)
 		/* note we allow use of nullfrac regardless of security check */
 		stats2 = (Form_pg_statistic) GETSTRUCT(vardata2.statsTuple);
 		if (statistic_proc_security_check(&vardata2, opfuncoid))
+		{
 			have_mcvs2 = get_attstatsslot(&sslot2, vardata2.statsTuple,
 										  STATISTIC_KIND_MCV, InvalidOid,
 										  ATTSTATSSLOT_VALUES | ATTSTATSSLOT_NUMBERS);
+			have_cms2 = get_attstatsslot(&sslot2, vardata2.statsTuple,
+										  STATISTIC_KIND_COUNT_MIN_SKETCH, InvalidOid,
+										  ATTSTATSSLOT_NUMBERS);
+		}
 	}
 
 	/* We need to compute the inner-join selectivity in all cases */
@@ -2301,7 +2316,8 @@ eqjoinsel(PG_FUNCTION_ARGS)
 								  isdefault1, isdefault2,
 								  &sslot1, &sslot2,
 								  stats1, stats2,
-								  have_mcvs1, have_mcvs2);
+								  have_mcvs1, have_mcvs2,
+								  have_cms1, have_cms2);
 
 	switch (sjinfo->jointype)
 	{
@@ -2389,11 +2405,90 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
 				bool isdefault1, bool isdefault2,
 				AttStatsSlot *sslot1, AttStatsSlot *sslot2,
 				Form_pg_statistic stats1, Form_pg_statistic stats2,
-				bool have_mcvs1, bool have_mcvs2)
+				bool have_mcvs1, bool have_mcvs2,
+				bool have_cms1, bool have_cms2)
 {
 	double		selec;
 
-	if (have_mcvs1 && have_mcvs2)
+	if (have_cms1 && have_cms2 && use_count_min_sketch)
+	{
+		int	i;
+		int	num1 = sslot1->numbers[0];
+		int	num2 = sslot2->numbers[0];
+		double	cross_size = (double) num1 * num2;
+		double	estimate = 0;
+
+//		double	error_frac, error, certainty;
+
+		/*
+		 * This is wrong, because the ndistinct esimates are for the whole
+		 * data set, not just for the sample (which is what the sketch is
+		 * calculated from)
+		 */
+//		double	nd1_per_bucket = nd1 / sslot1->numbers[2];
+//		double	nd2_per_bucket = nd2 / sslot2->numbers[2];
+
+/* keep the same values as in analyze.c */
+#define	CM_SKETCH_DEPTH	8
+#define	CM_SKETCH_WIDTH	128
+
+		Assert(sslot1->numbers[1] == sslot2->numbers[1]);
+		Assert(sslot1->numbers[2] == sslot2->numbers[2]);
+
+		Assert(sslot1->numbers[1] == CM_SKETCH_DEPTH);
+		Assert(sslot1->numbers[2] == CM_SKETCH_WIDTH);
+
+//		error_frac = 2.0 / sslot1->numbers[2];
+//		error = error_frac * sslot1->numbers[0] * sslot2->numbers[0];
+//		certainty = 1 - pow(0.5, sslot1->numbers[1]);
+
+// elog(WARNING, "relative error = %f (%f)", error_frac, error);
+// elog(WARNING, "certainty = %f", certainty);
+
+		for (i = 0; i < CM_SKETCH_DEPTH; i++)
+		{
+			int j;
+			double sum = 0;
+			for (j = 0; j < CM_SKETCH_WIDTH; j++)
+			{
+				double count1 = sslot1->numbers[3 + i * CM_SKETCH_WIDTH + j];
+				double count2 = sslot2->numbers[3 + i * CM_SKETCH_WIDTH + j];
+
+				/* Assume all groups in the bucket are of equal size */
+				// double count1_avg = (count1 / nd1_per_bucket);
+				// double count2_avg = (count2 / nd2_per_bucket);
+
+				/*
+				 * Geometric mean between average and "single group" in the
+				 * bucket - models somewhat skewed distribution with smaller
+				 * and larger groups.
+				 */
+				// double count1_avg = sqrt(count1 * (count1 / nd1_per_bucket));
+				// double count2_avg = sqrt(count2 * (count2 / nd2_per_bucket));
+
+				/*
+				 * Correction coefficient - number of groups to consider, we pick
+				 * minimum because if there are A and B items, (A < B) then we
+				 * can't join more than A groups.
+				 */
+				// double nd_min = Min(count1 / count1_avg, count2 / count2_avg);
+				// sum += count1_avg * count2_avg * nd_min;
+
+				/*
+				 * This is what the paper does (more or less considers the
+				 * whole bucket at a single group, matching everything from
+				 * the other side.
+				 */
+				sum += count1 * count2;
+			}
+
+			if ((i == 0) || (sum < estimate))
+				estimate = sum;
+		}
+
+		selec = estimate / cross_size;
+	}
+	else if (have_mcvs1 && have_mcvs2)
 	{
 		/*
 		 * We have most-common-value lists for both relations.  Run through
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 68b62d523d..e41a1a2ca1 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -103,6 +103,7 @@
 #include "utils/ps_status.h"
 #include "utils/queryjumble.h"
 #include "utils/rls.h"
+#include "utils/selfuncs.h"
 #include "utils/snapmgr.h"
 #include "utils/tzparser.h"
 #include "utils/inval.h"
@@ -2109,6 +2110,15 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"use_count_min_sketch", PGC_SUSET, DEVELOPER_OPTIONS,
+			gettext_noop("use Count-Min sketch for join estimates"),
+		},
+		&use_count_min_sketch,
+		true,
+		NULL, NULL, NULL
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, false, NULL, NULL, NULL
diff --git a/src/include/catalog/pg_statistic.h b/src/include/catalog/pg_statistic.h
index d1827858e2..8d28e7dc49 100644
--- a/src/include/catalog/pg_statistic.h
+++ b/src/include/catalog/pg_statistic.h
@@ -90,18 +90,21 @@ CATALOG(pg_statistic,2619,StatisticRelationId)
 	int16		stakind3;
 	int16		stakind4;
 	int16		stakind5;
+	int16		stakind6;
 
 	Oid			staop1 BKI_LOOKUP_OPT(pg_operator);
 	Oid			staop2 BKI_LOOKUP_OPT(pg_operator);
 	Oid			staop3 BKI_LOOKUP_OPT(pg_operator);
 	Oid			staop4 BKI_LOOKUP_OPT(pg_operator);
 	Oid			staop5 BKI_LOOKUP_OPT(pg_operator);
+	Oid			staop6 BKI_LOOKUP_OPT(pg_operator);
 
 	Oid			stacoll1 BKI_LOOKUP_OPT(pg_collation);
 	Oid			stacoll2 BKI_LOOKUP_OPT(pg_collation);
 	Oid			stacoll3 BKI_LOOKUP_OPT(pg_collation);
 	Oid			stacoll4 BKI_LOOKUP_OPT(pg_collation);
 	Oid			stacoll5 BKI_LOOKUP_OPT(pg_collation);
+	Oid			stacoll6 BKI_LOOKUP_OPT(pg_collation);
 
 #ifdef CATALOG_VARLEN			/* variable-length fields start here */
 	float4		stanumbers1[1];
@@ -109,6 +112,7 @@ CATALOG(pg_statistic,2619,StatisticRelationId)
 	float4		stanumbers3[1];
 	float4		stanumbers4[1];
 	float4		stanumbers5[1];
+	float4		stanumbers6[1];
 
 	/*
 	 * Values in these arrays are values of the column's data type, or of some
@@ -121,10 +125,11 @@ CATALOG(pg_statistic,2619,StatisticRelationId)
 	anyarray	stavalues3;
 	anyarray	stavalues4;
 	anyarray	stavalues5;
+	anyarray	stavalues6;
 #endif
 } FormData_pg_statistic;
 
-#define STATISTIC_NUM_SLOTS  5
+#define STATISTIC_NUM_SLOTS  6
 
 
 /* ----------------
@@ -278,6 +283,19 @@ DECLARE_FOREIGN_KEY((starelid, staattnum), pg_attribute, (attrelid, attnum));
  */
 #define STATISTIC_KIND_BOUNDS_HISTOGRAM  7
 
+/*
+ * A "Count-Min Sketch" slot is storing sketch used to estimate frequencies
+ * of a particular value. staop is the OID of the "=" operator used to decide
+ * whether values are the same or not, and stacoll is the collation used
+ * (same as column's collation).  stanumbers contains values encoding the
+ * Count-Min Sketch. Number of items, depth, width, and (depth x width)
+ * counters. In principle the values are integers, but we store them as
+ * float4 - that's simple, and float4 can store exactly integers with up
+ * to 7 decimal digits (that's enough for 3000000 rows, which is the max
+ * with statistics target 10000).
+ */
+#define STATISTIC_KIND_COUNT_MIN_SKETCH  8
+
 #endif							/* EXPOSE_TO_CLIENT_CODE */
 
 #endif							/* PG_STATISTIC_H */
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index 9dd444e1ff..9301dcc3a5 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -133,6 +133,8 @@ typedef struct
 	double		num_sa_scans;	/* # indexscans from ScalarArrayOpExprs */
 } GenericCosts;
 
+extern bool use_count_min_sketch;
+
 /* Hooks for plugins to get control when we ask for stats */
 typedef bool (*get_relation_stats_hook_type) (PlannerInfo *root,
 											  RangeTblEntry *rte,

PoC: Using Count-Min Sketch for join cardinality estimation

Reply via email to