Hello Tom & Tomas,

If the choice is between reporting the failure to the user, and
addressing the failure, surely the latter would be the default option?
Particularly if the user can't really address the issue easily
(recompiling psql is not very practical solution).

I remain of the opinion that we ought to simply rip out support for
zipfian with s < 1.

+1 to that

If this is done, some people with zipfian distribution that currently work might be unhappy.

After giving it some thought, I think that this cannot be fully fixed for 12.

The attached patch removes the code for param in (0, 1), and slightly improve the documentation about the performance, if you want to proceed.

For s > 1, there is no such constraint, and it works fine, there is no reason to remove it.

Given the constraint of Jim Gray's approximated method for s in (0, 1), which really does zipfian for the first two integers and then uses an exponential approximation, the only approach is that the parameters must be computed in a partial eval preparation phase before the bench code is run. This means that only (mostly) constants would be allowed as parameters when s is in (0, 1), but I think that this is acceptable because anyway the method fundamentaly requires it. I think that it can be implemented reasonably well (meaning not too much code), but would requires a few round of reviews if someone implements it (for a reminder, I was only the reviewer on this one). An added benefit would be that the parameter cache could be shared between thread, which would be a good thing.

The attached other attached patch illustrate what I call poor performance for stupid parameters (no point in doing zipfian on 2 integers…) :

  ./pgbench -T 3 -D n=2 -D s=1.01 -f zipf_perf.sql   # 46981 tps
  ./pgbench -T 3 -D n=2 -D s=1.001 -f zipf_perf.sql   # 6187 tps
  ./pgbench -T 3 -D n=2 -D s=1.0001 -f zipf_perf.sql   # 710 tps

  ./pgbench -T 3 -D n=100 -D s=1.01 -f zipf_perf.sql  # 142910 tps
  ./pgbench -T 3 -D n=100 -D s=1.001 -f zipf_perf.sql  # 21214 tps
  ./pgbench -T 3 -D n=100 -D s=1.0001 -f zipf_perf.sql  # 2466 tps

  ./pgbench -T 3 -D n=1000000 -D s=1.01 -f zipf_perf.sql # 376453 tps
  ./pgbench -T 3 -D n=1000000 -D s=1.001 -f zipf_perf.sql # 57441 tps
  ./pgbench -T 3 -D n=1000000 -D s=1.0001 -f zipf_perf.sql # 6780 tps

Maybe the implementation could impose that s is at least 1.001 to avoid
the lower performance?

--
Fabien.
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 24833f46bc..6241b9f9c5 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -1598,23 +1598,18 @@ f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) /
     </listitem>
     <listitem>
      <para>
-      <literal>random_zipfian</literal> generates an approximated bounded Zipfian
-      distribution. For <replaceable>parameter</replaceable> in (0, 1), an
-      approximated algorithm is taken from
-      "Quickly Generating Billion-Record Synthetic Databases",
-      Jim Gray et al, SIGMOD 1994. For <replaceable>parameter</replaceable>
-      in (1, 1000), a rejection method is used, based on
+      <literal>random_zipfian</literal> generates an bounded Zipfian distribution,
+      for <replaceable>parameter</replaceable> in (1, 1000].
+      A rejection method is used, based on
       "Non-Uniform Random Variate Generation", Luc Devroye, p. 550-551,
       Springer 1986. The distribution is not defined when the parameter's
-      value is 1.0. The function's performance is poor for parameter values
-      close and above 1.0 and on a small range.
+      value is 1.0.  The function's drawing performance is poor for parameter
+      values close and above 1.0 (eg under 1.001) and on a small range (eg 10).
      </para>
      <para>
       <replaceable>parameter</replaceable> defines how skewed the distribution
       is. The larger the <replaceable>parameter</replaceable>, the more
       frequently values closer to the beginning of the interval are drawn.
-      The closer to 0 <replaceable>parameter</replaceable> is,
-      the flatter (more uniform) the output distribution.
       The distribution is such that, assuming the range starts from 1,
       the ratio of the probability of drawing <replaceable>k</replaceable>
       versus drawing <replaceable>k+1</replaceable> is
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 4789ab92ee..e62a7d1bff 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -409,35 +409,6 @@ typedef struct
 	int			ecnt;			/* error count */
 } CState;
 
-/*
- * Cache cell for random_zipfian call
- */
-typedef struct
-{
-	/* cell keys */
-	double		s;				/* s - parameter of random_zipfian function */
-	int64		n;				/* number of elements in range (max - min + 1) */
-
-	double		harmonicn;		/* generalizedHarmonicNumber(n, s) */
-	double		alpha;
-	double		beta;
-	double		eta;
-
-	uint64		last_used;		/* last used logical time */
-} ZipfCell;
-
-/*
- * Zipf cache for zeta values
- */
-typedef struct
-{
-	uint64		current;		/* counter for LRU cache replacement algorithm */
-
-	int			nb_cells;		/* number of filled cells */
-	int			overflowCount;	/* number of cache overflows */
-	ZipfCell	cells[ZIPF_CACHE_SIZE];
-} ZipfCache;
-
 /*
  * Thread state
  */
@@ -459,8 +430,6 @@ typedef struct
 
 	int64		throttle_trigger;	/* previous/next throttling (us) */
 	FILE	   *logfile;		/* where to log, or NULL */
-	ZipfCache	zipf_cache;		/* for thread-safe  zipfian random number
-								 * generation */
 
 	/* per thread collected stats */
 	instr_time	start_time;		/* thread start time */
@@ -982,77 +951,12 @@ getPoissonRand(RandomState *random_state, double center)
 	return (int64) (-log(uniform) * center + 0.5);
 }
 
-/* helper function for getZipfianRand */
-static double
-generalizedHarmonicNumber(int64 n, double s)
-{
-	int			i;
-	double		ans = 0.0;
-
-	for (i = n; i > 1; i--)
-		ans += pow(i, -s);
-	return ans + 1.0;
-}
-
-/* set harmonicn and other parameters to cache cell */
-static void
-zipfSetCacheCell(ZipfCell *cell, int64 n, double s)
-{
-	double		harmonic2;
-
-	cell->n = n;
-	cell->s = s;
-
-	harmonic2 = generalizedHarmonicNumber(2, s);
-	cell->harmonicn = generalizedHarmonicNumber(n, s);
-
-	cell->alpha = 1.0 / (1.0 - s);
-	cell->beta = pow(0.5, s);
-	cell->eta = (1.0 - pow(2.0 / n, 1.0 - s)) / (1.0 - harmonic2 / cell->harmonicn);
-}
-
-/*
- * search for cache cell with keys (n, s)
- * and create new cell if it does not exist
- */
-static ZipfCell *
-zipfFindOrCreateCacheCell(ZipfCache *cache, int64 n, double s)
-{
-	int			i,
-				least_recently_used = 0;
-	ZipfCell   *cell;
-
-	/* search cached cell for given parameters */
-	for (i = 0; i < cache->nb_cells; i++)
-	{
-		cell = &cache->cells[i];
-		if (cell->n == n && cell->s == s)
-			return &cache->cells[i];
-
-		if (cell->last_used < cache->cells[least_recently_used].last_used)
-			least_recently_used = i;
-	}
-
-	/* create new one if it does not exist */
-	if (cache->nb_cells < ZIPF_CACHE_SIZE)
-		i = cache->nb_cells++;
-	else
-	{
-		/* replace LRU cell if cache is full */
-		i = least_recently_used;
-		cache->overflowCount++;
-	}
-
-	zipfSetCacheCell(&cache->cells[i], n, s);
-
-	cache->cells[i].last_used = cache->current++;
-	return &cache->cells[i];
-}
-
 /*
  * Computing zipfian using rejection method, based on
  * "Non-Uniform Random Variate Generation",
  * Luc Devroye, p. 550-551, Springer 1986.
+ *
+ * This works for s > 1.0
  */
 static int64
 computeIterativeZipfian(RandomState *random_state, int64 n, double s)
@@ -1079,39 +983,16 @@ computeIterativeZipfian(RandomState *random_state, int64 n, double s)
 	return (int64) x;
 }
 
-/*
- * Computing zipfian using harmonic numbers, based on algorithm described in
- * "Quickly Generating Billion-Record Synthetic Databases",
- * Jim Gray et al, SIGMOD 1994
- */
-static int64
-computeHarmonicZipfian(ZipfCache *zipf_cache, RandomState *random_state,
-					   int64 n, double s)
-{
-	ZipfCell   *cell = zipfFindOrCreateCacheCell(zipf_cache, n, s);
-	double		uniform = pg_erand48(random_state->xseed);
-	double		uz = uniform * cell->harmonicn;
-
-	if (uz < 1.0)
-		return 1;
-	if (uz < 1.0 + cell->beta)
-		return 2;
-	return 1 + (int64) (cell->n * pow(cell->eta * uniform - cell->eta + 1.0, cell->alpha));
-}
-
 /* random number generator: zipfian distribution from min to max inclusive */
 static int64
-getZipfianRand(ZipfCache *zipf_cache, RandomState *random_state, int64 min,
-			   int64 max, double s)
+getZipfianRand(RandomState *random_state, int64 min, int64 max, double s)
 {
 	int64		n = max - min + 1;
 
 	/* abort if parameter is invalid */
-	Assert(s > 0.0 && s != 1.0 && s <= MAX_ZIPFIAN_PARAM);
+	Assert(1.0 <= s && s <= MAX_ZIPFIAN_PARAM);
 
-	return min - 1 + ((s > 1)
-					  ? computeIterativeZipfian(random_state, n, s)
-					  : computeHarmonicZipfian(zipf_cache, random_state, n, s));
+	return min - 1 + computeIterativeZipfian(random_state, n, s);
 }
 
 /*
@@ -2433,17 +2314,15 @@ evalStandardFunc(TState *thread, CState *st,
 					}
 					else if (func == PGBENCH_RANDOM_ZIPFIAN)
 					{
-						if (param <= 0.0 || param == 1.0 || param > MAX_ZIPFIAN_PARAM)
+						if (param <= 1.0 || param > MAX_ZIPFIAN_PARAM)
 						{
 							fprintf(stderr,
-									"zipfian parameter must be in range (0, 1) U (1, %d]"
+									"zipfian parameter must be in range (1, %d]"
 									" (got %f)\n", MAX_ZIPFIAN_PARAM, param);
 							return false;
 						}
 						setIntValue(retval,
-									getZipfianRand(&thread->zipf_cache,
-												   &st->cs_func_rs,
-												   imin, imax, param));
+									getZipfianRand(&st->cs_func_rs, imin, imax, param));
 					}
 					else		/* exponential */
 					{
@@ -5012,8 +4891,6 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 				tps_include,
 				tps_exclude;
 	int64		ntx = total->cnt - total->skipped;
-	int			i,
-				totalCacheOverflows = 0;
 
 	time_include = INSTR_TIME_GET_DOUBLE(total_time);
 
@@ -5041,15 +4918,6 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 		printf("number of transactions actually processed: " INT64_FORMAT "\n",
 			   ntx);
 	}
-	/* Report zipfian cache overflow */
-	for (i = 0; i < nthreads; i++)
-	{
-		totalCacheOverflows += threads[i].zipf_cache.overflowCount;
-	}
-	if (totalCacheOverflows > 0)
-	{
-		printf("zipfian cache array overflowed %d time(s)\n", totalCacheOverflows);
-	}
 
 	/* Remaining stats are nonsensical if we failed to execute any xacts */
 	if (total->cnt <= 0)
@@ -5937,9 +5805,6 @@ main(int argc, char **argv)
 		initRandomState(&thread->ts_sample_rs);
 		thread->logfile = NULL; /* filled in later */
 		thread->latency_late = 0;
-		thread->zipf_cache.nb_cells = 0;
-		thread->zipf_cache.current = 0;
-		thread->zipf_cache.overflowCount = 0;
 		initStats(&thread->stats, 0);
 
 		nclients_dealt += thread->nstate;
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 45888dc12e..5283a04302 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -676,13 +676,13 @@ SELECT LEAST(}.join(', ', (':i') x 256).q{)}
 	[
 		'set zipfian param to 1',
 		2,
-		[qr{zipfian parameter must be in range \(0, 1\) U \(1, \d+\]}],
+		[qr{zipfian parameter must be in range \(1, \d+\]}],
 		q{\set i random_zipfian(0, 10, 1)}
 	],
 	[
 		'set zipfian param too large',
 		2,
-		[qr{zipfian parameter must be in range \(0, 1\) U \(1, \d+\]}],
+		[qr{zipfian parameter must be in range \(1, \d+\]}],
 		q{\set i random_zipfian(0, 10, 1000000)}
 	],
 	[
@@ -815,33 +815,6 @@ for my $e (@errors)
 		{ $n => $script });
 }
 
-# zipfian cache array overflow
-pgbench(
-	'-t 1', 0,
-	[ qr{processed: 1/1}, qr{zipfian cache array overflowed 1 time\(s\)} ],
-	[qr{^}],
-	'pgbench zipfian array overflow on random_zipfian',
-	{
-		'001_pgbench_random_zipfian' => q{
-\set i random_zipfian(1, 100, 0.5)
-\set i random_zipfian(2, 100, 0.5)
-\set i random_zipfian(3, 100, 0.5)
-\set i random_zipfian(4, 100, 0.5)
-\set i random_zipfian(5, 100, 0.5)
-\set i random_zipfian(6, 100, 0.5)
-\set i random_zipfian(7, 100, 0.5)
-\set i random_zipfian(8, 100, 0.5)
-\set i random_zipfian(9, 100, 0.5)
-\set i random_zipfian(10, 100, 0.5)
-\set i random_zipfian(11, 100, 0.5)
-\set i random_zipfian(12, 100, 0.5)
-\set i random_zipfian(13, 100, 0.5)
-\set i random_zipfian(14, 100, 0.5)
-\set i random_zipfian(15, 100, 0.5)
-\set i random_zipfian(16, 100, 0.5)
-}
-	});
-
 # throttling
 pgbench(
 	'-t 100 -S --rate=100000 --latency-limit=1000000 -c 2 -n -r',

Attachment: zipf_perf.sql
Description: application/sql

Reply via email to