Hello Tom & Tomas,
If the choice is between reporting the failure to the user, and addressing the failure, surely the latter would be the default option? Particularly if the user can't really address the issue easily (recompiling psql is not very practical solution).I remain of the opinion that we ought to simply rip out support for zipfian with s < 1.+1 to thatIf this is done, some people with zipfian distribution that currently work might be unhappy.
After giving it some thought, I think that this cannot be fully fixed for 12.
The attached patch removes the code for param in (0, 1), and slightly improve the documentation about the performance, if you want to proceed.
For s > 1, there is no such constraint, and it works fine, there is no reason to remove it.
Given the constraint of Jim Gray's approximated method for s in (0, 1), which really does zipfian for the first two integers and then uses an exponential approximation, the only approach is that the parameters must be computed in a partial eval preparation phase before the bench code is run. This means that only (mostly) constants would be allowed as parameters when s is in (0, 1), but I think that this is acceptable because anyway the method fundamentaly requires it. I think that it can be implemented reasonably well (meaning not too much code), but would requires a few round of reviews if someone implements it (for a reminder, I was only the reviewer on this one). An added benefit would be that the parameter cache could be shared between thread, which would be a good thing.
The attached other attached patch illustrate what I call poor performance for stupid parameters (no point in doing zipfian on 2 integers…) :
./pgbench -T 3 -D n=2 -D s=1.01 -f zipf_perf.sql # 46981 tps ./pgbench -T 3 -D n=2 -D s=1.001 -f zipf_perf.sql # 6187 tps ./pgbench -T 3 -D n=2 -D s=1.0001 -f zipf_perf.sql # 710 tps ./pgbench -T 3 -D n=100 -D s=1.01 -f zipf_perf.sql # 142910 tps ./pgbench -T 3 -D n=100 -D s=1.001 -f zipf_perf.sql # 21214 tps ./pgbench -T 3 -D n=100 -D s=1.0001 -f zipf_perf.sql # 2466 tps ./pgbench -T 3 -D n=1000000 -D s=1.01 -f zipf_perf.sql # 376453 tps ./pgbench -T 3 -D n=1000000 -D s=1.001 -f zipf_perf.sql # 57441 tps ./pgbench -T 3 -D n=1000000 -D s=1.0001 -f zipf_perf.sql # 6780 tps Maybe the implementation could impose that s is at least 1.001 to avoid the lower performance? -- Fabien.
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index 24833f46bc..6241b9f9c5 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -1598,23 +1598,18 @@ f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) / </listitem> <listitem> <para> - <literal>random_zipfian</literal> generates an approximated bounded Zipfian - distribution. For <replaceable>parameter</replaceable> in (0, 1), an - approximated algorithm is taken from - "Quickly Generating Billion-Record Synthetic Databases", - Jim Gray et al, SIGMOD 1994. For <replaceable>parameter</replaceable> - in (1, 1000), a rejection method is used, based on + <literal>random_zipfian</literal> generates an bounded Zipfian distribution, + for <replaceable>parameter</replaceable> in (1, 1000]. + A rejection method is used, based on "Non-Uniform Random Variate Generation", Luc Devroye, p. 550-551, Springer 1986. The distribution is not defined when the parameter's - value is 1.0. The function's performance is poor for parameter values - close and above 1.0 and on a small range. + value is 1.0. The function's drawing performance is poor for parameter + values close and above 1.0 (eg under 1.001) and on a small range (eg 10). </para> <para> <replaceable>parameter</replaceable> defines how skewed the distribution is. The larger the <replaceable>parameter</replaceable>, the more frequently values closer to the beginning of the interval are drawn. - The closer to 0 <replaceable>parameter</replaceable> is, - the flatter (more uniform) the output distribution. The distribution is such that, assuming the range starts from 1, the ratio of the probability of drawing <replaceable>k</replaceable> versus drawing <replaceable>k+1</replaceable> is diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index 4789ab92ee..e62a7d1bff 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -409,35 +409,6 @@ typedef struct int ecnt; /* error count */ } CState; -/* - * Cache cell for random_zipfian call - */ -typedef struct -{ - /* cell keys */ - double s; /* s - parameter of random_zipfian function */ - int64 n; /* number of elements in range (max - min + 1) */ - - double harmonicn; /* generalizedHarmonicNumber(n, s) */ - double alpha; - double beta; - double eta; - - uint64 last_used; /* last used logical time */ -} ZipfCell; - -/* - * Zipf cache for zeta values - */ -typedef struct -{ - uint64 current; /* counter for LRU cache replacement algorithm */ - - int nb_cells; /* number of filled cells */ - int overflowCount; /* number of cache overflows */ - ZipfCell cells[ZIPF_CACHE_SIZE]; -} ZipfCache; - /* * Thread state */ @@ -459,8 +430,6 @@ typedef struct int64 throttle_trigger; /* previous/next throttling (us) */ FILE *logfile; /* where to log, or NULL */ - ZipfCache zipf_cache; /* for thread-safe zipfian random number - * generation */ /* per thread collected stats */ instr_time start_time; /* thread start time */ @@ -982,77 +951,12 @@ getPoissonRand(RandomState *random_state, double center) return (int64) (-log(uniform) * center + 0.5); } -/* helper function for getZipfianRand */ -static double -generalizedHarmonicNumber(int64 n, double s) -{ - int i; - double ans = 0.0; - - for (i = n; i > 1; i--) - ans += pow(i, -s); - return ans + 1.0; -} - -/* set harmonicn and other parameters to cache cell */ -static void -zipfSetCacheCell(ZipfCell *cell, int64 n, double s) -{ - double harmonic2; - - cell->n = n; - cell->s = s; - - harmonic2 = generalizedHarmonicNumber(2, s); - cell->harmonicn = generalizedHarmonicNumber(n, s); - - cell->alpha = 1.0 / (1.0 - s); - cell->beta = pow(0.5, s); - cell->eta = (1.0 - pow(2.0 / n, 1.0 - s)) / (1.0 - harmonic2 / cell->harmonicn); -} - -/* - * search for cache cell with keys (n, s) - * and create new cell if it does not exist - */ -static ZipfCell * -zipfFindOrCreateCacheCell(ZipfCache *cache, int64 n, double s) -{ - int i, - least_recently_used = 0; - ZipfCell *cell; - - /* search cached cell for given parameters */ - for (i = 0; i < cache->nb_cells; i++) - { - cell = &cache->cells[i]; - if (cell->n == n && cell->s == s) - return &cache->cells[i]; - - if (cell->last_used < cache->cells[least_recently_used].last_used) - least_recently_used = i; - } - - /* create new one if it does not exist */ - if (cache->nb_cells < ZIPF_CACHE_SIZE) - i = cache->nb_cells++; - else - { - /* replace LRU cell if cache is full */ - i = least_recently_used; - cache->overflowCount++; - } - - zipfSetCacheCell(&cache->cells[i], n, s); - - cache->cells[i].last_used = cache->current++; - return &cache->cells[i]; -} - /* * Computing zipfian using rejection method, based on * "Non-Uniform Random Variate Generation", * Luc Devroye, p. 550-551, Springer 1986. + * + * This works for s > 1.0 */ static int64 computeIterativeZipfian(RandomState *random_state, int64 n, double s) @@ -1079,39 +983,16 @@ computeIterativeZipfian(RandomState *random_state, int64 n, double s) return (int64) x; } -/* - * Computing zipfian using harmonic numbers, based on algorithm described in - * "Quickly Generating Billion-Record Synthetic Databases", - * Jim Gray et al, SIGMOD 1994 - */ -static int64 -computeHarmonicZipfian(ZipfCache *zipf_cache, RandomState *random_state, - int64 n, double s) -{ - ZipfCell *cell = zipfFindOrCreateCacheCell(zipf_cache, n, s); - double uniform = pg_erand48(random_state->xseed); - double uz = uniform * cell->harmonicn; - - if (uz < 1.0) - return 1; - if (uz < 1.0 + cell->beta) - return 2; - return 1 + (int64) (cell->n * pow(cell->eta * uniform - cell->eta + 1.0, cell->alpha)); -} - /* random number generator: zipfian distribution from min to max inclusive */ static int64 -getZipfianRand(ZipfCache *zipf_cache, RandomState *random_state, int64 min, - int64 max, double s) +getZipfianRand(RandomState *random_state, int64 min, int64 max, double s) { int64 n = max - min + 1; /* abort if parameter is invalid */ - Assert(s > 0.0 && s != 1.0 && s <= MAX_ZIPFIAN_PARAM); + Assert(1.0 <= s && s <= MAX_ZIPFIAN_PARAM); - return min - 1 + ((s > 1) - ? computeIterativeZipfian(random_state, n, s) - : computeHarmonicZipfian(zipf_cache, random_state, n, s)); + return min - 1 + computeIterativeZipfian(random_state, n, s); } /* @@ -2433,17 +2314,15 @@ evalStandardFunc(TState *thread, CState *st, } else if (func == PGBENCH_RANDOM_ZIPFIAN) { - if (param <= 0.0 || param == 1.0 || param > MAX_ZIPFIAN_PARAM) + if (param <= 1.0 || param > MAX_ZIPFIAN_PARAM) { fprintf(stderr, - "zipfian parameter must be in range (0, 1) U (1, %d]" + "zipfian parameter must be in range (1, %d]" " (got %f)\n", MAX_ZIPFIAN_PARAM, param); return false; } setIntValue(retval, - getZipfianRand(&thread->zipf_cache, - &st->cs_func_rs, - imin, imax, param)); + getZipfianRand(&st->cs_func_rs, imin, imax, param)); } else /* exponential */ { @@ -5012,8 +4891,6 @@ printResults(TState *threads, StatsData *total, instr_time total_time, tps_include, tps_exclude; int64 ntx = total->cnt - total->skipped; - int i, - totalCacheOverflows = 0; time_include = INSTR_TIME_GET_DOUBLE(total_time); @@ -5041,15 +4918,6 @@ printResults(TState *threads, StatsData *total, instr_time total_time, printf("number of transactions actually processed: " INT64_FORMAT "\n", ntx); } - /* Report zipfian cache overflow */ - for (i = 0; i < nthreads; i++) - { - totalCacheOverflows += threads[i].zipf_cache.overflowCount; - } - if (totalCacheOverflows > 0) - { - printf("zipfian cache array overflowed %d time(s)\n", totalCacheOverflows); - } /* Remaining stats are nonsensical if we failed to execute any xacts */ if (total->cnt <= 0) @@ -5937,9 +5805,6 @@ main(int argc, char **argv) initRandomState(&thread->ts_sample_rs); thread->logfile = NULL; /* filled in later */ thread->latency_late = 0; - thread->zipf_cache.nb_cells = 0; - thread->zipf_cache.current = 0; - thread->zipf_cache.overflowCount = 0; initStats(&thread->stats, 0); nclients_dealt += thread->nstate; diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl index 45888dc12e..5283a04302 100644 --- a/src/bin/pgbench/t/001_pgbench_with_server.pl +++ b/src/bin/pgbench/t/001_pgbench_with_server.pl @@ -676,13 +676,13 @@ SELECT LEAST(}.join(', ', (':i') x 256).q{)} [ 'set zipfian param to 1', 2, - [qr{zipfian parameter must be in range \(0, 1\) U \(1, \d+\]}], + [qr{zipfian parameter must be in range \(1, \d+\]}], q{\set i random_zipfian(0, 10, 1)} ], [ 'set zipfian param too large', 2, - [qr{zipfian parameter must be in range \(0, 1\) U \(1, \d+\]}], + [qr{zipfian parameter must be in range \(1, \d+\]}], q{\set i random_zipfian(0, 10, 1000000)} ], [ @@ -815,33 +815,6 @@ for my $e (@errors) { $n => $script }); } -# zipfian cache array overflow -pgbench( - '-t 1', 0, - [ qr{processed: 1/1}, qr{zipfian cache array overflowed 1 time\(s\)} ], - [qr{^}], - 'pgbench zipfian array overflow on random_zipfian', - { - '001_pgbench_random_zipfian' => q{ -\set i random_zipfian(1, 100, 0.5) -\set i random_zipfian(2, 100, 0.5) -\set i random_zipfian(3, 100, 0.5) -\set i random_zipfian(4, 100, 0.5) -\set i random_zipfian(5, 100, 0.5) -\set i random_zipfian(6, 100, 0.5) -\set i random_zipfian(7, 100, 0.5) -\set i random_zipfian(8, 100, 0.5) -\set i random_zipfian(9, 100, 0.5) -\set i random_zipfian(10, 100, 0.5) -\set i random_zipfian(11, 100, 0.5) -\set i random_zipfian(12, 100, 0.5) -\set i random_zipfian(13, 100, 0.5) -\set i random_zipfian(14, 100, 0.5) -\set i random_zipfian(15, 100, 0.5) -\set i random_zipfian(16, 100, 0.5) -} - }); - # throttling pgbench( '-t 100 -S --rate=100000 --latency-limit=1000000 -c 2 -n -r',
zipf_perf.sql
Description: application/sql