On 01/02/18 05:57, Fabien COELHO wrote: >> Here is a new version which output use used seed when a seed is >> explicitely set with an option or from the environment.
This is a simple patch that does what it says on the tin. I ran into trouble with the pgbench TAP test *even before applying the patch*, but only because I was doing a VPATH build as a user without 'write' on the source tree (001_pgbench_with_server.pl tried to make pgbench create log files there). Bad me. Oddly, that was the only test in the whole tree to have such an issue, so here I add a pre-patch to fix that. Now my review needs a review. :) With that taken care of, the added tests all pass for me. I wonder, though: > The tests assume that stdlib random/srandom behavior is standard thus > deterministic between platform. Is the behavior of srandom() and the system generator really so precisely specified that seed 5432 will produce the same values hardcoded in the tests on all platforms? It does on mine, but could it produce spurious test failures for others? I guess the (or a) spec would be here: http://pubs.opengroup.org/onlinepubs/7908799/xsh/initstate.html It specifies a "non-linear additive feedback random-number generator employing a default state array size of 31 long integers", but it does not pin down parameters or claim only one candidate exists. To have the test run pgbench twice with the same seed and compare the results sounds safer. This revised pgbench-seed-4.patch changes the code not at all, and the TAP test only in whitespace. I did some wordsmithing of the doc, which I hope was not presumptuous of me, only as a conversation starter. I expanded the second sentence because on my first reading I wasn't quite sure of its meaning. Once I had looked at the code, I could see that the sentence was economical and precise already, but I tried to find a version that would also have been clear to the me-before-I-looked. The documentation doesn't say that --random-seed= (or PGBENCH_RANDOM_SEED=) will have the same effect as 'time', and indeed, I really think it should be unset (defaulting to 'time'), or 'time', or 'rand', or an integer, or an error. The error, right now, says only "expecting an unsigned integer"; it should also mention time and rand. Should 'rand' be something that conveys more about its meaning, 'strong' perhaps? The documentation doesn't mention the allowed range of the unsigned integer (which I get, because 'unsigned int' is exactly the signature for srandom, but somebody might read "unsigned integer" in a more generic sense). I'm not sure what would be a better way to say it. The base (only decimal, as now in the code) isn't specified either. Maybe the documentation should mention that the output now includes the random seed being used, so that (even without planning ahead) a session can be re-run with the same seed if necessary. It could just say "an unsigned integer in decimal, as it is shown in pgbench's output" or something like that. Something more may need to be said in the doc about reproducibility. I think the real story is that a run can be reproduced if the number of clients is equal to the number of threads. Although each thread has its own generator state, each client does not (if there is more than one per thread), and as soon as real select() delays start happening in CSTATE_WAIT_RESULT, the clients dealt out to a given thread may not be hitting that thread's generator in a deterministic order. -Chap
>From 4fe8f8563fedcb04b167db476adaf3bc398f879d Mon Sep 17 00:00:00 2001 From: Chapman Flack <c...@anastigmatix.net> Date: Mon, 8 Jan 2018 23:15:01 -0500 Subject: [PATCH 1/2] Survive VPATH build without w on source tree. A couple tests pass log options to pgbench, with a bare filename for --log-prefix, so it would be created in the current directory. In a VPATH build, inconveniently, the Makefile chdirs into the original source tree before running this, and if the build user has no write access there, the tests fail. Chdir back to a nice writable $TESTDIR/tmp_check to avoid that. --- src/bin/pgbench/t/001_pgbench_with_server.pl | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl index 3dd080e..8305db3 100644 --- a/src/bin/pgbench/t/001_pgbench_with_server.pl +++ b/src/bin/pgbench/t/001_pgbench_with_server.pl @@ -5,6 +5,11 @@ use PostgresNode; use TestLib; use Test::More; +if ( exists $ENV{'TESTDIR'} ) { + chdir $ENV{'TESTDIR'}; + -d 'tmp_check' && -w 'tmp_check' && chdir 'tmp_check'; +} + # start a pgbench specific server my $node = get_new_node('main'); $node->init; -- 2.7.3
>From 33d5ec2b863fe3e8198d1c89de91bc36b52db5cc Mon Sep 17 00:00:00 2001 From: Chapman Flack <c...@anastigmatix.net> Date: Tue, 9 Jan 2018 01:29:43 -0500 Subject: [PATCH 2/2] New pgbench --random-seed option for reproducibility. --- doc/src/sgml/ref/pgbench.sgml | 33 +++++++++++++++++ src/bin/pgbench/pgbench.c | 53 +++++++++++++++++++++++++--- src/bin/pgbench/t/001_pgbench_with_server.pl | 11 ++++-- 3 files changed, 90 insertions(+), 7 deletions(-) diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index 1519fe7..13b1a0a 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -680,6 +680,39 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d </varlistentry> <varlistentry> + <term><option>--random-seed=</option><replaceable>SEED</replaceable></term> + <listitem> + <para> + Set random generator seed. Seeds the system random number generator, + which then produces a sequence of initial generator states, one for + each thread. + Values for <replaceable>SEED</replaceable> may be: + <literal>time</literal> (the default, the seed is based on the current time), + <literal>rand</literal> (use a strong random source, failing if none + is available), or any unsigned integer value. + The random generator is invoked explicitly from a pgbench script + (<literal>random...</literal> functions) or implicitly (for instance option + <option>--rate</option> uses it to schedule transactions). + Any value allowed for <replaceable>SEED</replaceable> may also be + provided through the environment variable + <literal>PGBENCH_RANDOM_SEED</literal>. + To ensure that the provided seed impacts all possible uses, put this option + first or use the environment variable. + </para> + <para> + Setting the seed explicitly allows to reproduce a <command>pgbench</command> + run exactly, as far as random numbers are concerned. + From a statistical viewpoint this is a bad idea because it can hide the + performance variability or improve performance unduly, e.g. by hitting + the same pages as a previous run. + However, it may also be of great help for debugging, for instance + re-running a tricky case which leads to an error. + Use wisely. + </para> + </listitem> + </varlistentry> + + <varlistentry> <term><option>--sampling-rate=<replaceable>rate</replaceable></option></term> <listitem> <para> diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index fc2c734..e1da3fd 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -557,6 +557,7 @@ usage(void) " --log-prefix=PREFIX prefix for transaction time log file\n" " (default: \"pgbench_log\")\n" " --progress-timestamp use Unix epoch timestamps for progress\n" + " --random-seed=SEED set random seed (\"time\", \"rand\", integer)\n" " --sampling-rate=NUM fraction of transactions to log (e.g., 0.01 for 1%%)\n" "\nCommon options:\n" " -d, --debug print debugging output\n" @@ -4010,6 +4011,46 @@ printResults(TState *threads, StatsData *total, instr_time total_time, } } +/* call srandom based on some seed. NULL triggers the default behavior. */ +static void +set_random_seed(const char *seed) +{ + unsigned int iseed; + + if (seed == NULL || *seed == '\0' || strcmp(seed, "time") == 0) + { + /* rely on current time */ + instr_time now; + INSTR_TIME_SET_CURRENT(now); + iseed = (unsigned int) INSTR_TIME_GET_MICROSEC(now); + } + else if (strcmp(seed, "rand") == 0) + { + /* use some "strong" random source */ + if (!pg_strong_random(&iseed, sizeof(iseed))) + { + fprintf(stderr, "cannot seed random from a strong source\n"); + exit(1); + } + } + else + { + /* parse seed value coming either from option or environment */ + char garbage; + if (sscanf(seed, "%u%c", &iseed, &garbage) != 1) + { + fprintf(stderr, + "error while scanning '%s', expecting an unsigned integer\n", + seed); + exit(1); + } + } + + if (seed != NULL) + fprintf(stderr, "setting random seed to %u\n", iseed); + srandom(iseed); +} + int main(int argc, char **argv) @@ -4052,6 +4093,7 @@ main(int argc, char **argv) {"progress-timestamp", no_argument, NULL, 6}, {"log-prefix", required_argument, NULL, 7}, {"foreign-keys", no_argument, NULL, 8}, + {"random-seed", required_argument, NULL, 9}, {NULL, 0, NULL, 0} }; @@ -4120,6 +4162,9 @@ main(int argc, char **argv) state = (CState *) pg_malloc(sizeof(CState)); memset(state, 0, sizeof(CState)); + /* set random seed early, because it may be used while parsing scripts. */ + set_random_seed(getenv("PGBENCH_RANDOM_SEED")); + while ((c = getopt_long(argc, argv, "iI:h:nvp:dqb:SNc:j:Crs:t:T:U:lf:D:F:M:P:R:L:", long_options, &optindex)) != -1) { char *script; @@ -4392,6 +4437,10 @@ main(int argc, char **argv) initialization_option_set = true; foreign_keys = true; break; + case 9: /* random-seed */ + benchmarking_option_set = true; + set_random_seed(optarg); + break; default: fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname); exit(1); @@ -4698,10 +4747,6 @@ main(int argc, char **argv) } PQfinish(con); - /* set random seed */ - INSTR_TIME_SET_CURRENT(start_time); - srandom((unsigned int) INSTR_TIME_GET_MICROSEC(start_time)); - /* set up thread data structures */ threads = (TState *) pg_malloc(sizeof(TState) * nthreads); nclients_dealt = 0; diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl index 8305db3..9497818 100644 --- a/src/bin/pgbench/t/001_pgbench_with_server.pl +++ b/src/bin/pgbench/t/001_pgbench_with_server.pl @@ -215,11 +215,16 @@ COMMIT; } }); # test expressions +# command 1..3 and 23 depend on random seed which is used to call srandom. pgbench( - '-t 1 -Dfoo=-10.1 -Dbla=false -Di=+3 -Dminint=-9223372036854775808', + '--random-seed=5432 -t 1 -Dfoo=-10.1 -Dbla=false -Di=+3 -Dminint=-9223372036854775808', 0, [ qr{type: .*/001_pgbench_expressions}, qr{processed: 1/1} ], - [ qr{command=4.: int 4\b}, + [ qr{setting random seed to 5432\b}, + qr{command=1.: int 28\b}, # uniform random + qr{command=2.: int 7\b}, # exponential random + qr{command=3.: int 47\b}, # gaussian random + qr{command=4.: int 4\b}, qr{command=5.: int 5\b}, qr{command=6.: int 6\b}, qr{command=7.: int 7\b}, @@ -237,7 +242,7 @@ pgbench( qr{command=19.: double 19\b}, qr{command=20.: double 20\b}, qr{command=21.: int 9223372036854775807\b}, - qr{command=23.: int [1-9]\b}, + qr{command=23.: int 1\b}, # zipfian random qr{command=24.: double -27\b}, qr{command=25.: double 1024\b}, qr{command=26.: double 1\b}, -- 2.7.3