Hello Fabien,
17/01/2018 10:52, Fabien COELHO пишет: >> Here is a new version of patch. I've splitted it into two parts. The >> first one is almost the same as v4 from [1] with some refactoring. >> The second part introduces random_seed variable as you proposed. > > Patch 1 applies. Compilations fails, there are two "hash_seed" > declared in "pgbench.c". > > Patch 2 applies cleanly on top of the previous patch and compiles, > because the variable is removed... > > If an ":hash_seed" pgbench variable is used, ISTM that there is no > need for a global variable at all, so the two patches are going back > and forth, which is unhelpful. ISTM better to provide just one > combined patch for the feature. > > If the hash_seed variable really needs to be kept, it should be an > "int64" variable, like other pgbench values. > > The len < 1 || len > 2 is checked twice, once in the "switch", on in > an "if" just after the "switch". Once is enough. I totally messed up doing git rebase and didn't double check the code. *facepalm* There shouldn't be hash_seed variable and the second 'len < 1 || len > 2' check. Sorry for that, fixed in the attached patch. > Calling random just usually initializes about 31 bits, so random > should be called 2 or maybe 3 times? Or maybe use the internal getrand > which has 48 pseudorandom bits? Done. I used code from get_random_uint64() as an example. > > For me "random seed" is what is passed to srandom, so the variable > should rather be named hash_seed because there could also be a random > seed (actually, there is in another parallel patch:-). Moreover, this > seed may or may not be random if set, so calling it "random_seed" is > not desirable. > My intention was to introduce seed variable which potentially could be used in different contexts, not only for hash functions. I renamed it to 'hash_seed' for now. But what do you think about naming it simply 'seed' or choosing some other general name? >> I didn't do the executor simplification thing yet because I'm a >> little concerned about inventive users, who may want to change >> random_seed variable in runtime (which is possible since pgbench >> doesn't have read only variables aka constants AFAIK). > > If the user choses to overide hash_seed in their script, it is their > decision, the documentation has only to be clear about :hash_seed > being the default seed. I see no clear reason to work around this > possibility by evaluating the seed at parse time, especially as the > variable may not have its final value yet depending on the option > order. I'd suggest to just use make_variable("hash_seed") for the > default second argument and simplify the executor. That is a great idea, I didn't see that possibility. Done. > > The seed variable is not tested explicitely in the script, you could add > a "hash(5432) == hash(5432, :hash_seed)" for instance. > > It would be nice if an native English speaker could proofread the > documentation text. I'd suggest: "*an* optional seed parameter". "In > case *the* seed...". "<literal>:hash_seed</literal>". "shared for" -> > "shared by". "following listing" -> "following pgbench script". "few > accounts generates" -> "few accounts generate". > Done as well. > For the document example, I'd use larger values for the random & > modulo, eg 100000000 and 1000000. The drawback is that zipfian does a > costly computation of the generalized harmonic number when the > parameter is lower than 1.0. For cities, the parameter found by Zipf > was 1.07 (says Wikipedia). Maybe use this historical value. Or maybe > use an exponential distribution in the example. > Changed parameter to 1.07. Thanks! -- Ildar Musin Postgres Professional: http://www.postgrespro.com Russian Postgres Company
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index 3dd492c..c31254c 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -882,6 +882,11 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d <entry> <literal>client_id</literal> </entry> <entry>unique number identifying the client session (starts from zero)</entry> </row> + + <row> + <entry> <literal>hash_seed</literal> </entry> + <entry>random seed used in hash functions by default</entry> + </row> </tbody> </tgroup> </table> @@ -1246,6 +1251,27 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d <entry><literal>5</literal></entry> </row> <row> + <entry><literal><function>hash(<replaceable>a</replaceable> [, <replaceable>seed</replaceable> ] )</function></literal></entry> + <entry>integer</entry> + <entry>alias for <literal>hash_murmur2()</literal></entry> + <entry><literal>hash(10, 5432)</literal></entry> + <entry><literal>-5817877081768721676</literal></entry> + </row> + <row> + <entry><literal><function>hash_fnv1a(<replaceable>a</replaceable> [, <replaceable>seed</replaceable> ] )</function></literal></entry> + <entry>integer</entry> + <entry><literal>FNV</literal> hash</entry> + <entry><literal>hash_fnv1a(10, 5432)</literal></entry> + <entry><literal>-7793829335365542153</literal></entry> + </row> + <row> + <entry><literal><function>hash_murmur2(<replaceable>a</replaceable> [, <replaceable>seed</replaceable> ] )</function></literal></entry> + <entry>integer</entry> + <entry><literal>murmur2</literal> hash</entry> + <entry><literal>hash_murmur2(10, 5432)</literal></entry> + <entry><literal>-5817877081768721676</literal></entry> + </row> + <row> <entry><literal><function>int(<replaceable>x</replaceable>)</function></literal></entry> <entry>integer</entry> <entry>cast to int</entry> @@ -1423,6 +1449,22 @@ f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) / </itemizedlist> <para> + Hash functions accepts an input value and an optional seed parameter. In case the seed isn't provided <literal>hash_seed</literal> value is used. Hash functions can be used, for example, to modify the distribution of <literal>random_zipfian</literal> or <literal>random_exponential</literal> functions in order to obtain scattered distribution. Thus the following pgbench script simulates possible real world workload typical for social media and blogging platforms where few accounts generate excessive load: + +<programlisting> +\set r random_zipfian(0, 100000000, 1.07) +\set k abs(hash(:r)) % 1000000 +</programlisting> + + In some cases several distinct distributions are needed which don't correlate with each other and this is when implicit seed parameter comes in handy: + +<programlisting> +\set k1 abs(hash(:r), :hash_seed + 123) % 1000000 +\set k2 abs(hash(:r), :hash_seed + 321) % 1000000 +</programlisting> + </para> + + <para> As an example, the full definition of the built-in TPC-B-like transaction is: diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y index e23ca51..b5e150f 100644 --- a/src/bin/pgbench/exprparse.y +++ b/src/bin/pgbench/exprparse.y @@ -16,6 +16,10 @@ #include "pgbench.h" +#define PGBENCH_NARGS_VARIABLE (-1) +#define PGBENCH_NARGS_CASE (-2) +#define PGBENCH_NARGS_HASH (-3) + PgBenchExpr *expr_parse_result; static PgBenchExprList *make_elist(PgBenchExpr *exp, PgBenchExprList *list); @@ -226,9 +230,13 @@ make_uop(yyscan_t yyscanner, const char *operator, PgBenchExpr *expr) /* * List of available functions: * - fname: function name, "!..." for special internal functions - * - nargs: number of arguments - * -1 is a special value for least & greatest meaning #args >= 1 - * -2 is for the "CASE WHEN ..." function, which has #args >= 3 and odd + * - nargs: number of arguments. Special cases: + * - PGBENCH_NARGS_VARIABLE is a special value for least & greatest + * meaning #args >= 1; + * - PGBENCH_NARGS_CASE is for the "CASE WHEN ..." function, which + * has #args >= 3 and odd; + * - PGBENCH_NARGS_HASH is for hash functions, which have one required + * and one optional argument; * - tag: function identifier from PgBenchFunction enum */ static const struct @@ -259,10 +267,10 @@ static const struct "abs", 1, PGBENCH_ABS }, { - "least", -1, PGBENCH_LEAST + "least", PGBENCH_NARGS_VARIABLE, PGBENCH_LEAST }, { - "greatest", -1, PGBENCH_GREATEST + "greatest", PGBENCH_NARGS_VARIABLE, PGBENCH_GREATEST }, { "debug", 1, PGBENCH_DEBUG @@ -347,7 +355,25 @@ static const struct }, /* "case when ... then ... else ... end" construction */ { - "!case_end", -2, PGBENCH_CASE + "!case_end", PGBENCH_NARGS_CASE, PGBENCH_CASE + }, + { + "hash", PGBENCH_NARGS_HASH, PGBENCH_HASH_MURMUR2 + }, + { + "hash_murmur2", PGBENCH_NARGS_HASH, PGBENCH_HASH_MURMUR2 + }, + { + "hash_fnv1a", PGBENCH_NARGS_HASH, PGBENCH_HASH_FNV1A + }, + { + "hash", -3, PGBENCH_HASH_MURMUR2 + }, + { + "hash_murmur2", -3, PGBENCH_HASH_MURMUR2 + }, + { + "hash_fnv1a", -3, PGBENCH_HASH_FNV1A }, /* keep as last array element */ { @@ -423,29 +449,49 @@ elist_length(PgBenchExprList *list) static PgBenchExpr * make_func(yyscan_t yyscanner, int fnumber, PgBenchExprList *args) { + int len = elist_length(args); + PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr)); Assert(fnumber >= 0); - if (PGBENCH_FUNCTIONS[fnumber].nargs >= 0 && - PGBENCH_FUNCTIONS[fnumber].nargs != elist_length(args)) - expr_yyerror_more(yyscanner, "unexpected number of arguments", - PGBENCH_FUNCTIONS[fnumber].fname); - - /* check at least one arg for least & greatest */ - if (PGBENCH_FUNCTIONS[fnumber].nargs == -1 && - elist_length(args) == 0) - expr_yyerror_more(yyscanner, "at least one argument expected", - PGBENCH_FUNCTIONS[fnumber].fname); - /* special case: case (when ... then ...)+ (else ...)? end */ - if (PGBENCH_FUNCTIONS[fnumber].nargs == -2) - { - int len = elist_length(args); - - /* 'else' branch is always present, but could be a NULL-constant */ - if (len < 3 || len % 2 != 1) - expr_yyerror_more(yyscanner, "odd and >= 3 number of arguments expected", - "case control structure"); + /* validate arguments number including few special cases */ + switch (PGBENCH_FUNCTIONS[fnumber].nargs) + { + /* check at least one arg for least & greatest */ + case PGBENCH_NARGS_VARIABLE: + if (len == 0) + expr_yyerror_more(yyscanner, "at least one argument expected", + PGBENCH_FUNCTIONS[fnumber].fname); + break; + + /* case (when ... then ...)+ (else ...)? end */ + case PGBENCH_NARGS_CASE: + /* 'else' branch is always present, but could be a NULL-constant */ + if (len < 3 || len % 2 != 1) + expr_yyerror_more(yyscanner, + "odd and >= 3 number of arguments expected", + "case control structure"); + break; + + /* hash functions with optional seed argument */ + case PGBENCH_NARGS_HASH: + if (len > 2) + expr_yyerror_more(yyscanner, "unexpected number of arguments", + PGBENCH_FUNCTIONS[fnumber].fname); + + if (len == 1) + { + PgBenchExpr *var = make_variable("hash_seed"); + args = make_elist(var, args); + } + break; + + /* common case: positive arguments number */ + default: + if (PGBENCH_FUNCTIONS[fnumber].nargs != len) + expr_yyerror_more(yyscanner, "unexpected number of arguments", + PGBENCH_FUNCTIONS[fnumber].fname); } expr->etype = ENODE_FUNCTION; diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index 31ea6ca..8e6e0ad 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -61,6 +61,14 @@ #define ERRCODE_UNDEFINED_TABLE "42P01" /* + * Hashing constants + */ +#define FNV_PRIME 0x100000001b3 +#define FNV_OFFSET_BASIS 0xcbf29ce484222325 +#define MM2_MUL 0xc6a4a7935bd1e995 +#define MM2_ROT 47 + +/* * Multi-platform pthread implementations */ @@ -915,6 +923,51 @@ getZipfianRand(TState *thread, int64 min, int64 max, double s) } /* + * FNV-1a hash function + */ +static int64 +getHashFnv1a(int64 val, uint64 seed) +{ + int64 result; + int i; + + result = FNV_OFFSET_BASIS ^ seed; + for (i = 0; i < 8; ++i) + { + int32 octet = val & 0xff; + + val = val >> 8; + result = result ^ octet; + result = result * FNV_PRIME; + } + + return result; +} + +/* + * Murmur2 hash function + */ +static int64 +getHashMurmur2(int64 val, uint64 seed) +{ + uint64 result = seed ^ (sizeof(int64) * MM2_MUL); + uint64 k = (uint64) val; + + k *= MM2_MUL; + k ^= k >> MM2_ROT; + k *= MM2_MUL; + + result ^= k; + result *= MM2_MUL; + + result ^= result >> MM2_ROT; + result *= MM2_MUL; + result ^= result >> MM2_ROT; + + return (int64) result; +} + +/* * Initialize the given SimpleStats struct to all zeroes */ static void @@ -2209,6 +2262,28 @@ evalStandardFunc( return true; } + /* hashing */ + case PGBENCH_HASH_FNV1A: + case PGBENCH_HASH_MURMUR2: + { + int64 val; + int64 seed; + int64 result; + + Assert(nargs == 2); + + if (!coerceToInt(&vargs[0], &val)) + return false; + + if (!coerceToInt(&vargs[1], &seed)) + return false; + + result = (func == PGBENCH_HASH_FNV1A) ? + getHashFnv1a(val, seed) : getHashMurmur2(val, seed); + setIntValue(retval, result); + return true; + } + default: /* cannot get here */ Assert(0); @@ -4972,6 +5047,10 @@ main(int argc, char **argv) exit(1); } + /* set random seed */ + INSTR_TIME_SET_CURRENT(start_time); + srandom((unsigned int) INSTR_TIME_GET_MICROSEC(start_time)); + if (internal_script_used) { /* @@ -5033,6 +5112,21 @@ main(int argc, char **argv) } } + /* set default seed for hash functions */ + if (lookupVariable(&state[0], "hash_seed") == NULL) + { + uint64 seed; + + seed = (uint64) (random() & 0xFFFF) << 48; + seed |= (uint64) (random() & 0xFFFF) << 32; + seed |= (uint64) (random() & 0xFFFF) << 16; + seed |= (uint64) (random() & 0xFFFF); + + for (i = 0; i < nclients; i++) + if (!putVariableInt(&state[i], "startup", "hash_seed", (int64) seed)) + exit(1); + } + if (!is_no_vacuum) { fprintf(stderr, "starting vacuum..."); @@ -5050,10 +5144,6 @@ main(int argc, char **argv) } PQfinish(con); - /* set random seed */ - INSTR_TIME_SET_CURRENT(start_time); - srandom((unsigned int) INSTR_TIME_GET_MICROSEC(start_time)); - /* set up thread data structures */ threads = (TState *) pg_malloc(sizeof(TState) * nthreads); nclients_dealt = 0; diff --git a/src/bin/pgbench/pgbench.h b/src/bin/pgbench/pgbench.h index 0705ccd..6983865 100644 --- a/src/bin/pgbench/pgbench.h +++ b/src/bin/pgbench/pgbench.h @@ -97,7 +97,9 @@ typedef enum PgBenchFunction PGBENCH_LE, PGBENCH_LT, PGBENCH_IS, - PGBENCH_CASE + PGBENCH_CASE, + PGBENCH_HASH_FNV1A, + PGBENCH_HASH_MURMUR2 } PgBenchFunction; typedef struct PgBenchExpr PgBenchExpr; diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl index a8b2962..752c299 100644 --- a/src/bin/pgbench/t/001_pgbench_with_server.pl +++ b/src/bin/pgbench/t/001_pgbench_with_server.pl @@ -259,6 +259,11 @@ pgbench( qr{command=46.: int 46\b}, qr{command=47.: boolean true\b}, qr{command=48.: boolean true\b}, + qr{command=49.: int -5817877081768721676\b}, + qr{command=50.: boolean true\b}, + qr{command=51.: int -7793829335365542153\b}, + qr{command=52.: int -?\d+\b}, + qr{command=53.: boolean true\b}, ], 'pgbench expressions', { '001_pgbench_expressions' => q{-- integer functions @@ -327,6 +332,12 @@ pgbench( \set n6 debug(:n IS NULL AND NOT :f AND :t) -- conditional truth \set cs debug(CASE WHEN 1 THEN TRUE END AND CASE WHEN 1.0 THEN TRUE END AND CASE WHEN :n THEN NULL ELSE TRUE END) +-- hash functions +\set h0 debug(hash(10, 5432)) +\set h1 debug(:h0 = hash_murmur2(10, 5432)) +\set h3 debug(hash_fnv1a(10, 5432)) +\set h4 debug(hash(10)) +\set h5 debug(hash(10) = hash(10, :hash_seed)) -- lazy evaluation \set zy 0 \set yz debug(case when :zy = 0 then -1 else (1 / :zy) end)