Hello Fabien,

17/01/2018 10:52, Fabien COELHO пишет:

>> Here is a new version of patch. I've splitted it into two parts. The
>> first one is almost the same as v4 from [1] with some refactoring.
>> The second part introduces random_seed variable as you proposed.
>
> Patch 1 applies. Compilations fails, there are two "hash_seed"
> declared in "pgbench.c".
>
> Patch 2 applies cleanly on top of the previous patch and compiles,
> because the variable is removed...
>
> If an ":hash_seed" pgbench variable is used, ISTM that there is no
> need for a global variable at all, so the two patches are going back
> and forth, which is unhelpful. ISTM better to provide just one
> combined patch for the feature.
>
> If the hash_seed variable really needs to be kept, it should be an
> "int64" variable, like other pgbench values.
>
> The len < 1 || len > 2 is checked twice, once in the "switch", on in
> an "if" just after the "switch". Once is enough.

I totally messed up doing git rebase and didn't double check the code.
*facepalm* There shouldn't be hash_seed variable and the second 'len < 1
|| len > 2' check. Sorry for that, fixed in the attached patch.

> Calling random just usually initializes about 31 bits, so random
> should be called 2 or maybe 3 times? Or maybe use the internal getrand
> which has 48 pseudorandom bits?
Done. I used code from get_random_uint64() as an example.
>
> For me "random seed" is what is passed to srandom, so the variable
> should rather be named hash_seed because there could also be a random
> seed (actually, there is in another parallel patch:-). Moreover, this
> seed may or may not be random if set, so calling it "random_seed" is
> not desirable.
>
My intention was to introduce seed variable which potentially could be
used in different contexts, not only for hash functions. I renamed it to
'hash_seed' for now. But what do you think about naming it simply 'seed'
or choosing some other general name?
>> I didn't do the executor simplification thing yet because I'm a
>> little concerned about inventive users, who may want to change
>> random_seed variable in runtime (which is possible since pgbench
>> doesn't have read only variables aka constants AFAIK).
>
> If the user choses to overide hash_seed in their script, it is their
> decision, the documentation has only to be clear about :hash_seed
> being the default seed. I see no clear reason to work around this
> possibility by evaluating the seed at parse time, especially as the
> variable may not have its final value yet depending on the option
> order. I'd suggest to just use make_variable("hash_seed") for the
> default second argument and simplify the executor.
That is a great idea, I didn't see that possibility. Done.
>
> The seed variable is not tested explicitely in the script, you could add
> a "hash(5432) == hash(5432, :hash_seed)" for instance.
>
> It would be nice if an native English speaker could proofread the
> documentation text. I'd suggest: "*an* optional seed parameter". "In
> case *the* seed...". "<literal>:hash_seed</literal>". "shared for" ->
> "shared by". "following listing" -> "following pgbench script". "few
> accounts generates" -> "few accounts generate".
>
Done as well.
> For the document example, I'd use larger values for the random &
> modulo, eg 100000000 and 1000000. The drawback is that zipfian does a
> costly computation of the generalized harmonic number when the
> parameter is lower than 1.0. For cities, the parameter found by Zipf
> was 1.07 (says Wikipedia). Maybe use this historical value. Or maybe
> use an exponential distribution in the example.
>
Changed parameter to 1.07.

Thanks!

-- 
Ildar Musin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company 

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 3dd492c..c31254c 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -882,6 +882,11 @@ pgbench <optional> <replaceable>options</replaceable> 
</optional> <replaceable>d
        <entry> <literal>client_id</literal> </entry>
        <entry>unique number identifying the client session (starts from 
zero)</entry>
       </row>
+
+      <row>
+       <entry> <literal>hash_seed</literal> </entry>
+       <entry>random seed used in hash functions by default</entry>
+      </row>
      </tbody>
     </tgroup>
    </table>
@@ -1246,6 +1251,27 @@ pgbench <optional> <replaceable>options</replaceable> 
</optional> <replaceable>d
        <entry><literal>5</literal></entry>
       </row>
       <row>
+       <entry><literal><function>hash(<replaceable>a</replaceable> [, 
<replaceable>seed</replaceable> ] )</function></literal></entry>
+       <entry>integer</entry>
+       <entry>alias for <literal>hash_murmur2()</literal></entry>
+       <entry><literal>hash(10, 5432)</literal></entry>
+       <entry><literal>-5817877081768721676</literal></entry>
+      </row>
+      <row>
+       <entry><literal><function>hash_fnv1a(<replaceable>a</replaceable> [, 
<replaceable>seed</replaceable> ] )</function></literal></entry>
+       <entry>integer</entry>
+       <entry><literal>FNV</literal> hash</entry>
+       <entry><literal>hash_fnv1a(10, 5432)</literal></entry>
+       <entry><literal>-7793829335365542153</literal></entry>
+      </row>
+      <row>
+       <entry><literal><function>hash_murmur2(<replaceable>a</replaceable> [, 
<replaceable>seed</replaceable> ] )</function></literal></entry>
+       <entry>integer</entry>
+       <entry><literal>murmur2</literal> hash</entry>
+       <entry><literal>hash_murmur2(10, 5432)</literal></entry>
+       <entry><literal>-5817877081768721676</literal></entry>
+      </row>
+      <row>
        
<entry><literal><function>int(<replaceable>x</replaceable>)</function></literal></entry>
        <entry>integer</entry>
        <entry>cast to int</entry>
@@ -1423,6 +1449,22 @@ f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) 
/
    </itemizedlist>
 
   <para>
+    Hash functions accepts an input value and an optional seed parameter. In 
case the seed isn't provided <literal>hash_seed</literal> value is used. Hash 
functions can be used, for example, to modify the distribution of 
<literal>random_zipfian</literal> or <literal>random_exponential</literal> 
functions in order to obtain scattered distribution. Thus the following pgbench 
script simulates possible real world workload typical for social media and 
blogging platforms where few accounts generate excessive load:
+
+<programlisting>
+\set r random_zipfian(0, 100000000, 1.07)
+\set k abs(hash(:r)) % 1000000
+</programlisting>
+
+    In some cases several distinct distributions are needed which don't 
correlate with each other and this is when implicit seed parameter comes in 
handy:
+
+<programlisting>
+\set k1 abs(hash(:r), :hash_seed + 123) % 1000000
+\set k2 abs(hash(:r), :hash_seed + 321) % 1000000
+</programlisting>
+  </para>
+
+  <para>
    As an example, the full definition of the built-in TPC-B-like
    transaction is:
 
diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y
index e23ca51..b5e150f 100644
--- a/src/bin/pgbench/exprparse.y
+++ b/src/bin/pgbench/exprparse.y
@@ -16,6 +16,10 @@
 
 #include "pgbench.h"
 
+#define PGBENCH_NARGS_VARIABLE (-1)
+#define PGBENCH_NARGS_CASE             (-2)
+#define PGBENCH_NARGS_HASH             (-3)
+
 PgBenchExpr *expr_parse_result;
 
 static PgBenchExprList *make_elist(PgBenchExpr *exp, PgBenchExprList *list);
@@ -226,9 +230,13 @@ make_uop(yyscan_t yyscanner, const char *operator, 
PgBenchExpr *expr)
 /*
  * List of available functions:
  * - fname: function name, "!..." for special internal functions
- * - nargs: number of arguments
- *                     -1 is a special value for least & greatest meaning 
#args >= 1
- *                     -2 is for the "CASE WHEN ..." function, which has #args 
>= 3 and odd
+ * - nargs: number of arguments. Special cases:
+ *                     - PGBENCH_NARGS_VARIABLE is a special value for least & 
greatest
+ *                       meaning #args >= 1;
+ *                     - PGBENCH_NARGS_CASE is for the "CASE WHEN ..." 
function, which
+ *                       has #args >= 3 and odd;
+ *                     - PGBENCH_NARGS_HASH is for hash functions, which have 
one required
+ *                       and one optional argument;
  * - tag: function identifier from PgBenchFunction enum
  */
 static const struct
@@ -259,10 +267,10 @@ static const struct
                "abs", 1, PGBENCH_ABS
        },
        {
-               "least", -1, PGBENCH_LEAST
+               "least", PGBENCH_NARGS_VARIABLE, PGBENCH_LEAST
        },
        {
-               "greatest", -1, PGBENCH_GREATEST
+               "greatest", PGBENCH_NARGS_VARIABLE, PGBENCH_GREATEST
        },
        {
                "debug", 1, PGBENCH_DEBUG
@@ -347,7 +355,25 @@ static const struct
        },
        /* "case when ... then ... else ... end" construction */
        {
-               "!case_end", -2, PGBENCH_CASE
+               "!case_end", PGBENCH_NARGS_CASE, PGBENCH_CASE
+       },
+       {
+               "hash", PGBENCH_NARGS_HASH, PGBENCH_HASH_MURMUR2
+       },
+       {
+               "hash_murmur2", PGBENCH_NARGS_HASH, PGBENCH_HASH_MURMUR2
+       },
+       {
+               "hash_fnv1a", PGBENCH_NARGS_HASH, PGBENCH_HASH_FNV1A
+       },
+       {
+               "hash", -3, PGBENCH_HASH_MURMUR2
+       },
+       {
+               "hash_murmur2", -3, PGBENCH_HASH_MURMUR2
+       },
+       {
+               "hash_fnv1a", -3, PGBENCH_HASH_FNV1A
        },
        /* keep as last array element */
        {
@@ -423,29 +449,49 @@ elist_length(PgBenchExprList *list)
 static PgBenchExpr *
 make_func(yyscan_t yyscanner, int fnumber, PgBenchExprList *args)
 {
+       int len = elist_length(args);
+
        PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr));
 
        Assert(fnumber >= 0);
 
-       if (PGBENCH_FUNCTIONS[fnumber].nargs >= 0 &&
-               PGBENCH_FUNCTIONS[fnumber].nargs != elist_length(args))
-               expr_yyerror_more(yyscanner, "unexpected number of arguments",
-                                                 
PGBENCH_FUNCTIONS[fnumber].fname);
-
-       /* check at least one arg for least & greatest */
-       if (PGBENCH_FUNCTIONS[fnumber].nargs == -1 &&
-               elist_length(args) == 0)
-               expr_yyerror_more(yyscanner, "at least one argument expected",
-                                                 
PGBENCH_FUNCTIONS[fnumber].fname);
-       /* special case: case (when ... then ...)+ (else ...)? end */
-       if (PGBENCH_FUNCTIONS[fnumber].nargs == -2)
-       {
-               int len = elist_length(args);
-
-               /* 'else' branch is always present, but could be a 
NULL-constant */
-               if (len < 3 || len % 2 != 1)
-                       expr_yyerror_more(yyscanner, "odd and >= 3 number of 
arguments expected",
-                                                         "case control 
structure");
+       /* validate arguments number including few special cases */
+       switch (PGBENCH_FUNCTIONS[fnumber].nargs)
+       {
+               /* check at least one arg for least & greatest */
+               case PGBENCH_NARGS_VARIABLE:
+                       if (len == 0)
+                               expr_yyerror_more(yyscanner, "at least one 
argument expected",
+                                                                 
PGBENCH_FUNCTIONS[fnumber].fname);
+                       break;
+
+               /* case (when ... then ...)+ (else ...)? end */
+               case PGBENCH_NARGS_CASE:
+                       /* 'else' branch is always present, but could be a 
NULL-constant */
+                       if (len < 3 || len % 2 != 1)
+                               expr_yyerror_more(yyscanner,
+                                                                 "odd and >= 3 
number of arguments expected",
+                                                                 "case control 
structure");
+                       break;
+
+               /* hash functions with optional seed argument */
+               case PGBENCH_NARGS_HASH:
+                       if (len > 2)
+                               expr_yyerror_more(yyscanner, "unexpected number 
of arguments",
+                                                                 
PGBENCH_FUNCTIONS[fnumber].fname);
+
+                       if (len == 1)
+                       {
+                               PgBenchExpr *var = make_variable("hash_seed");
+                               args = make_elist(var, args);
+                       }
+                       break;
+
+               /* common case: positive arguments number */
+               default:
+                       if (PGBENCH_FUNCTIONS[fnumber].nargs != len)
+                               expr_yyerror_more(yyscanner, "unexpected number 
of arguments",
+                                                                 
PGBENCH_FUNCTIONS[fnumber].fname);
        }
 
        expr->etype = ENODE_FUNCTION;
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 31ea6ca..8e6e0ad 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -61,6 +61,14 @@
 #define ERRCODE_UNDEFINED_TABLE  "42P01"
 
 /*
+ * Hashing constants
+ */
+#define FNV_PRIME 0x100000001b3
+#define FNV_OFFSET_BASIS 0xcbf29ce484222325
+#define MM2_MUL 0xc6a4a7935bd1e995
+#define MM2_ROT 47
+
+/*
  * Multi-platform pthread implementations
  */
 
@@ -915,6 +923,51 @@ getZipfianRand(TState *thread, int64 min, int64 max, 
double s)
 }
 
 /*
+ * FNV-1a hash function
+ */
+static int64
+getHashFnv1a(int64 val, uint64 seed)
+{
+       int64   result;
+       int             i;
+
+       result = FNV_OFFSET_BASIS ^ seed;
+       for (i = 0; i < 8; ++i)
+       {
+               int32 octet = val & 0xff;
+
+               val = val >> 8;
+               result = result ^ octet;
+               result = result * FNV_PRIME;
+       }
+
+       return result;
+}
+
+/*
+ * Murmur2 hash function
+ */
+static int64
+getHashMurmur2(int64 val, uint64 seed)
+{
+       uint64  result = seed ^ (sizeof(int64) * MM2_MUL);
+       uint64  k = (uint64) val;
+
+       k *= MM2_MUL;
+       k ^= k >> MM2_ROT;
+       k *= MM2_MUL;
+
+       result ^= k;
+       result *= MM2_MUL;
+
+       result ^= result >> MM2_ROT;
+       result *= MM2_MUL;
+       result ^= result >> MM2_ROT;
+
+       return (int64) result;
+}
+
+/*
  * Initialize the given SimpleStats struct to all zeroes
  */
 static void
@@ -2209,6 +2262,28 @@ evalStandardFunc(
                                return true;
                        }
 
+                       /* hashing */
+               case PGBENCH_HASH_FNV1A:
+               case PGBENCH_HASH_MURMUR2:
+                       {
+                               int64   val;
+                               int64   seed;
+                               int64   result;
+
+                               Assert(nargs == 2);
+
+                               if (!coerceToInt(&vargs[0], &val))
+                                       return false;
+
+                               if (!coerceToInt(&vargs[1], &seed))
+                                       return false;
+
+                               result = (func == PGBENCH_HASH_FNV1A) ?
+                                       getHashFnv1a(val, seed) : 
getHashMurmur2(val, seed);
+                               setIntValue(retval, result);
+                               return true;
+                       }
+
                default:
                        /* cannot get here */
                        Assert(0);
@@ -4972,6 +5047,10 @@ main(int argc, char **argv)
                exit(1);
        }
 
+       /* set random seed */
+       INSTR_TIME_SET_CURRENT(start_time);
+       srandom((unsigned int) INSTR_TIME_GET_MICROSEC(start_time));
+
        if (internal_script_used)
        {
                /*
@@ -5033,6 +5112,21 @@ main(int argc, char **argv)
                }
        }
 
+       /* set default seed for hash functions */
+       if (lookupVariable(&state[0], "hash_seed") == NULL)
+       {
+               uint64          seed;
+
+               seed = (uint64) (random() & 0xFFFF) << 48;
+               seed |= (uint64) (random() & 0xFFFF) << 32;
+               seed |= (uint64) (random() & 0xFFFF) << 16;
+               seed |= (uint64) (random() & 0xFFFF);
+
+               for (i = 0; i < nclients; i++)
+                       if (!putVariableInt(&state[i], "startup", "hash_seed", 
(int64) seed))
+                               exit(1);
+       }
+
        if (!is_no_vacuum)
        {
                fprintf(stderr, "starting vacuum...");
@@ -5050,10 +5144,6 @@ main(int argc, char **argv)
        }
        PQfinish(con);
 
-       /* set random seed */
-       INSTR_TIME_SET_CURRENT(start_time);
-       srandom((unsigned int) INSTR_TIME_GET_MICROSEC(start_time));
-
        /* set up thread data structures */
        threads = (TState *) pg_malloc(sizeof(TState) * nthreads);
        nclients_dealt = 0;
diff --git a/src/bin/pgbench/pgbench.h b/src/bin/pgbench/pgbench.h
index 0705ccd..6983865 100644
--- a/src/bin/pgbench/pgbench.h
+++ b/src/bin/pgbench/pgbench.h
@@ -97,7 +97,9 @@ typedef enum PgBenchFunction
        PGBENCH_LE,
        PGBENCH_LT,
        PGBENCH_IS,
-       PGBENCH_CASE
+       PGBENCH_CASE,
+       PGBENCH_HASH_FNV1A,
+       PGBENCH_HASH_MURMUR2
 } PgBenchFunction;
 
 typedef struct PgBenchExpr PgBenchExpr;
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl 
b/src/bin/pgbench/t/001_pgbench_with_server.pl
index a8b2962..752c299 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -259,6 +259,11 @@ pgbench(
                qr{command=46.: int 46\b},
                qr{command=47.: boolean true\b},
                qr{command=48.: boolean true\b},
+               qr{command=49.: int -5817877081768721676\b},
+               qr{command=50.: boolean true\b},
+               qr{command=51.: int -7793829335365542153\b},
+               qr{command=52.: int -?\d+\b},
+               qr{command=53.: boolean true\b},
        ],
        'pgbench expressions',
        {   '001_pgbench_expressions' => q{-- integer functions
@@ -327,6 +332,12 @@ pgbench(
 \set n6 debug(:n IS NULL AND NOT :f AND :t)
 -- conditional truth
 \set cs debug(CASE WHEN 1 THEN TRUE END AND CASE WHEN 1.0 THEN TRUE END AND 
CASE WHEN :n THEN NULL ELSE TRUE END)
+-- hash functions
+\set h0 debug(hash(10, 5432))
+\set h1 debug(:h0 = hash_murmur2(10, 5432))
+\set h3 debug(hash_fnv1a(10, 5432))
+\set h4 debug(hash(10))
+\set h5 debug(hash(10) = hash(10, :hash_seed))
 -- lazy evaluation
 \set zy 0
 \set yz debug(case when :zy = 0 then -1 else (1 / :zy) end)

Reply via email to