On Wed, Mar 20, 2019 at 9:24 PM Tom Lane <t...@sss.pgh.pa.us> wrote:

> Joel Jacobson <j...@trustly.com> writes:
> > I've seen a performance trick in other hash functions [1]
> > to instead read multiple bytes in each iteration,
> > and then handle the remaining bytes after the loop.
> > [1] https://github.com/wangyi-fudan/wyhash/blob/master/wyhash.h#L29
>
> I can't get very excited about this, seeing that we're only going to
> be hashing short strings.  I don't really believe your 30% number
> for short strings; and even if I did, there's no evidence that the
> hash functions are worth any further optimization in terms of our
> overall performance.
>

I went ahead and tested this approach anyway, since I need this algorithm
in a completely different project.

The benchmark below shows stats for three different keywords per length,
compiled with -O2:

$ c++ -O2 -std=c++14 -o bench_perfect_hash bench_perfect_hash.cc
$ ./bench_perfect_hash

keyword              length char-a-time (ns) word-a-time (ns) diff (%)
as                   2      3.30             2.62             -0.21
at                   2      3.54             2.66             -0.25
by                   2      3.30             2.59             -0.22
add                  3      4.01             3.15             -0.21
all                  3      4.04             3.11             -0.23
and                  3      3.84             3.11             -0.19
also                 4      4.50             3.17             -0.30
both                 4      4.49             3.06             -0.32
call                 4      4.95             3.42             -0.31
abort                5      6.09             4.02             -0.34
admin                5      5.26             3.65             -0.31
after                5      5.18             3.76             -0.27
access               6      5.97             3.91             -0.34
action               6      5.86             3.89             -0.34
always               6      6.10             3.77             -0.38
analyse              7      6.67             4.64             -0.30
analyze              7      7.09             4.87             -0.31
between              7      7.02             4.66             -0.34
absolute             8      7.49             3.82             -0.49
backward             8      7.13             3.88             -0.46
cascaded             8      7.23             4.17             -0.42
aggregate            9      8.04             4.49             -0.44
assertion            9      7.98             4.52             -0.43
attribute            9      8.03             4.44             -0.45
assignment           10     8.58             4.67             -0.46
asymmetric           10     9.07             4.57             -0.50
checkpoint           10     9.15             4.53             -0.51
constraints          11     9.58             5.14             -0.46
insensitive          11     9.62             5.30             -0.45
publication          11     10.30            5.60             -0.46
concurrently         12     10.36            4.81             -0.54
current_date         12     11.17            5.48             -0.51
current_role         12     11.15            5.10             -0.54
authorization        13     11.87            5.50             -0.54
configuration        13     11.50            5.51             -0.52
xmlattributes        13     11.72            5.66             -0.52
current_schema       14     12.17            5.58             -0.54
localtimestamp       14     11.78            5.46             -0.54
characteristics      15     12.77            5.97             -0.53
current_catalog      15     12.65            5.87             -0.54
current_timestamp    17     14.19            6.12             -0.57


> Also, as best I can tell, the approach you propose would result
> in an endianness dependence, meaning we'd have to have separate
> lookup tables for BE and LE machines.  That's not a dealbreaker
> perhaps, but it is certainly another point on the "it's not worth it"
> side of the argument.
>

I can see how the same problem has been worked-around in e.g. pg_crc32.h:

#ifdef WORDS_BIGENDIAN
#define FIN_CRC32C(crc) ((crc) = pg_bswap32(crc) ^ 0xFFFFFFFF)
#else
#define FIN_CRC32C(crc) ((crc) ^= 0xFFFFFFFF)
#endif

So I used the same trick in PerfectHash.pm:

$f .= sprintf "#ifdef WORDS_BIGENDIAN\n";
$f .= sprintf "\t\tc4 = pg_bswap32(c4);\n";
$f .= sprintf "#endif\n";

I've also tried to measure the overall effect by hacking postgres.c:

+       struct timespec start, stop;
+       clock_gettime( CLOCK_REALTIME, &start);
+       for (int i=0; i<100000; i++)
+  {
+               List       *parsetree_list2;
+               MemoryContext oldcontext2;
+
+               oldcontext2 = MemoryContextSwitchTo(MessageContext);
+               parsetree_list2 = pg_parse_query(query_string);
+               MemoryContextSwitchTo(oldcontext2);
+//             MemoryContextReset(MessageContext);
+               CHECK_FOR_INTERRUPTS();
+  }
+       clock_gettime( CLOCK_REALTIME, &stop);
+       printf("Bench: %f\n", ( stop.tv_sec - start.tv_sec ) + (double)(
stop.tv_nsec - start.tv_nsec ) / 1000000000L );

I measured the time for a big query found here:
https://wiki.postgresql.org/wiki/Index_Maintenance

I might be doing something wrong, but it looks like thee overall effect is
a ~3% improvement.

Attachment: perfect-hash-read-word-at-a-time.patch
Description: Binary data

Attachment: bench_perfect_hash.cc
Description: Binary data

Attachment: pgkeywords.h
Description: Binary data

Reply via email to