On Wed, Mar 20, 2019 at 9:24 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > Joel Jacobson <j...@trustly.com> writes: > > I've seen a performance trick in other hash functions [1] > > to instead read multiple bytes in each iteration, > > and then handle the remaining bytes after the loop. > > [1] https://github.com/wangyi-fudan/wyhash/blob/master/wyhash.h#L29 > > I can't get very excited about this, seeing that we're only going to > be hashing short strings. I don't really believe your 30% number > for short strings; and even if I did, there's no evidence that the > hash functions are worth any further optimization in terms of our > overall performance. >
I went ahead and tested this approach anyway, since I need this algorithm in a completely different project. The benchmark below shows stats for three different keywords per length, compiled with -O2: $ c++ -O2 -std=c++14 -o bench_perfect_hash bench_perfect_hash.cc $ ./bench_perfect_hash keyword length char-a-time (ns) word-a-time (ns) diff (%) as 2 3.30 2.62 -0.21 at 2 3.54 2.66 -0.25 by 2 3.30 2.59 -0.22 add 3 4.01 3.15 -0.21 all 3 4.04 3.11 -0.23 and 3 3.84 3.11 -0.19 also 4 4.50 3.17 -0.30 both 4 4.49 3.06 -0.32 call 4 4.95 3.42 -0.31 abort 5 6.09 4.02 -0.34 admin 5 5.26 3.65 -0.31 after 5 5.18 3.76 -0.27 access 6 5.97 3.91 -0.34 action 6 5.86 3.89 -0.34 always 6 6.10 3.77 -0.38 analyse 7 6.67 4.64 -0.30 analyze 7 7.09 4.87 -0.31 between 7 7.02 4.66 -0.34 absolute 8 7.49 3.82 -0.49 backward 8 7.13 3.88 -0.46 cascaded 8 7.23 4.17 -0.42 aggregate 9 8.04 4.49 -0.44 assertion 9 7.98 4.52 -0.43 attribute 9 8.03 4.44 -0.45 assignment 10 8.58 4.67 -0.46 asymmetric 10 9.07 4.57 -0.50 checkpoint 10 9.15 4.53 -0.51 constraints 11 9.58 5.14 -0.46 insensitive 11 9.62 5.30 -0.45 publication 11 10.30 5.60 -0.46 concurrently 12 10.36 4.81 -0.54 current_date 12 11.17 5.48 -0.51 current_role 12 11.15 5.10 -0.54 authorization 13 11.87 5.50 -0.54 configuration 13 11.50 5.51 -0.52 xmlattributes 13 11.72 5.66 -0.52 current_schema 14 12.17 5.58 -0.54 localtimestamp 14 11.78 5.46 -0.54 characteristics 15 12.77 5.97 -0.53 current_catalog 15 12.65 5.87 -0.54 current_timestamp 17 14.19 6.12 -0.57 > Also, as best I can tell, the approach you propose would result > in an endianness dependence, meaning we'd have to have separate > lookup tables for BE and LE machines. That's not a dealbreaker > perhaps, but it is certainly another point on the "it's not worth it" > side of the argument. > I can see how the same problem has been worked-around in e.g. pg_crc32.h: #ifdef WORDS_BIGENDIAN #define FIN_CRC32C(crc) ((crc) = pg_bswap32(crc) ^ 0xFFFFFFFF) #else #define FIN_CRC32C(crc) ((crc) ^= 0xFFFFFFFF) #endif So I used the same trick in PerfectHash.pm: $f .= sprintf "#ifdef WORDS_BIGENDIAN\n"; $f .= sprintf "\t\tc4 = pg_bswap32(c4);\n"; $f .= sprintf "#endif\n"; I've also tried to measure the overall effect by hacking postgres.c: + struct timespec start, stop; + clock_gettime( CLOCK_REALTIME, &start); + for (int i=0; i<100000; i++) + { + List *parsetree_list2; + MemoryContext oldcontext2; + + oldcontext2 = MemoryContextSwitchTo(MessageContext); + parsetree_list2 = pg_parse_query(query_string); + MemoryContextSwitchTo(oldcontext2); +// MemoryContextReset(MessageContext); + CHECK_FOR_INTERRUPTS(); + } + clock_gettime( CLOCK_REALTIME, &stop); + printf("Bench: %f\n", ( stop.tv_sec - start.tv_sec ) + (double)( stop.tv_nsec - start.tv_nsec ) / 1000000000L ); I measured the time for a big query found here: https://wiki.postgresql.org/wiki/Index_Maintenance I might be doing something wrong, but it looks like thee overall effect is a ~3% improvement.
perfect-hash-read-word-at-a-time.patch
Description: Binary data
bench_perfect_hash.cc
Description: Binary data
pgkeywords.h
Description: Binary data