On 2017-08-03 17:57:37 -0400, Robert Haas wrote: > On Thu, Aug 3, 2017 at 5:50 PM, Andres Freund <and...@anarazel.de> wrote: > > On 2017-08-03 17:43:44 -0400, Robert Haas wrote: > >> For me, the basic point here is that we need a set of hash functions > >> for hash partitioning that are different than what we use for hash > >> indexes and hash joins -- otherwise when we hash partition a table and > >> create hash indexes on each partition, those indexes will have nasty > >> clustering. Partitionwise hash joins will have similar problems. So, > >> a new set of hash functions specifically for hash partitioning is > >> quite desirable. > > > > Couldn't that just as well solved by being a bit smarter with an IV? I > > doubt we want to end up with different hashfunctions for sharding, > > partitioning, hashjoins (which seems to form a hierarchy). Having a > > working hash-combine function, or even better a hash API that can > > continue to use the hash's internal state, seems a more scalable > > solution. > > That's another way to go, but it requires inventing a way to thread > the IV through the hash opclass interface.
Only if we really want to do it really well :P. Using a hash_combine() like /* * Combine two hash values, resulting in another hash value, with decent bit * mixing. * * Similar to boost's hash_combine(). */ static inline uint32 hash_combine(uint32 a, uint32 b) { a ^= b + 0x9e3779b9 + (a << 6) + (a >> 2); return a; } between hash(IV) and the hashfunction should do the trick (the IV needs to hashed once, otherwise the bit mix is bad). > That's actually sort of a > problem anyway. Maybe I ought to have started with the question of > how we're going to make that end of things work. +1 one for that plan. > We could: > > - Invent a new hash_partition AM that doesn't really make indexes but > supplies hash functions for hash partitioning. > - Add a new, optional support function 2 to the hash AM that takes a > value of the type *and* an IV as an argument. > - Something else. Not arguing for it, but one option could also have pg_type.hash* function(s). One thing that I think might be advisable to think about is that we're atm stuck with a relatively bad hash function for hash indexes (and hash joins/aggs), and we should probably evolve it at some point. At the same time there's currently people out there relying on the current hash functions remaining stable. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers