Re: [HACKERS] Hash Functions

Andres Freund Thu, 03 Aug 2017 15:09:13 -0700

On 2017-08-03 17:57:37 -0400, Robert Haas wrote:
> On Thu, Aug 3, 2017 at 5:50 PM, Andres Freund <and...@anarazel.de> wrote:
> > On 2017-08-03 17:43:44 -0400, Robert Haas wrote:
> >> For me, the basic point here is that we need a set of hash functions
> >> for hash partitioning that are different than what we use for hash
> >> indexes and hash joins -- otherwise when we hash partition a table and
> >> create hash indexes on each partition, those indexes will have nasty
> >> clustering.  Partitionwise hash joins will have similar problems.  So,
> >> a new set of hash functions specifically for hash partitioning is
> >> quite desirable.
> >
> > Couldn't that just as well solved by being a bit smarter with an IV? I
> > doubt we want to end up with different hashfunctions for sharding,
> > partitioning, hashjoins (which seems to form a hierarchy). Having a
> > working hash-combine function, or even better a hash API that can
> > continue to use the hash's internal state, seems a more scalable
> > solution.
> 
> That's another way to go, but it requires inventing a way to thread
> the IV through the hash opclass interface.


Only if we really want to do it really well :P. Using a hash_combine()
like

/*
 * Combine two hash values, resulting in another hash value, with decent bit
 * mixing.
 *
 * Similar to boost's hash_combine().
 */
static inline uint32
hash_combine(uint32 a, uint32 b)
{
        a ^= b + 0x9e3779b9 + (a << 6) + (a >> 2);
        return a;
}

between hash(IV) and the hashfunction should do the trick (the IV needs
to hashed once, otherwise the bit mix is bad).


> That's actually sort of a
> problem anyway.  Maybe I ought to have started with the question of
> how we're going to make that end of things work.

+1 one for that plan.


> We could:
> 
> - Invent a new hash_partition AM that doesn't really make indexes but
> supplies hash functions for hash partitioning.
> - Add a new, optional support function 2 to the hash AM that takes a
> value of the type *and* an IV as an argument.
> - Something else.

Not arguing for it, but one option could also have pg_type.hash*
function(s).

One thing that I think might be advisable to think about is that we're
atm stuck with a relatively bad hash function for hash indexes (and hash
joins/aggs), and we should probably evolve it at some point. At the same
time there's currently people out there relying on the current hash
functions remaining stable.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Functions

Reply via email to