Peter Zijlstra <pet...@infradead.org> wrote: > On Sat, May 28, 2016 at 03:57:19PM -0400, George Spelvin wrote: >> +static inline unsigned int fold_hash(unsigned long x, unsigned long y) >> { >> + y ^= x * GOLDEN_RATIO_64; >> + y *= GOLDEN_RATIO_64; >> + return y >> 32; >> }
> So does it make sense to use that pattern here too? > > This code doesn't much care about performance, but wants a decent hash > from the stack of class keys. > > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c > index 81f1a7107c0e..c8498efcd5d9 100644 > --- a/kernel/locking/lockdep.c > +++ b/kernel/locking/lockdep.c > @@ -309,10 +309,12 @@ static struct hlist_head > chainhash_table[CHAINHASH_SIZE]; > * It's a 64-bit hash, because it's important for the keys to be > * unique. > */ > -#define iterate_chain_key(key1, key2) \ > - (((key1) << MAX_LOCKDEP_KEYS_BITS) ^ \ > - ((key1) >> (64-MAX_LOCKDEP_KEYS_BITS)) ^ \ > - (key2)) > +static inline u64 iterate_chain_key(u64 x, u64 y) > +{ > + y ^= x * GOLDEN_RATIO_64; > + y *= GOLDEN_RATIO_64; > + return y; > +} > > void lockdep_off(void) > { Not quite. The fold_hash() you quote is used only on 64-bit systems, which can be assumed to have a reasonable 64-bit multiply. On 32-bit platforms, I avoid using GOLDEN_RATIO_64 at all, since 64x64-bit multiplies are so expensive. You actually have only 96 bits of input. The correct prototype is: static inline u64 iterate_chain_key(u64 key, u32 idx) If performance mattered, I'd be inclined to use one or two iterations of the 32-bit HASH_MIX() function, which is specifically designed to add 32 bits to a 64-bit hash value. A more thorough mixing would be achieved by __jhash_mix(). Basically: static inline u64 iterate_chain_key(u64 key, u32 idx) { u32 k0 = key, k1 = key >> 32; __jhash_mix(idx, k0, k1) /* Macro that modifies arguments! */ return k0 | (u64)k1 << 32; } (The order of arguments is chosen to perserve the two "most-hashed" values.) Also, I just had contact from the hppa folks who have brought to my attention that it's an example of an out-of-order superscalar CPU that *doesn't* have a good integer multiplier. For general multiplies, you have to move values to the FPU and the code is a pain. Instead, it has shift-and-add instructions designed to help the compiler generate multiplies by constants, but large ones like GOLDEN_RATIO_64 is still a pain. Here's code to take x (in %r26) and multiply it by GOLDEN_RATIO_64, with the result in %r28: shladd,l %r26,1,%r26,%r19 depd,z %r19,46,47,%r31 sub %r31,%r19,%r31 shladd,l %r31,2,%r26,%r31 shladd,l %r31,2,%r26,%r31 shladd,l %r31,2,%r31,%r19 depd,z %r19,60,61,%r31 sub %r31,%r19,%r31 depd,z %r31,54,55,%r28 add,l %r31,%r28,%r31 shladd,l %r31,3,%r26,%r31 depd,z %r31,59,60,%r28 add,l %r31,%r28,%r31 shladd,l %r31,2,%r26,%r31 depd,z %r31,60,61,%r28 sub %r28,%r31,%r28 depd,z %r28,60,61,%r31 sub %r31,%r28,%r31 depd,z %r31,57,58,%r28 add,l %r31,%r28,%r28 depd,z %r28,61,62,%r28 sub %r28,%r26,%r28 shladd,l %r28,3,%r28,%r28 We're going to work on that. Either finding a better code sequence, or using the 32-bit version of hash_64() witht he nice code I've already found for multiplies by GOLDEN_RATIO_32. (But thaks, Peter, for looking a the code!)