namei.c: Improve dcache hash function

George Spelvin Mon, 30 May 2016 09:06:59 -0700

Peter Zijlstra <pet...@infradead.org> wrote:
> On Sat, May 28, 2016 at 03:57:19PM -0400, George Spelvin wrote:
>> +static inline unsigned int fold_hash(unsigned long x, unsigned long y)
>>  {
>> +    y ^= x * GOLDEN_RATIO_64;
>> +    y *= GOLDEN_RATIO_64;
>> +    return y >> 32;
>>  }


> So does it make sense to use that pattern here too?
>
> This code doesn't much care about performance, but wants a decent hash
> from the stack of class keys.
>
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 81f1a7107c0e..c8498efcd5d9 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -309,10 +309,12 @@ static struct hlist_head 
> chainhash_table[CHAINHASH_SIZE];
>   * It's a 64-bit hash, because it's important for the keys to be
>   * unique.
>   */
> -#define iterate_chain_key(key1, key2) \
> -     (((key1) << MAX_LOCKDEP_KEYS_BITS) ^ \
> -     ((key1) >> (64-MAX_LOCKDEP_KEYS_BITS)) ^ \
> -     (key2))
> +static inline u64 iterate_chain_key(u64 x, u64 y)
> +{
> +     y ^= x * GOLDEN_RATIO_64;
> +     y *= GOLDEN_RATIO_64;
> +     return y;
> +}
>  
>  void lockdep_off(void)
>  {

Not quite.  The fold_hash() you quote is used only on 64-bit systems,
which can be assumed to have a reasonable 64-bit multiply.  On 32-bit
platforms, I avoid using GOLDEN_RATIO_64 at all, since 64x64-bit
multiplies are so expensive.

You actually have only 96 bits of input.  The correct prototype is:

static inline u64 iterate_chain_key(u64 key, u32 idx)

If performance mattered, I'd be inclined to use one or two iterations
of the 32-bit HASH_MIX() function, which is specifically designed
to add 32 bits to a 64-bit hash value.

A more thorough mixing would be achieved by __jhash_mix().  Basically:

static inline u64 iterate_chain_key(u64 key, u32 idx)
{
        u32 k0 = key, k1 = key >> 32;

        __jhash_mix(idx, k0, k1)        /* Macro that modifies arguments! */

        return k0 | (u64)k1 << 32;
}

(The order of arguments is chosen to perserve the two "most-hashed" values.)


Also, I just had contact from the hppa folks who have brought to my
attention that it's an example of an out-of-order superscalar CPU that
*doesn't* have a good integer multiplier.  For general multiplies,
you have to move values to the FPU and the code is a pain.

Instead, it has shift-and-add instructions designed to help the compiler
generate multiplies by constants, but large ones like GOLDEN_RATIO_64
is still a pain.

Here's code to take x (in %r26) and multiply it by GOLDEN_RATIO_64,
with the result in %r28:

        shladd,l %r26,1,%r26,%r19
        depd,z %r19,46,47,%r31
        sub %r31,%r19,%r31
        shladd,l %r31,2,%r26,%r31
        shladd,l %r31,2,%r26,%r31
        shladd,l %r31,2,%r31,%r19
        depd,z %r19,60,61,%r31
        sub %r31,%r19,%r31
        depd,z %r31,54,55,%r28
        add,l %r31,%r28,%r31
        shladd,l %r31,3,%r26,%r31
        depd,z %r31,59,60,%r28
        add,l %r31,%r28,%r31
        shladd,l %r31,2,%r26,%r31
        depd,z %r31,60,61,%r28
        sub %r28,%r31,%r28
        depd,z %r28,60,61,%r31
        sub %r31,%r28,%r31
        depd,z %r31,57,58,%r28
        add,l %r31,%r28,%r28
        depd,z %r28,61,62,%r28
        sub %r28,%r26,%r28
        shladd,l %r28,3,%r28,%r28

We're going to work on that.  Either finding a better code sequence,
or using the 32-bit version of hash_64() witht he nice code I've already
found for multiplies by GOLDEN_RATIO_32.


(But thaks, Peter, for looking a the code!)

Re: [PATCH v3 06/10] fs/namei.c: Improve dcache hash function

Reply via email to