futex_hash() references two global variables: the base pointer futex_queues and the size of the array futex_hashsize. The latter is marked __read_mostly, while the former is not, so they are likely to end up very far from each other. This means that futex_hash() is likely to encounter two cache misses.
We could mark futex_queues as __read_mostly as well, but that doesn't guarantee they'll end up next to each other. So put the two variables in a small singleton struct and mark that as __read_mostly. A diff of the disassembly shows what I'd expect: @@ -213289,14 +213289,14 @@ : 31 d1 xor %edx,%ecx : c1 ca 12 ror $0x12,%edx : 29 d1 sub %edx,%ecx -: 48 8b 15 95 61 e4 00 mov 0xe46195(%rip),%rdx # ffffffff81eff9e8 <futex_hashsize> +: 48 8b 15 a5 61 e4 00 mov 0xe461a5(%rip),%rdx # ffffffff81eff9f8 <__futex_data+0x8> : 31 c8 xor %ecx,%eax : c1 c9 08 ror $0x8,%ecx : 29 c8 sub %ecx,%eax : 48 83 ea 01 sub $0x1,%rdx : 48 21 d0 and %rdx,%rax : 48 c1 e0 06 shl $0x6,%rax -: 48 03 05 74 dc 00 01 add 0x100dc74(%rip),%rax # ffffffff820c74e0 <futex_queues> +: 48 03 05 84 61 e4 00 add 0xe46184(%rip),%rax # ffffffff81eff9f0 <__futex_data> : c3 retq : 0f 1f 00 nopl (%rax) Signed-off-by: Rasmus Villemoes <li...@rasmusvillemoes.dk> --- If this is worth applying, one could consider giving dentry_hashtable, inode_hashtable and their friends the same treatment. The variables are all __read_mostly, but may still end up in separate cache lines (even if the linker places them next to each other). Even better would be to have some alternatives-like mechanism for replacing the code with instructions using immediates. But that's far more complicated... kernel/futex.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index 2579e407ff67..c5f33bf78293 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -254,9 +254,18 @@ struct futex_hash_bucket { struct plist_head chain; } ____cacheline_aligned_in_smp; -static unsigned long __read_mostly futex_hashsize; +/* + * The base of the bucket array and its size are always used together + * (after initialization only in hash_futex()), so ensure that they + * reside in the same cacheline. + */ +static struct { + struct futex_hash_bucket *queues; + unsigned long hashsize; +} __futex_data __read_mostly __aligned(16); +#define futex_queues (__futex_data.queues) +#define futex_hashsize (__futex_data.hashsize) -static struct futex_hash_bucket *futex_queues; static inline void futex_get_mm(union futex_key *key) { -- 2.1.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in Please read the FAQ at http://www.tux.org/lkml/