rhashtable_rehash_one() uses complex logic to update entry->next field, after INIT_RHT_NULLS_HEAD and NULLS_MARKER expansion:
entry->next = 1 | ((base + off) << 1) This can be compiled along the lines of: entry->next = base + off entry->next <<= 1 entry->next |= 1 Which will break concurrent readers. NULLS value recomputation is not needed here, so just remove the complex logic. The data race was found with KernelThreadSanitizer (KTSAN). Signed-off-by: Dmitry Vyukov <dvyu...@google.com> --- v2: Remove NULLS values recomputation as it is not needed. Update commit description to clarify that the problem is not with racy reads/writes per se but rather with the complex update logic. KTSAN report for the record: ThreadSanitizer: data-race in netlink_lookup Atomic read at 0xffff880480443bd0 of size 8 by thread 2747 on CPU 11: [< inline >] rhashtable_lookup_fast include/linux/rhashtable.h:543 [< inline >] __netlink_lookup net/netlink/af_netlink.c:1026 [<ffffffff81bd9a84>] netlink_lookup+0x134/0x1c0 net/netlink/af_netlink.c:1046 [< inline >] netlink_getsockbyportid net/netlink/af_netlink.c:1616 [<ffffffff81bdc701>] netlink_unicast+0x111/0x300 net/netlink/af_netlink.c:1812 [<ffffffff81bdcdb9>] netlink_sendmsg+0x4c9/0x5f0 net/netlink/af_netlink.c:2443 [< inline >] sock_sendmsg_nosec net/socket.c:610 [<ffffffff81b5d6f3>] sock_sendmsg+0x83/0x90 net/socket.c:620 [<ffffffff81b5e59f>] ___sys_sendmsg+0x3cf/0x3e0 net/socket.c:1952 [<ffffffff81b5f6ac>] __sys_sendmsg+0x4c/0xb0 net/socket.c:1986 [< inline >] SYSC_sendmsg net/socket.c:1997 [<ffffffff81b5f740>] SyS_sendmsg+0x30/0x50 net/socket.c:1993 [<ffffffff81ee3e11>] entry_SYSCALL_64_fastpath+0x31/0x95 arch/x86/entry/entry_64.S:188 Previous write at 0xffff880480443bd0 of size 8 by thread 213 on CPU 4: [< inline >] rhashtable_rehash_one lib/rhashtable.c:193 [< inline >] rhashtable_rehash_chain lib/rhashtable.c:213 [< inline >] rhashtable_rehash_table lib/rhashtable.c:257 [<ffffffff8156f7e0>] rht_deferred_worker+0x3b0/0x6d0 lib/rhashtable.c:373 [<ffffffff810b1d6e>] process_one_work+0x47e/0x930 kernel/workqueue.c:2036 [<ffffffff810b22d0>] worker_thread+0xb0/0x900 kernel/workqueue.c:2170 [<ffffffff810bba40>] kthread+0x150/0x170 kernel/kthread.c:209 [<ffffffff81ee420f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:529 Mutexes locked by thread 213: Mutex 217217 is locked here: [<ffffffff81ee0407>] mutex_lock+0x57/0x70 kernel/locking/mutex.c:108 [<ffffffff8156f475>] rht_deferred_worker+0x45/0x6d0 lib/rhashtable.c:363 [<ffffffff810b1d6e>] process_one_work+0x47e/0x930 kernel/workqueue.c:2036 [<ffffffff810b22d0>] worker_thread+0xb0/0x900 kernel/workqueue.c:2170 [<ffffffff810bba40>] kthread+0x150/0x170 kernel/kthread.c:209 [<ffffffff81ee420f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:529 Mutex 431216 is locked here: [< inline >] __raw_spin_lock_bh include/linux/spinlock_api_smp.h:149 [<ffffffff81ee3195>] _raw_spin_lock_bh+0x65/0x80 kernel/locking/spinlock.c:175 [< inline >] spin_lock_bh include/linux/spinlock.h:317 [< inline >] rhashtable_rehash_chain lib/rhashtable.c:212 [< inline >] rhashtable_rehash_table lib/rhashtable.c:257 [<ffffffff8156f616>] rht_deferred_worker+0x1e6/0x6d0 lib/rhashtable.c:373 [<ffffffff810b1d6e>] process_one_work+0x47e/0x930 kernel/workqueue.c:2036 [<ffffffff810b22d0>] worker_thread+0xb0/0x900 kernel/workqueue.c:2170 [<ffffffff810bba40>] kthread+0x150/0x170 kernel/kthread.c:209 [<ffffffff81ee420f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:529 Mutex 432766 is locked here: [< inline >] __raw_spin_lock include/linux/spinlock_api_smp.h:158 [<ffffffff81ee37d0>] _raw_spin_lock+0x50/0x70 kernel/locking/spinlock.c:151 [< inline >] rhashtable_rehash_one lib/rhashtable.c:186 [< inline >] rhashtable_rehash_chain lib/rhashtable.c:213 [< inline >] rhashtable_rehash_table lib/rhashtable.c:257 [<ffffffff8156f79b>] rht_deferred_worker+0x36b/0x6d0 lib/rhashtable.c:373 [<ffffffff810b1d6e>] process_one_work+0x47e/0x930 kernel/workqueue.c:2036 [<ffffffff810b22d0>] worker_thread+0xb0/0x900 kernel/workqueue.c:2170 [<ffffffff810bba40>] kthread+0x150/0x170 kernel/kthread.c:209 [<ffffffff81ee420f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:529 --- lib/rhashtable.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/lib/rhashtable.c b/lib/rhashtable.c index cc0c697..a54ff89 100644 --- a/lib/rhashtable.c +++ b/lib/rhashtable.c @@ -187,10 +187,7 @@ static int rhashtable_rehash_one(struct rhashtable *ht, unsigned int old_hash) head = rht_dereference_bucket(new_tbl->buckets[new_hash], new_tbl, new_hash); - if (rht_is_a_nulls(head)) - INIT_RHT_NULLS_HEAD(entry->next, ht, new_hash); - else - RCU_INIT_POINTER(entry->next, head); + RCU_INIT_POINTER(entry->next, head); rcu_assign_pointer(new_tbl->buckets[new_hash], entry); spin_unlock(new_bucket_lock); -- 2.6.0.rc0.131.gf624c3d -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html