On 07/17/2018 06:47 PM, Alexei Starovoitov wrote: > On Tue, Jul 17, 2018 at 06:10:38PM +0300, Tariq Toukan wrote: >> Fix the warning below by calling rhashtable_lookup under >> RCU read lock. >> >> [ 342.450870] WARNING: suspicious RCU usage >> [ 342.455856] 4.18.0-rc2+ #17 Tainted: G O >> [ 342.462210] ----------------------------- >> [ 342.467202] ./include/linux/rhashtable.h:481 suspicious >> rcu_dereference_check() usage! >> [ 342.476568] >> [ 342.476568] other info that might help us debug this: >> [ 342.476568] >> [ 342.486978] >> [ 342.486978] rcu_scheduler_active = 2, debug_locks = 1 >> [ 342.495211] 4 locks held by modprobe/3934: >> [ 342.500265] #0: 00000000e23116b2 (mlx5_intf_mutex){+.+.}, at: >> mlx5_unregister_interface+0x18/0x90 [mlx5_core] >> [ 342.511953] #1: 00000000ca16db96 (rtnl_mutex){+.+.}, at: >> unregister_netdev+0xe/0x20 >> [ 342.521109] #2: 00000000a46e2c4b (&priv->state_lock){+.+.}, at: >> mlx5e_close+0x29/0x60 >> [mlx5_core] >> [ 342.531642] #3: 0000000060c5bde3 (mem_id_lock){+.+.}, at: >> xdp_rxq_info_unreg+0x93/0x6b0 >> [ 342.541206] >> [ 342.541206] stack backtrace: >> [ 342.547075] CPU: 12 PID: 3934 Comm: modprobe Tainted: G O >> 4.18.0-rc2+ #17 >> [ 342.556621] Hardware name: Dell Inc. PowerEdge R730/0H21J3, BIOS 1.5.4 >> 10/002/2015 >> [ 342.565606] Call Trace: >> [ 342.568861] dump_stack+0x78/0xb3 >> [ 342.573086] xdp_rxq_info_unreg+0x3f5/0x6b0 >> [ 342.578285] ? __call_rcu+0x220/0x300 >> [ 342.582911] mlx5e_free_rq+0x38/0xc0 [mlx5_core] >> [ 342.588602] mlx5e_close_channel+0x20/0x120 [mlx5_core] >> [ 342.594976] mlx5e_close_channels+0x26/0x40 [mlx5_core] >> [ 342.601345] mlx5e_close_locked+0x44/0x50 [mlx5_core] >> [ 342.607519] mlx5e_close+0x42/0x60 [mlx5_core] >> [ 342.613005] __dev_close_many+0xb1/0x120 >> [ 342.617911] dev_close_many+0xa2/0x170 >> [ 342.622622] rollback_registered_many+0x148/0x460 >> [ 342.628401] ? __lock_acquire+0x48d/0x11b0 >> [ 342.633498] ? unregister_netdev+0xe/0x20 >> [ 342.638495] rollback_registered+0x56/0x90 >> [ 342.643588] unregister_netdevice_queue+0x7e/0x100 >> [ 342.649461] unregister_netdev+0x18/0x20 >> [ 342.654362] mlx5e_remove+0x2a/0x50 [mlx5_core] >> [ 342.659944] mlx5_remove_device+0xe5/0x110 [mlx5_core] >> [ 342.666208] mlx5_unregister_interface+0x39/0x90 [mlx5_core] >> [ 342.673038] cleanup+0x5/0xbfc [mlx5_core] >> [ 342.678094] __x64_sys_delete_module+0x16b/0x240 >> [ 342.683725] ? do_syscall_64+0x1c/0x210 >> [ 342.688476] do_syscall_64+0x5a/0x210 >> [ 342.693025] entry_SYSCALL_64_after_hwframe+0x49/0xbe >> >> Fixes: 8d5d88527587 ("xdp: rhashtable with allocator ID to pointer mapping") >> Signed-off-by: Tariq Toukan <tar...@mellanox.com> >> Cc: Jesper Dangaard Brouer <bro...@redhat.com> >> --- >> net/core/xdp.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/net/core/xdp.c b/net/core/xdp.c >> index 9d1f22072d5d..c20fefbfb76c 100644 >> --- a/net/core/xdp.c >> +++ b/net/core/xdp.c >> @@ -102,7 +102,9 @@ static void __xdp_rxq_info_unreg_mem_model(struct >> xdp_rxq_info *xdp_rxq) >> >> mutex_lock(&mem_id_lock); >> >> + rcu_read_lock(); >> xa = rhashtable_lookup(mem_id_ht, &id, mem_id_rht_params); >> + rcu_read_unlock(); >> if (!xa) { > > if it's an actual bug rcu_read_unlock seems to be misplaced. > It silences the warn, but rcu section looks wrong.
I think that whole piece in __xdp_rxq_info_unreg_mem_model() should be: mutex_lock(&mem_id_lock); xa = rhashtable_lookup_fast(mem_id_ht, &id, mem_id_rht_params); if (xa && rhashtable_remove_fast(mem_id_ht, &xa->node, mem_id_rht_params) == 0) call_rcu(&xa->rcu, __xdp_mem_allocator_rcu_free); mutex_unlock(&mem_id_lock); Technically the RCU read side plus rhashtable_lookup() is the same, but lets use proper api. From the doc (https://lwn.net/Articles/751374/) object removal is wrapped around the RCU read side additionally, but in our case we're behind mem_id_lock for insertion/removal serialization. Cheers, Daniel