Paolo Abeni <pab...@redhat.com> writes: > On 3/31/25 11:23 PM, davem...@linux.ibm.com wrote: >> From: Dave Marquardt <davem...@linux.ibm.com> >> >> Use rtnl_mutex to synchronize veth_pool_store with itself, >> ibmveth_close and ibmveth_open, preventing multiple calls in a row to >> napi_disable. >> >> Signed-off-by: Dave Marquardt <davem...@linux.ibm.com> >> Fixes: 860f242eb534 ("[PATCH] ibmveth change buffer pools dynamically") >> Reviewed-by: Nick Child <nnac...@linux.ibm.com> >> --- >> In working on removing BUG_ON calls from ibmveth, I realized that 2 >> threads could call veth_pool_store through writing to >> /sys/devices/vio/30000002/pool*/*. You can do this easily with a little >> shell script. >> >> Running on a 6.14 kernel, I saw a hang: >> >> [ 243.683282][ T108] INFO: task stress.sh:5829 blocked for more than >> 122 seconds. >> [ 243.683300][ T108] Not tainted 6.14.0-01103-g2df0c02dab82 #3 >> [ 243.683303][ T108] "echo 0 > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> [ 366.563278][ T108] INFO: task stress.sh:5829 blocked for more than >> 245 seconds. >> [ 366.563297][ T108] Not tainted 6.14.0-01103-g2df0c02dab82 #3 >> [ 366.563301][ T108] "echo 0 > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> >> I configured LOCKDEP, compiled ibmveth.c with DEBUG, and built a new >> kernel. I ran the test again and saw: >> >> Setting pool0/active to 0 >> Setting pool1/active to 1 >> [ 73.911067][ T4365] ibmveth 30000002 eth0: close starting >> Setting pool1/active to 1 >> Setting pool1/active to 0 >> [ 73.911367][ T4366] ibmveth 30000002 eth0: close starting >> [ 73.916056][ T4365] ibmveth 30000002 eth0: close complete >> [ 73.916064][ T4365] ibmveth 30000002 eth0: open starting >> [ 110.808564][ T712] systemd-journald[712]: Sent WATCHDOG=1 >> notification. >> [ 230.808495][ T712] systemd-journald[712]: Sent WATCHDOG=1 >> notification. >> [ 243.683786][ T123] INFO: task stress.sh:4365 blocked for more than >> 122 seconds. >> [ 243.683827][ T123] Not tainted >> 6.14.0-01103-g2df0c02dab82-dirty #8 >> [ 243.683833][ T123] "echo 0 > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> [ 243.683838][ T123] task:stress.sh state:D stack:28096 pid:4365 >> tgid:4365 ppid:4364 task_flags:0x400040 flags:0x00042000 >> [ 243.683852][ T123] Call Trace: >> [ 243.683857][ T123] [c00000000c38f690] [0000000000000001] 0x1 >> (unreliable) >> [ 243.683868][ T123] [c00000000c38f840] [c00000000001f908] >> __switch_to+0x318/0x4e0 >> [ 243.683878][ T123] [c00000000c38f8a0] [c000000001549a70] >> __schedule+0x500/0x12a0 >> [ 243.683888][ T123] [c00000000c38f9a0] [c00000000154a878] >> schedule+0x68/0x210 >> [ 243.683896][ T123] [c00000000c38f9d0] [c00000000154ac80] >> schedule_preempt_disabled+0x30/0x50 >> [ 243.683904][ T123] [c00000000c38fa00] [c00000000154dbb0] >> __mutex_lock+0x730/0x10f0 >> [ 243.683913][ T123] [c00000000c38fb10] [c000000001154d40] >> napi_enable+0x30/0x60 >> [ 243.683921][ T123] [c00000000c38fb40] [c000000000f4ae94] >> ibmveth_open+0x68/0x5dc >> [ 243.683928][ T123] [c00000000c38fbe0] [c000000000f4aa20] >> veth_pool_store+0x220/0x270 >> [ 243.683936][ T123] [c00000000c38fc70] [c000000000826278] >> sysfs_kf_write+0x68/0xb0 >> [ 243.683944][ T123] [c00000000c38fcb0] [c0000000008240b8] >> kernfs_fop_write_iter+0x198/0x2d0 >> [ 243.683951][ T123] [c00000000c38fd00] [c00000000071b9ac] >> vfs_write+0x34c/0x650 >> [ 243.683958][ T123] [c00000000c38fdc0] [c00000000071bea8] >> ksys_write+0x88/0x150 >> [ 243.683966][ T123] [c00000000c38fe10] [c0000000000317f4] >> system_call_exception+0x124/0x340 >> [ 243.683973][ T123] [c00000000c38fe50] [c00000000000d05c] >> system_call_vectored_common+0x15c/0x2ec >> ... >> [ 243.684087][ T123] Showing all locks held in the system: >> [ 243.684095][ T123] 1 lock held by khungtaskd/123: >> [ 243.684099][ T123] #0: c00000000278e370 >> (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x50/0x248 >> [ 243.684114][ T123] 4 locks held by stress.sh/4365: >> [ 243.684119][ T123] #0: c00000003a4cd3f8 (sb_writers#3){.+.+}-{0:0}, >> at: ksys_write+0x88/0x150 >> [ 243.684132][ T123] #1: c000000041aea888 (&of->mutex#2){+.+.}-{3:3}, >> at: kernfs_fop_write_iter+0x154/0x2d0 >> [ 243.684143][ T123] #2: c0000000366fb9a8 >> (kn->active#64){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x160/0x2d0 >> [ 243.684155][ T123] #3: c000000035ff4cb8 (&dev->lock){+.+.}-{3:3}, >> at: napi_enable+0x30/0x60 >> [ 243.684166][ T123] 5 locks held by stress.sh/4366: >> [ 243.684170][ T123] #0: c00000003a4cd3f8 (sb_writers#3){.+.+}-{0:0}, >> at: ksys_write+0x88/0x150 >> [ 243.684183][ T123] #1: c00000000aee2288 (&of->mutex#2){+.+.}-{3:3}, >> at: kernfs_fop_write_iter+0x154/0x2d0 >> [ 243.684194][ T123] #2: c0000000366f4ba8 >> (kn->active#64){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x160/0x2d0 >> [ 243.684205][ T123] #3: c000000035ff4cb8 (&dev->lock){+.+.}-{3:3}, >> at: napi_disable+0x30/0x60 >> [ 243.684216][ T123] #4: c0000003ff9bbf18 (&rq->__lock){-.-.}-{2:2}, >> at: __schedule+0x138/0x12a0 >> >> From the ibmveth debug, two threads are calling veth_pool_store, which >> calls ibmveth_close and ibmveth_open. Here's the sequence: >> >> T4365 T4366 >> ----------------- ----------------- --------- >> veth_pool_store veth_pool_store >> ibmveth_close >> ibmveth_close >> napi_disable >> napi_disable >> ibmveth_open >> napi_enable <- HANG >> >> ibmveth_close calls napi_disable at the top and ibmveth_open calls >> napi_enable at the top. >> >> https://docs.kernel.org/networking/napi.html]] says >> >> The control APIs are not idempotent. Control API calls are safe >> against concurrent use of datapath APIs but an incorrect sequence of >> control API calls may result in crashes, deadlocks, or race >> conditions. For example, calling napi_disable() multiple times in a >> row will deadlock. >> >> In the normal open and close paths, rtnl_mutex is acquired to prevent >> other callers. This is missing from veth_pool_store. Use rtnl_mutex in >> veth_pool_store fixes these hangs. > > Some/most of the above should actually land into the commit message, > please rewrite it accordingly.
Okay, will do for v2. >> drivers/net/ethernet/ibm/ibmveth.c | 27 +++++++++++++++++++++++---- >> 1 file changed, 23 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/net/ethernet/ibm/ibmveth.c >> b/drivers/net/ethernet/ibm/ibmveth.c >> index b619a3ec245b..77ef19a53e72 100644 >> --- a/drivers/net/ethernet/ibm/ibmveth.c >> +++ b/drivers/net/ethernet/ibm/ibmveth.c >> @@ -1802,18 +1802,24 @@ static ssize_t veth_pool_store(struct kobject *kobj, >> struct attribute *attr, >> long value = simple_strtol(buf, NULL, 10); >> long rc; >> >> + rtnl_lock(); >> + >> if (attr == &veth_active_attr) { >> if (value && !pool->active) { >> if (netif_running(netdev)) { >> if (ibmveth_alloc_buffer_pool(pool)) { >> netdev_err(netdev, >> "unable to alloc pool\n"); >> + rtnl_unlock(); >> return -ENOMEM; >> } >> pool->active = 1; >> ibmveth_close(netdev); >> - if ((rc = ibmveth_open(netdev))) >> + rc = ibmveth_open(netdev); >> + if (rc) { >> + rtnl_unlock(); >> return rc; > > If you avoid a bit of duplicate code with > goto unlock_err; > > // at the end of the function > unlock_err: > rtnl_unlock(); > return rc; Yes, I'll do this for v2. Thanks. -Dave