This bug is awaiting verification that the linux- bluefield/5.15.0-1029.31 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-bluefield' to 'verification-done-jammy-linux-bluefield'. If the problem still exists, change the tag 'verification-needed-jammy-linux-bluefield' to 'verification-failed-jammy-linux-bluefield'.
If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: kernel-spammed-jammy-linux-bluefield-v2 verification-needed-jammy-linux-bluefield -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-bluefield in Ubuntu. https://bugs.launchpad.net/bugs/2039869 Title: Devlink reload hangs: fix race and lock issue Status in linux-bluefield package in Ubuntu: Invalid Status in linux-bluefield source package in Jammy: Fix Committed Bug description: Summary: Machine hangs when doing devlink reload How to reproduce: Host: [root@bu-lab24v ~]# echo '2' > /sys/class/net/ens2f0np0/device/sriov_numvfs Arm: root@bu-lab24v-oob:~# uname -r 5.15.0-1027-bluefield root@bu-lab24v-oob:~# devlink dev eswitch set pci/0000:03:00.0 mode switchdev root@bu-lab24v-oob:~# devlink dev reload pci/0000:03:00.0 *Hangs* Arm dmesg: [ 1089.747409] INFO: task devlink:8753 blocked for more than 120 seconds. [ 1089.760560] Tainted: G OE 5.15.0-1027-bluefield #29-Ubuntu [ 1089.775086] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1089.790829] task:devlink state:D stack: 0 pid: 8753 ppid: 5090 flags:0x00000004 [ 1089.790838] Call trace: [ 1089.790840] __switch_to+0xf8/0x150 [ 1089.790857] __schedule+0x2b8/0x790 [ 1089.790865] schedule+0x64/0x140 [ 1089.790870] schedule_preempt_disabled+0x18/0x24 [ 1089.790874] __mutex_lock.constprop.0+0x1a0/0x680 [ 1089.790878] __mutex_lock_slowpath+0x40/0x90 [ 1089.790883] mutex_lock+0x64/0x70 [ 1089.790887] devl_lock+0x1c/0x30 [ 1089.790893] mlx5_detach_device+0x58/0x190 [mlx5_core] [ 1089.791055] mlx5_unload_one+0x40/0xe4 [mlx5_core] [ 1089.791177] mlx5_devlink_reload_down+0x184/0x270 [mlx5_core] [ 1089.791318] devlink_reload+0x214/0x290 Fixes: Checking the OFED source code, we found this missing devl trap group also need to be backported to avoid deadlock. void mlx5_detach_device(struct mlx5_core_dev *dev, bool suspend) { ... #ifdef HAVE_DEVL_PORT_REGISTER #ifdef HAVE_DEVL_TRAP_GROUPS_REGISTER devl_assert_locked(priv_to_devlink(dev)); #else devl_lock(devlink); #endif /* HAVE_DEVL_TRAP_GROUPS_REGISTER */ #endif /* HAVE_DEVL_PORT_REGISTER */ mutex_lock(&mlx5_intf_mutex); #ifdef HAVE_DEVL_PORT_REGISTER Related issue: #2032378 Devlink backport: fix race and lock issue So cherry-pick the patch below commit 852e85a704c2e11c050bdea286bc438aba4f4a22 Author: Jiri Pirko <j...@resnulli.us> Date: Sat Jul 16 13:02:34 2022 +0200 net: devlink: add unlocked variants of devling_trap*() functions Add unlocked variants of devl_trap*() functions to be used in drivers called-in with devlink->lock held. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2039869/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp