Hi Shixiong / Shuang, according to the 6.8.0-58.log in #59, you are not experiencing the issue anymore, am i correct?
As mentioned in #56, our bisect was at this point: 4d60b13f267d workqueue: Don't call cpumask_test_cpu() with -1 CPU in wq_update_node_max_active() adc1b642f72f workqueue: Implement system-wide nr_active enforcement for unbound workqueues 929b7fbecbcc workqueue: Introduce struct wq_node_nr_active afd774d513f5 workqueue: RCU protect wq->dfl_pwq and implement accessors for it 31a8e16645d7 workqueue: Make wq_adjust_max_active() round-robin pwqs while activating e4bbec8ce062 workqueue: Move nr_active handling into helpers 865f7641cf47 workqueue: Replace pwq_activate_inactive_work() with [__]pwq_activate_work() a88074533304 workqueue: Factor out pwq_is_empty() 5d378b3d47e1 workqueue: Move pwq->max_active to wq->max_active eb182ba1f6cb workqueue.c: Increase workqueue name length ... 7fdb45c9bbbc (tag: Ubuntu-6.8.0-31.31, refs/bisect/good-7fdb45c9bbbc95a3300b4d8de3f751f4c05c98e2) UBUNTU: Ubuntu-6.8.0-31.31 In particular, all those workqueue patches were reverted upstream on v6.8.4: https://github.com/gregkh/linux/commits/v6.8.4/ because they were causing several regressions - so any kernel that has those reverts, should be good. Can you confirm that with 6.8.0-58 you are not experiencing this issue anymore? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2081685 Title: [Ubuntu 24.04-generic Kernel-6.8]Hard lockup on 8 Socket System, ThinkSystem SR950 V3. Status in linux package in Ubuntu: Confirmed Status in linux source package in Noble: In Progress Status in linux source package in Oracular: Confirmed Bug description: There is CPU hard Lockup detected under Ubuntu 24.04 LTS (kernel 6.8.0-38). see attachment"dmesg0723-Lockup-Ubuntu24.04.log" ubuntu@SR950V3:~$ cat /var/log/dmesg | grep -i lockup [ 15.241164] kernel: watchdog: Watchdog detected hard LOCKUP on cpu 124 [ 15.241164] kernel: ? watchdog_hardlockup_check+0x1cb/0x3b0 Besides, the issue does not occur on upstream kernel 6.8,6.9, 6.10, 6.11-rc*, then only ubuntu kernel issue. see attachment "dmesg0923-No-Lockup-Kernel 6-10.log". According to the dmesg log, the "hard lockup" is not a real lockup, Because many CPU try to get cache_disable_lock spin lock at the same time when kernel boot. And competition has occurred here. Every CPU's TLB will be flushed in the critical zone, the flushing TLB is a time-consuming operation, and there are so many CPUs, so the false "hard lockup" was detected by kernel. To avoid customer confuse, when Canonical do the fix? HW Config: ThinkSystem SR950 V3 CPU: 8* Intel(R) Xeon(R) Platinum 8490H 60 Core 3.5GHz MEM: 2TB = SK Hynix 356GB DDR5 4800MHz 3DS (2015.1GB) Raid: ThinkSystem RAID 940-8i 4GB Flash PCIe Gen4 12Gb Adapter Storage: Micron_7450_MTFDKBA960TFR *1 Samsung 30.7TB 24Gbps SAS 2.5" SSD NIC: ThinkSystem Intel X710-T4L 10GBASE-T 4-Port OCP Ethernet Adapter OS: ubuntu 24.04 LTS( kernel 6.8.0-38-generic) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2081685/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp