verification Jammy: nproc: 24 Before using -proposed: 1. $ uname -r: 5.15.0-118-generic 2. $ sudo dmesg | grep bnx2x: [ 2.656536] bnx2x 0000:04:00.0: msix capability found [ 2.669166] bnx2x 0000:04:00.0: part number 0-0-0-0 [ 3.133782] bnx2x 0000:04:00.0: 32.000 Gb/s available PCIe bandwidth (5.0 GT/s PCIe x8 link) [ 3.201230] bnx2x 0000:04:00.1: msix capability found [ 3.201815] bnx2x 0000:04:00.1: part number 0-0-0-0 [ 3.402127] bnx2x 0000:04:00.1: 32.000 Gb/s available PCIe bandwidth (5.0 GT/s PCIe x8 link) [ 3.407664] bnx2x 0000:04:00.0 eno49: renamed from eth0 [ 3.492325] bnx2x 0000:04:00.1 eno50: renamed from eth1 [ 56.145698] bnx2x 0000:04:00.1 eno50: using MSI-X IRQs: sp 78 fp[0] 80 ... fp[7] 87 [ 57.381769] bnx2x 0000:04:00.0 eno49: using MSI-X IRQs: sp 68 fp[0] 70 ... fp[7] 77 [ 64.772106] bnx2x 0000:04:00.0 eno49: NIC Link is Up, 10000 Mbps full duplex, Flow control: none [ 65.732116] bnx2x 0000:04:00.1 eno50: NIC Link is Up, 10000 Mbps full duplex, Flow control: none
3. We can see that there are two interfaces on this machine utilizing the bnx2x driver. As noted in the description increasing the num_queue variable value will not result in any UBSAN warnings since 5.15 has them disabled: $ sudo modprobe -r bnx2x $ sudo modprobe bnx2x num_queues=20 $ sduo dmesg | grep bnx2x ...<cut output>... [ 621.562054] bnx2x 0000:04:00.0: msix capability found [ 621.581949] bnx2x 0000:04:00.0: part number 0-0-0-0 [ 621.754136] bnx2x 0000:04:00.0: 32.000 Gb/s available PCIe bandwidth (5.0 GT/s PCIe x8 link) [ 621.754254] bnx2x 0000:04:00.1: msix capability found [ 621.758926] bnx2x 0000:04:00.0 eno49: renamed from eth0 [ 621.773993] bnx2x 0000:04:00.1: part number 0-0-0-0 [ 622.513115] bnx2x 0000:04:00.0 eno49: using MSI-X IRQs: sp 68 fp[0] 70 ... fp[19] 119 [ 623.282738] bnx2x 0000:04:00.1: 32.000 Gb/s available PCIe bandwidth (5.0 GT/s PCIe x8 link) [ 623.284540] bnx2x 0000:04:00.1 eno50: renamed from eth0 $ sudo dmesg | grep UBSAN <no result> 4. We know that the machine is accessing data out of bounds but the kernel is not reporting it. Let's upgrade to -proposed and see if the machine remains stable. After upgrading to -proposed: 1. $ uname -r: 5.15.0-120-generic 2. sudo dmesg | grep bnx2x [ 4.303867] bnx2x 0000:04:00.0: msix capability found [ 4.317050] bnx2x 0000:04:00.0: part number 0-0-0-0 [ 4.883254] bnx2x 0000:04:00.0: 32.000 Gb/s available PCIe bandwidth (5.0 GT/s PCIe x8 link) [ 4.951123] bnx2x 0000:04:00.1: msix capability found [ 4.951779] bnx2x 0000:04:00.1: part number 0-0-0-0 [ 5.200782] bnx2x 0000:04:00.1: 32.000 Gb/s available PCIe bandwidth (5.0 GT/s PCIe x8 link) [ 5.206714] bnx2x 0000:04:00.0 eno49: renamed from eth0 [ 5.293926] bnx2x 0000:04:00.1 eno50: renamed from eth1 [ 19.194753] bnx2x 0000:04:00.1 eno50: using MSI-X IRQs: sp 78 fp[0] 80 ... fp[7] 87 [ 20.430462] bnx2x 0000:04:00.0 eno49: using MSI-X IRQs: sp 68 fp[0] 70 ... fp[7] 77 [ 27.457468] bnx2x 0000:04:00.1 eno50: NIC Link is Up, 10000 Mbps full duplex, Flow control: none [ 27.637478] bnx2x 0000:04:00.0 eno49: NIC Link is Up, 10000 Mbps full duplex, Flow control: none 3. $ sudo modprobe -r bnx2x 4. $ sudo modrpobe bnx2x num_queues=20 ...<cut output>... [ 414.537575] bnx2x 0000:04:00.0: msix capability found [ 414.556346] bnx2x 0000:04:00.0: part number 0-0-0-0 [ 414.722808] bnx2x 0000:04:00.0: 32.000 Gb/s available PCIe bandwidth (5.0 GT/s PCIe x8 link) [ 414.722901] bnx2x 0000:04:00.1: msix capability found [ 414.725755] bnx2x 0000:04:00.0 eno49: renamed from eth0 [ 414.740419] bnx2x 0000:04:00.1: part number 0-0-0-0 [ 415.452007] bnx2x 0000:04:00.0 eno49: using MSI-X IRQs: sp 68 fp[0] 70 ... fp[19] 119 [ 416.220670] bnx2x 0000:04:00.1: 32.000 Gb/s available PCIe bandwidth (5.0 GT/s PCIe x8 link) [ 416.223125] bnx2x 0000:04:00.1 eno50: renamed from eth0 [ 422.644115] bnx2x 0000:04:00.0 eno49: NIC Link is Up, 10000 Mbps full duplex, Flow control: none 5. We can see that the network driver still loads correctly even after the patch is applied. 6. Running iperf3 -s on the server with my machine as the client we can see that the driver is also working as expected (The low upload speed in this test is unrelated to the driver and is due to limitations with my connection to the server): [ ID] Interval Transfer Bitrate [ 5] 0.00-1.00 sec 268 KBytes 2.19 Mbits/sec [ 5] 1.00-2.00 sec 589 KBytes 4.83 Mbits/sec [ 5] 2.00-3.00 sec 849 KBytes 6.95 Mbits/sec [ 5] 3.00-4.00 sec 490 KBytes 4.01 Mbits/sec [ 5] 4.00-5.00 sec 609 KBytes 4.99 Mbits/sec [ 5] 5.00-6.00 sec 648 KBytes 5.31 Mbits/sec [ 5] 6.00-7.00 sec 648 KBytes 5.31 Mbits/sec [ 5] 7.00-8.00 sec 692 KBytes 5.67 Mbits/sec [ 5] 8.00-9.00 sec 631 KBytes 5.17 Mbits/sec [ 5] 9.00-10.00 sec 653 KBytes 5.35 Mbits/sec [ 5] 10.00-10.13 sec 105 KBytes 6.78 Mbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate [ 5] 0.00-10.13 sec 6.04 MBytes 5.00 Mbits/sec receiver ** Tags removed: verification-needed-jammy-linux ** Tags added: verification-done-jammy-linux -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2074215 Title: [SRU] UBSAN warnings in bnx2x kernel driver Status in linux package in Ubuntu: Fix Released Status in linux source package in Focal: Fix Committed Status in linux source package in Jammy: Fix Committed Status in linux source package in Noble: Fix Committed Status in linux source package in Oracular: Fix Released Bug description: [impact] Currently in the bnx2x kernel driver there are reads/writes that occur out of bounds that have the possibility to cause kernel crashes. No meaningful impact has been observed yet other than UBSAN stack traces. I have posted a patch upstream to resolve this issue (134061163ee5 bnx2x: Fix multiple UBSAN array-index-out-of-bounds) and it has been accepted and merged. Although these traces appear only on linux version 6.5 and up, this bug also affects kernels 6.x and 5.x as well but no UBSAN warnings will be printed on these kernels since they were not enforced in these kernels. [Test Plan] There are multiple ways to reproduce the issue. But the most hands free way to reproduce it would be to utilize a Qlogic NIC that makes use of the E2 controller on a system with more than 32 cores. Below are both ways this can be reproduced. Please note that both will require a NIC that makes use of the bnx2x driver. * Normal Reproduction: 1. start a machine running kernel 6.5 or higher with a a number of cores above 32. Please note that these need to be physical cores not threads. The machine also needs to be using a NIC that utilizes an E2 controller. 2. In dmesg the following UBSAN warnings can be seen: UBSAN: array-index-out-of-bounds in drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c:1529:11 index 20 is out of range for type 'stats_query_entry [19]' CPU: 12 PID: 858 Comm: systemd-network Not tainted 6.9.0-060900rc7-generic #202405052133 Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 10/21/2019 Call Trace: <TASK> dump_stack_lvl+0x76/0xa0 dump_stack+0x10/0x20 __ubsan_handle_out_of_bounds+0xcb/0x110 bnx2x_prep_fw_stats_req+0x2e1/0x310 [bnx2x] bnx2x_stats_init+0x156/0x320 [bnx2x] bnx2x_post_irq_nic_init+0x81/0x1a0 [bnx2x] bnx2x_nic_load+0x8e8/0x19e0 [bnx2x] bnx2x_open+0x16b/0x290 [bnx2x] __dev_open+0x10e/0x1d0 RIP: 0033:0x736223927a0a Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89 RSP: 002b:00007ffc0bb2ada8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000583df50f9c78 RCX: 0000736223927a0a RDX: 0000000000000020 RSI: 0000583df50ee510 RDI: 0000000000000003 RBP: 0000583df50d4940 R08: 00007ffc0bb2adb0 R09: 0000000000000080 R10: 0000000000000000 R11: 0000000000000246 R12: 0000583df5103ae0 R13: 000000000000035a R14: 0000583df50f9c30 R15: 0000583ddddddf00 </TASK> ---[ end trace ]--- ------------[ cut here ]------------ UBSAN: array-index-out-of-bounds in drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c:1546:11 index 28 is out of range for type 'stats_query_entry [19]' CPU: 12 PID: 858 Comm: systemd-network Not tainted 6.9.0-060900rc7-generic #202405052133 Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 10/21/2019 Call Trace: <TASK> dump_stack_lvl+0x76/0xa0 dump_stack+0x10/0x20 __ubsan_handle_out_of_bounds+0xcb/0x110 bnx2x_prep_fw_stats_req+0x2fd/0x310 [bnx2x] bnx2x_stats_init+0x156/0x320 [bnx2x] bnx2x_post_irq_nic_init+0x81/0x1a0 [bnx2x] bnx2x_nic_load+0x8e8/0x19e0 [bnx2x] bnx2x_open+0x16b/0x290 [bnx2x] __dev_open+0x10e/0x1d0 RIP: 0033:0x736223927a0a Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89 RSP: 002b:00007ffc0bb2ada8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000583df50f9c78 RCX: 0000736223927a0a RDX: 0000000000000020 RSI: 0000583df50ee510 RDI: 0000000000000003 RBP: 0000583df50d4940 R08: 00007ffc0bb2adb0 R09: 0000000000000080 R10: 0000000000000000 R11: 0000000000000246 R12: 0000583df5103ae0 R13: 000000000000035a R14: 0000583df50f9c30 R15: 0000583ddddddf00 </TASK> ---[ end trace ]--- ------------[ cut here ]------------ UBSAN: array-index-out-of-bounds in drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c:1895:8 index 29 is out of range for type 'stats_query_entry [19]' CPU: 13 PID: 163 Comm: kworker/u96:1 Not tainted 6.9.0-060900rc7-generic #202405052133 Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 10/21/2019 Workqueue: bnx2x bnx2x_sp_task [bnx2x] Call Trace: <TASK> dump_stack_lvl+0x76/0xa0 dump_stack+0x10/0x20 __ubsan_handle_out_of_bounds+0xcb/0x110 bnx2x_iov_adjust_stats_req+0x3c4/0x3d0 [bnx2x] bnx2x_storm_stats_post.part.0+0x4a/0x330 [bnx2x] ? bnx2x_hw_stats_post+0x231/0x250 [bnx2x] bnx2x_stats_start+0x44/0x70 [bnx2x] bnx2x_stats_handle+0x149/0x350 [bnx2x] bnx2x_attn_int_asserted+0x998/0x9b0 [bnx2x] bnx2x_sp_task+0x491/0x5c0 [bnx2x] process_one_work+0x18d/0x3f0 </TASK> ---[ end trace ]--- * Forced reproducer: 1. Make sure you have a machine running kernel 6.5 and higher with any NIC that makes use of the bnx2x driver (No need for a NIC that utilizes the E2 controller). Also the number of cores the machine has is not important. 2. once the machine is booted unload the bnx2x module from the kernel: $ sudo modprobe -r bnx2x 3. then load back the driver but while specifying the number of ethernet queues to a value above 16: $ sudo modprobe bnx2x num_queues=20 4. The same stack traces shown above will show up in dmesg. [Fix] The fix already upstream and provided by: * 134061163ee5 bnx2x: Fix multiple UBSAN array-index-out-of-bounds [where problems could occur] * Since the patch increases the firmware stats array size, the driver will utilize slightly more memory, however this is still an insignificant amount. * Since no logic change has been done to the driver the regression risk is minimal [workaround] As stated earlier I have already written a patch to solve the issue, but in the meantime one way to avoid this problem would be to unload the driver and then load it back with a value for num_queues below 16: $ sudo modprobe bnx2x num_queues=15 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2074215/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp