Public bug reported:

* introduction
Found a regression on 5.15.0-1053, which worked OK on 5.15.0-1050

At DPU, when using the 5.15.0-1053-bluefield kernel, when the user-space 
process uses the OFED driver to create 2000 of SF devices in a batch mode.
At host side, the ubuntu kernel will prevent the user-space process from being 
scheduled for a long time, causing the user-space process to be stuck for a 
period of time.

* log at host side

INFO: task fwupd:7067 blocked for more than 368 seconds.
Tainted: G OE 6.8.0-45-generic #45-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:fwupd state:D stack:0 pid:7067 tgid:7067 ppid:1 flags:0×00000006
Call Trace:
__schedule+0x27c/0×6b0
schedule+0x33/0×110
schedule_preempt_disabled+0x15/0×30
__mutex_lock.constprop.0+0x42f/0×740
? __memcg_slab_post_alloc_hook+0x18e/0×230
__mutex_lock_slowpath+0x13/0×20
mutex_lock+0x3c/0×50
uevent_show+0xc4/0×170
dev_attr_show+0x1a/0×70
sysfs_kf_seq_show+0xa4/0×120
kernfs_seq_show+0x24/0×40
seq_read_iter+0x12f/0×4b0
kernfs_fop_read_iter+0x34/0×40
vfs_read+0x255/0×390
ksys_read+0x73/0×100
__x64_sys_read+0x19/0×30
x64_sys_call+0x1ada/0×25c0
do_syscall_64+0x7f/0×180
? handle_pte_fault+0x1cb/0×1d0
? __handle_mm_fault+0x653/0×790
? __count_memcg_events+0x6b/0×120
? count_memcg_events.constprop.0+0x2a/0×50
? handle_mm_fault+0xad/0×380
? do_user_addr_fault+0x32c/0×670
? irqentry_exit_to_user_mode+0x7e/0×260
? irqentry_exit+0x43/0×50
? clear_bhb_loop+0x15/0×70
? clear_bhb_loop+0x15/0×70
? clear_bhb_loop+0x15/0×70
entry_SYSCALL_64_after_hwframe+0x78/0×80
RIP: 0033:0×78c3f511ba9a
RSP: 002b:00007ffd44147480 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 00005cb70c378cc0 RCX: 000078c3f511ba9a
RDX: 0000000000001008 RSI: 00005cb70c378cc0 RDI: 000000000000000e
RBP: 00007ffd441474a0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000246 R12: 000000000000000e
R13: 0000000000001008 R14: 0000000000001008 R15: 0000000000001007

* possible solution
check between the two tags (1053 and 1050), doesn't find anything interesting.
checking the changelog, I guess the introduction of 1052 causes the regression.

linux-bluefield (5.15.0-1052.54) jammy; urgency=medium

  * jammy/linux-bluefield: 5.15.0-1052.54 -proposed tracker (LP:
#2075859)

  * Jammy update: v5.15.163 upstream stable release (LP: #2075170)
    - SAUCE: wireguard: allowedips: include <asm/unaligned.h> to fix build error

  [ Ubuntu: 5.15.0-121.131 ]

  * jammy/linux: 5.15.0-121.131 -proposed tracker (LP: #2076347)
  * jammy:linux bpf selftest do not build (LP: #2076334)
    - SAUCE: Revert "bpf: Allow reads from uninit stack"

  [ Ubuntu: 5.15.0-120.130 ]

  * jammy/linux: 5.15.0-120.130 -proposed tracker (LP: #2075903)
  * Packaging resync (LP: #1786013)
    - [Packaging] debian.master/dkms-versions -- update from kernel-versions
      (main/2024.08.05)
  * Jammy update: v5.15.163 upstream stable release (LP: #2075170)
    - Compiler Attributes: Add __uninitialized macro
    - locking/mutex: Introduce devm_mutex_init()
    - drm/lima: fix shared irq handling on driver remove
    - media: dvb: as102-fe: Fix as10x_register_addr packing
    - media: dvb-usb: dib0700_devices: Add missing release_firmware()
    - IB/core: Implement a limit on UMAD receive List
    - scsi: qedf: Make qedf_execute_tmf() non-preemptible
    - crypto: aead,cipher - zeroize key buffer after use
    - drm/amdgpu: Initialize timestamp for some legacy SOCs
    - drm/amd/display: Check index msg_id before read or write
    - drm/amd/display: Check pipe offset before setting vblank
    - drm/amd/display: Skip finding free audio for unknown engine_id
    - media: dw2102: Don't translate i2c read into write

... about 800 commits.

one way is to try bisect between them.

** Affects: linux-bluefield (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2084479

Title:
  Create 2K VNET VFs cause call trace on host side

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2084479/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to