[Kernel-packages] [Bug 2033122] Re: Request backport of xen timekeeping performance improvements
** Tags removed: verification-needed-jammy-linux-aws ** Tags added: verification-done-jammy-linux-aws -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2033122 Title: Request backport of xen timekeeping performance improvements Status in linux package in Ubuntu: Invalid Status in linux source package in Jammy: Fix Released Status in linux source package in Lunar: Fix Released Bug description: Users, especially those on EC2, are encouraged to select tsc as their default clocksource. However, this requires manual tuning of the operating system. Kvm can determine if it safe to use the tsc, and will default to that instead of its pvclock when appropriate. This requests a backport of patch does the same for Xen instances. If appropriate, it's fine if this is applied to only the linux-aws branches. Not all Xen EC2 instances advertise explicit nomigrate support, however, on those that do we'll select tsc by default. On the subset of hosts where this is advertised, users will safely default to the more performant clocksource. [Impact] Xen instances default to the xen clocksource which has been documented to be slower. This is required for instances where the tsc is not safe to use, or the guest is subject to migration. On some platforms the performance impact can be high, and users are encouraged to select the tsc when appropriate. Instead of leaving up to users to figure this out by reading a variety of different documents, pick the fast clocksource when it can be determined to be safe to do so. [Backport] Clean cherry pick. No conflicts applying to 5.15 or 6.2. [Test] Booted EC2 xen instances with and without this patch and validated that on those that properly advertised the required criteria via cpuid, that the clocksource defaulted to tsc instead of xen. [Potential Regression] Potential is low, since only absurd configurations could lead to a problem. If this is considered risky, it can be applied to only linux-aws where the documented guidance is for users to enable tsc as the clocksource on Xen. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2033122/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2056227] Re: KVM: arm64: softlockups in stage2_apply_range
For posterity, LTS 5.15 picked up this fix in 5.15.154 ** Tags removed: verification-needed-jammy-linux ** Tags added: verification-done-jammy-linux -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2056227 Title: KVM: arm64: softlockups in stage2_apply_range Status in linux package in Ubuntu: Invalid Status in linux source package in Jammy: Fix Committed Bug description: [Impact] Tearing down kvm VMs on arm64 can cause softlockups to appear on console. When terminating VMs with > 100Gb of memory and 4k pages, the memory unmap times often exceed 20 seconds, which can trigger the softlockup detector. Portions of the unmap path also have interrupts disabled while tlb invalidation instructions run, which can further contribute to latency problems. My team has observed networking latency problems if the cpu where the teardown is occurring is also mapped to handle a NIC interrupt. Fortunately, a solution has been in place since Linux 6.1. A small pair of patches modify stage2_apply_range to operate on smaller memory ranges before performing a cond_resched. With these patches applied, softlockups are no longer observed when tearing down VMs with large amounts of memory. Although I also submitted the patches to 5.15 LTS (link to LTS submission in "Backport" section), I'd appreciate it if Ubuntu were willing to take this submission in parallel since the impact has left us unable to utilize arm64 for kvm until we can either migrate our hypervisors to hugepages, pick up this fix, or some combination of the two. [Backport] Backport the following fixes from linux 6.1: 3b5c082bbf KVM: arm64: Work out supported block level at compile time 5994bc9e05 KVM: arm64: Limit stage2_apply_range() batch size to largest block The fix is in 5994bc9e05 and 3b5c082bbf is a dependency that was submitted as part of the series. The original submission is here: https://lore.kernel.org/all/20221007234151.461779-1-oliver.up...@linux.dev/ I've also submitted the patches to 5.15 LTS here: https://lore.kernel.org/stable/cover.1709665227.git.k...@templeofstupid.com/ Both fixes cherry picked cleanly and there were no conflicts. [Test] Executed the test from 5994bc9e05 as well as my own run of kvm_page_table_test on a VM with 4k pages and a memory size > 100Gb. Without the patches, softlockups were observed in both tests. With the patches applied, the tests ran without incident. This was tested against both LTS 5.15.150 and linux-aws-5.15.0-1055. [Potential Regression] Regression potential is low. These patches have been present in Linux since 6.1 and appear to have needed no further maintenance. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2056227/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2056227] Re: KVM: arm64: softlockups in stage2_apply_range
I've tested linux/5.15.0-104.114 and it passes my tests. Marking verification-done-jammy-linux. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2056227 Title: KVM: arm64: softlockups in stage2_apply_range Status in linux package in Ubuntu: Invalid Status in linux source package in Jammy: Fix Committed Bug description: [Impact] Tearing down kvm VMs on arm64 can cause softlockups to appear on console. When terminating VMs with > 100Gb of memory and 4k pages, the memory unmap times often exceed 20 seconds, which can trigger the softlockup detector. Portions of the unmap path also have interrupts disabled while tlb invalidation instructions run, which can further contribute to latency problems. My team has observed networking latency problems if the cpu where the teardown is occurring is also mapped to handle a NIC interrupt. Fortunately, a solution has been in place since Linux 6.1. A small pair of patches modify stage2_apply_range to operate on smaller memory ranges before performing a cond_resched. With these patches applied, softlockups are no longer observed when tearing down VMs with large amounts of memory. Although I also submitted the patches to 5.15 LTS (link to LTS submission in "Backport" section), I'd appreciate it if Ubuntu were willing to take this submission in parallel since the impact has left us unable to utilize arm64 for kvm until we can either migrate our hypervisors to hugepages, pick up this fix, or some combination of the two. [Backport] Backport the following fixes from linux 6.1: 3b5c082bbf KVM: arm64: Work out supported block level at compile time 5994bc9e05 KVM: arm64: Limit stage2_apply_range() batch size to largest block The fix is in 5994bc9e05 and 3b5c082bbf is a dependency that was submitted as part of the series. The original submission is here: https://lore.kernel.org/all/20221007234151.461779-1-oliver.up...@linux.dev/ I've also submitted the patches to 5.15 LTS here: https://lore.kernel.org/stable/cover.1709665227.git.k...@templeofstupid.com/ Both fixes cherry picked cleanly and there were no conflicts. [Test] Executed the test from 5994bc9e05 as well as my own run of kvm_page_table_test on a VM with 4k pages and a memory size > 100Gb. Without the patches, softlockups were observed in both tests. With the patches applied, the tests ran without incident. This was tested against both LTS 5.15.150 and linux-aws-5.15.0-1055. [Potential Regression] Regression potential is low. These patches have been present in Linux since 6.1 and appear to have needed no further maintenance. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2056227/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2056227] [NEW] KVM: arm64: softlockups in stage2_apply_range
Public bug reported: [Impact] Tearing down kvm VMs on arm64 can cause softlockups to appear on console. When terminating VMs with > 100Gb of memory and 4k pages, the memory unmap times often exceed 20 seconds, which can trigger the softlockup detector. Portions of the unmap path also have interrupts disabled while tlb invalidation instructions run, which can further contribute to latency problems. My team has observed networking latency problems if the cpu where the teardown is occurring is also mapped to handle a NIC interrupt. Fortunately, a solution has been in place since Linux 6.1. A small pair of patches modify stage2_apply_range to operate on smaller memory ranges before performing a cond_resched. With these patches applied, softlockups are no longer observed when tearing down VMs with large amounts of memory. Although I also submitted the patches to 5.15 LTS (link to LTS submission in "Backport" section), I'd appreciate it if Ubuntu were willing to take this submission in parallel since the impact has left us unable to utilize arm64 for kvm until we can either migrate our hypervisors to hugepages, pick up this fix, or some combination of the two. [Backport] Backport the following fixes from linux 6.1: 3b5c082bbf KVM: arm64: Work out supported block level at compile time 5994bc9e05 KVM: arm64: Limit stage2_apply_range() batch size to largest block The fix is in 5994bc9e05 and 3b5c082bbf is a dependency that was submitted as part of the series. The original submission is here: https://lore.kernel.org/all/20221007234151.461779-1-oliver.up...@linux.dev/ I've also submitted the patches to 5.15 LTS here: https://lore.kernel.org/stable/cover.1709665227.git.k...@templeofstupid.com/ Both fixes cherry picked cleanly and there were no conflicts. [Test] Executed the test from 5994bc9e05 as well as my own run of kvm_page_table_test on a VM with 4k pages and a memory size > 100Gb. Without the patches, softlockups were observed in both tests. With the patches applied, the tests ran without incident. This was tested against both LTS 5.15.150 and linux-aws-5.15.0-1055. [Potential Regression] Regression potential is low. These patches have been present in Linux since 6.1 and appear to have needed no further maintenance. ** Affects: linux (Ubuntu) Importance: Undecided Status: New ** Tags: patch -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2056227 Title: KVM: arm64: softlockups in stage2_apply_range Status in linux package in Ubuntu: New Bug description: [Impact] Tearing down kvm VMs on arm64 can cause softlockups to appear on console. When terminating VMs with > 100Gb of memory and 4k pages, the memory unmap times often exceed 20 seconds, which can trigger the softlockup detector. Portions of the unmap path also have interrupts disabled while tlb invalidation instructions run, which can further contribute to latency problems. My team has observed networking latency problems if the cpu where the teardown is occurring is also mapped to handle a NIC interrupt. Fortunately, a solution has been in place since Linux 6.1. A small pair of patches modify stage2_apply_range to operate on smaller memory ranges before performing a cond_resched. With these patches applied, softlockups are no longer observed when tearing down VMs with large amounts of memory. Although I also submitted the patches to 5.15 LTS (link to LTS submission in "Backport" section), I'd appreciate it if Ubuntu were willing to take this submission in parallel since the impact has left us unable to utilize arm64 for kvm until we can either migrate our hypervisors to hugepages, pick up this fix, or some combination of the two. [Backport] Backport the following fixes from linux 6.1: 3b5c082bbf KVM: arm64: Work out supported block level at compile time 5994bc9e05 KVM: arm64: Limit stage2_apply_range() batch size to largest block The fix is in 5994bc9e05 and 3b5c082bbf is a dependency that was submitted as part of the series. The original submission is here: https://lore.kernel.org/all/20221007234151.461779-1-oliver.up...@linux.dev/ I've also submitted the patches to 5.15 LTS here: https://lore.kernel.org/stable/cover.1709665227.git.k...@templeofstupid.com/ Both fixes cherry picked cleanly and there were no conflicts. [Test] Executed the test from 5994bc9e05 as well as my own run of kvm_page_table_test on a VM with 4k pages and a memory size > 100Gb. Without the patches, softlockups were observed in both tests. With the patches applied, the tests ran without incident. This was tested against both LTS 5.15.150 and linux-aws-5.15.0-1055. [Potential Regression] Regression potential is low. These patches have been present in Linux since 6
[Kernel-packages] [Bug 2056227] Re: KVM: arm64: softlockups in stage2_apply_range
This specifically affects Jammy and the 5.15 series. I have the necessary patches prepared and will e-mail those to the kernel team's mailing list. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2056227 Title: KVM: arm64: softlockups in stage2_apply_range Status in linux package in Ubuntu: New Bug description: [Impact] Tearing down kvm VMs on arm64 can cause softlockups to appear on console. When terminating VMs with > 100Gb of memory and 4k pages, the memory unmap times often exceed 20 seconds, which can trigger the softlockup detector. Portions of the unmap path also have interrupts disabled while tlb invalidation instructions run, which can further contribute to latency problems. My team has observed networking latency problems if the cpu where the teardown is occurring is also mapped to handle a NIC interrupt. Fortunately, a solution has been in place since Linux 6.1. A small pair of patches modify stage2_apply_range to operate on smaller memory ranges before performing a cond_resched. With these patches applied, softlockups are no longer observed when tearing down VMs with large amounts of memory. Although I also submitted the patches to 5.15 LTS (link to LTS submission in "Backport" section), I'd appreciate it if Ubuntu were willing to take this submission in parallel since the impact has left us unable to utilize arm64 for kvm until we can either migrate our hypervisors to hugepages, pick up this fix, or some combination of the two. [Backport] Backport the following fixes from linux 6.1: 3b5c082bbf KVM: arm64: Work out supported block level at compile time 5994bc9e05 KVM: arm64: Limit stage2_apply_range() batch size to largest block The fix is in 5994bc9e05 and 3b5c082bbf is a dependency that was submitted as part of the series. The original submission is here: https://lore.kernel.org/all/20221007234151.461779-1-oliver.up...@linux.dev/ I've also submitted the patches to 5.15 LTS here: https://lore.kernel.org/stable/cover.1709665227.git.k...@templeofstupid.com/ Both fixes cherry picked cleanly and there were no conflicts. [Test] Executed the test from 5994bc9e05 as well as my own run of kvm_page_table_test on a VM with 4k pages and a memory size > 100Gb. Without the patches, softlockups were observed in both tests. With the patches applied, the tests ran without incident. This was tested against both LTS 5.15.150 and linux-aws-5.15.0-1055. [Potential Regression] Regression potential is low. These patches have been present in Linux since 6.1 and appear to have needed no further maintenance. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2056227/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2056227] Re: KVM: arm64: softlockups in stage2_apply_range
Patches to mailing list here: https://lists.ubuntu.com/archives/kernel-team/2024-March/149383.html -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2056227 Title: KVM: arm64: softlockups in stage2_apply_range Status in linux package in Ubuntu: New Bug description: [Impact] Tearing down kvm VMs on arm64 can cause softlockups to appear on console. When terminating VMs with > 100Gb of memory and 4k pages, the memory unmap times often exceed 20 seconds, which can trigger the softlockup detector. Portions of the unmap path also have interrupts disabled while tlb invalidation instructions run, which can further contribute to latency problems. My team has observed networking latency problems if the cpu where the teardown is occurring is also mapped to handle a NIC interrupt. Fortunately, a solution has been in place since Linux 6.1. A small pair of patches modify stage2_apply_range to operate on smaller memory ranges before performing a cond_resched. With these patches applied, softlockups are no longer observed when tearing down VMs with large amounts of memory. Although I also submitted the patches to 5.15 LTS (link to LTS submission in "Backport" section), I'd appreciate it if Ubuntu were willing to take this submission in parallel since the impact has left us unable to utilize arm64 for kvm until we can either migrate our hypervisors to hugepages, pick up this fix, or some combination of the two. [Backport] Backport the following fixes from linux 6.1: 3b5c082bbf KVM: arm64: Work out supported block level at compile time 5994bc9e05 KVM: arm64: Limit stage2_apply_range() batch size to largest block The fix is in 5994bc9e05 and 3b5c082bbf is a dependency that was submitted as part of the series. The original submission is here: https://lore.kernel.org/all/20221007234151.461779-1-oliver.up...@linux.dev/ I've also submitted the patches to 5.15 LTS here: https://lore.kernel.org/stable/cover.1709665227.git.k...@templeofstupid.com/ Both fixes cherry picked cleanly and there were no conflicts. [Test] Executed the test from 5994bc9e05 as well as my own run of kvm_page_table_test on a VM with 4k pages and a memory size > 100Gb. Without the patches, softlockups were observed in both tests. With the patches applied, the tests ran without incident. This was tested against both LTS 5.15.150 and linux-aws-5.15.0-1055. [Potential Regression] Regression potential is low. These patches have been present in Linux since 6.1 and appear to have needed no further maintenance. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2056227/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1987232] [NEW] WARN in trace_event_dyn_put_ref
Public bug reported: I have systems that are regularly hitting a WARN in trace_event_dyn_put_ref. The exact message is: WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46 +trace_event_dyn_put_ref+0x15/0x20 With the following stacktrace: perf_trace_init+0x8f/0xd0 perf_tp_event_init+0x1f/0x40 perf_try_init_event+0x4a/0x130 perf_event_alloc+0x497/0xf40 __do_sys_perf_event_open+0x1d4/0xf70 __x64_sys_perf_event_open+0x20/0x30 do_syscall_64+0x5c/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae I've debugged this and worked with upstream to get a fix into Linux. It was recently merged in 6.0-rc2. See here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0-rc2&id=7249921d94ff64f67b733eca0b68853a62032b3d The problem started appearing as soon as our systems picked up the linux-aws-5.15 branch for Focal. (That was 5.15.0-1015-aws, if memory serves). Could you please cherry pick this fix and pull it back to the the linux and linux-aws kernels for Focal? There's test here: https://lore.kernel.org/all/cover.1660347763.git.k...@templeofstupid.com/ that reproduces the problem very reliably for me. With the patch applied, I no longer get the WARNs. ** Affects: linux (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1987232 Title: WARN in trace_event_dyn_put_ref Status in linux package in Ubuntu: New Bug description: I have systems that are regularly hitting a WARN in trace_event_dyn_put_ref. The exact message is: WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46 +trace_event_dyn_put_ref+0x15/0x20 With the following stacktrace: perf_trace_init+0x8f/0xd0 perf_tp_event_init+0x1f/0x40 perf_try_init_event+0x4a/0x130 perf_event_alloc+0x497/0xf40 __do_sys_perf_event_open+0x1d4/0xf70 __x64_sys_perf_event_open+0x20/0x30 do_syscall_64+0x5c/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae I've debugged this and worked with upstream to get a fix into Linux. It was recently merged in 6.0-rc2. See here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0-rc2&id=7249921d94ff64f67b733eca0b68853a62032b3d The problem started appearing as soon as our systems picked up the linux-aws-5.15 branch for Focal. (That was 5.15.0-1015-aws, if memory serves). Could you please cherry pick this fix and pull it back to the the linux and linux-aws kernels for Focal? There's test here: https://lore.kernel.org/all/cover.1660347763.git.k...@templeofstupid.com/ that reproduces the problem very reliably for me. With the patch applied, I no longer get the WARNs. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1987232/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1987232] Re: WARN in trace_event_dyn_put_ref
apport information ** Tags added: apport-collected focal uec-images ** Description changed: I have systems that are regularly hitting a WARN in trace_event_dyn_put_ref. The exact message is: WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46 +trace_event_dyn_put_ref+0x15/0x20 With the following stacktrace: perf_trace_init+0x8f/0xd0 perf_tp_event_init+0x1f/0x40 perf_try_init_event+0x4a/0x130 perf_event_alloc+0x497/0xf40 __do_sys_perf_event_open+0x1d4/0xf70 __x64_sys_perf_event_open+0x20/0x30 do_syscall_64+0x5c/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae I've debugged this and worked with upstream to get a fix into Linux. It was recently merged in 6.0-rc2. See here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0-rc2&id=7249921d94ff64f67b733eca0b68853a62032b3d - The problem started appearing as soon as our systems picked up the - linux-aws-5.15 branch for Focal. (That was 5.15.0-1015-aws, if memory - serves). Could you please cherry pick this fix and pull it back to the - the linux and linux-aws kernels for Focal? There's test here: - https://lore.kernel.org/all/cover.1660347763.git.k...@templeofstupid.com/ - that reproduces the problem very reliably for me. With the patch - applied, I no longer get the WARNs. + The problem started appearing as soon as our systems picked up the linux-aws-5.15 branch for Focal. (That was 5.15.0-1015-aws, if memory serves). Could you please cherry pick this fix and pull it back to the the linux and linux-aws kernels for Focal? There's test here: https://lore.kernel.org/all/cover.1660347763.git.k...@templeofstupid.com/ that reproduces the problem very reliably for me. With the patch applied, I no longer get the WARNs. + --- + ProblemType: Bug + AlsaDevices: + total 0 + crw-rw 1 root audio 116, 1 Aug 22 17:32 seq + crw-rw 1 root audio 116, 33 Aug 22 17:32 timer + AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' + ApportVersion: 2.20.11-0ubuntu27.24 + Architecture: amd64 + ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' + AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: + CRDA: N/A + CasperMD5CheckResult: skip + DistroRelease: Ubuntu 20.04 + IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' + Lsusb: Error: command ['lsusb'] failed with exit code 1: + Lsusb-t: + + Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1: + MachineType: Amazon EC2 c5d.12xlarge + Package: linux (not installed) + PciMultimedia: + + ProcEnviron: + TERM=xterm-256color + PATH=(custom, no user) + LANG=C.UTF-8 + SHELL=/bin/bash + ProcFB: + + ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-1015-aws root=PARTUUID=4986e35b-1bd5-45d3-b528-fa2edb861a38 ro console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295 panic=-1 + ProcVersionSignature: Ubuntu 5.15.0-1015.19~20.04.1-aws 5.15.39 + RelatedPackageVersions: + linux-restricted-modules-5.15.0-1015-aws N/A + linux-backports-modules-5.15.0-1015-aws N/A + linux-firmware N/A + RfKill: Error: [Errno 2] No such file or directory: 'rfkill' + Tags: focal uec-images + Uname: Linux 5.15.0-1015-aws x86_64 + UnreportableReason: This report is about a package that is not installed. + UpgradeStatus: No upgrade log present (probably fresh install) + UserGroups: N/A + _MarkForUpload: False + dmi.bios.date: 10/16/2017 + dmi.bios.release: 1.0 + dmi.bios.vendor: Amazon EC2 + dmi.bios.version: 1.0 + dmi.board.asset.tag: i-03f5d8581c7ad94aa + dmi.board.vendor: Amazon EC2 + dmi.chassis.asset.tag: Amazon EC2 + dmi.chassis.type: 1 + dmi.chassis.vendor: Amazon EC2 + dmi.modalias: dmi:bvnAmazonEC2:bvr1.0:bd10/16/2017:br1.0:svnAmazonEC2:pnc5d.12xlarge:pvr:rvnAmazonEC2:rn:rvr:cvnAmazonEC2:ct1:cvr:sku: + dmi.product.name: c5d.12xlarge + dmi.sys.vendor: Amazon EC2 ** Attachment added: "CurrentDmesg.txt" https://bugs.launchpad.net/bugs/1987232/+attachment/5610807/+files/CurrentDmesg.txt -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1987232 Title: WARN in trace_event_dyn_put_ref Status in linux package in Ubuntu: Confirmed Bug description: I have systems that are regularly hitting a WARN in trace_event_dyn_put_ref. The exact message is: WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46 +trace_event_dyn_put_ref+0x15/0x20 With the following stacktrace: perf_trace_init+0x8f/0xd0 perf_tp_event_init+0x1f/0x40 perf_try_init_event+0x4a/0x130 perf_event_alloc+0x497/0xf40 __do_sys_perf_event_open+0x1d4/0xf70 __x64_sys_perf_event_open+0x20/0x30 do_syscall_64+0x5c/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae I've debugged this and worked with upstream to get a fix into Linux. It was recently merged in 6.0-rc2. See h
[Kernel-packages] [Bug 1987232] Lspci-vt.txt
apport information ** Attachment added: "Lspci-vt.txt" https://bugs.launchpad.net/bugs/1987232/+attachment/5610809/+files/Lspci-vt.txt -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1987232 Title: WARN in trace_event_dyn_put_ref Status in linux package in Ubuntu: Confirmed Bug description: I have systems that are regularly hitting a WARN in trace_event_dyn_put_ref. The exact message is: WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46 +trace_event_dyn_put_ref+0x15/0x20 With the following stacktrace: perf_trace_init+0x8f/0xd0 perf_tp_event_init+0x1f/0x40 perf_try_init_event+0x4a/0x130 perf_event_alloc+0x497/0xf40 __do_sys_perf_event_open+0x1d4/0xf70 __x64_sys_perf_event_open+0x20/0x30 do_syscall_64+0x5c/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae I've debugged this and worked with upstream to get a fix into Linux. It was recently merged in 6.0-rc2. See here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0-rc2&id=7249921d94ff64f67b733eca0b68853a62032b3d The problem started appearing as soon as our systems picked up the linux-aws-5.15 branch for Focal. (That was 5.15.0-1015-aws, if memory serves). Could you please cherry pick this fix and pull it back to the the linux and linux-aws kernels for Focal? There's test here: https://lore.kernel.org/all/cover.1660347763.git.k...@templeofstupid.com/ that reproduces the problem very reliably for me. With the patch applied, I no longer get the WARNs. --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Aug 22 17:32 seq crw-rw 1 root audio 116, 33 Aug 22 17:32 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu27.24 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A CasperMD5CheckResult: skip DistroRelease: Ubuntu 20.04 IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' Lsusb: Error: command ['lsusb'] failed with exit code 1: Lsusb-t: Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1: MachineType: Amazon EC2 c5d.12xlarge Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=C.UTF-8 SHELL=/bin/bash ProcFB: ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-1015-aws root=PARTUUID=4986e35b-1bd5-45d3-b528-fa2edb861a38 ro console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295 panic=-1 ProcVersionSignature: Ubuntu 5.15.0-1015.19~20.04.1-aws 5.15.39 RelatedPackageVersions: linux-restricted-modules-5.15.0-1015-aws N/A linux-backports-modules-5.15.0-1015-aws N/A linux-firmware N/A RfKill: Error: [Errno 2] No such file or directory: 'rfkill' Tags: focal uec-images Uname: Linux 5.15.0-1015-aws x86_64 UnreportableReason: This report is about a package that is not installed. UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: N/A _MarkForUpload: False dmi.bios.date: 10/16/2017 dmi.bios.release: 1.0 dmi.bios.vendor: Amazon EC2 dmi.bios.version: 1.0 dmi.board.asset.tag: i-03f5d8581c7ad94aa dmi.board.vendor: Amazon EC2 dmi.chassis.asset.tag: Amazon EC2 dmi.chassis.type: 1 dmi.chassis.vendor: Amazon EC2 dmi.modalias: dmi:bvnAmazonEC2:bvr1.0:bd10/16/2017:br1.0:svnAmazonEC2:pnc5d.12xlarge:pvr:rvnAmazonEC2:rn:rvr:cvnAmazonEC2:ct1:cvr:sku: dmi.product.name: c5d.12xlarge dmi.sys.vendor: Amazon EC2 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1987232/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1987232] Lspci.txt
apport information ** Attachment added: "Lspci.txt" https://bugs.launchpad.net/bugs/1987232/+attachment/5610808/+files/Lspci.txt -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1987232 Title: WARN in trace_event_dyn_put_ref Status in linux package in Ubuntu: Confirmed Bug description: I have systems that are regularly hitting a WARN in trace_event_dyn_put_ref. The exact message is: WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46 +trace_event_dyn_put_ref+0x15/0x20 With the following stacktrace: perf_trace_init+0x8f/0xd0 perf_tp_event_init+0x1f/0x40 perf_try_init_event+0x4a/0x130 perf_event_alloc+0x497/0xf40 __do_sys_perf_event_open+0x1d4/0xf70 __x64_sys_perf_event_open+0x20/0x30 do_syscall_64+0x5c/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae I've debugged this and worked with upstream to get a fix into Linux. It was recently merged in 6.0-rc2. See here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0-rc2&id=7249921d94ff64f67b733eca0b68853a62032b3d The problem started appearing as soon as our systems picked up the linux-aws-5.15 branch for Focal. (That was 5.15.0-1015-aws, if memory serves). Could you please cherry pick this fix and pull it back to the the linux and linux-aws kernels for Focal? There's test here: https://lore.kernel.org/all/cover.1660347763.git.k...@templeofstupid.com/ that reproduces the problem very reliably for me. With the patch applied, I no longer get the WARNs. --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Aug 22 17:32 seq crw-rw 1 root audio 116, 33 Aug 22 17:32 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu27.24 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A CasperMD5CheckResult: skip DistroRelease: Ubuntu 20.04 IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' Lsusb: Error: command ['lsusb'] failed with exit code 1: Lsusb-t: Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1: MachineType: Amazon EC2 c5d.12xlarge Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=C.UTF-8 SHELL=/bin/bash ProcFB: ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-1015-aws root=PARTUUID=4986e35b-1bd5-45d3-b528-fa2edb861a38 ro console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295 panic=-1 ProcVersionSignature: Ubuntu 5.15.0-1015.19~20.04.1-aws 5.15.39 RelatedPackageVersions: linux-restricted-modules-5.15.0-1015-aws N/A linux-backports-modules-5.15.0-1015-aws N/A linux-firmware N/A RfKill: Error: [Errno 2] No such file or directory: 'rfkill' Tags: focal uec-images Uname: Linux 5.15.0-1015-aws x86_64 UnreportableReason: This report is about a package that is not installed. UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: N/A _MarkForUpload: False dmi.bios.date: 10/16/2017 dmi.bios.release: 1.0 dmi.bios.vendor: Amazon EC2 dmi.bios.version: 1.0 dmi.board.asset.tag: i-03f5d8581c7ad94aa dmi.board.vendor: Amazon EC2 dmi.chassis.asset.tag: Amazon EC2 dmi.chassis.type: 1 dmi.chassis.vendor: Amazon EC2 dmi.modalias: dmi:bvnAmazonEC2:bvr1.0:bd10/16/2017:br1.0:svnAmazonEC2:pnc5d.12xlarge:pvr:rvnAmazonEC2:rn:rvr:cvnAmazonEC2:ct1:cvr:sku: dmi.product.name: c5d.12xlarge dmi.sys.vendor: Amazon EC2 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1987232/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1987232] ProcModules.txt
apport information ** Attachment added: "ProcModules.txt" https://bugs.launchpad.net/bugs/1987232/+attachment/5610812/+files/ProcModules.txt -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1987232 Title: WARN in trace_event_dyn_put_ref Status in linux package in Ubuntu: Confirmed Bug description: I have systems that are regularly hitting a WARN in trace_event_dyn_put_ref. The exact message is: WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46 +trace_event_dyn_put_ref+0x15/0x20 With the following stacktrace: perf_trace_init+0x8f/0xd0 perf_tp_event_init+0x1f/0x40 perf_try_init_event+0x4a/0x130 perf_event_alloc+0x497/0xf40 __do_sys_perf_event_open+0x1d4/0xf70 __x64_sys_perf_event_open+0x20/0x30 do_syscall_64+0x5c/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae I've debugged this and worked with upstream to get a fix into Linux. It was recently merged in 6.0-rc2. See here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0-rc2&id=7249921d94ff64f67b733eca0b68853a62032b3d The problem started appearing as soon as our systems picked up the linux-aws-5.15 branch for Focal. (That was 5.15.0-1015-aws, if memory serves). Could you please cherry pick this fix and pull it back to the the linux and linux-aws kernels for Focal? There's test here: https://lore.kernel.org/all/cover.1660347763.git.k...@templeofstupid.com/ that reproduces the problem very reliably for me. With the patch applied, I no longer get the WARNs. --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Aug 22 17:32 seq crw-rw 1 root audio 116, 33 Aug 22 17:32 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu27.24 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A CasperMD5CheckResult: skip DistroRelease: Ubuntu 20.04 IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' Lsusb: Error: command ['lsusb'] failed with exit code 1: Lsusb-t: Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1: MachineType: Amazon EC2 c5d.12xlarge Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=C.UTF-8 SHELL=/bin/bash ProcFB: ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-1015-aws root=PARTUUID=4986e35b-1bd5-45d3-b528-fa2edb861a38 ro console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295 panic=-1 ProcVersionSignature: Ubuntu 5.15.0-1015.19~20.04.1-aws 5.15.39 RelatedPackageVersions: linux-restricted-modules-5.15.0-1015-aws N/A linux-backports-modules-5.15.0-1015-aws N/A linux-firmware N/A RfKill: Error: [Errno 2] No such file or directory: 'rfkill' Tags: focal uec-images Uname: Linux 5.15.0-1015-aws x86_64 UnreportableReason: This report is about a package that is not installed. UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: N/A _MarkForUpload: False dmi.bios.date: 10/16/2017 dmi.bios.release: 1.0 dmi.bios.vendor: Amazon EC2 dmi.bios.version: 1.0 dmi.board.asset.tag: i-03f5d8581c7ad94aa dmi.board.vendor: Amazon EC2 dmi.chassis.asset.tag: Amazon EC2 dmi.chassis.type: 1 dmi.chassis.vendor: Amazon EC2 dmi.modalias: dmi:bvnAmazonEC2:bvr1.0:bd10/16/2017:br1.0:svnAmazonEC2:pnc5d.12xlarge:pvr:rvnAmazonEC2:rn:rvr:cvnAmazonEC2:ct1:cvr:sku: dmi.product.name: c5d.12xlarge dmi.sys.vendor: Amazon EC2 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1987232/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1987232] ProcInterrupts.txt
apport information ** Attachment added: "ProcInterrupts.txt" https://bugs.launchpad.net/bugs/1987232/+attachment/5610811/+files/ProcInterrupts.txt -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1987232 Title: WARN in trace_event_dyn_put_ref Status in linux package in Ubuntu: Confirmed Bug description: I have systems that are regularly hitting a WARN in trace_event_dyn_put_ref. The exact message is: WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46 +trace_event_dyn_put_ref+0x15/0x20 With the following stacktrace: perf_trace_init+0x8f/0xd0 perf_tp_event_init+0x1f/0x40 perf_try_init_event+0x4a/0x130 perf_event_alloc+0x497/0xf40 __do_sys_perf_event_open+0x1d4/0xf70 __x64_sys_perf_event_open+0x20/0x30 do_syscall_64+0x5c/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae I've debugged this and worked with upstream to get a fix into Linux. It was recently merged in 6.0-rc2. See here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0-rc2&id=7249921d94ff64f67b733eca0b68853a62032b3d The problem started appearing as soon as our systems picked up the linux-aws-5.15 branch for Focal. (That was 5.15.0-1015-aws, if memory serves). Could you please cherry pick this fix and pull it back to the the linux and linux-aws kernels for Focal? There's test here: https://lore.kernel.org/all/cover.1660347763.git.k...@templeofstupid.com/ that reproduces the problem very reliably for me. With the patch applied, I no longer get the WARNs. --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Aug 22 17:32 seq crw-rw 1 root audio 116, 33 Aug 22 17:32 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu27.24 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A CasperMD5CheckResult: skip DistroRelease: Ubuntu 20.04 IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' Lsusb: Error: command ['lsusb'] failed with exit code 1: Lsusb-t: Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1: MachineType: Amazon EC2 c5d.12xlarge Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=C.UTF-8 SHELL=/bin/bash ProcFB: ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-1015-aws root=PARTUUID=4986e35b-1bd5-45d3-b528-fa2edb861a38 ro console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295 panic=-1 ProcVersionSignature: Ubuntu 5.15.0-1015.19~20.04.1-aws 5.15.39 RelatedPackageVersions: linux-restricted-modules-5.15.0-1015-aws N/A linux-backports-modules-5.15.0-1015-aws N/A linux-firmware N/A RfKill: Error: [Errno 2] No such file or directory: 'rfkill' Tags: focal uec-images Uname: Linux 5.15.0-1015-aws x86_64 UnreportableReason: This report is about a package that is not installed. UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: N/A _MarkForUpload: False dmi.bios.date: 10/16/2017 dmi.bios.release: 1.0 dmi.bios.vendor: Amazon EC2 dmi.bios.version: 1.0 dmi.board.asset.tag: i-03f5d8581c7ad94aa dmi.board.vendor: Amazon EC2 dmi.chassis.asset.tag: Amazon EC2 dmi.chassis.type: 1 dmi.chassis.vendor: Amazon EC2 dmi.modalias: dmi:bvnAmazonEC2:bvr1.0:bd10/16/2017:br1.0:svnAmazonEC2:pnc5d.12xlarge:pvr:rvnAmazonEC2:rn:rvr:cvnAmazonEC2:ct1:cvr:sku: dmi.product.name: c5d.12xlarge dmi.sys.vendor: Amazon EC2 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1987232/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1987232] ProcCpuinfoMinimal.txt
apport information ** Attachment added: "ProcCpuinfoMinimal.txt" https://bugs.launchpad.net/bugs/1987232/+attachment/5610810/+files/ProcCpuinfoMinimal.txt -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1987232 Title: WARN in trace_event_dyn_put_ref Status in linux package in Ubuntu: Confirmed Bug description: I have systems that are regularly hitting a WARN in trace_event_dyn_put_ref. The exact message is: WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46 +trace_event_dyn_put_ref+0x15/0x20 With the following stacktrace: perf_trace_init+0x8f/0xd0 perf_tp_event_init+0x1f/0x40 perf_try_init_event+0x4a/0x130 perf_event_alloc+0x497/0xf40 __do_sys_perf_event_open+0x1d4/0xf70 __x64_sys_perf_event_open+0x20/0x30 do_syscall_64+0x5c/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae I've debugged this and worked with upstream to get a fix into Linux. It was recently merged in 6.0-rc2. See here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0-rc2&id=7249921d94ff64f67b733eca0b68853a62032b3d The problem started appearing as soon as our systems picked up the linux-aws-5.15 branch for Focal. (That was 5.15.0-1015-aws, if memory serves). Could you please cherry pick this fix and pull it back to the the linux and linux-aws kernels for Focal? There's test here: https://lore.kernel.org/all/cover.1660347763.git.k...@templeofstupid.com/ that reproduces the problem very reliably for me. With the patch applied, I no longer get the WARNs. --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Aug 22 17:32 seq crw-rw 1 root audio 116, 33 Aug 22 17:32 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu27.24 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A CasperMD5CheckResult: skip DistroRelease: Ubuntu 20.04 IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' Lsusb: Error: command ['lsusb'] failed with exit code 1: Lsusb-t: Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1: MachineType: Amazon EC2 c5d.12xlarge Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=C.UTF-8 SHELL=/bin/bash ProcFB: ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-1015-aws root=PARTUUID=4986e35b-1bd5-45d3-b528-fa2edb861a38 ro console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295 panic=-1 ProcVersionSignature: Ubuntu 5.15.0-1015.19~20.04.1-aws 5.15.39 RelatedPackageVersions: linux-restricted-modules-5.15.0-1015-aws N/A linux-backports-modules-5.15.0-1015-aws N/A linux-firmware N/A RfKill: Error: [Errno 2] No such file or directory: 'rfkill' Tags: focal uec-images Uname: Linux 5.15.0-1015-aws x86_64 UnreportableReason: This report is about a package that is not installed. UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: N/A _MarkForUpload: False dmi.bios.date: 10/16/2017 dmi.bios.release: 1.0 dmi.bios.vendor: Amazon EC2 dmi.bios.version: 1.0 dmi.board.asset.tag: i-03f5d8581c7ad94aa dmi.board.vendor: Amazon EC2 dmi.chassis.asset.tag: Amazon EC2 dmi.chassis.type: 1 dmi.chassis.vendor: Amazon EC2 dmi.modalias: dmi:bvnAmazonEC2:bvr1.0:bd10/16/2017:br1.0:svnAmazonEC2:pnc5d.12xlarge:pvr:rvnAmazonEC2:rn:rvr:cvnAmazonEC2:ct1:cvr:sku: dmi.product.name: c5d.12xlarge dmi.sys.vendor: Amazon EC2 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1987232/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1987232] UdevDb.txt
apport information ** Attachment added: "UdevDb.txt" https://bugs.launchpad.net/bugs/1987232/+attachment/5610813/+files/UdevDb.txt -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1987232 Title: WARN in trace_event_dyn_put_ref Status in linux package in Ubuntu: Confirmed Bug description: I have systems that are regularly hitting a WARN in trace_event_dyn_put_ref. The exact message is: WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46 +trace_event_dyn_put_ref+0x15/0x20 With the following stacktrace: perf_trace_init+0x8f/0xd0 perf_tp_event_init+0x1f/0x40 perf_try_init_event+0x4a/0x130 perf_event_alloc+0x497/0xf40 __do_sys_perf_event_open+0x1d4/0xf70 __x64_sys_perf_event_open+0x20/0x30 do_syscall_64+0x5c/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae I've debugged this and worked with upstream to get a fix into Linux. It was recently merged in 6.0-rc2. See here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0-rc2&id=7249921d94ff64f67b733eca0b68853a62032b3d The problem started appearing as soon as our systems picked up the linux-aws-5.15 branch for Focal. (That was 5.15.0-1015-aws, if memory serves). Could you please cherry pick this fix and pull it back to the the linux and linux-aws kernels for Focal? There's test here: https://lore.kernel.org/all/cover.1660347763.git.k...@templeofstupid.com/ that reproduces the problem very reliably for me. With the patch applied, I no longer get the WARNs. --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Aug 22 17:32 seq crw-rw 1 root audio 116, 33 Aug 22 17:32 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu27.24 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A CasperMD5CheckResult: skip DistroRelease: Ubuntu 20.04 IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' Lsusb: Error: command ['lsusb'] failed with exit code 1: Lsusb-t: Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1: MachineType: Amazon EC2 c5d.12xlarge Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=C.UTF-8 SHELL=/bin/bash ProcFB: ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-1015-aws root=PARTUUID=4986e35b-1bd5-45d3-b528-fa2edb861a38 ro console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295 panic=-1 ProcVersionSignature: Ubuntu 5.15.0-1015.19~20.04.1-aws 5.15.39 RelatedPackageVersions: linux-restricted-modules-5.15.0-1015-aws N/A linux-backports-modules-5.15.0-1015-aws N/A linux-firmware N/A RfKill: Error: [Errno 2] No such file or directory: 'rfkill' Tags: focal uec-images Uname: Linux 5.15.0-1015-aws x86_64 UnreportableReason: This report is about a package that is not installed. UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: N/A _MarkForUpload: False dmi.bios.date: 10/16/2017 dmi.bios.release: 1.0 dmi.bios.vendor: Amazon EC2 dmi.bios.version: 1.0 dmi.board.asset.tag: i-03f5d8581c7ad94aa dmi.board.vendor: Amazon EC2 dmi.chassis.asset.tag: Amazon EC2 dmi.chassis.type: 1 dmi.chassis.vendor: Amazon EC2 dmi.modalias: dmi:bvnAmazonEC2:bvr1.0:bd10/16/2017:br1.0:svnAmazonEC2:pnc5d.12xlarge:pvr:rvnAmazonEC2:rn:rvr:cvnAmazonEC2:ct1:cvr:sku: dmi.product.name: c5d.12xlarge dmi.sys.vendor: Amazon EC2 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1987232/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1987232] Re: WARN in trace_event_dyn_put_ref
The fix has also been added to the Stable queue for 5.15 and 5.19 as of this morning: https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable- queue.git/tree/queue-5.19/tracing-perf-fix-double-put-of-trace-event- when-init-fails.patch https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable- queue.git/tree/queue-5.15/tracing-perf-fix-double-put-of-trace-event- when-init-fails.patch -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1987232 Title: WARN in trace_event_dyn_put_ref Status in linux package in Ubuntu: Confirmed Bug description: I have systems that are regularly hitting a WARN in trace_event_dyn_put_ref. The exact message is: WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46 +trace_event_dyn_put_ref+0x15/0x20 With the following stacktrace: perf_trace_init+0x8f/0xd0 perf_tp_event_init+0x1f/0x40 perf_try_init_event+0x4a/0x130 perf_event_alloc+0x497/0xf40 __do_sys_perf_event_open+0x1d4/0xf70 __x64_sys_perf_event_open+0x20/0x30 do_syscall_64+0x5c/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae I've debugged this and worked with upstream to get a fix into Linux. It was recently merged in 6.0-rc2. See here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0-rc2&id=7249921d94ff64f67b733eca0b68853a62032b3d The problem started appearing as soon as our systems picked up the linux-aws-5.15 branch for Focal. (That was 5.15.0-1015-aws, if memory serves). Could you please cherry pick this fix and pull it back to the the linux and linux-aws kernels for Focal? There's test here: https://lore.kernel.org/all/cover.1660347763.git.k...@templeofstupid.com/ that reproduces the problem very reliably for me. With the patch applied, I no longer get the WARNs. --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Aug 22 17:32 seq crw-rw 1 root audio 116, 33 Aug 22 17:32 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu27.24 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A CasperMD5CheckResult: skip DistroRelease: Ubuntu 20.04 IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' Lsusb: Error: command ['lsusb'] failed with exit code 1: Lsusb-t: Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1: MachineType: Amazon EC2 c5d.12xlarge Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=C.UTF-8 SHELL=/bin/bash ProcFB: ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-1015-aws root=PARTUUID=4986e35b-1bd5-45d3-b528-fa2edb861a38 ro console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295 panic=-1 ProcVersionSignature: Ubuntu 5.15.0-1015.19~20.04.1-aws 5.15.39 RelatedPackageVersions: linux-restricted-modules-5.15.0-1015-aws N/A linux-backports-modules-5.15.0-1015-aws N/A linux-firmware N/A RfKill: Error: [Errno 2] No such file or directory: 'rfkill' Tags: focal uec-images Uname: Linux 5.15.0-1015-aws x86_64 UnreportableReason: This report is about a package that is not installed. UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: N/A _MarkForUpload: False dmi.bios.date: 10/16/2017 dmi.bios.release: 1.0 dmi.bios.vendor: Amazon EC2 dmi.bios.version: 1.0 dmi.board.asset.tag: i-03f5d8581c7ad94aa dmi.board.vendor: Amazon EC2 dmi.chassis.asset.tag: Amazon EC2 dmi.chassis.type: 1 dmi.chassis.vendor: Amazon EC2 dmi.modalias: dmi:bvnAmazonEC2:bvr1.0:bd10/16/2017:br1.0:svnAmazonEC2:pnc5d.12xlarge:pvr:rvnAmazonEC2:rn:rvr:cvnAmazonEC2:ct1:cvr:sku: dmi.product.name: c5d.12xlarge dmi.sys.vendor: Amazon EC2 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1987232/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1987232] WifiSyslog.txt
apport information ** Attachment added: "WifiSyslog.txt" https://bugs.launchpad.net/bugs/1987232/+attachment/5610814/+files/WifiSyslog.txt -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1987232 Title: WARN in trace_event_dyn_put_ref Status in linux package in Ubuntu: Confirmed Bug description: I have systems that are regularly hitting a WARN in trace_event_dyn_put_ref. The exact message is: WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46 +trace_event_dyn_put_ref+0x15/0x20 With the following stacktrace: perf_trace_init+0x8f/0xd0 perf_tp_event_init+0x1f/0x40 perf_try_init_event+0x4a/0x130 perf_event_alloc+0x497/0xf40 __do_sys_perf_event_open+0x1d4/0xf70 __x64_sys_perf_event_open+0x20/0x30 do_syscall_64+0x5c/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae I've debugged this and worked with upstream to get a fix into Linux. It was recently merged in 6.0-rc2. See here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0-rc2&id=7249921d94ff64f67b733eca0b68853a62032b3d The problem started appearing as soon as our systems picked up the linux-aws-5.15 branch for Focal. (That was 5.15.0-1015-aws, if memory serves). Could you please cherry pick this fix and pull it back to the the linux and linux-aws kernels for Focal? There's test here: https://lore.kernel.org/all/cover.1660347763.git.k...@templeofstupid.com/ that reproduces the problem very reliably for me. With the patch applied, I no longer get the WARNs. --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Aug 22 17:32 seq crw-rw 1 root audio 116, 33 Aug 22 17:32 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu27.24 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A CasperMD5CheckResult: skip DistroRelease: Ubuntu 20.04 IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' Lsusb: Error: command ['lsusb'] failed with exit code 1: Lsusb-t: Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1: MachineType: Amazon EC2 c5d.12xlarge Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=C.UTF-8 SHELL=/bin/bash ProcFB: ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-1015-aws root=PARTUUID=4986e35b-1bd5-45d3-b528-fa2edb861a38 ro console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295 panic=-1 ProcVersionSignature: Ubuntu 5.15.0-1015.19~20.04.1-aws 5.15.39 RelatedPackageVersions: linux-restricted-modules-5.15.0-1015-aws N/A linux-backports-modules-5.15.0-1015-aws N/A linux-firmware N/A RfKill: Error: [Errno 2] No such file or directory: 'rfkill' Tags: focal uec-images Uname: Linux 5.15.0-1015-aws x86_64 UnreportableReason: This report is about a package that is not installed. UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: N/A _MarkForUpload: False dmi.bios.date: 10/16/2017 dmi.bios.release: 1.0 dmi.bios.vendor: Amazon EC2 dmi.bios.version: 1.0 dmi.board.asset.tag: i-03f5d8581c7ad94aa dmi.board.vendor: Amazon EC2 dmi.chassis.asset.tag: Amazon EC2 dmi.chassis.type: 1 dmi.chassis.vendor: Amazon EC2 dmi.modalias: dmi:bvnAmazonEC2:bvr1.0:bd10/16/2017:br1.0:svnAmazonEC2:pnc5d.12xlarge:pvr:rvnAmazonEC2:rn:rvr:cvnAmazonEC2:ct1:cvr:sku: dmi.product.name: c5d.12xlarge dmi.sys.vendor: Amazon EC2 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1987232/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1987232] acpidump.txt
apport information ** Attachment added: "acpidump.txt" https://bugs.launchpad.net/bugs/1987232/+attachment/5610815/+files/acpidump.txt ** Changed in: linux (Ubuntu) Status: Incomplete => Confirmed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1987232 Title: WARN in trace_event_dyn_put_ref Status in linux package in Ubuntu: Confirmed Bug description: I have systems that are regularly hitting a WARN in trace_event_dyn_put_ref. The exact message is: WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46 +trace_event_dyn_put_ref+0x15/0x20 With the following stacktrace: perf_trace_init+0x8f/0xd0 perf_tp_event_init+0x1f/0x40 perf_try_init_event+0x4a/0x130 perf_event_alloc+0x497/0xf40 __do_sys_perf_event_open+0x1d4/0xf70 __x64_sys_perf_event_open+0x20/0x30 do_syscall_64+0x5c/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae I've debugged this and worked with upstream to get a fix into Linux. It was recently merged in 6.0-rc2. See here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0-rc2&id=7249921d94ff64f67b733eca0b68853a62032b3d The problem started appearing as soon as our systems picked up the linux-aws-5.15 branch for Focal. (That was 5.15.0-1015-aws, if memory serves). Could you please cherry pick this fix and pull it back to the the linux and linux-aws kernels for Focal? There's test here: https://lore.kernel.org/all/cover.1660347763.git.k...@templeofstupid.com/ that reproduces the problem very reliably for me. With the patch applied, I no longer get the WARNs. --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Aug 22 17:32 seq crw-rw 1 root audio 116, 33 Aug 22 17:32 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu27.24 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A CasperMD5CheckResult: skip DistroRelease: Ubuntu 20.04 IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' Lsusb: Error: command ['lsusb'] failed with exit code 1: Lsusb-t: Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1: MachineType: Amazon EC2 c5d.12xlarge Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=C.UTF-8 SHELL=/bin/bash ProcFB: ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-1015-aws root=PARTUUID=4986e35b-1bd5-45d3-b528-fa2edb861a38 ro console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295 panic=-1 ProcVersionSignature: Ubuntu 5.15.0-1015.19~20.04.1-aws 5.15.39 RelatedPackageVersions: linux-restricted-modules-5.15.0-1015-aws N/A linux-backports-modules-5.15.0-1015-aws N/A linux-firmware N/A RfKill: Error: [Errno 2] No such file or directory: 'rfkill' Tags: focal uec-images Uname: Linux 5.15.0-1015-aws x86_64 UnreportableReason: This report is about a package that is not installed. UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: N/A _MarkForUpload: False dmi.bios.date: 10/16/2017 dmi.bios.release: 1.0 dmi.bios.vendor: Amazon EC2 dmi.bios.version: 1.0 dmi.board.asset.tag: i-03f5d8581c7ad94aa dmi.board.vendor: Amazon EC2 dmi.chassis.asset.tag: Amazon EC2 dmi.chassis.type: 1 dmi.chassis.vendor: Amazon EC2 dmi.modalias: dmi:bvnAmazonEC2:bvr1.0:bd10/16/2017:br1.0:svnAmazonEC2:pnc5d.12xlarge:pvr:rvnAmazonEC2:rn:rvr:cvnAmazonEC2:ct1:cvr:sku: dmi.product.name: c5d.12xlarge dmi.sys.vendor: Amazon EC2 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1987232/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1987232] Re: WARN in trace_event_dyn_put_ref
Should this also get nominated as affecting Focal? I hit this on the 5.15 kernel that was attached to linux-aws for Focal. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1987232 Title: WARN in trace_event_dyn_put_ref Status in linux package in Ubuntu: Confirmed Status in linux source package in Jammy: In Progress Status in linux source package in Kinetic: Confirmed Bug description: I have systems that are regularly hitting a WARN in trace_event_dyn_put_ref. The exact message is: WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46 +trace_event_dyn_put_ref+0x15/0x20 With the following stacktrace: perf_trace_init+0x8f/0xd0 perf_tp_event_init+0x1f/0x40 perf_try_init_event+0x4a/0x130 perf_event_alloc+0x497/0xf40 __do_sys_perf_event_open+0x1d4/0xf70 __x64_sys_perf_event_open+0x20/0x30 do_syscall_64+0x5c/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae I've debugged this and worked with upstream to get a fix into Linux. It was recently merged in 6.0-rc2. See here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0-rc2&id=7249921d94ff64f67b733eca0b68853a62032b3d The problem started appearing as soon as our systems picked up the linux-aws-5.15 branch for Focal. (That was 5.15.0-1015-aws, if memory serves). Could you please cherry pick this fix and pull it back to the the linux and linux-aws kernels for Focal? There's test here: https://lore.kernel.org/all/cover.1660347763.git.k...@templeofstupid.com/ that reproduces the problem very reliably for me. With the patch applied, I no longer get the WARNs. --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Aug 22 17:32 seq crw-rw 1 root audio 116, 33 Aug 22 17:32 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu27.24 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A CasperMD5CheckResult: skip DistroRelease: Ubuntu 20.04 IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' Lsusb: Error: command ['lsusb'] failed with exit code 1: Lsusb-t: Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1: MachineType: Amazon EC2 c5d.12xlarge Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=C.UTF-8 SHELL=/bin/bash ProcFB: ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-1015-aws root=PARTUUID=4986e35b-1bd5-45d3-b528-fa2edb861a38 ro console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295 panic=-1 ProcVersionSignature: Ubuntu 5.15.0-1015.19~20.04.1-aws 5.15.39 RelatedPackageVersions: linux-restricted-modules-5.15.0-1015-aws N/A linux-backports-modules-5.15.0-1015-aws N/A linux-firmware N/A RfKill: Error: [Errno 2] No such file or directory: 'rfkill' Tags: focal uec-images Uname: Linux 5.15.0-1015-aws x86_64 UnreportableReason: This report is about a package that is not installed. UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: N/A _MarkForUpload: False dmi.bios.date: 10/16/2017 dmi.bios.release: 1.0 dmi.bios.vendor: Amazon EC2 dmi.bios.version: 1.0 dmi.board.asset.tag: i-03f5d8581c7ad94aa dmi.board.vendor: Amazon EC2 dmi.chassis.asset.tag: Amazon EC2 dmi.chassis.type: 1 dmi.chassis.vendor: Amazon EC2 dmi.modalias: dmi:bvnAmazonEC2:bvr1.0:bd10/16/2017:br1.0:svnAmazonEC2:pnc5d.12xlarge:pvr:rvnAmazonEC2:rn:rvr:cvnAmazonEC2:ct1:cvr:sku: dmi.product.name: c5d.12xlarge dmi.sys.vendor: Amazon EC2 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1987232/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2033122] Re: Request backport of xen timekeeping performance improvements
Thanks, booted both kernels on i3 instances that reported support for invariant tsc and had nomigrate set and was able to validate that both selected the tsc instead of xen as the clocksource. ** Tags removed: verification-needed-jammy-linux verification-needed-lunar-linux ** Tags added: verification-done-jammy-linux verification-done-lunar-linux -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2033122 Title: Request backport of xen timekeeping performance improvements Status in linux package in Ubuntu: Invalid Status in linux source package in Jammy: Fix Committed Status in linux source package in Lunar: Fix Committed Bug description: Users, especially those on EC2, are encouraged to select tsc as their default clocksource. However, this requires manual tuning of the operating system. Kvm can determine if it safe to use the tsc, and will default to that instead of its pvclock when appropriate. This requests a backport of patch does the same for Xen instances. If appropriate, it's fine if this is applied to only the linux-aws branches. Not all Xen EC2 instances advertise explicit nomigrate support, however, on those that do we'll select tsc by default. On the subset of hosts where this is advertised, users will safely default to the more performant clocksource. [Impact] Xen instances default to the xen clocksource which has been documented to be slower. This is required for instances where the tsc is not safe to use, or the guest is subject to migration. On some platforms the performance impact can be high, and users are encouraged to select the tsc when appropriate. Instead of leaving up to users to figure this out by reading a variety of different documents, pick the fast clocksource when it can be determined to be safe to do so. [Backport] Clean cherry pick. No conflicts applying to 5.15 or 6.2. [Test] Booted EC2 xen instances with and without this patch and validated that on those that properly advertised the required criteria via cpuid, that the clocksource defaulted to tsc instead of xen. [Potential Regression] Potential is low, since only absurd configurations could lead to a problem. If this is considered risky, it can be applied to only linux-aws where the documented guidance is for users to enable tsc as the clocksource on Xen. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2033122/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1987232] Re: WARN in trace_event_dyn_put_ref
@Stefan thanks for explaining how the process works. I appreciate your willingness to take this patch ahead of its arrival in the stable pull for the Jammy train. One of your updates mentioned TBD on a test. I have a reproducer in the original cover letter to Steven here, if it helps: https://lore.kernel.org/all/cover.1660347763.git.k...@templeofstupid.com/ -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1987232 Title: WARN in trace_event_dyn_put_ref Status in linux package in Ubuntu: Confirmed Status in linux source package in Jammy: Fix Committed Status in linux source package in Kinetic: Confirmed Bug description: [SRU Justification] Impact: Some imbalanced ref-counting produces kernel warnings regularly. Since it is a warning level, this triggers system monitoring on servers which in turn causes unnecessary work for inspecting the logs. Fix: There is a fix upstream and also backported to the upstream stable branch. However we are still a bit behind catching up with the latest versions. Since this is having quite an impact and the fix is rather straight forward, we pull this in from upstream stable ahead of time. Test case: tbd Regression potential: Regressions would manifest as different errors related to ref-counting. --- I have systems that are regularly hitting a WARN in trace_event_dyn_put_ref. The exact message is: WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46 +trace_event_dyn_put_ref+0x15/0x20 With the following stacktrace: perf_trace_init+0x8f/0xd0 perf_tp_event_init+0x1f/0x40 perf_try_init_event+0x4a/0x130 perf_event_alloc+0x497/0xf40 __do_sys_perf_event_open+0x1d4/0xf70 __x64_sys_perf_event_open+0x20/0x30 do_syscall_64+0x5c/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae I've debugged this and worked with upstream to get a fix into Linux. It was recently merged in 6.0-rc2. See here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0-rc2&id=7249921d94ff64f67b733eca0b68853a62032b3d The problem started appearing as soon as our systems picked up the linux-aws-5.15 branch for Focal. (That was 5.15.0-1015-aws, if memory serves). Could you please cherry pick this fix and pull it back to the the linux and linux-aws kernels for Focal? There's test here: https://lore.kernel.org/all/cover.1660347763.git.k...@templeofstupid.com/ that reproduces the problem very reliably for me. With the patch applied, I no longer get the WARNs. --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Aug 22 17:32 seq crw-rw 1 root audio 116, 33 Aug 22 17:32 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu27.24 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A CasperMD5CheckResult: skip DistroRelease: Ubuntu 20.04 IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' Lsusb: Error: command ['lsusb'] failed with exit code 1: Lsusb-t: Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1: MachineType: Amazon EC2 c5d.12xlarge Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=C.UTF-8 SHELL=/bin/bash ProcFB: ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-1015-aws root=PARTUUID=4986e35b-1bd5-45d3-b528-fa2edb861a38 ro console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295 panic=-1 ProcVersionSignature: Ubuntu 5.15.0-1015.19~20.04.1-aws 5.15.39 RelatedPackageVersions: linux-restricted-modules-5.15.0-1015-aws N/A linux-backports-modules-5.15.0-1015-aws N/A linux-firmware N/A RfKill: Error: [Errno 2] No such file or directory: 'rfkill' Tags: focal uec-images Uname: Linux 5.15.0-1015-aws x86_64 UnreportableReason: This report is about a package that is not installed. UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: N/A _MarkForUpload: False dmi.bios.date: 10/16/2017 dmi.bios.release: 1.0 dmi.bios.vendor: Amazon EC2 dmi.bios.version: 1.0 dmi.board.asset.tag: i-03f5d8581c7ad94aa dmi.board.vendor: Amazon EC2 dmi.chassis.asset.tag: Amazon EC2 dmi.chassis.type: 1 dmi.chassis.vendor: Amazon EC2 dmi.modalias: dmi:bvnAmazonEC2:bvr1.0:bd10/16/2017:br1.0:svnAmazonEC2:pnc5d.12xlarge:pvr:rvnAmazonEC2:rn:rvr:cvnAmazonEC2:ct1:cvr:sku: dmi.product.name: c5d.12xlarge dmi.sys.vendor: Amazon EC2 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1987232/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@li
[Kernel-packages] [Bug 1987232] Re: WARN in trace_event_dyn_put_ref
I ran the original reproducer on a VM that was running linux/5.15.0-50.56 and linux/linux/5.15.0-46.49. On the former the problem did not reproduce, but on the latter it did. Marking this as verified via testing and setting 'verification-done-jammy'. ** Tags added: verification-done-jammy -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1987232 Title: WARN in trace_event_dyn_put_ref Status in linux package in Ubuntu: Confirmed Status in linux source package in Jammy: Fix Committed Status in linux source package in Kinetic: Confirmed Bug description: [SRU Justification] Impact: Some imbalanced ref-counting produces kernel warnings regularly. Since it is a warning level, this triggers system monitoring on servers which in turn causes unnecessary work for inspecting the logs. Fix: There is a fix upstream and also backported to the upstream stable branch. However we are still a bit behind catching up with the latest versions. Since this is having quite an impact and the fix is rather straight forward, we pull this in from upstream stable ahead of time. Test case: tbd Regression potential: Regressions would manifest as different errors related to ref-counting. --- I have systems that are regularly hitting a WARN in trace_event_dyn_put_ref. The exact message is: WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46 +trace_event_dyn_put_ref+0x15/0x20 With the following stacktrace: perf_trace_init+0x8f/0xd0 perf_tp_event_init+0x1f/0x40 perf_try_init_event+0x4a/0x130 perf_event_alloc+0x497/0xf40 __do_sys_perf_event_open+0x1d4/0xf70 __x64_sys_perf_event_open+0x20/0x30 do_syscall_64+0x5c/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae I've debugged this and worked with upstream to get a fix into Linux. It was recently merged in 6.0-rc2. See here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0-rc2&id=7249921d94ff64f67b733eca0b68853a62032b3d The problem started appearing as soon as our systems picked up the linux-aws-5.15 branch for Focal. (That was 5.15.0-1015-aws, if memory serves). Could you please cherry pick this fix and pull it back to the the linux and linux-aws kernels for Focal? There's test here: https://lore.kernel.org/all/cover.1660347763.git.k...@templeofstupid.com/ that reproduces the problem very reliably for me. With the patch applied, I no longer get the WARNs. --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Aug 22 17:32 seq crw-rw 1 root audio 116, 33 Aug 22 17:32 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu27.24 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A CasperMD5CheckResult: skip DistroRelease: Ubuntu 20.04 IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' Lsusb: Error: command ['lsusb'] failed with exit code 1: Lsusb-t: Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1: MachineType: Amazon EC2 c5d.12xlarge Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=C.UTF-8 SHELL=/bin/bash ProcFB: ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-1015-aws root=PARTUUID=4986e35b-1bd5-45d3-b528-fa2edb861a38 ro console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295 panic=-1 ProcVersionSignature: Ubuntu 5.15.0-1015.19~20.04.1-aws 5.15.39 RelatedPackageVersions: linux-restricted-modules-5.15.0-1015-aws N/A linux-backports-modules-5.15.0-1015-aws N/A linux-firmware N/A RfKill: Error: [Errno 2] No such file or directory: 'rfkill' Tags: focal uec-images Uname: Linux 5.15.0-1015-aws x86_64 UnreportableReason: This report is about a package that is not installed. UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: N/A _MarkForUpload: False dmi.bios.date: 10/16/2017 dmi.bios.release: 1.0 dmi.bios.vendor: Amazon EC2 dmi.bios.version: 1.0 dmi.board.asset.tag: i-03f5d8581c7ad94aa dmi.board.vendor: Amazon EC2 dmi.chassis.asset.tag: Amazon EC2 dmi.chassis.type: 1 dmi.chassis.vendor: Amazon EC2 dmi.modalias: dmi:bvnAmazonEC2:bvr1.0:bd10/16/2017:br1.0:svnAmazonEC2:pnc5d.12xlarge:pvr:rvnAmazonEC2:rn:rvr:cvnAmazonEC2:ct1:cvr:sku: dmi.product.name: c5d.12xlarge dmi.sys.vendor: Amazon EC2 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1987232/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages
[Kernel-packages] [Bug 2089373] Re: WARN in trc_wait_for_one_reader about failed IPIs
I've re-run the tests against the proposed kernel and no longer see these warnings. Thanks for taking the patches to fix this! ** Tags removed: verification-needed-jammy-linux ** Tags added: verification-done-jammy-linux -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2089373 Title: WARN in trc_wait_for_one_reader about failed IPIs Status in linux package in Ubuntu: Invalid Status in linux source package in Jammy: Fix Committed Bug description: [Impact] When ending bpf tracing, 5.15 kernels now report a warning in trc_wait_for_one_reader() on platforms that support hot-plugging CPUs, but that do not have all of their hotplug slots populated. In this submitter's environment, it reproduces on Xen EC2 instances, but not Nitro ones. The warning looks like this: kernel: [ 6416.920266] [ cut here ] kernel: [ 6416.920272] trc_wait_for_one_reader(): smp_call_function_single() failed for CPU: 64 kernel: [ 6416.920289] WARNING: CPU: 0 PID: 13 at kernel/rcu/tasks.h:1044 trc_wait_for_one_reader+0x2b8/0x300 kernel: [ 6416.920299] Modules linked in: xt_state xt_connmark nf_conntrack_netlink nfnetlink xt_addrtype xt_statistic xt_nat xt_tcpudp ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs nvidia_uvm(POE) nvidia_drm(POE) drm_kms_helper cec rc_core fb_sys_fops syscopyarea sysfillrect sysimgblt nvidia_modeset(POE) nvidia(POE) iptable_mangle ip6table_mangle ip6table_filter ip6table_nat ip6_tables xt_MASQUERADE xt_conntrack xt_comment iptable_filter xt_mark iptable_nat nf_nat bpfilter aufs overlay udp_diag tcp_diag inet_diag binfmt_misc nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sha256_ssse3 sha1_ssse3 aesni_intel input_leds psmouse crypto_simd cryptd serio_raw floppy sch_fq_codel nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ena drm efi_pstore ip_tables x_tables autofs4 kernel: [ 6416.920368] CPU: 0 PID: 13 Comm: rcu_tasks_trace Tainted: P OE 5.15.0-1071-aws #77~20.04.1-Ubuntu kernel: [ 6416.920372] Hardware name: Xen HVM domU, BIOS 4.11.amazon 08/24/2006 kernel: [ 6416.920374] RIP: 0010:trc_wait_for_one_reader+0x2b8/0x300 kernel: [ 6416.920376] Code: 00 00 00 4c 89 ef e8 37 ac 4e 00 eb 9f 44 89 fa 48 c7 c6 00 63 e2 b8 48 c7 c7 a0 9a 1e b9 c6 05 2f 2e 09 02 01 e8 15 2e b9 00 <0f> 0b e9 31 ff ff ff 4c 89 ee 48 c7 c7 20 df b7 b9 e8 a2 99 52 00 kernel: [ 6416.920380] RSP: 0018:9e048c4efe00 EFLAGS: 00010286 kernel: [ 6416.920382] RAX: RBX: RCX: 0027 kernel: [ 6416.920384] RDX: 0027 RSI: 0003 RDI: 93074ae20588 kernel: [ 6416.920385] RBP: 9e048c4efe28 R08: 93074ae20580 R09: 0001 kernel: [ 6416.920387] R10: 000a R11: 93463feb2c7f R12: 92cbc6a1e600 kernel: [ 6416.920389] R13: 0040 R14: 000205a4 R15: 0040 kernel: [ 6416.920390] FS: () GS:93074ae0() knlGS: kernel: [ 6416.920393] CS: 0010 DS: ES: CR0: 80050033 kernel: [ 6416.920394] CR2: 7f4a72b04098 CR3: 0046c8964001 CR4: 001706f0 kernel: [ 6416.920399] Call Trace: kernel: [ 6416.920401] kernel: [ 6416.920404] ? show_regs.cold+0x1a/0x1f kernel: [ 6416.920410] ? trc_wait_for_one_reader+0x2b8/0x300 kernel: [ 6416.920412] ? __warn+0x8b/0xe0 kernel: [ 6416.920418] ? trc_wait_for_one_reader+0x2b8/0x300 kernel: [ 6416.920421] ? report_bug+0xd5/0x110 kernel: [ 6416.920427] ? handle_bug+0x39/0x90 kernel: [ 6416.920431] ? exc_invalid_op+0x19/0x70 kernel: [ 6416.920434] ? asm_exc_invalid_op+0x1b/0x20 kernel: [ 6416.920442] ? trc_wait_for_one_reader+0x2b8/0x300 kernel: [ 6416.920446] rcu_tasks_trace_postscan+0x47/0x80 kernel: [ 6416.920449] rcu_tasks_wait_gp+0x108/0x210 kernel: [ 6416.920453] rcu_tasks_kthread+0x10f/0x1c0 kernel: [ 6416.920456] ? wait_woken+0x60/0x60 kernel: [ 6416.920462] ? show_rcu_tasks_trace_gp_kthread+0x80/0x80 kernel: [ 6416.920464] kthread+0x12a/0x150 kernel: [ 6416.920471] ? set_kthread_struct+0x50/0x50 kernel: [ 6416.920476] ret_from_fork+0x22/0x30 kernel: [ 6416.920485] kernel: [ 6416.920486] ---[ end trace 0500611ddaff33a7 ]--- The problem appears when: - The system is performing a rcu_tasks_trace grace period wait - The system has more hot plug CPU slots available than are populated - The rcu tasks postscan detects a holdout The problem is actually caused by a mismerge of 9b3c4ab304("sched,rcu: Rework try_invoke_on_locked_down_task()"). When that patch was applied, a conflict around task nesting was improperly resolved and lead to quiescent tasks getting flagged as holdouts. This in turn results in more IPIs than necessary to idle CPUs, as well as WARNs about failing to send IPIs to CPUs tha
[Kernel-packages] [Bug 2104210] Re: uprobe-related panics during profiling
Patches sent to list: https://lists.ubuntu.com/archives/kernel- team/2025-March/158376.html -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2104210 Title: uprobe-related panics during profiling Status in linux package in Ubuntu: Confirmed Bug description: Impact] On systems that utilize both uprobes and perf_events style profiling, it is possible to hit a panic in the uprobe_free_utask code. This occurs during process exit. If the profiler fires while uprobe_free_utask is in the process of cleaning up the utask, the NMI may read freed memory because the cleanup code frees the utask before setting its pointer to NULL. This submitter has encountered the problem on systems running workloads without intentionally trying to trigger the problem. The stacks look something like this: RIP: 0010:is_uprobe_at_func_entry+0x28/0x80 ... ? die_addr+0x36/0x90 ? exc_general_protection+0x217/0x420 ? asm_exc_general_protection+0x26/0x30 ? is_uprobe_at_func_entry+0x28/0x80 perf_callchain_user+0x20a/0x360 get_perf_callchain+0x147/0x1d0 bpf_get_stackid+0x60/0x90 bpf_prog_9aac297fb833e2f5_do_perf_event+0x434/0x53b ? __smp_call_single_queue+0xad/0x120 bpf_overflow_handler+0x75/0x110 ... asm_sysvec_apic_timer_interrupt+0x1a/0x20 RIP: 0010:__kmem_cache_free+0x1cb/0x350 ... ? uprobe_free_utask+0x62/0x80 ? acct_collect+0x4c/0x220 uprobe_free_utask+0x62/0x80 mm_release+0x12/0xb0 do_exit+0x26b/0xaa0 __x64_sys_exit+0x1b/0x20 do_syscall_64+0x5a/0x80 The person who reported the issue upstream provided this reproducer. (Run each command in a separate terminal): # while :; do bpftrace -e 'uprobe:/bin/ls:_start { printf("hit\n"); }' -c ls; done # bpftrace -e 'profile:hz:10 { @[ustack()] = count(); }' However, since the binutils are stripped on some of the releases where I tested this, I ran the following instead: # while :; do bpftrace -e 'uprobe:libc:malloc { printf("hit\n"); }' -c ls; done # bpftrace -e 'profile:hz:10 { @[ustack()] = count(); }' [Backport] The fix is upstream as commit b583ef82b671 ("uprobes: Fix race in uprobe_free_utask") However this patch was massaged by stable for its inclusion in 6.12, 6.6, and 6.1. Instead of re-doing stable's conflict resolution, take the patch directly from 6.6.x instead, at commit eff00c5e29ab. This patch is in stable as of 6.12.19, 6.6.83, and 6.1.131. [Test] I've run the provided reproducer and validated that I can reproduce the problem without the patch applied and that I cannot reproduce it again once I have applied the patch. [Potential Regression] The regression potential here seems quite low. The fix has been upstream for a couple releases and no subsequent issues have been reported. It makes no functional change beyond ensuring that the utask pointer is set to NULL before the utask structure itself is freed. The dereference and free occur on the same cpu. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2104210/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2104210] [NEW] uprobe-related panics during profiling
Public bug reported: Impact] On systems that utilize both uprobes and perf_events style profiling, it is possible to hit a panic in the uprobe_free_utask code. This occurs during process exit. If the profiler fires while uprobe_free_utask is in the process of cleaning up the utask, the NMI may read freed memory because the cleanup code frees the utask before setting its pointer to NULL. This submitter has encountered the problem on systems running workloads without intentionally trying to trigger the problem. The stacks look something like this: RIP: 0010:is_uprobe_at_func_entry+0x28/0x80 ... ? die_addr+0x36/0x90 ? exc_general_protection+0x217/0x420 ? asm_exc_general_protection+0x26/0x30 ? is_uprobe_at_func_entry+0x28/0x80 perf_callchain_user+0x20a/0x360 get_perf_callchain+0x147/0x1d0 bpf_get_stackid+0x60/0x90 bpf_prog_9aac297fb833e2f5_do_perf_event+0x434/0x53b ? __smp_call_single_queue+0xad/0x120 bpf_overflow_handler+0x75/0x110 ... asm_sysvec_apic_timer_interrupt+0x1a/0x20 RIP: 0010:__kmem_cache_free+0x1cb/0x350 ... ? uprobe_free_utask+0x62/0x80 ? acct_collect+0x4c/0x220 uprobe_free_utask+0x62/0x80 mm_release+0x12/0xb0 do_exit+0x26b/0xaa0 __x64_sys_exit+0x1b/0x20 do_syscall_64+0x5a/0x80 The person who reported the issue upstream provided this reproducer. (Run each command in a separate terminal): # while :; do bpftrace -e 'uprobe:/bin/ls:_start { printf("hit\n"); }' -c ls; done # bpftrace -e 'profile:hz:10 { @[ustack()] = count(); }' However, since the binutils are stripped on some of the releases where I tested this, I ran the following instead: # while :; do bpftrace -e 'uprobe:libc:malloc { printf("hit\n"); }' -c ls; done # bpftrace -e 'profile:hz:10 { @[ustack()] = count(); }' [Backport] The fix is upstream as commit b583ef82b671 ("uprobes: Fix race in uprobe_free_utask") However this patch was massaged by stable for its inclusion in 6.12, 6.6, and 6.1. Instead of re-doing stable's conflict resolution, take the patch directly from 6.6.x instead, at commit eff00c5e29ab. This patch is in stable as of 6.12.19, 6.6.83, and 6.1.131. [Test] I've run the provided reproducer and validated that I can reproduce the problem without the patch applied and that I cannot reproduce it again once I have applied the patch. [Potential Regression] The regression potential here seems quite low. The fix has been upstream for a couple releases and no subsequent issues have been reported. It makes no functional change beyond ensuring that the utask pointer is set to NULL before the utask structure itself is freed. The dereference and free occur on the same cpu. ** Affects: linux (Ubuntu) Importance: Undecided Status: New ** Tags: patch patch-accepted-upstream -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2104210 Title: uprobe-related panics during profiling Status in linux package in Ubuntu: New Bug description: Impact] On systems that utilize both uprobes and perf_events style profiling, it is possible to hit a panic in the uprobe_free_utask code. This occurs during process exit. If the profiler fires while uprobe_free_utask is in the process of cleaning up the utask, the NMI may read freed memory because the cleanup code frees the utask before setting its pointer to NULL. This submitter has encountered the problem on systems running workloads without intentionally trying to trigger the problem. The stacks look something like this: RIP: 0010:is_uprobe_at_func_entry+0x28/0x80 ... ? die_addr+0x36/0x90 ? exc_general_protection+0x217/0x420 ? asm_exc_general_protection+0x26/0x30 ? is_uprobe_at_func_entry+0x28/0x80 perf_callchain_user+0x20a/0x360 get_perf_callchain+0x147/0x1d0 bpf_get_stackid+0x60/0x90 bpf_prog_9aac297fb833e2f5_do_perf_event+0x434/0x53b ? __smp_call_single_queue+0xad/0x120 bpf_overflow_handler+0x75/0x110 ... asm_sysvec_apic_timer_interrupt+0x1a/0x20 RIP: 0010:__kmem_cache_free+0x1cb/0x350 ... ? uprobe_free_utask+0x62/0x80 ? acct_collect+0x4c/0x220 uprobe_free_utask+0x62/0x80 mm_release+0x12/0xb0 do_exit+0x26b/0xaa0 __x64_sys_exit+0x1b/0x20 do_syscall_64+0x5a/0x80 The person who reported the issue upstream provided this reproducer. (Run each command in a separate terminal): # while :; do bpftrace -e 'uprobe:/bin/ls:_start { printf("hit\n"); }' -c ls; done # bpftrace -e 'profile:hz:10 { @[ustack()] = count(); }' However, since the binutils are stripped on some of the releases where I tested this, I ran the following instead: # while :; do bpftrace -e 'uprobe:libc:malloc { printf("hit\n"); }' -c ls; done # bpftrace -e 'profile:hz:10 { @[ustack()] = count(); }' [Backport] The fix is upstream as commit b583ef82b671 ("uprobes: Fix race in uprobe_fr
[Kernel-packages] [Bug 2101120] Re: mptcp BUG 'scheduling while atomic' in mptcp_pm_nl_append_new_local_addr
I have tested the noble proposed and validated that it fixes this bug. ** Tags removed: verification-needed-noble-linux ** Tags added: verification-done-noble-linux -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2101120 Title: mptcp BUG 'scheduling while atomic' in mptcp_pm_nl_append_new_local_addr Status in linux package in Ubuntu: Confirmed Status in linux source package in Noble: Fix Committed Status in linux source package in Oracular: Fix Committed Bug description: [Impact] If mptcp endpoints are configured on a host using an address that is external to the host, then the kernel will create an implicit endpoint with the host's local address when mptcp receives its first flow. If multiple packets for these local interfaces arrive in parallel, more than one caller may end up in mptcp_pm_nl_append_new_local_addr because none found the address in local_addr_list during their call to mptcp_pm_nl_get_local_id. In this case, the concurrent new_local_addr calls may delete the address entry created by the previous caller. These deletes use synchronize_rcu, but this is not permitted in some of the contexts where this function may be called. During packet recv, the caller may be in a rcu read critical section and have preemption disabled. This can lead to a BUG / panic because synchronize_rcu is called in softint context. An example stack: BUG: scheduling while atomic: swapper/2/0/0x0302 Call Trace: dump_stack_lvl (lib/dump_stack.c:117 (discriminator 1)) dump_stack (lib/dump_stack.c:124) __schedule_bug (kernel/sched/core.c:5943) schedule_debug.constprop.0 (arch/x86/include/asm/preempt.h:33 kernel/sched/core.c:5970) __schedule (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 kernel/sched/features.h:29 kernel/sched/core.c:6621) schedule (arch/x86/include/asm/preempt.h:84 kernel/sched/core.c:6804 kernel/sched/core.c:6818) schedule_timeout (kernel/time/timer.c:2160) wait_for_completion (kernel/sched/completion.c:96 kernel/sched/completion.c:116 kernel/sched/completion.c:127 kernel/sched/completion.c:148) __wait_rcu_gp (include/linux/rcupdate.h:311 kernel/rcu/update.c:444) synchronize_rcu (kernel/rcu/tree.c:3609) mptcp_pm_nl_append_new_local_addr (net/mptcp/pm_netlink.c:966 net/mptcp/pm_netlink.c:1061) mptcp_pm_nl_get_local_id (net/mptcp/pm_netlink.c:1164) mptcp_pm_get_local_id (net/mptcp/pm.c:420) subflow_check_req (net/mptcp/subflow.c:98 net/mptcp/subflow.c:213) subflow_v4_route_req (net/mptcp/subflow.c:305) tcp_conn_request (net/ipv4/tcp_input.c:7216) subflow_v4_conn_request (net/mptcp/subflow.c:651) tcp_rcv_state_process (net/ipv4/tcp_input.c:6709) tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1934) tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2334) ip_protocol_deliver_rcu (net/ipv4/ip_input.c:205 (discriminator 1)) ip_local_deliver (include/linux/netfilter.h:314 include/linux/netfilter.h:308 net/ipv4/ip_input.c:254) ip_sublist_rcv_finish (include/net/dst.h:461 net/ipv4/ip_input.c:580) ip_sublist_rcv (net/ipv4/ip_input.c:640) ip_list_rcv (net/ipv4/ip_input.c:675) __netif_receive_skb_list_core (net/core/dev.c:5583 net/core/dev.c:5631) netif_receive_skb_list_internal (net/core/dev.c:5685 net/core/dev.c:5774) napi_complete_done (include/linux/list.h:37 include/net/gro.h:449 include/net/gro.h:444 net/core/dev.c:6114) igb_poll (drivers/net/ethernet/intel/igb/igb_main.c:8244) igb __napi_poll (net/core/dev.c:6582) net_rx_action (net/core/dev.c:6653 net/core/dev.c:6787) handle_softirqs (kernel/softirq.c:553) __irq_exit_rcu (kernel/softirq.c:588 kernel/softirq.c:427 kernel/softirq.c:636) irq_exit_rcu (kernel/softirq.c:651) common_interrupt (arch/x86/kernel/irq.c:247 (discriminator 14)) [Backport] Cherry-pick the following patch from upstream: 022bfe24aad8 ("mptcp: fix 'scheduling while atomic' in mptcp_pm_nl_append_new_local_addr") This patch fixes the problem by deleting the duplicate prior to its insertion in local_addr_list by skipping the replacement operation in mptcp_pm_nl_append_new_local_addr. Instead of the last implicit endpoint replacing the previous, it is discarded without a synchronize_rcu and the old copy is kept. This mode is only selected in mptcp_pm_nl_get_local_id. [Test] This patch has passed the upstream mptcp test suites and has also been tested against the reproducer that triggered the panic. (Add and remove mptcp endpoints with an external address that differs from the internal address). Prior to this patch the problem would trigger in less than a minute. With this patch applied, the test has run for hours without incident. [Potential Regression] The regression potential is low since
[Kernel-packages] [Bug 2104210] Re: uprobe-related panics during profiling
I have verified this in noble proposed and validated that it fixes the bug. ** Description changed: - Impact] + [Impact] On systems that utilize both uprobes and perf_events style profiling, it is possible to hit a panic in the uprobe_free_utask code. This occurs during process exit. If the profiler fires while uprobe_free_utask is in the process of cleaning up the utask, the NMI may read freed memory because the cleanup code frees the utask before setting its pointer to NULL. This submitter has encountered the problem on systems running workloads without intentionally trying to trigger the problem. The stacks look something like this: - RIP: 0010:is_uprobe_at_func_entry+0x28/0x80 - ... - ? die_addr+0x36/0x90 - ? exc_general_protection+0x217/0x420 - ? asm_exc_general_protection+0x26/0x30 - ? is_uprobe_at_func_entry+0x28/0x80 - perf_callchain_user+0x20a/0x360 - get_perf_callchain+0x147/0x1d0 - bpf_get_stackid+0x60/0x90 - bpf_prog_9aac297fb833e2f5_do_perf_event+0x434/0x53b - ? __smp_call_single_queue+0xad/0x120 - bpf_overflow_handler+0x75/0x110 - ... - asm_sysvec_apic_timer_interrupt+0x1a/0x20 - RIP: 0010:__kmem_cache_free+0x1cb/0x350 - ... - ? uprobe_free_utask+0x62/0x80 - ? acct_collect+0x4c/0x220 - uprobe_free_utask+0x62/0x80 - mm_release+0x12/0xb0 - do_exit+0x26b/0xaa0 - __x64_sys_exit+0x1b/0x20 - do_syscall_64+0x5a/0x80 + RIP: 0010:is_uprobe_at_func_entry+0x28/0x80 + ... + ? die_addr+0x36/0x90 + ? exc_general_protection+0x217/0x420 + ? asm_exc_general_protection+0x26/0x30 + ? is_uprobe_at_func_entry+0x28/0x80 + perf_callchain_user+0x20a/0x360 + get_perf_callchain+0x147/0x1d0 + bpf_get_stackid+0x60/0x90 + bpf_prog_9aac297fb833e2f5_do_perf_event+0x434/0x53b + ? __smp_call_single_queue+0xad/0x120 + bpf_overflow_handler+0x75/0x110 + ... + asm_sysvec_apic_timer_interrupt+0x1a/0x20 + RIP: 0010:__kmem_cache_free+0x1cb/0x350 + ... + ? uprobe_free_utask+0x62/0x80 + ? acct_collect+0x4c/0x220 + uprobe_free_utask+0x62/0x80 + mm_release+0x12/0xb0 + do_exit+0x26b/0xaa0 + __x64_sys_exit+0x1b/0x20 + do_syscall_64+0x5a/0x80 The person who reported the issue upstream provided this reproducer. (Run each command in a separate terminal): - # while :; do bpftrace -e 'uprobe:/bin/ls:_start { printf("hit\n"); }' -c ls; done - # bpftrace -e 'profile:hz:10 { @[ustack()] = count(); }' + # while :; do bpftrace -e 'uprobe:/bin/ls:_start { printf("hit\n"); }' -c ls; done + # bpftrace -e 'profile:hz:10 { @[ustack()] = count(); }' However, since the binutils are stripped on some of the releases where I tested this, I ran the following instead: - # while :; do bpftrace -e 'uprobe:libc:malloc { printf("hit\n"); }' -c ls; done - # bpftrace -e 'profile:hz:10 { @[ustack()] = count(); }' + # while :; do bpftrace -e 'uprobe:libc:malloc { printf("hit\n"); }' -c ls; done + # bpftrace -e 'profile:hz:10 { @[ustack()] = count(); }' [Backport] The fix is upstream as commit b583ef82b671 ("uprobes: Fix race in uprobe_free_utask") However this patch was massaged by stable for its inclusion in 6.12, 6.6, and 6.1. Instead of re-doing stable's conflict resolution, take the patch directly from 6.6.x instead, at commit eff00c5e29ab. This patch is in stable as of 6.12.19, 6.6.83, and 6.1.131. [Test] I've run the provided reproducer and validated that I can reproduce the problem without the patch applied and that I cannot reproduce it again once I have applied the patch. [Potential Regression] The regression potential here seems quite low. The fix has been upstream for a couple releases and no subsequent issues have been reported. It makes no functional change beyond ensuring that the utask pointer is set to NULL before the utask structure itself is freed. The dereference and free occur on the same cpu. ** Tags removed: verification-needed-noble-linux ** Tags added: verification-done-noble-linux -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2104210 Title: uprobe-related panics during profiling Status in linux package in Ubuntu: Invalid Status in linux source package in Noble: Fix Committed Status in linux source package in Oracular: Fix Committed Bug description: [Impact] On systems that utilize both uprobes and perf_events style profiling, it is possible to hit a panic in the uprobe_free_utask code. This occurs during process exit. If the profiler fires while uprobe_free_utask is in the process of cleaning up the utask, the NMI may read freed memory because the cleanup code frees the utask before setting its pointer to NULL. This submitter has encountered the problem on systems running workloads without intentionally trying to trigger the problem. The stacks look something like this:
[Kernel-packages] [Bug 2101120] Re: mptcp BUG 'scheduling while atomic' in mptcp_pm_nl_append_new_local_addr
Patches sent to kernel team's list: https://lists.ubuntu.com/archives/kernel-team/2025-March/157856.html -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2101120 Title: mptcp BUG 'scheduling while atomic' in mptcp_pm_nl_append_new_local_addr Status in linux package in Ubuntu: New Bug description: [Impact] If mptcp endpoints are configured on a host using an address that is external to the host, then the kernel will create an implicit endpoint with the host's local address when mptcp receives its first flow. If multiple packets for these local interfaces arrive in parallel, more than one caller may end up in mptcp_pm_nl_append_new_local_addr because none found the address in local_addr_list during their call to mptcp_pm_nl_get_local_id. In this case, the concurrent new_local_addr calls may delete the address entry created by the previous caller. These deletes use synchronize_rcu, but this is not permitted in some of the contexts where this function may be called. During packet recv, the caller may be in a rcu read critical section and have preemption disabled. This can lead to a BUG / panic because synchronize_rcu is called in softint context. An example stack: BUG: scheduling while atomic: swapper/2/0/0x0302 Call Trace: dump_stack_lvl (lib/dump_stack.c:117 (discriminator 1)) dump_stack (lib/dump_stack.c:124) __schedule_bug (kernel/sched/core.c:5943) schedule_debug.constprop.0 (arch/x86/include/asm/preempt.h:33 kernel/sched/core.c:5970) __schedule (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 kernel/sched/features.h:29 kernel/sched/core.c:6621) schedule (arch/x86/include/asm/preempt.h:84 kernel/sched/core.c:6804 kernel/sched/core.c:6818) schedule_timeout (kernel/time/timer.c:2160) wait_for_completion (kernel/sched/completion.c:96 kernel/sched/completion.c:116 kernel/sched/completion.c:127 kernel/sched/completion.c:148) __wait_rcu_gp (include/linux/rcupdate.h:311 kernel/rcu/update.c:444) synchronize_rcu (kernel/rcu/tree.c:3609) mptcp_pm_nl_append_new_local_addr (net/mptcp/pm_netlink.c:966 net/mptcp/pm_netlink.c:1061) mptcp_pm_nl_get_local_id (net/mptcp/pm_netlink.c:1164) mptcp_pm_get_local_id (net/mptcp/pm.c:420) subflow_check_req (net/mptcp/subflow.c:98 net/mptcp/subflow.c:213) subflow_v4_route_req (net/mptcp/subflow.c:305) tcp_conn_request (net/ipv4/tcp_input.c:7216) subflow_v4_conn_request (net/mptcp/subflow.c:651) tcp_rcv_state_process (net/ipv4/tcp_input.c:6709) tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1934) tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2334) ip_protocol_deliver_rcu (net/ipv4/ip_input.c:205 (discriminator 1)) ip_local_deliver (include/linux/netfilter.h:314 include/linux/netfilter.h:308 net/ipv4/ip_input.c:254) ip_sublist_rcv_finish (include/net/dst.h:461 net/ipv4/ip_input.c:580) ip_sublist_rcv (net/ipv4/ip_input.c:640) ip_list_rcv (net/ipv4/ip_input.c:675) __netif_receive_skb_list_core (net/core/dev.c:5583 net/core/dev.c:5631) netif_receive_skb_list_internal (net/core/dev.c:5685 net/core/dev.c:5774) napi_complete_done (include/linux/list.h:37 include/net/gro.h:449 include/net/gro.h:444 net/core/dev.c:6114) igb_poll (drivers/net/ethernet/intel/igb/igb_main.c:8244) igb __napi_poll (net/core/dev.c:6582) net_rx_action (net/core/dev.c:6653 net/core/dev.c:6787) handle_softirqs (kernel/softirq.c:553) __irq_exit_rcu (kernel/softirq.c:588 kernel/softirq.c:427 kernel/softirq.c:636) irq_exit_rcu (kernel/softirq.c:651) common_interrupt (arch/x86/kernel/irq.c:247 (discriminator 14)) [Backport] Cherry-pick the following patch from upstream: 022bfe24aad8 ("mptcp: fix 'scheduling while atomic' in mptcp_pm_nl_append_new_local_addr") This patch fixes the problem by deleting the duplicate prior to its insertion in local_addr_list by skipping the replacement operation in mptcp_pm_nl_append_new_local_addr. Instead of the last implicit endpoint replacing the previous, it is discarded without a synchronize_rcu and the old copy is kept. This mode is only selected in mptcp_pm_nl_get_local_id. [Test] This patch has passed the upstream mptcp test suites and has also been tested against the reproducer that triggered the panic. (Add and remove mptcp endpoints with an external address that differs from the internal address). Prior to this patch the problem would trigger in less than a minute. With this patch applied, the test has run for hours without incident. [Potential Regression] The regression potential is low since the behavior change is small. Implicit endpoints still get created and deleted, but they are only replaced when a user adds an endpoint with the same local address as an existin
[Kernel-packages] [Bug 2101120] [NEW] mptcp BUG 'scheduling while atomic' in mptcp_pm_nl_append_new_local_addr
Public bug reported: [Impact] If mptcp endpoints are configured on a host using an address that is external to the host, then the kernel will create an implicit endpoint with the host's local address when mptcp receives its first flow. If multiple packets for these local interfaces arrive in parallel, more than one caller may end up in mptcp_pm_nl_append_new_local_addr because none found the address in local_addr_list during their call to mptcp_pm_nl_get_local_id. In this case, the concurrent new_local_addr calls may delete the address entry created by the previous caller. These deletes use synchronize_rcu, but this is not permitted in some of the contexts where this function may be called. During packet recv, the caller may be in a rcu read critical section and have preemption disabled. This can lead to a BUG / panic because synchronize_rcu is called in softint context. An example stack: BUG: scheduling while atomic: swapper/2/0/0x0302 Call Trace: dump_stack_lvl (lib/dump_stack.c:117 (discriminator 1)) dump_stack (lib/dump_stack.c:124) __schedule_bug (kernel/sched/core.c:5943) schedule_debug.constprop.0 (arch/x86/include/asm/preempt.h:33 kernel/sched/core.c:5970) __schedule (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 kernel/sched/features.h:29 kernel/sched/core.c:6621) schedule (arch/x86/include/asm/preempt.h:84 kernel/sched/core.c:6804 kernel/sched/core.c:6818) schedule_timeout (kernel/time/timer.c:2160) wait_for_completion (kernel/sched/completion.c:96 kernel/sched/completion.c:116 kernel/sched/completion.c:127 kernel/sched/completion.c:148) __wait_rcu_gp (include/linux/rcupdate.h:311 kernel/rcu/update.c:444) synchronize_rcu (kernel/rcu/tree.c:3609) mptcp_pm_nl_append_new_local_addr (net/mptcp/pm_netlink.c:966 net/mptcp/pm_netlink.c:1061) mptcp_pm_nl_get_local_id (net/mptcp/pm_netlink.c:1164) mptcp_pm_get_local_id (net/mptcp/pm.c:420) subflow_check_req (net/mptcp/subflow.c:98 net/mptcp/subflow.c:213) subflow_v4_route_req (net/mptcp/subflow.c:305) tcp_conn_request (net/ipv4/tcp_input.c:7216) subflow_v4_conn_request (net/mptcp/subflow.c:651) tcp_rcv_state_process (net/ipv4/tcp_input.c:6709) tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1934) tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2334) ip_protocol_deliver_rcu (net/ipv4/ip_input.c:205 (discriminator 1)) ip_local_deliver (include/linux/netfilter.h:314 include/linux/netfilter.h:308 net/ipv4/ip_input.c:254) ip_sublist_rcv_finish (include/net/dst.h:461 net/ipv4/ip_input.c:580) ip_sublist_rcv (net/ipv4/ip_input.c:640) ip_list_rcv (net/ipv4/ip_input.c:675) __netif_receive_skb_list_core (net/core/dev.c:5583 net/core/dev.c:5631) netif_receive_skb_list_internal (net/core/dev.c:5685 net/core/dev.c:5774) napi_complete_done (include/linux/list.h:37 include/net/gro.h:449 include/net/gro.h:444 net/core/dev.c:6114) igb_poll (drivers/net/ethernet/intel/igb/igb_main.c:8244) igb __napi_poll (net/core/dev.c:6582) net_rx_action (net/core/dev.c:6653 net/core/dev.c:6787) handle_softirqs (kernel/softirq.c:553) __irq_exit_rcu (kernel/softirq.c:588 kernel/softirq.c:427 kernel/softirq.c:636) irq_exit_rcu (kernel/softirq.c:651) common_interrupt (arch/x86/kernel/irq.c:247 (discriminator 14)) [Backport] Cherry-pick the following patch from upstream: 022bfe24aad8 ("mptcp: fix 'scheduling while atomic' in mptcp_pm_nl_append_new_local_addr") This patch fixes the problem by deleting the duplicate prior to its insertion in local_addr_list by skipping the replacement operation in mptcp_pm_nl_append_new_local_addr. Instead of the last implicit endpoint replacing the previous, it is discarded without a synchronize_rcu and the old copy is kept. This mode is only selected in mptcp_pm_nl_get_local_id. [Test] This patch has passed the upstream mptcp test suites and has also been tested against the reproducer that triggered the panic. (Add and remove mptcp endpoints with an external address that differs from the internal address). Prior to this patch the problem would trigger in less than a minute. With this patch applied, the test has run for hours without incident. [Potential Regression] The regression potential is low since the behavior change is small. Implicit endpoints still get created and deleted, but they are only replaced when a user adds an endpoint with the same local address as an existing implicit address. No replacements via mptcp_pm_nl_get_local_id will occur anymore. ** Affects: linux (Ubuntu) Importance: Undecided Status: New ** Tags: patch patch-accepted-upstream -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2101120 Title: mptcp BUG 'scheduling while atomic' in mptcp_pm_nl_append_new_local_addr Status in linux package in Ubuntu: New Bug description: [Impact] If mptcp endpoints ar
[Kernel-packages] [Bug 2101120] Re: mptcp BUG 'scheduling while atomic' in mptcp_pm_nl_append_new_local_addr
I have a patch for this accepted upstream that I'll send to the Ubuntu kernel team in short order. This has been merged to Linus's tree but has yet to be picked up by Stable. It's tagged to go there, it just hasn't been picked up by the robots yet. It affects all releases from 5.17 onward, which should put it in scope for Noble, Oracular, and Plucky. ** Description changed: [Impact] If mptcp endpoints are configured on a host using an address that is external to the host, then the kernel will create an implicit endpoint with the host's local address when mptcp receives its first flow. If multiple packets for these local interfaces arrive in parallel, more than one caller may end up in mptcp_pm_nl_append_new_local_addr because none found the address in local_addr_list during their call to mptcp_pm_nl_get_local_id. In this case, the concurrent new_local_addr calls may delete the address entry created by the previous caller. These deletes use synchronize_rcu, but this is not permitted in some of the contexts where this function may be called. During packet recv, the caller may be in a rcu read critical section and have preemption disabled. This can lead to a BUG / panic because synchronize_rcu is called in softint context. An example stack: -BUG: scheduling while atomic: swapper/2/0/0x0302 + BUG: scheduling while atomic: swapper/2/0/0x0302 -Call Trace: - -dump_stack_lvl (lib/dump_stack.c:117 (discriminator 1)) -dump_stack (lib/dump_stack.c:124) -__schedule_bug (kernel/sched/core.c:5943) -schedule_debug.constprop.0 (arch/x86/include/asm/preempt.h:33 kernel/sched/core.c:5970) -__schedule (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 kernel/sched/features.h:29 kernel/sched/core.c:6621) -schedule (arch/x86/include/asm/preempt.h:84 kernel/sched/core.c:6804 kernel/sched/core.c:6818) -schedule_timeout (kernel/time/timer.c:2160) -wait_for_completion (kernel/sched/completion.c:96 kernel/sched/completion.c:116 kernel/sched/completion.c:127 kernel/sched/completion.c:148) -__wait_rcu_gp (include/linux/rcupdate.h:311 kernel/rcu/update.c:444) -synchronize_rcu (kernel/rcu/tree.c:3609) -mptcp_pm_nl_append_new_local_addr (net/mptcp/pm_netlink.c:966 net/mptcp/pm_netlink.c:1061) -mptcp_pm_nl_get_local_id (net/mptcp/pm_netlink.c:1164) -mptcp_pm_get_local_id (net/mptcp/pm.c:420) -subflow_check_req (net/mptcp/subflow.c:98 net/mptcp/subflow.c:213) -subflow_v4_route_req (net/mptcp/subflow.c:305) -tcp_conn_request (net/ipv4/tcp_input.c:7216) -subflow_v4_conn_request (net/mptcp/subflow.c:651) -tcp_rcv_state_process (net/ipv4/tcp_input.c:6709) -tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1934) -tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2334) -ip_protocol_deliver_rcu (net/ipv4/ip_input.c:205 (discriminator 1)) -ip_local_deliver (include/linux/netfilter.h:314 include/linux/netfilter.h:308 net/ipv4/ip_input.c:254) -ip_sublist_rcv_finish (include/net/dst.h:461 net/ipv4/ip_input.c:580) -ip_sublist_rcv (net/ipv4/ip_input.c:640) -ip_list_rcv (net/ipv4/ip_input.c:675) -__netif_receive_skb_list_core (net/core/dev.c:5583 net/core/dev.c:5631) -netif_receive_skb_list_internal (net/core/dev.c:5685 net/core/dev.c:5774) -napi_complete_done (include/linux/list.h:37 include/net/gro.h:449 include/net/gro.h:444 net/core/dev.c:6114) -igb_poll (drivers/net/ethernet/intel/igb/igb_main.c:8244) igb -__napi_poll (net/core/dev.c:6582) -net_rx_action (net/core/dev.c:6653 net/core/dev.c:6787) -handle_softirqs (kernel/softirq.c:553) -__irq_exit_rcu (kernel/softirq.c:588 kernel/softirq.c:427 kernel/softirq.c:636) -irq_exit_rcu (kernel/softirq.c:651) -common_interrupt (arch/x86/kernel/irq.c:247 (discriminator 14)) - + Call Trace: + + dump_stack_lvl (lib/dump_stack.c:117 (discriminator 1)) + dump_stack (lib/dump_stack.c:124) + __schedule_bug (kernel/sched/core.c:5943) + schedule_debug.constprop.0 (arch/x86/include/asm/preempt.h:33 kernel/sched/core.c:5970) + __schedule (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 kernel/sched/features.h:29 kernel/sched/core.c:6621) + schedule (arch/x86/include/asm/preempt.h:84 kernel/sched/core.c:6804 kernel/sched/core.c:6818) + schedule_timeout (kernel/time/timer.c:2160) + wait_for_completion (kernel/sched/completion.c:96 kernel/sched/completion.c:116 kernel/sched/completion.c:127 kernel/sched/completion.c:148) + __wait_rcu_gp (include/linux/rcupdate.h:311 kernel/rcu/update.c:444) + synchronize_rcu (kernel/rcu/tree.c:3609) + mptcp_pm_nl_append_new_local_addr (net/mptcp/pm_netlink.c:966 net/mptcp/pm_netlink.c:1061) + mptcp_pm_nl_get_local_id (net/mptcp/pm_netlink.c:1164) + mptcp_pm_get_local_id (net/mptcp/pm.c:420) + subflow_check_req (net/mptcp/subflow.c:98 net/mptcp/subflow.c:213) + subflow_v4_route_req (net/mptcp/subf