[Kernel-packages] [Bug 2033122] Re: Request backport of xen timekeeping performance improvements

2023-10-06 Thread Krister Johansen
** Tags removed: verification-needed-jammy-linux-aws
** Tags added: verification-done-jammy-linux-aws

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2033122

Title:
  Request backport of xen timekeeping performance improvements

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Jammy:
  Fix Released
Status in linux source package in Lunar:
  Fix Released

Bug description:
  Users, especially those on EC2, are encouraged to select tsc as their
  default clocksource.  However, this requires manual tuning of the
  operating system. Kvm can determine if it safe to use the tsc, and
  will default to that instead of its pvclock when appropriate.  This
  requests a backport of patch does the same for Xen instances.

  If appropriate, it's fine if this is applied to only the linux-aws
  branches.

  Not all Xen EC2 instances advertise explicit nomigrate support,
  however, on those that do we'll select tsc by default.  On the subset
  of hosts where this is advertised, users will safely default to the
  more performant clocksource.

  [Impact]
  Xen instances default to the xen clocksource which has been documented to be 
slower.  This is required for instances where the tsc is not safe to use, or 
the guest is subject to migration.  On some platforms the performance impact 
can be high, and users are encouraged to select the tsc when appropriate.  
Instead of leaving up to users to figure this out by reading a variety of 
different documents, pick the fast clocksource when it can be determined to be 
safe to do so.

  [Backport]
  Clean cherry pick.  No conflicts applying to 5.15 or 6.2.

  [Test]
  Booted EC2 xen instances with and without this patch and validated that on 
those that properly advertised the required criteria via cpuid, that the 
clocksource defaulted to tsc instead of xen.

  [Potential Regression]
  Potential is low, since only absurd configurations could lead to a problem.  
If this is considered risky, it can be applied to only linux-aws where the 
documented guidance is for users to enable tsc as the clocksource on Xen.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2033122/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2056227] Re: KVM: arm64: softlockups in stage2_apply_range

2024-04-11 Thread Krister Johansen
For posterity, LTS 5.15 picked up this fix in 5.15.154

** Tags removed: verification-needed-jammy-linux
** Tags added: verification-done-jammy-linux

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2056227

Title:
  KVM: arm64: softlockups in stage2_apply_range

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Jammy:
  Fix Committed

Bug description:
  [Impact]

  Tearing down kvm VMs on arm64 can cause softlockups to appear on console.  
When
  terminating VMs with > 100Gb of memory and 4k pages, the memory unmap times
  often exceed 20 seconds, which can trigger the softlockup detector.  Portions 
of
  the unmap path also have interrupts disabled while tlb invalidation 
instructions
  run, which can further contribute to latency problems.  My team has observed
  networking latency problems if the cpu where the teardown is occurring is also
  mapped to handle a NIC interrupt.

  Fortunately, a solution has been in place since Linux 6.1.  A small pair of
  patches modify stage2_apply_range to operate on smaller memory ranges before
  performing a cond_resched.  With these patches applied, softlockups are no
  longer observed when tearing down VMs with large amounts of memory. 

  Although I also submitted the patches to 5.15 LTS (link to LTS submission in
  "Backport" section), I'd appreciate it if Ubuntu were willing to take this
  submission in parallel since the impact has left us unable to utilize arm64 
for
  kvm until we can either migrate our hypervisors to hugepages, pick up this 
fix,
  or some combination of the two.

  [Backport]

  Backport the following fixes from linux 6.1: 

  3b5c082bbf KVM: arm64: Work out supported block level at compile time
  5994bc9e05 KVM: arm64: Limit stage2_apply_range() batch size to largest block

  The fix is in 5994bc9e05 and 3b5c082bbf is a dependency that was submitted as
  part of the series.  The original submission is here:

  https://lore.kernel.org/all/20221007234151.461779-1-oliver.up...@linux.dev/

  I've also submitted the patches to 5.15 LTS here:

  https://lore.kernel.org/stable/cover.1709665227.git.k...@templeofstupid.com/

  Both fixes cherry picked cleanly and there were no conflicts.

  [Test]
 
  Executed the test from 5994bc9e05 as well as my own run of kvm_page_table_test
  on a VM with 4k pages and a memory size > 100Gb.  Without the patches,
  softlockups were observed in both tests.  With the patches applied, the tests
  ran without incident.

  This was tested against both LTS 5.15.150 and linux-aws-5.15.0-1055.
   
  [Potential Regression]
 
  Regression potential is low.  These patches have been present in Linux since 
6.1
  and appear to have needed no further maintenance.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2056227/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2056227] Re: KVM: arm64: softlockups in stage2_apply_range

2024-04-11 Thread Krister Johansen
I've tested linux/5.15.0-104.114 and it passes my tests.  Marking
verification-done-jammy-linux.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2056227

Title:
  KVM: arm64: softlockups in stage2_apply_range

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Jammy:
  Fix Committed

Bug description:
  [Impact]

  Tearing down kvm VMs on arm64 can cause softlockups to appear on console.  
When
  terminating VMs with > 100Gb of memory and 4k pages, the memory unmap times
  often exceed 20 seconds, which can trigger the softlockup detector.  Portions 
of
  the unmap path also have interrupts disabled while tlb invalidation 
instructions
  run, which can further contribute to latency problems.  My team has observed
  networking latency problems if the cpu where the teardown is occurring is also
  mapped to handle a NIC interrupt.

  Fortunately, a solution has been in place since Linux 6.1.  A small pair of
  patches modify stage2_apply_range to operate on smaller memory ranges before
  performing a cond_resched.  With these patches applied, softlockups are no
  longer observed when tearing down VMs with large amounts of memory. 

  Although I also submitted the patches to 5.15 LTS (link to LTS submission in
  "Backport" section), I'd appreciate it if Ubuntu were willing to take this
  submission in parallel since the impact has left us unable to utilize arm64 
for
  kvm until we can either migrate our hypervisors to hugepages, pick up this 
fix,
  or some combination of the two.

  [Backport]

  Backport the following fixes from linux 6.1: 

  3b5c082bbf KVM: arm64: Work out supported block level at compile time
  5994bc9e05 KVM: arm64: Limit stage2_apply_range() batch size to largest block

  The fix is in 5994bc9e05 and 3b5c082bbf is a dependency that was submitted as
  part of the series.  The original submission is here:

  https://lore.kernel.org/all/20221007234151.461779-1-oliver.up...@linux.dev/

  I've also submitted the patches to 5.15 LTS here:

  https://lore.kernel.org/stable/cover.1709665227.git.k...@templeofstupid.com/

  Both fixes cherry picked cleanly and there were no conflicts.

  [Test]
 
  Executed the test from 5994bc9e05 as well as my own run of kvm_page_table_test
  on a VM with 4k pages and a memory size > 100Gb.  Without the patches,
  softlockups were observed in both tests.  With the patches applied, the tests
  ran without incident.

  This was tested against both LTS 5.15.150 and linux-aws-5.15.0-1055.
   
  [Potential Regression]
 
  Regression potential is low.  These patches have been present in Linux since 
6.1
  and appear to have needed no further maintenance.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2056227/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2056227] [NEW] KVM: arm64: softlockups in stage2_apply_range

2024-03-05 Thread Krister Johansen
Public bug reported:

[Impact]

Tearing down kvm VMs on arm64 can cause softlockups to appear on console.  When
terminating VMs with > 100Gb of memory and 4k pages, the memory unmap times
often exceed 20 seconds, which can trigger the softlockup detector.  Portions of
the unmap path also have interrupts disabled while tlb invalidation instructions
run, which can further contribute to latency problems.  My team has observed
networking latency problems if the cpu where the teardown is occurring is also
mapped to handle a NIC interrupt.
  
Fortunately, a solution has been in place since Linux 6.1.  A small pair of
patches modify stage2_apply_range to operate on smaller memory ranges before
performing a cond_resched.  With these patches applied, softlockups are no
longer observed when tearing down VMs with large amounts of memory. 

Although I also submitted the patches to 5.15 LTS (link to LTS submission in
"Backport" section), I'd appreciate it if Ubuntu were willing to take this
submission in parallel since the impact has left us unable to utilize arm64 for
kvm until we can either migrate our hypervisors to hugepages, pick up this fix,
or some combination of the two.

[Backport]

Backport the following fixes from linux 6.1: 
  
3b5c082bbf KVM: arm64: Work out supported block level at compile time
5994bc9e05 KVM: arm64: Limit stage2_apply_range() batch size to largest block
  
The fix is in 5994bc9e05 and 3b5c082bbf is a dependency that was submitted as
part of the series.  The original submission is here:
  
https://lore.kernel.org/all/20221007234151.461779-1-oliver.up...@linux.dev/
  
I've also submitted the patches to 5.15 LTS here:
  
https://lore.kernel.org/stable/cover.1709665227.git.k...@templeofstupid.com/
  
Both fixes cherry picked cleanly and there were no conflicts.
  
[Test]
   
Executed the test from 5994bc9e05 as well as my own run of kvm_page_table_test
on a VM with 4k pages and a memory size > 100Gb.  Without the patches,
softlockups were observed in both tests.  With the patches applied, the tests
ran without incident.

This was tested against both LTS 5.15.150 and linux-aws-5.15.0-1055.
 
[Potential Regression]
   
Regression potential is low.  These patches have been present in Linux since 6.1
and appear to have needed no further maintenance.

** Affects: linux (Ubuntu)
 Importance: Undecided
 Status: New


** Tags: patch

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2056227

Title:
  KVM: arm64: softlockups in stage2_apply_range

Status in linux package in Ubuntu:
  New

Bug description:
  [Impact]

  Tearing down kvm VMs on arm64 can cause softlockups to appear on console.  
When
  terminating VMs with > 100Gb of memory and 4k pages, the memory unmap times
  often exceed 20 seconds, which can trigger the softlockup detector.  Portions 
of
  the unmap path also have interrupts disabled while tlb invalidation 
instructions
  run, which can further contribute to latency problems.  My team has observed
  networking latency problems if the cpu where the teardown is occurring is also
  mapped to handle a NIC interrupt.

  Fortunately, a solution has been in place since Linux 6.1.  A small pair of
  patches modify stage2_apply_range to operate on smaller memory ranges before
  performing a cond_resched.  With these patches applied, softlockups are no
  longer observed when tearing down VMs with large amounts of memory. 

  Although I also submitted the patches to 5.15 LTS (link to LTS submission in
  "Backport" section), I'd appreciate it if Ubuntu were willing to take this
  submission in parallel since the impact has left us unable to utilize arm64 
for
  kvm until we can either migrate our hypervisors to hugepages, pick up this 
fix,
  or some combination of the two.

  [Backport]

  Backport the following fixes from linux 6.1: 

  3b5c082bbf KVM: arm64: Work out supported block level at compile time
  5994bc9e05 KVM: arm64: Limit stage2_apply_range() batch size to largest block

  The fix is in 5994bc9e05 and 3b5c082bbf is a dependency that was submitted as
  part of the series.  The original submission is here:

  https://lore.kernel.org/all/20221007234151.461779-1-oliver.up...@linux.dev/

  I've also submitted the patches to 5.15 LTS here:

  https://lore.kernel.org/stable/cover.1709665227.git.k...@templeofstupid.com/

  Both fixes cherry picked cleanly and there were no conflicts.

  [Test]
 
  Executed the test from 5994bc9e05 as well as my own run of kvm_page_table_test
  on a VM with 4k pages and a memory size > 100Gb.  Without the patches,
  softlockups were observed in both tests.  With the patches applied, the tests
  ran without incident.

  This was tested against both LTS 5.15.150 and linux-aws-5.15.0-1055.
   
  [Potential Regression]
 
  Regression potential is low.  These patches have been present in Linux since 
6

[Kernel-packages] [Bug 2056227] Re: KVM: arm64: softlockups in stage2_apply_range

2024-03-05 Thread Krister Johansen
This specifically affects Jammy and the 5.15 series.  I have the
necessary patches prepared and will e-mail those to the kernel team's
mailing list.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2056227

Title:
  KVM: arm64: softlockups in stage2_apply_range

Status in linux package in Ubuntu:
  New

Bug description:
  [Impact]

  Tearing down kvm VMs on arm64 can cause softlockups to appear on console.  
When
  terminating VMs with > 100Gb of memory and 4k pages, the memory unmap times
  often exceed 20 seconds, which can trigger the softlockup detector.  Portions 
of
  the unmap path also have interrupts disabled while tlb invalidation 
instructions
  run, which can further contribute to latency problems.  My team has observed
  networking latency problems if the cpu where the teardown is occurring is also
  mapped to handle a NIC interrupt.

  Fortunately, a solution has been in place since Linux 6.1.  A small pair of
  patches modify stage2_apply_range to operate on smaller memory ranges before
  performing a cond_resched.  With these patches applied, softlockups are no
  longer observed when tearing down VMs with large amounts of memory. 

  Although I also submitted the patches to 5.15 LTS (link to LTS submission in
  "Backport" section), I'd appreciate it if Ubuntu were willing to take this
  submission in parallel since the impact has left us unable to utilize arm64 
for
  kvm until we can either migrate our hypervisors to hugepages, pick up this 
fix,
  or some combination of the two.

  [Backport]

  Backport the following fixes from linux 6.1: 

  3b5c082bbf KVM: arm64: Work out supported block level at compile time
  5994bc9e05 KVM: arm64: Limit stage2_apply_range() batch size to largest block

  The fix is in 5994bc9e05 and 3b5c082bbf is a dependency that was submitted as
  part of the series.  The original submission is here:

  https://lore.kernel.org/all/20221007234151.461779-1-oliver.up...@linux.dev/

  I've also submitted the patches to 5.15 LTS here:

  https://lore.kernel.org/stable/cover.1709665227.git.k...@templeofstupid.com/

  Both fixes cherry picked cleanly and there were no conflicts.

  [Test]
 
  Executed the test from 5994bc9e05 as well as my own run of kvm_page_table_test
  on a VM with 4k pages and a memory size > 100Gb.  Without the patches,
  softlockups were observed in both tests.  With the patches applied, the tests
  ran without incident.

  This was tested against both LTS 5.15.150 and linux-aws-5.15.0-1055.
   
  [Potential Regression]
 
  Regression potential is low.  These patches have been present in Linux since 
6.1
  and appear to have needed no further maintenance.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2056227/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2056227] Re: KVM: arm64: softlockups in stage2_apply_range

2024-03-05 Thread Krister Johansen
Patches to mailing list here:

https://lists.ubuntu.com/archives/kernel-team/2024-March/149383.html

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2056227

Title:
  KVM: arm64: softlockups in stage2_apply_range

Status in linux package in Ubuntu:
  New

Bug description:
  [Impact]

  Tearing down kvm VMs on arm64 can cause softlockups to appear on console.  
When
  terminating VMs with > 100Gb of memory and 4k pages, the memory unmap times
  often exceed 20 seconds, which can trigger the softlockup detector.  Portions 
of
  the unmap path also have interrupts disabled while tlb invalidation 
instructions
  run, which can further contribute to latency problems.  My team has observed
  networking latency problems if the cpu where the teardown is occurring is also
  mapped to handle a NIC interrupt.

  Fortunately, a solution has been in place since Linux 6.1.  A small pair of
  patches modify stage2_apply_range to operate on smaller memory ranges before
  performing a cond_resched.  With these patches applied, softlockups are no
  longer observed when tearing down VMs with large amounts of memory. 

  Although I also submitted the patches to 5.15 LTS (link to LTS submission in
  "Backport" section), I'd appreciate it if Ubuntu were willing to take this
  submission in parallel since the impact has left us unable to utilize arm64 
for
  kvm until we can either migrate our hypervisors to hugepages, pick up this 
fix,
  or some combination of the two.

  [Backport]

  Backport the following fixes from linux 6.1: 

  3b5c082bbf KVM: arm64: Work out supported block level at compile time
  5994bc9e05 KVM: arm64: Limit stage2_apply_range() batch size to largest block

  The fix is in 5994bc9e05 and 3b5c082bbf is a dependency that was submitted as
  part of the series.  The original submission is here:

  https://lore.kernel.org/all/20221007234151.461779-1-oliver.up...@linux.dev/

  I've also submitted the patches to 5.15 LTS here:

  https://lore.kernel.org/stable/cover.1709665227.git.k...@templeofstupid.com/

  Both fixes cherry picked cleanly and there were no conflicts.

  [Test]
 
  Executed the test from 5994bc9e05 as well as my own run of kvm_page_table_test
  on a VM with 4k pages and a memory size > 100Gb.  Without the patches,
  softlockups were observed in both tests.  With the patches applied, the tests
  ran without incident.

  This was tested against both LTS 5.15.150 and linux-aws-5.15.0-1055.
   
  [Potential Regression]
 
  Regression potential is low.  These patches have been present in Linux since 
6.1
  and appear to have needed no further maintenance.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2056227/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1987232] [NEW] WARN in trace_event_dyn_put_ref

2022-08-21 Thread Krister Johansen
Public bug reported:

I have systems that are regularly hitting a WARN in
trace_event_dyn_put_ref.

The exact message is:

WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46
+trace_event_dyn_put_ref+0x15/0x20

With the following stacktrace:

 perf_trace_init+0x8f/0xd0
 perf_tp_event_init+0x1f/0x40
 perf_try_init_event+0x4a/0x130
 perf_event_alloc+0x497/0xf40
 __do_sys_perf_event_open+0x1d4/0xf70
 __x64_sys_perf_event_open+0x20/0x30
 do_syscall_64+0x5c/0xc0
 entry_SYSCALL_64_after_hwframe+0x44/0xae

I've debugged this and worked with upstream to get a fix into Linux.  It
was recently merged in 6.0-rc2.  See here:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0-rc2&id=7249921d94ff64f67b733eca0b68853a62032b3d

The problem started appearing as soon as our systems picked up the
linux-aws-5.15 branch for Focal.  (That was 5.15.0-1015-aws, if memory
serves).  Could you please cherry pick this fix and pull it back to the
the linux and linux-aws kernels for Focal?  There's test here:
https://lore.kernel.org/all/cover.1660347763.git.k...@templeofstupid.com/
that reproduces the problem very reliably for me.  With the patch
applied, I no longer get the WARNs.

** Affects: linux (Ubuntu)
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1987232

Title:
  WARN in trace_event_dyn_put_ref

Status in linux package in Ubuntu:
  New

Bug description:
  I have systems that are regularly hitting a WARN in
  trace_event_dyn_put_ref.

  The exact message is:

  WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46
  +trace_event_dyn_put_ref+0x15/0x20

  With the following stacktrace:

   perf_trace_init+0x8f/0xd0
   perf_tp_event_init+0x1f/0x40
   perf_try_init_event+0x4a/0x130
   perf_event_alloc+0x497/0xf40
   __do_sys_perf_event_open+0x1d4/0xf70
   __x64_sys_perf_event_open+0x20/0x30
   do_syscall_64+0x5c/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae

  I've debugged this and worked with upstream to get a fix into Linux.
  It was recently merged in 6.0-rc2.  See here:
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0-rc2&id=7249921d94ff64f67b733eca0b68853a62032b3d

  The problem started appearing as soon as our systems picked up the
  linux-aws-5.15 branch for Focal.  (That was 5.15.0-1015-aws, if memory
  serves).  Could you please cherry pick this fix and pull it back to
  the the linux and linux-aws kernels for Focal?  There's test here:
  https://lore.kernel.org/all/cover.1660347763.git.k...@templeofstupid.com/
  that reproduces the problem very reliably for me.  With the patch
  applied, I no longer get the WARNs.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1987232/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1987232] Re: WARN in trace_event_dyn_put_ref

2022-08-22 Thread Krister Johansen
apport information

** Tags added: apport-collected focal uec-images

** Description changed:

  I have systems that are regularly hitting a WARN in
  trace_event_dyn_put_ref.
  
  The exact message is:
  
  WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46
  +trace_event_dyn_put_ref+0x15/0x20
  
  With the following stacktrace:
  
   perf_trace_init+0x8f/0xd0
   perf_tp_event_init+0x1f/0x40
   perf_try_init_event+0x4a/0x130
   perf_event_alloc+0x497/0xf40
   __do_sys_perf_event_open+0x1d4/0xf70
   __x64_sys_perf_event_open+0x20/0x30
   do_syscall_64+0x5c/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae
  
  I've debugged this and worked with upstream to get a fix into Linux.  It
  was recently merged in 6.0-rc2.  See here:
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0-rc2&id=7249921d94ff64f67b733eca0b68853a62032b3d
  
- The problem started appearing as soon as our systems picked up the
- linux-aws-5.15 branch for Focal.  (That was 5.15.0-1015-aws, if memory
- serves).  Could you please cherry pick this fix and pull it back to the
- the linux and linux-aws kernels for Focal?  There's test here:
- https://lore.kernel.org/all/cover.1660347763.git.k...@templeofstupid.com/
- that reproduces the problem very reliably for me.  With the patch
- applied, I no longer get the WARNs.
+ The problem started appearing as soon as our systems picked up the 
linux-aws-5.15 branch for Focal.  (That was 5.15.0-1015-aws, if memory serves). 
 Could you please cherry pick this fix and pull it back to the the linux and 
linux-aws kernels for Focal?  There's test here: 
https://lore.kernel.org/all/cover.1660347763.git.k...@templeofstupid.com/ that 
reproduces the problem very reliably for me.  With the patch applied, I no 
longer get the WARNs.
+ --- 
+ ProblemType: Bug
+ AlsaDevices:
+  total 0
+  crw-rw 1 root audio 116,  1 Aug 22 17:32 seq
+  crw-rw 1 root audio 116, 33 Aug 22 17:32 timer
+ AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
+ ApportVersion: 2.20.11-0ubuntu27.24
+ Architecture: amd64
+ ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
+ AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
+ CRDA: N/A
+ CasperMD5CheckResult: skip
+ DistroRelease: Ubuntu 20.04
+ IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
+ Lsusb: Error: command ['lsusb'] failed with exit code 1:
+ Lsusb-t:
+  
+ Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1:
+ MachineType: Amazon EC2 c5d.12xlarge
+ Package: linux (not installed)
+ PciMultimedia:
+  
+ ProcEnviron:
+  TERM=xterm-256color
+  PATH=(custom, no user)
+  LANG=C.UTF-8
+  SHELL=/bin/bash
+ ProcFB:
+  
+ ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-1015-aws 
root=PARTUUID=4986e35b-1bd5-45d3-b528-fa2edb861a38 ro console=tty1 
console=ttyS0 nvme_core.io_timeout=4294967295 panic=-1
+ ProcVersionSignature: Ubuntu 5.15.0-1015.19~20.04.1-aws 5.15.39
+ RelatedPackageVersions:
+  linux-restricted-modules-5.15.0-1015-aws N/A
+  linux-backports-modules-5.15.0-1015-aws  N/A
+  linux-firmware   N/A
+ RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
+ Tags:  focal uec-images
+ Uname: Linux 5.15.0-1015-aws x86_64
+ UnreportableReason: This report is about a package that is not installed.
+ UpgradeStatus: No upgrade log present (probably fresh install)
+ UserGroups: N/A
+ _MarkForUpload: False
+ dmi.bios.date: 10/16/2017
+ dmi.bios.release: 1.0
+ dmi.bios.vendor: Amazon EC2
+ dmi.bios.version: 1.0
+ dmi.board.asset.tag: i-03f5d8581c7ad94aa
+ dmi.board.vendor: Amazon EC2
+ dmi.chassis.asset.tag: Amazon EC2
+ dmi.chassis.type: 1
+ dmi.chassis.vendor: Amazon EC2
+ dmi.modalias: 
dmi:bvnAmazonEC2:bvr1.0:bd10/16/2017:br1.0:svnAmazonEC2:pnc5d.12xlarge:pvr:rvnAmazonEC2:rn:rvr:cvnAmazonEC2:ct1:cvr:sku:
+ dmi.product.name: c5d.12xlarge
+ dmi.sys.vendor: Amazon EC2

** Attachment added: "CurrentDmesg.txt"
   
https://bugs.launchpad.net/bugs/1987232/+attachment/5610807/+files/CurrentDmesg.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1987232

Title:
  WARN in trace_event_dyn_put_ref

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  I have systems that are regularly hitting a WARN in
  trace_event_dyn_put_ref.

  The exact message is:

  WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46
  +trace_event_dyn_put_ref+0x15/0x20

  With the following stacktrace:

   perf_trace_init+0x8f/0xd0
   perf_tp_event_init+0x1f/0x40
   perf_try_init_event+0x4a/0x130
   perf_event_alloc+0x497/0xf40
   __do_sys_perf_event_open+0x1d4/0xf70
   __x64_sys_perf_event_open+0x20/0x30
   do_syscall_64+0x5c/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae

  I've debugged this and worked with upstream to get a fix into Linux.
  It was recently merged in 6.0-rc2.  See h

[Kernel-packages] [Bug 1987232] Lspci-vt.txt

2022-08-22 Thread Krister Johansen
apport information

** Attachment added: "Lspci-vt.txt"
   
https://bugs.launchpad.net/bugs/1987232/+attachment/5610809/+files/Lspci-vt.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1987232

Title:
  WARN in trace_event_dyn_put_ref

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  I have systems that are regularly hitting a WARN in
  trace_event_dyn_put_ref.

  The exact message is:

  WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46
  +trace_event_dyn_put_ref+0x15/0x20

  With the following stacktrace:

   perf_trace_init+0x8f/0xd0
   perf_tp_event_init+0x1f/0x40
   perf_try_init_event+0x4a/0x130
   perf_event_alloc+0x497/0xf40
   __do_sys_perf_event_open+0x1d4/0xf70
   __x64_sys_perf_event_open+0x20/0x30
   do_syscall_64+0x5c/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae

  I've debugged this and worked with upstream to get a fix into Linux.
  It was recently merged in 6.0-rc2.  See here:
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0-rc2&id=7249921d94ff64f67b733eca0b68853a62032b3d

  The problem started appearing as soon as our systems picked up the 
linux-aws-5.15 branch for Focal.  (That was 5.15.0-1015-aws, if memory serves). 
 Could you please cherry pick this fix and pull it back to the the linux and 
linux-aws kernels for Focal?  There's test here: 
https://lore.kernel.org/all/cover.1660347763.git.k...@templeofstupid.com/ that 
reproduces the problem very reliably for me.  With the patch applied, I no 
longer get the WARNs.
  --- 
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Aug 22 17:32 seq
   crw-rw 1 root audio 116, 33 Aug 22 17:32 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu27.24
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  CasperMD5CheckResult: skip
  DistroRelease: Ubuntu 20.04
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lsusb: Error: command ['lsusb'] failed with exit code 1:
  Lsusb-t:
   
  Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1:
  MachineType: Amazon EC2 c5d.12xlarge
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  ProcFB:
   
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-1015-aws 
root=PARTUUID=4986e35b-1bd5-45d3-b528-fa2edb861a38 ro console=tty1 
console=ttyS0 nvme_core.io_timeout=4294967295 panic=-1
  ProcVersionSignature: Ubuntu 5.15.0-1015.19~20.04.1-aws 5.15.39
  RelatedPackageVersions:
   linux-restricted-modules-5.15.0-1015-aws N/A
   linux-backports-modules-5.15.0-1015-aws  N/A
   linux-firmware   N/A
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  Tags:  focal uec-images
  Uname: Linux 5.15.0-1015-aws x86_64
  UnreportableReason: This report is about a package that is not installed.
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: N/A
  _MarkForUpload: False
  dmi.bios.date: 10/16/2017
  dmi.bios.release: 1.0
  dmi.bios.vendor: Amazon EC2
  dmi.bios.version: 1.0
  dmi.board.asset.tag: i-03f5d8581c7ad94aa
  dmi.board.vendor: Amazon EC2
  dmi.chassis.asset.tag: Amazon EC2
  dmi.chassis.type: 1
  dmi.chassis.vendor: Amazon EC2
  dmi.modalias: 
dmi:bvnAmazonEC2:bvr1.0:bd10/16/2017:br1.0:svnAmazonEC2:pnc5d.12xlarge:pvr:rvnAmazonEC2:rn:rvr:cvnAmazonEC2:ct1:cvr:sku:
  dmi.product.name: c5d.12xlarge
  dmi.sys.vendor: Amazon EC2

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1987232/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1987232] Lspci.txt

2022-08-22 Thread Krister Johansen
apport information

** Attachment added: "Lspci.txt"
   https://bugs.launchpad.net/bugs/1987232/+attachment/5610808/+files/Lspci.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1987232

Title:
  WARN in trace_event_dyn_put_ref

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  I have systems that are regularly hitting a WARN in
  trace_event_dyn_put_ref.

  The exact message is:

  WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46
  +trace_event_dyn_put_ref+0x15/0x20

  With the following stacktrace:

   perf_trace_init+0x8f/0xd0
   perf_tp_event_init+0x1f/0x40
   perf_try_init_event+0x4a/0x130
   perf_event_alloc+0x497/0xf40
   __do_sys_perf_event_open+0x1d4/0xf70
   __x64_sys_perf_event_open+0x20/0x30
   do_syscall_64+0x5c/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae

  I've debugged this and worked with upstream to get a fix into Linux.
  It was recently merged in 6.0-rc2.  See here:
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0-rc2&id=7249921d94ff64f67b733eca0b68853a62032b3d

  The problem started appearing as soon as our systems picked up the 
linux-aws-5.15 branch for Focal.  (That was 5.15.0-1015-aws, if memory serves). 
 Could you please cherry pick this fix and pull it back to the the linux and 
linux-aws kernels for Focal?  There's test here: 
https://lore.kernel.org/all/cover.1660347763.git.k...@templeofstupid.com/ that 
reproduces the problem very reliably for me.  With the patch applied, I no 
longer get the WARNs.
  --- 
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Aug 22 17:32 seq
   crw-rw 1 root audio 116, 33 Aug 22 17:32 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu27.24
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  CasperMD5CheckResult: skip
  DistroRelease: Ubuntu 20.04
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lsusb: Error: command ['lsusb'] failed with exit code 1:
  Lsusb-t:
   
  Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1:
  MachineType: Amazon EC2 c5d.12xlarge
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  ProcFB:
   
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-1015-aws 
root=PARTUUID=4986e35b-1bd5-45d3-b528-fa2edb861a38 ro console=tty1 
console=ttyS0 nvme_core.io_timeout=4294967295 panic=-1
  ProcVersionSignature: Ubuntu 5.15.0-1015.19~20.04.1-aws 5.15.39
  RelatedPackageVersions:
   linux-restricted-modules-5.15.0-1015-aws N/A
   linux-backports-modules-5.15.0-1015-aws  N/A
   linux-firmware   N/A
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  Tags:  focal uec-images
  Uname: Linux 5.15.0-1015-aws x86_64
  UnreportableReason: This report is about a package that is not installed.
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: N/A
  _MarkForUpload: False
  dmi.bios.date: 10/16/2017
  dmi.bios.release: 1.0
  dmi.bios.vendor: Amazon EC2
  dmi.bios.version: 1.0
  dmi.board.asset.tag: i-03f5d8581c7ad94aa
  dmi.board.vendor: Amazon EC2
  dmi.chassis.asset.tag: Amazon EC2
  dmi.chassis.type: 1
  dmi.chassis.vendor: Amazon EC2
  dmi.modalias: 
dmi:bvnAmazonEC2:bvr1.0:bd10/16/2017:br1.0:svnAmazonEC2:pnc5d.12xlarge:pvr:rvnAmazonEC2:rn:rvr:cvnAmazonEC2:ct1:cvr:sku:
  dmi.product.name: c5d.12xlarge
  dmi.sys.vendor: Amazon EC2

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1987232/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1987232] ProcModules.txt

2022-08-22 Thread Krister Johansen
apport information

** Attachment added: "ProcModules.txt"
   
https://bugs.launchpad.net/bugs/1987232/+attachment/5610812/+files/ProcModules.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1987232

Title:
  WARN in trace_event_dyn_put_ref

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  I have systems that are regularly hitting a WARN in
  trace_event_dyn_put_ref.

  The exact message is:

  WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46
  +trace_event_dyn_put_ref+0x15/0x20

  With the following stacktrace:

   perf_trace_init+0x8f/0xd0
   perf_tp_event_init+0x1f/0x40
   perf_try_init_event+0x4a/0x130
   perf_event_alloc+0x497/0xf40
   __do_sys_perf_event_open+0x1d4/0xf70
   __x64_sys_perf_event_open+0x20/0x30
   do_syscall_64+0x5c/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae

  I've debugged this and worked with upstream to get a fix into Linux.
  It was recently merged in 6.0-rc2.  See here:
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0-rc2&id=7249921d94ff64f67b733eca0b68853a62032b3d

  The problem started appearing as soon as our systems picked up the 
linux-aws-5.15 branch for Focal.  (That was 5.15.0-1015-aws, if memory serves). 
 Could you please cherry pick this fix and pull it back to the the linux and 
linux-aws kernels for Focal?  There's test here: 
https://lore.kernel.org/all/cover.1660347763.git.k...@templeofstupid.com/ that 
reproduces the problem very reliably for me.  With the patch applied, I no 
longer get the WARNs.
  --- 
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Aug 22 17:32 seq
   crw-rw 1 root audio 116, 33 Aug 22 17:32 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu27.24
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  CasperMD5CheckResult: skip
  DistroRelease: Ubuntu 20.04
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lsusb: Error: command ['lsusb'] failed with exit code 1:
  Lsusb-t:
   
  Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1:
  MachineType: Amazon EC2 c5d.12xlarge
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  ProcFB:
   
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-1015-aws 
root=PARTUUID=4986e35b-1bd5-45d3-b528-fa2edb861a38 ro console=tty1 
console=ttyS0 nvme_core.io_timeout=4294967295 panic=-1
  ProcVersionSignature: Ubuntu 5.15.0-1015.19~20.04.1-aws 5.15.39
  RelatedPackageVersions:
   linux-restricted-modules-5.15.0-1015-aws N/A
   linux-backports-modules-5.15.0-1015-aws  N/A
   linux-firmware   N/A
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  Tags:  focal uec-images
  Uname: Linux 5.15.0-1015-aws x86_64
  UnreportableReason: This report is about a package that is not installed.
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: N/A
  _MarkForUpload: False
  dmi.bios.date: 10/16/2017
  dmi.bios.release: 1.0
  dmi.bios.vendor: Amazon EC2
  dmi.bios.version: 1.0
  dmi.board.asset.tag: i-03f5d8581c7ad94aa
  dmi.board.vendor: Amazon EC2
  dmi.chassis.asset.tag: Amazon EC2
  dmi.chassis.type: 1
  dmi.chassis.vendor: Amazon EC2
  dmi.modalias: 
dmi:bvnAmazonEC2:bvr1.0:bd10/16/2017:br1.0:svnAmazonEC2:pnc5d.12xlarge:pvr:rvnAmazonEC2:rn:rvr:cvnAmazonEC2:ct1:cvr:sku:
  dmi.product.name: c5d.12xlarge
  dmi.sys.vendor: Amazon EC2

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1987232/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1987232] ProcInterrupts.txt

2022-08-22 Thread Krister Johansen
apport information

** Attachment added: "ProcInterrupts.txt"
   
https://bugs.launchpad.net/bugs/1987232/+attachment/5610811/+files/ProcInterrupts.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1987232

Title:
  WARN in trace_event_dyn_put_ref

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  I have systems that are regularly hitting a WARN in
  trace_event_dyn_put_ref.

  The exact message is:

  WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46
  +trace_event_dyn_put_ref+0x15/0x20

  With the following stacktrace:

   perf_trace_init+0x8f/0xd0
   perf_tp_event_init+0x1f/0x40
   perf_try_init_event+0x4a/0x130
   perf_event_alloc+0x497/0xf40
   __do_sys_perf_event_open+0x1d4/0xf70
   __x64_sys_perf_event_open+0x20/0x30
   do_syscall_64+0x5c/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae

  I've debugged this and worked with upstream to get a fix into Linux.
  It was recently merged in 6.0-rc2.  See here:
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0-rc2&id=7249921d94ff64f67b733eca0b68853a62032b3d

  The problem started appearing as soon as our systems picked up the 
linux-aws-5.15 branch for Focal.  (That was 5.15.0-1015-aws, if memory serves). 
 Could you please cherry pick this fix and pull it back to the the linux and 
linux-aws kernels for Focal?  There's test here: 
https://lore.kernel.org/all/cover.1660347763.git.k...@templeofstupid.com/ that 
reproduces the problem very reliably for me.  With the patch applied, I no 
longer get the WARNs.
  --- 
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Aug 22 17:32 seq
   crw-rw 1 root audio 116, 33 Aug 22 17:32 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu27.24
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  CasperMD5CheckResult: skip
  DistroRelease: Ubuntu 20.04
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lsusb: Error: command ['lsusb'] failed with exit code 1:
  Lsusb-t:
   
  Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1:
  MachineType: Amazon EC2 c5d.12xlarge
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  ProcFB:
   
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-1015-aws 
root=PARTUUID=4986e35b-1bd5-45d3-b528-fa2edb861a38 ro console=tty1 
console=ttyS0 nvme_core.io_timeout=4294967295 panic=-1
  ProcVersionSignature: Ubuntu 5.15.0-1015.19~20.04.1-aws 5.15.39
  RelatedPackageVersions:
   linux-restricted-modules-5.15.0-1015-aws N/A
   linux-backports-modules-5.15.0-1015-aws  N/A
   linux-firmware   N/A
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  Tags:  focal uec-images
  Uname: Linux 5.15.0-1015-aws x86_64
  UnreportableReason: This report is about a package that is not installed.
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: N/A
  _MarkForUpload: False
  dmi.bios.date: 10/16/2017
  dmi.bios.release: 1.0
  dmi.bios.vendor: Amazon EC2
  dmi.bios.version: 1.0
  dmi.board.asset.tag: i-03f5d8581c7ad94aa
  dmi.board.vendor: Amazon EC2
  dmi.chassis.asset.tag: Amazon EC2
  dmi.chassis.type: 1
  dmi.chassis.vendor: Amazon EC2
  dmi.modalias: 
dmi:bvnAmazonEC2:bvr1.0:bd10/16/2017:br1.0:svnAmazonEC2:pnc5d.12xlarge:pvr:rvnAmazonEC2:rn:rvr:cvnAmazonEC2:ct1:cvr:sku:
  dmi.product.name: c5d.12xlarge
  dmi.sys.vendor: Amazon EC2

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1987232/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1987232] ProcCpuinfoMinimal.txt

2022-08-22 Thread Krister Johansen
apport information

** Attachment added: "ProcCpuinfoMinimal.txt"
   
https://bugs.launchpad.net/bugs/1987232/+attachment/5610810/+files/ProcCpuinfoMinimal.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1987232

Title:
  WARN in trace_event_dyn_put_ref

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  I have systems that are regularly hitting a WARN in
  trace_event_dyn_put_ref.

  The exact message is:

  WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46
  +trace_event_dyn_put_ref+0x15/0x20

  With the following stacktrace:

   perf_trace_init+0x8f/0xd0
   perf_tp_event_init+0x1f/0x40
   perf_try_init_event+0x4a/0x130
   perf_event_alloc+0x497/0xf40
   __do_sys_perf_event_open+0x1d4/0xf70
   __x64_sys_perf_event_open+0x20/0x30
   do_syscall_64+0x5c/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae

  I've debugged this and worked with upstream to get a fix into Linux.
  It was recently merged in 6.0-rc2.  See here:
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0-rc2&id=7249921d94ff64f67b733eca0b68853a62032b3d

  The problem started appearing as soon as our systems picked up the 
linux-aws-5.15 branch for Focal.  (That was 5.15.0-1015-aws, if memory serves). 
 Could you please cherry pick this fix and pull it back to the the linux and 
linux-aws kernels for Focal?  There's test here: 
https://lore.kernel.org/all/cover.1660347763.git.k...@templeofstupid.com/ that 
reproduces the problem very reliably for me.  With the patch applied, I no 
longer get the WARNs.
  --- 
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Aug 22 17:32 seq
   crw-rw 1 root audio 116, 33 Aug 22 17:32 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu27.24
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  CasperMD5CheckResult: skip
  DistroRelease: Ubuntu 20.04
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lsusb: Error: command ['lsusb'] failed with exit code 1:
  Lsusb-t:
   
  Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1:
  MachineType: Amazon EC2 c5d.12xlarge
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  ProcFB:
   
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-1015-aws 
root=PARTUUID=4986e35b-1bd5-45d3-b528-fa2edb861a38 ro console=tty1 
console=ttyS0 nvme_core.io_timeout=4294967295 panic=-1
  ProcVersionSignature: Ubuntu 5.15.0-1015.19~20.04.1-aws 5.15.39
  RelatedPackageVersions:
   linux-restricted-modules-5.15.0-1015-aws N/A
   linux-backports-modules-5.15.0-1015-aws  N/A
   linux-firmware   N/A
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  Tags:  focal uec-images
  Uname: Linux 5.15.0-1015-aws x86_64
  UnreportableReason: This report is about a package that is not installed.
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: N/A
  _MarkForUpload: False
  dmi.bios.date: 10/16/2017
  dmi.bios.release: 1.0
  dmi.bios.vendor: Amazon EC2
  dmi.bios.version: 1.0
  dmi.board.asset.tag: i-03f5d8581c7ad94aa
  dmi.board.vendor: Amazon EC2
  dmi.chassis.asset.tag: Amazon EC2
  dmi.chassis.type: 1
  dmi.chassis.vendor: Amazon EC2
  dmi.modalias: 
dmi:bvnAmazonEC2:bvr1.0:bd10/16/2017:br1.0:svnAmazonEC2:pnc5d.12xlarge:pvr:rvnAmazonEC2:rn:rvr:cvnAmazonEC2:ct1:cvr:sku:
  dmi.product.name: c5d.12xlarge
  dmi.sys.vendor: Amazon EC2

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1987232/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1987232] UdevDb.txt

2022-08-22 Thread Krister Johansen
apport information

** Attachment added: "UdevDb.txt"
   https://bugs.launchpad.net/bugs/1987232/+attachment/5610813/+files/UdevDb.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1987232

Title:
  WARN in trace_event_dyn_put_ref

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  I have systems that are regularly hitting a WARN in
  trace_event_dyn_put_ref.

  The exact message is:

  WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46
  +trace_event_dyn_put_ref+0x15/0x20

  With the following stacktrace:

   perf_trace_init+0x8f/0xd0
   perf_tp_event_init+0x1f/0x40
   perf_try_init_event+0x4a/0x130
   perf_event_alloc+0x497/0xf40
   __do_sys_perf_event_open+0x1d4/0xf70
   __x64_sys_perf_event_open+0x20/0x30
   do_syscall_64+0x5c/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae

  I've debugged this and worked with upstream to get a fix into Linux.
  It was recently merged in 6.0-rc2.  See here:
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0-rc2&id=7249921d94ff64f67b733eca0b68853a62032b3d

  The problem started appearing as soon as our systems picked up the 
linux-aws-5.15 branch for Focal.  (That was 5.15.0-1015-aws, if memory serves). 
 Could you please cherry pick this fix and pull it back to the the linux and 
linux-aws kernels for Focal?  There's test here: 
https://lore.kernel.org/all/cover.1660347763.git.k...@templeofstupid.com/ that 
reproduces the problem very reliably for me.  With the patch applied, I no 
longer get the WARNs.
  --- 
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Aug 22 17:32 seq
   crw-rw 1 root audio 116, 33 Aug 22 17:32 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu27.24
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  CasperMD5CheckResult: skip
  DistroRelease: Ubuntu 20.04
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lsusb: Error: command ['lsusb'] failed with exit code 1:
  Lsusb-t:
   
  Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1:
  MachineType: Amazon EC2 c5d.12xlarge
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  ProcFB:
   
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-1015-aws 
root=PARTUUID=4986e35b-1bd5-45d3-b528-fa2edb861a38 ro console=tty1 
console=ttyS0 nvme_core.io_timeout=4294967295 panic=-1
  ProcVersionSignature: Ubuntu 5.15.0-1015.19~20.04.1-aws 5.15.39
  RelatedPackageVersions:
   linux-restricted-modules-5.15.0-1015-aws N/A
   linux-backports-modules-5.15.0-1015-aws  N/A
   linux-firmware   N/A
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  Tags:  focal uec-images
  Uname: Linux 5.15.0-1015-aws x86_64
  UnreportableReason: This report is about a package that is not installed.
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: N/A
  _MarkForUpload: False
  dmi.bios.date: 10/16/2017
  dmi.bios.release: 1.0
  dmi.bios.vendor: Amazon EC2
  dmi.bios.version: 1.0
  dmi.board.asset.tag: i-03f5d8581c7ad94aa
  dmi.board.vendor: Amazon EC2
  dmi.chassis.asset.tag: Amazon EC2
  dmi.chassis.type: 1
  dmi.chassis.vendor: Amazon EC2
  dmi.modalias: 
dmi:bvnAmazonEC2:bvr1.0:bd10/16/2017:br1.0:svnAmazonEC2:pnc5d.12xlarge:pvr:rvnAmazonEC2:rn:rvr:cvnAmazonEC2:ct1:cvr:sku:
  dmi.product.name: c5d.12xlarge
  dmi.sys.vendor: Amazon EC2

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1987232/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1987232] Re: WARN in trace_event_dyn_put_ref

2022-08-22 Thread Krister Johansen
The fix has also been added to the Stable queue for 5.15 and 5.19 as of
this morning:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-
queue.git/tree/queue-5.19/tracing-perf-fix-double-put-of-trace-event-
when-init-fails.patch

https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-
queue.git/tree/queue-5.15/tracing-perf-fix-double-put-of-trace-event-
when-init-fails.patch

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1987232

Title:
  WARN in trace_event_dyn_put_ref

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  I have systems that are regularly hitting a WARN in
  trace_event_dyn_put_ref.

  The exact message is:

  WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46
  +trace_event_dyn_put_ref+0x15/0x20

  With the following stacktrace:

   perf_trace_init+0x8f/0xd0
   perf_tp_event_init+0x1f/0x40
   perf_try_init_event+0x4a/0x130
   perf_event_alloc+0x497/0xf40
   __do_sys_perf_event_open+0x1d4/0xf70
   __x64_sys_perf_event_open+0x20/0x30
   do_syscall_64+0x5c/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae

  I've debugged this and worked with upstream to get a fix into Linux.
  It was recently merged in 6.0-rc2.  See here:
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0-rc2&id=7249921d94ff64f67b733eca0b68853a62032b3d

  The problem started appearing as soon as our systems picked up the 
linux-aws-5.15 branch for Focal.  (That was 5.15.0-1015-aws, if memory serves). 
 Could you please cherry pick this fix and pull it back to the the linux and 
linux-aws kernels for Focal?  There's test here: 
https://lore.kernel.org/all/cover.1660347763.git.k...@templeofstupid.com/ that 
reproduces the problem very reliably for me.  With the patch applied, I no 
longer get the WARNs.
  --- 
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Aug 22 17:32 seq
   crw-rw 1 root audio 116, 33 Aug 22 17:32 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu27.24
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  CasperMD5CheckResult: skip
  DistroRelease: Ubuntu 20.04
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lsusb: Error: command ['lsusb'] failed with exit code 1:
  Lsusb-t:
   
  Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1:
  MachineType: Amazon EC2 c5d.12xlarge
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  ProcFB:
   
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-1015-aws 
root=PARTUUID=4986e35b-1bd5-45d3-b528-fa2edb861a38 ro console=tty1 
console=ttyS0 nvme_core.io_timeout=4294967295 panic=-1
  ProcVersionSignature: Ubuntu 5.15.0-1015.19~20.04.1-aws 5.15.39
  RelatedPackageVersions:
   linux-restricted-modules-5.15.0-1015-aws N/A
   linux-backports-modules-5.15.0-1015-aws  N/A
   linux-firmware   N/A
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  Tags:  focal uec-images
  Uname: Linux 5.15.0-1015-aws x86_64
  UnreportableReason: This report is about a package that is not installed.
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: N/A
  _MarkForUpload: False
  dmi.bios.date: 10/16/2017
  dmi.bios.release: 1.0
  dmi.bios.vendor: Amazon EC2
  dmi.bios.version: 1.0
  dmi.board.asset.tag: i-03f5d8581c7ad94aa
  dmi.board.vendor: Amazon EC2
  dmi.chassis.asset.tag: Amazon EC2
  dmi.chassis.type: 1
  dmi.chassis.vendor: Amazon EC2
  dmi.modalias: 
dmi:bvnAmazonEC2:bvr1.0:bd10/16/2017:br1.0:svnAmazonEC2:pnc5d.12xlarge:pvr:rvnAmazonEC2:rn:rvr:cvnAmazonEC2:ct1:cvr:sku:
  dmi.product.name: c5d.12xlarge
  dmi.sys.vendor: Amazon EC2

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1987232/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1987232] WifiSyslog.txt

2022-08-22 Thread Krister Johansen
apport information

** Attachment added: "WifiSyslog.txt"
   
https://bugs.launchpad.net/bugs/1987232/+attachment/5610814/+files/WifiSyslog.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1987232

Title:
  WARN in trace_event_dyn_put_ref

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  I have systems that are regularly hitting a WARN in
  trace_event_dyn_put_ref.

  The exact message is:

  WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46
  +trace_event_dyn_put_ref+0x15/0x20

  With the following stacktrace:

   perf_trace_init+0x8f/0xd0
   perf_tp_event_init+0x1f/0x40
   perf_try_init_event+0x4a/0x130
   perf_event_alloc+0x497/0xf40
   __do_sys_perf_event_open+0x1d4/0xf70
   __x64_sys_perf_event_open+0x20/0x30
   do_syscall_64+0x5c/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae

  I've debugged this and worked with upstream to get a fix into Linux.
  It was recently merged in 6.0-rc2.  See here:
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0-rc2&id=7249921d94ff64f67b733eca0b68853a62032b3d

  The problem started appearing as soon as our systems picked up the 
linux-aws-5.15 branch for Focal.  (That was 5.15.0-1015-aws, if memory serves). 
 Could you please cherry pick this fix and pull it back to the the linux and 
linux-aws kernels for Focal?  There's test here: 
https://lore.kernel.org/all/cover.1660347763.git.k...@templeofstupid.com/ that 
reproduces the problem very reliably for me.  With the patch applied, I no 
longer get the WARNs.
  --- 
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Aug 22 17:32 seq
   crw-rw 1 root audio 116, 33 Aug 22 17:32 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu27.24
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  CasperMD5CheckResult: skip
  DistroRelease: Ubuntu 20.04
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lsusb: Error: command ['lsusb'] failed with exit code 1:
  Lsusb-t:
   
  Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1:
  MachineType: Amazon EC2 c5d.12xlarge
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  ProcFB:
   
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-1015-aws 
root=PARTUUID=4986e35b-1bd5-45d3-b528-fa2edb861a38 ro console=tty1 
console=ttyS0 nvme_core.io_timeout=4294967295 panic=-1
  ProcVersionSignature: Ubuntu 5.15.0-1015.19~20.04.1-aws 5.15.39
  RelatedPackageVersions:
   linux-restricted-modules-5.15.0-1015-aws N/A
   linux-backports-modules-5.15.0-1015-aws  N/A
   linux-firmware   N/A
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  Tags:  focal uec-images
  Uname: Linux 5.15.0-1015-aws x86_64
  UnreportableReason: This report is about a package that is not installed.
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: N/A
  _MarkForUpload: False
  dmi.bios.date: 10/16/2017
  dmi.bios.release: 1.0
  dmi.bios.vendor: Amazon EC2
  dmi.bios.version: 1.0
  dmi.board.asset.tag: i-03f5d8581c7ad94aa
  dmi.board.vendor: Amazon EC2
  dmi.chassis.asset.tag: Amazon EC2
  dmi.chassis.type: 1
  dmi.chassis.vendor: Amazon EC2
  dmi.modalias: 
dmi:bvnAmazonEC2:bvr1.0:bd10/16/2017:br1.0:svnAmazonEC2:pnc5d.12xlarge:pvr:rvnAmazonEC2:rn:rvr:cvnAmazonEC2:ct1:cvr:sku:
  dmi.product.name: c5d.12xlarge
  dmi.sys.vendor: Amazon EC2

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1987232/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1987232] acpidump.txt

2022-08-22 Thread Krister Johansen
apport information

** Attachment added: "acpidump.txt"
   
https://bugs.launchpad.net/bugs/1987232/+attachment/5610815/+files/acpidump.txt

** Changed in: linux (Ubuntu)
   Status: Incomplete => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1987232

Title:
  WARN in trace_event_dyn_put_ref

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  I have systems that are regularly hitting a WARN in
  trace_event_dyn_put_ref.

  The exact message is:

  WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46
  +trace_event_dyn_put_ref+0x15/0x20

  With the following stacktrace:

   perf_trace_init+0x8f/0xd0
   perf_tp_event_init+0x1f/0x40
   perf_try_init_event+0x4a/0x130
   perf_event_alloc+0x497/0xf40
   __do_sys_perf_event_open+0x1d4/0xf70
   __x64_sys_perf_event_open+0x20/0x30
   do_syscall_64+0x5c/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae

  I've debugged this and worked with upstream to get a fix into Linux.
  It was recently merged in 6.0-rc2.  See here:
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0-rc2&id=7249921d94ff64f67b733eca0b68853a62032b3d

  The problem started appearing as soon as our systems picked up the 
linux-aws-5.15 branch for Focal.  (That was 5.15.0-1015-aws, if memory serves). 
 Could you please cherry pick this fix and pull it back to the the linux and 
linux-aws kernels for Focal?  There's test here: 
https://lore.kernel.org/all/cover.1660347763.git.k...@templeofstupid.com/ that 
reproduces the problem very reliably for me.  With the patch applied, I no 
longer get the WARNs.
  --- 
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Aug 22 17:32 seq
   crw-rw 1 root audio 116, 33 Aug 22 17:32 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu27.24
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  CasperMD5CheckResult: skip
  DistroRelease: Ubuntu 20.04
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lsusb: Error: command ['lsusb'] failed with exit code 1:
  Lsusb-t:
   
  Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1:
  MachineType: Amazon EC2 c5d.12xlarge
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  ProcFB:
   
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-1015-aws 
root=PARTUUID=4986e35b-1bd5-45d3-b528-fa2edb861a38 ro console=tty1 
console=ttyS0 nvme_core.io_timeout=4294967295 panic=-1
  ProcVersionSignature: Ubuntu 5.15.0-1015.19~20.04.1-aws 5.15.39
  RelatedPackageVersions:
   linux-restricted-modules-5.15.0-1015-aws N/A
   linux-backports-modules-5.15.0-1015-aws  N/A
   linux-firmware   N/A
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  Tags:  focal uec-images
  Uname: Linux 5.15.0-1015-aws x86_64
  UnreportableReason: This report is about a package that is not installed.
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: N/A
  _MarkForUpload: False
  dmi.bios.date: 10/16/2017
  dmi.bios.release: 1.0
  dmi.bios.vendor: Amazon EC2
  dmi.bios.version: 1.0
  dmi.board.asset.tag: i-03f5d8581c7ad94aa
  dmi.board.vendor: Amazon EC2
  dmi.chassis.asset.tag: Amazon EC2
  dmi.chassis.type: 1
  dmi.chassis.vendor: Amazon EC2
  dmi.modalias: 
dmi:bvnAmazonEC2:bvr1.0:bd10/16/2017:br1.0:svnAmazonEC2:pnc5d.12xlarge:pvr:rvnAmazonEC2:rn:rvr:cvnAmazonEC2:ct1:cvr:sku:
  dmi.product.name: c5d.12xlarge
  dmi.sys.vendor: Amazon EC2

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1987232/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1987232] Re: WARN in trace_event_dyn_put_ref

2022-08-29 Thread Krister Johansen
Should this also get nominated as affecting Focal?  I hit this on the
5.15 kernel that was attached to linux-aws for Focal.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1987232

Title:
  WARN in trace_event_dyn_put_ref

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Jammy:
  In Progress
Status in linux source package in Kinetic:
  Confirmed

Bug description:
  I have systems that are regularly hitting a WARN in
  trace_event_dyn_put_ref.

  The exact message is:

  WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46
  +trace_event_dyn_put_ref+0x15/0x20

  With the following stacktrace:

   perf_trace_init+0x8f/0xd0
   perf_tp_event_init+0x1f/0x40
   perf_try_init_event+0x4a/0x130
   perf_event_alloc+0x497/0xf40
   __do_sys_perf_event_open+0x1d4/0xf70
   __x64_sys_perf_event_open+0x20/0x30
   do_syscall_64+0x5c/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae

  I've debugged this and worked with upstream to get a fix into Linux.
  It was recently merged in 6.0-rc2.  See here:
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0-rc2&id=7249921d94ff64f67b733eca0b68853a62032b3d

  The problem started appearing as soon as our systems picked up the 
linux-aws-5.15 branch for Focal.  (That was 5.15.0-1015-aws, if memory serves). 
 Could you please cherry pick this fix and pull it back to the the linux and 
linux-aws kernels for Focal?  There's test here: 
https://lore.kernel.org/all/cover.1660347763.git.k...@templeofstupid.com/ that 
reproduces the problem very reliably for me.  With the patch applied, I no 
longer get the WARNs.
  --- 
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Aug 22 17:32 seq
   crw-rw 1 root audio 116, 33 Aug 22 17:32 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu27.24
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  CasperMD5CheckResult: skip
  DistroRelease: Ubuntu 20.04
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lsusb: Error: command ['lsusb'] failed with exit code 1:
  Lsusb-t:
   
  Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1:
  MachineType: Amazon EC2 c5d.12xlarge
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  ProcFB:
   
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-1015-aws 
root=PARTUUID=4986e35b-1bd5-45d3-b528-fa2edb861a38 ro console=tty1 
console=ttyS0 nvme_core.io_timeout=4294967295 panic=-1
  ProcVersionSignature: Ubuntu 5.15.0-1015.19~20.04.1-aws 5.15.39
  RelatedPackageVersions:
   linux-restricted-modules-5.15.0-1015-aws N/A
   linux-backports-modules-5.15.0-1015-aws  N/A
   linux-firmware   N/A
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  Tags:  focal uec-images
  Uname: Linux 5.15.0-1015-aws x86_64
  UnreportableReason: This report is about a package that is not installed.
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: N/A
  _MarkForUpload: False
  dmi.bios.date: 10/16/2017
  dmi.bios.release: 1.0
  dmi.bios.vendor: Amazon EC2
  dmi.bios.version: 1.0
  dmi.board.asset.tag: i-03f5d8581c7ad94aa
  dmi.board.vendor: Amazon EC2
  dmi.chassis.asset.tag: Amazon EC2
  dmi.chassis.type: 1
  dmi.chassis.vendor: Amazon EC2
  dmi.modalias: 
dmi:bvnAmazonEC2:bvr1.0:bd10/16/2017:br1.0:svnAmazonEC2:pnc5d.12xlarge:pvr:rvnAmazonEC2:rn:rvr:cvnAmazonEC2:ct1:cvr:sku:
  dmi.product.name: c5d.12xlarge
  dmi.sys.vendor: Amazon EC2

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1987232/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2033122] Re: Request backport of xen timekeeping performance improvements

2023-09-08 Thread Krister Johansen
Thanks, booted both kernels on i3 instances that reported support for
invariant tsc and had nomigrate set and was able to validate that both
selected the tsc instead of xen as the clocksource.

** Tags removed: verification-needed-jammy-linux verification-needed-lunar-linux
** Tags added: verification-done-jammy-linux verification-done-lunar-linux

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2033122

Title:
  Request backport of xen timekeeping performance improvements

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Jammy:
  Fix Committed
Status in linux source package in Lunar:
  Fix Committed

Bug description:
  Users, especially those on EC2, are encouraged to select tsc as their
  default clocksource.  However, this requires manual tuning of the
  operating system. Kvm can determine if it safe to use the tsc, and
  will default to that instead of its pvclock when appropriate.  This
  requests a backport of patch does the same for Xen instances.

  If appropriate, it's fine if this is applied to only the linux-aws
  branches.

  Not all Xen EC2 instances advertise explicit nomigrate support,
  however, on those that do we'll select tsc by default.  On the subset
  of hosts where this is advertised, users will safely default to the
  more performant clocksource.

  [Impact]
  Xen instances default to the xen clocksource which has been documented to be 
slower.  This is required for instances where the tsc is not safe to use, or 
the guest is subject to migration.  On some platforms the performance impact 
can be high, and users are encouraged to select the tsc when appropriate.  
Instead of leaving up to users to figure this out by reading a variety of 
different documents, pick the fast clocksource when it can be determined to be 
safe to do so.

  [Backport]
  Clean cherry pick.  No conflicts applying to 5.15 or 6.2.

  [Test]
  Booted EC2 xen instances with and without this patch and validated that on 
those that properly advertised the required criteria via cpuid, that the 
clocksource defaulted to tsc instead of xen.

  [Potential Regression]
  Potential is low, since only absurd configurations could lead to a problem.  
If this is considered risky, it can be applied to only linux-aws where the 
documented guidance is for users to enable tsc as the clocksource on Xen.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2033122/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1987232] Re: WARN in trace_event_dyn_put_ref

2022-09-18 Thread Krister Johansen
@Stefan thanks for explaining how the process works.  I appreciate your
willingness to take this patch ahead of its arrival in the stable pull
for the Jammy train.  One of your updates mentioned TBD on a test.  I
have a reproducer in the original cover letter to Steven here, if it
helps:

https://lore.kernel.org/all/cover.1660347763.git.k...@templeofstupid.com/

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1987232

Title:
  WARN in trace_event_dyn_put_ref

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Jammy:
  Fix Committed
Status in linux source package in Kinetic:
  Confirmed

Bug description:
  [SRU Justification]

  Impact: Some imbalanced ref-counting produces kernel warnings
  regularly. Since it is a warning level, this triggers system
  monitoring on servers which in turn causes unnecessary work for
  inspecting the logs.

  Fix: There is a fix upstream and also backported to the upstream
  stable branch. However we are still a bit behind catching up with the
  latest versions. Since this is having quite an impact and the fix is
  rather straight forward, we pull this in from upstream stable ahead of
  time.

  Test case: tbd

  Regression potential: Regressions would manifest as different errors
  related to ref-counting.

  ---

  I have systems that are regularly hitting a WARN in
  trace_event_dyn_put_ref.

  The exact message is:

  WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46
  +trace_event_dyn_put_ref+0x15/0x20

  With the following stacktrace:

   perf_trace_init+0x8f/0xd0
   perf_tp_event_init+0x1f/0x40
   perf_try_init_event+0x4a/0x130
   perf_event_alloc+0x497/0xf40
   __do_sys_perf_event_open+0x1d4/0xf70
   __x64_sys_perf_event_open+0x20/0x30
   do_syscall_64+0x5c/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae

  I've debugged this and worked with upstream to get a fix into Linux.
  It was recently merged in 6.0-rc2.  See here:
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0-rc2&id=7249921d94ff64f67b733eca0b68853a62032b3d

  The problem started appearing as soon as our systems picked up the 
linux-aws-5.15 branch for Focal.  (That was 5.15.0-1015-aws, if memory serves). 
 Could you please cherry pick this fix and pull it back to the the linux and 
linux-aws kernels for Focal?  There's test here: 
https://lore.kernel.org/all/cover.1660347763.git.k...@templeofstupid.com/ that 
reproduces the problem very reliably for me.  With the patch applied, I no 
longer get the WARNs.
  ---
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Aug 22 17:32 seq
   crw-rw 1 root audio 116, 33 Aug 22 17:32 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu27.24
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  CasperMD5CheckResult: skip
  DistroRelease: Ubuntu 20.04
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lsusb: Error: command ['lsusb'] failed with exit code 1:
  Lsusb-t:

  Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1:
  MachineType: Amazon EC2 c5d.12xlarge
  Package: linux (not installed)
  PciMultimedia:

  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  ProcFB:

  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-1015-aws 
root=PARTUUID=4986e35b-1bd5-45d3-b528-fa2edb861a38 ro console=tty1 
console=ttyS0 nvme_core.io_timeout=4294967295 panic=-1
  ProcVersionSignature: Ubuntu 5.15.0-1015.19~20.04.1-aws 5.15.39
  RelatedPackageVersions:
   linux-restricted-modules-5.15.0-1015-aws N/A
   linux-backports-modules-5.15.0-1015-aws  N/A
   linux-firmware   N/A
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  Tags:  focal uec-images
  Uname: Linux 5.15.0-1015-aws x86_64
  UnreportableReason: This report is about a package that is not installed.
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: N/A
  _MarkForUpload: False
  dmi.bios.date: 10/16/2017
  dmi.bios.release: 1.0
  dmi.bios.vendor: Amazon EC2
  dmi.bios.version: 1.0
  dmi.board.asset.tag: i-03f5d8581c7ad94aa
  dmi.board.vendor: Amazon EC2
  dmi.chassis.asset.tag: Amazon EC2
  dmi.chassis.type: 1
  dmi.chassis.vendor: Amazon EC2
  dmi.modalias: 
dmi:bvnAmazonEC2:bvr1.0:bd10/16/2017:br1.0:svnAmazonEC2:pnc5d.12xlarge:pvr:rvnAmazonEC2:rn:rvr:cvnAmazonEC2:ct1:cvr:sku:
  dmi.product.name: c5d.12xlarge
  dmi.sys.vendor: Amazon EC2

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1987232/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@li

[Kernel-packages] [Bug 1987232] Re: WARN in trace_event_dyn_put_ref

2022-09-22 Thread Krister Johansen
I ran the original reproducer on a VM that was running
linux/5.15.0-50.56 and linux/linux/5.15.0-46.49.  On the former the
problem did not reproduce, but on the latter it did.  Marking this as
verified via testing and setting 'verification-done-jammy'.

** Tags added: verification-done-jammy

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1987232

Title:
  WARN in trace_event_dyn_put_ref

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Jammy:
  Fix Committed
Status in linux source package in Kinetic:
  Confirmed

Bug description:
  [SRU Justification]

  Impact: Some imbalanced ref-counting produces kernel warnings
  regularly. Since it is a warning level, this triggers system
  monitoring on servers which in turn causes unnecessary work for
  inspecting the logs.

  Fix: There is a fix upstream and also backported to the upstream
  stable branch. However we are still a bit behind catching up with the
  latest versions. Since this is having quite an impact and the fix is
  rather straight forward, we pull this in from upstream stable ahead of
  time.

  Test case: tbd

  Regression potential: Regressions would manifest as different errors
  related to ref-counting.

  ---

  I have systems that are regularly hitting a WARN in
  trace_event_dyn_put_ref.

  The exact message is:

  WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46
  +trace_event_dyn_put_ref+0x15/0x20

  With the following stacktrace:

   perf_trace_init+0x8f/0xd0
   perf_tp_event_init+0x1f/0x40
   perf_try_init_event+0x4a/0x130
   perf_event_alloc+0x497/0xf40
   __do_sys_perf_event_open+0x1d4/0xf70
   __x64_sys_perf_event_open+0x20/0x30
   do_syscall_64+0x5c/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae

  I've debugged this and worked with upstream to get a fix into Linux.
  It was recently merged in 6.0-rc2.  See here:
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0-rc2&id=7249921d94ff64f67b733eca0b68853a62032b3d

  The problem started appearing as soon as our systems picked up the 
linux-aws-5.15 branch for Focal.  (That was 5.15.0-1015-aws, if memory serves). 
 Could you please cherry pick this fix and pull it back to the the linux and 
linux-aws kernels for Focal?  There's test here: 
https://lore.kernel.org/all/cover.1660347763.git.k...@templeofstupid.com/ that 
reproduces the problem very reliably for me.  With the patch applied, I no 
longer get the WARNs.
  ---
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Aug 22 17:32 seq
   crw-rw 1 root audio 116, 33 Aug 22 17:32 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu27.24
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  CasperMD5CheckResult: skip
  DistroRelease: Ubuntu 20.04
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lsusb: Error: command ['lsusb'] failed with exit code 1:
  Lsusb-t:

  Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1:
  MachineType: Amazon EC2 c5d.12xlarge
  Package: linux (not installed)
  PciMultimedia:

  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  ProcFB:

  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-1015-aws 
root=PARTUUID=4986e35b-1bd5-45d3-b528-fa2edb861a38 ro console=tty1 
console=ttyS0 nvme_core.io_timeout=4294967295 panic=-1
  ProcVersionSignature: Ubuntu 5.15.0-1015.19~20.04.1-aws 5.15.39
  RelatedPackageVersions:
   linux-restricted-modules-5.15.0-1015-aws N/A
   linux-backports-modules-5.15.0-1015-aws  N/A
   linux-firmware   N/A
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  Tags:  focal uec-images
  Uname: Linux 5.15.0-1015-aws x86_64
  UnreportableReason: This report is about a package that is not installed.
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: N/A
  _MarkForUpload: False
  dmi.bios.date: 10/16/2017
  dmi.bios.release: 1.0
  dmi.bios.vendor: Amazon EC2
  dmi.bios.version: 1.0
  dmi.board.asset.tag: i-03f5d8581c7ad94aa
  dmi.board.vendor: Amazon EC2
  dmi.chassis.asset.tag: Amazon EC2
  dmi.chassis.type: 1
  dmi.chassis.vendor: Amazon EC2
  dmi.modalias: 
dmi:bvnAmazonEC2:bvr1.0:bd10/16/2017:br1.0:svnAmazonEC2:pnc5d.12xlarge:pvr:rvnAmazonEC2:rn:rvr:cvnAmazonEC2:ct1:cvr:sku:
  dmi.product.name: c5d.12xlarge
  dmi.sys.vendor: Amazon EC2

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1987232/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages

[Kernel-packages] [Bug 2089373] Re: WARN in trc_wait_for_one_reader about failed IPIs

2025-01-22 Thread Krister Johansen
I've re-run the tests against the proposed kernel and no longer see
these warnings.  Thanks for taking the patches to fix this!

** Tags removed: verification-needed-jammy-linux
** Tags added: verification-done-jammy-linux

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2089373

Title:
  WARN in trc_wait_for_one_reader about failed IPIs

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Jammy:
  Fix Committed

Bug description:
  [Impact]

  When ending bpf tracing, 5.15 kernels now report a warning in
  trc_wait_for_one_reader() on platforms that support hot-plugging CPUs,
  but that do not have all of their hotplug slots populated.  In this
  submitter's environment, it reproduces on Xen EC2 instances, but not
  Nitro ones.

  The warning looks like this:

  kernel: [ 6416.920266] [ cut here ]
  kernel: [ 6416.920272] trc_wait_for_one_reader(): smp_call_function_single() 
failed for CPU: 64
  kernel: [ 6416.920289] WARNING: CPU: 0 PID: 13 at kernel/rcu/tasks.h:1044 
trc_wait_for_one_reader+0x2b8/0x300
  kernel: [ 6416.920299] Modules linked in: xt_state xt_connmark 
nf_conntrack_netlink nfnetlink xt_addrtype xt_statistic xt_nat xt_tcpudp 
ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs nvidia_uvm(POE) nvidia_drm(POE) 
drm_kms_helper cec rc_core fb_sys_fops syscopyarea sysfillrect sysimgblt 
nvidia_modeset(POE) nvidia(POE) iptable_mangle ip6table_mangle ip6table_filter 
ip6table_nat ip6_tables xt_MASQUERADE xt_conntrack xt_comment iptable_filter 
xt_mark iptable_nat nf_nat bpfilter aufs overlay udp_diag tcp_diag inet_diag 
binfmt_misc nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua 
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sha256_ssse3 sha1_ssse3 
aesni_intel input_leds psmouse crypto_simd cryptd serio_raw floppy sch_fq_codel 
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ena drm efi_pstore 
ip_tables x_tables autofs4
  kernel: [ 6416.920368] CPU: 0 PID: 13 Comm: rcu_tasks_trace Tainted: P OE 
5.15.0-1071-aws #77~20.04.1-Ubuntu
  kernel: [ 6416.920372] Hardware name: Xen HVM domU, BIOS 4.11.amazon 
08/24/2006
  kernel: [ 6416.920374] RIP: 0010:trc_wait_for_one_reader+0x2b8/0x300
  kernel: [ 6416.920376] Code: 00 00 00 4c 89 ef e8 37 ac 4e 00 eb 9f 44 89 fa 
48 c7 c6 00 63 e2 b8 48 c7 c7 a0 9a 1e b9 c6 05 2f 2e 09 02 01 e8 15 2e b9 00 
<0f> 0b e9 31 ff ff ff 4c 89 ee 48 c7 c7 20 df b7 b9 e8 a2 99 52 00
  kernel: [ 6416.920380] RSP: 0018:9e048c4efe00 EFLAGS: 00010286
  kernel: [ 6416.920382] RAX:  RBX:  RCX: 
0027
  kernel: [ 6416.920384] RDX: 0027 RSI: 0003 RDI: 
93074ae20588
  kernel: [ 6416.920385] RBP: 9e048c4efe28 R08: 93074ae20580 R09: 
0001
  kernel: [ 6416.920387] R10: 000a R11: 93463feb2c7f R12: 
92cbc6a1e600
  kernel: [ 6416.920389] R13: 0040 R14: 000205a4 R15: 
0040
  kernel: [ 6416.920390] FS: () GS:93074ae0() 
knlGS:
  kernel: [ 6416.920393] CS: 0010 DS:  ES:  CR0: 80050033
  kernel: [ 6416.920394] CR2: 7f4a72b04098 CR3: 0046c8964001 CR4: 
001706f0
  kernel: [ 6416.920399] Call Trace:
  kernel: [ 6416.920401] 
  kernel: [ 6416.920404] ? show_regs.cold+0x1a/0x1f
  kernel: [ 6416.920410] ? trc_wait_for_one_reader+0x2b8/0x300
  kernel: [ 6416.920412] ? __warn+0x8b/0xe0
  kernel: [ 6416.920418] ? trc_wait_for_one_reader+0x2b8/0x300
  kernel: [ 6416.920421] ? report_bug+0xd5/0x110
  kernel: [ 6416.920427] ? handle_bug+0x39/0x90
  kernel: [ 6416.920431] ? exc_invalid_op+0x19/0x70
  kernel: [ 6416.920434] ? asm_exc_invalid_op+0x1b/0x20
  kernel: [ 6416.920442] ? trc_wait_for_one_reader+0x2b8/0x300
  kernel: [ 6416.920446] rcu_tasks_trace_postscan+0x47/0x80
  kernel: [ 6416.920449] rcu_tasks_wait_gp+0x108/0x210
  kernel: [ 6416.920453] rcu_tasks_kthread+0x10f/0x1c0
  kernel: [ 6416.920456] ? wait_woken+0x60/0x60
  kernel: [ 6416.920462] ? show_rcu_tasks_trace_gp_kthread+0x80/0x80
  kernel: [ 6416.920464] kthread+0x12a/0x150
  kernel: [ 6416.920471] ? set_kthread_struct+0x50/0x50
  kernel: [ 6416.920476] ret_from_fork+0x22/0x30
  kernel: [ 6416.920485] 
  kernel: [ 6416.920486] ---[ end trace 0500611ddaff33a7 ]---

  The problem appears when:

  - The system is performing a rcu_tasks_trace grace period wait
  - The system has more hot plug CPU slots available than are populated
  - The rcu tasks postscan detects a holdout

  The problem is actually caused by a mismerge of 9b3c4ab304("sched,rcu:
  Rework try_invoke_on_locked_down_task()").  When that patch was
  applied, a conflict around task nesting was improperly resolved and
  lead to quiescent tasks getting flagged as holdouts.  This in turn
  results in more IPIs than necessary to idle CPUs, as well as WARNs
  about failing to send IPIs to CPUs tha

[Kernel-packages] [Bug 2104210] Re: uprobe-related panics during profiling

2025-03-25 Thread Krister Johansen
Patches sent to list: https://lists.ubuntu.com/archives/kernel-
team/2025-March/158376.html

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2104210

Title:
  uprobe-related panics during profiling

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Impact]

  On systems that utilize both uprobes and perf_events style profiling, it
  is possible to hit a panic in the uprobe_free_utask code.  This occurs
  during process exit.  If the profiler fires while uprobe_free_utask is
  in the process of cleaning up the utask, the NMI may read freed memory
  because the cleanup code frees the utask before setting its pointer to
  NULL.  This submitter has encountered the problem on systems running
  workloads without intentionally trying to trigger the problem.

  The stacks look something like this:

   RIP: 0010:is_uprobe_at_func_entry+0x28/0x80
   ...
? die_addr+0x36/0x90
? exc_general_protection+0x217/0x420
? asm_exc_general_protection+0x26/0x30
? is_uprobe_at_func_entry+0x28/0x80
perf_callchain_user+0x20a/0x360
get_perf_callchain+0x147/0x1d0
bpf_get_stackid+0x60/0x90
bpf_prog_9aac297fb833e2f5_do_perf_event+0x434/0x53b
? __smp_call_single_queue+0xad/0x120
bpf_overflow_handler+0x75/0x110
...
asm_sysvec_apic_timer_interrupt+0x1a/0x20
   RIP: 0010:__kmem_cache_free+0x1cb/0x350
   ...
? uprobe_free_utask+0x62/0x80
? acct_collect+0x4c/0x220
uprobe_free_utask+0x62/0x80
mm_release+0x12/0xb0
do_exit+0x26b/0xaa0
__x64_sys_exit+0x1b/0x20
do_syscall_64+0x5a/0x80

  The person who reported the issue upstream provided this reproducer.
  (Run each command in a separate terminal):

# while :; do bpftrace -e 'uprobe:/bin/ls:_start  { printf("hit\n"); }' -c 
ls; done
# bpftrace -e 'profile:hz:10 { @[ustack()] = count(); }'

  However, since the binutils are stripped on some of the releases where I
  tested this, I ran the following instead:

# while :; do bpftrace -e 'uprobe:libc:malloc  { printf("hit\n"); }' -c ls; 
done
# bpftrace -e 'profile:hz:10 { @[ustack()] = count(); }'

  [Backport]
  The fix is upstream as commit b583ef82b671 ("uprobes: Fix race in
  uprobe_free_utask")

  However this patch was massaged by stable for its inclusion in 6.12,
  6.6, and 6.1.  Instead of re-doing stable's conflict resolution, take
  the patch directly from 6.6.x instead, at commit eff00c5e29ab.

  This patch is in stable as of 6.12.19, 6.6.83, and 6.1.131.

  [Test]

  I've run the provided reproducer and validated that I can reproduce the
  problem without the patch applied and that I cannot reproduce it again
  once I have applied the patch.

  [Potential Regression]

  The regression potential here seems quite low.  The fix has been
  upstream for a couple releases and no subsequent issues have been
  reported.  It makes no functional change beyond ensuring that the utask
  pointer is set to NULL before the utask structure itself is freed.  The
  dereference and free occur on the same cpu.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2104210/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2104210] [NEW] uprobe-related panics during profiling

2025-03-25 Thread Krister Johansen
Public bug reported:

Impact]

On systems that utilize both uprobes and perf_events style profiling, it
is possible to hit a panic in the uprobe_free_utask code.  This occurs
during process exit.  If the profiler fires while uprobe_free_utask is
in the process of cleaning up the utask, the NMI may read freed memory
because the cleanup code frees the utask before setting its pointer to
NULL.  This submitter has encountered the problem on systems running
workloads without intentionally trying to trigger the problem.

The stacks look something like this:

 RIP: 0010:is_uprobe_at_func_entry+0x28/0x80
 ...
  ? die_addr+0x36/0x90
  ? exc_general_protection+0x217/0x420
  ? asm_exc_general_protection+0x26/0x30
  ? is_uprobe_at_func_entry+0x28/0x80
  perf_callchain_user+0x20a/0x360
  get_perf_callchain+0x147/0x1d0
  bpf_get_stackid+0x60/0x90
  bpf_prog_9aac297fb833e2f5_do_perf_event+0x434/0x53b
  ? __smp_call_single_queue+0xad/0x120
  bpf_overflow_handler+0x75/0x110
  ...
  asm_sysvec_apic_timer_interrupt+0x1a/0x20
 RIP: 0010:__kmem_cache_free+0x1cb/0x350
 ...
  ? uprobe_free_utask+0x62/0x80
  ? acct_collect+0x4c/0x220
  uprobe_free_utask+0x62/0x80
  mm_release+0x12/0xb0
  do_exit+0x26b/0xaa0
  __x64_sys_exit+0x1b/0x20
  do_syscall_64+0x5a/0x80

The person who reported the issue upstream provided this reproducer.
(Run each command in a separate terminal):

  # while :; do bpftrace -e 'uprobe:/bin/ls:_start  { printf("hit\n"); }' -c 
ls; done
  # bpftrace -e 'profile:hz:10 { @[ustack()] = count(); }'

However, since the binutils are stripped on some of the releases where I
tested this, I ran the following instead:

  # while :; do bpftrace -e 'uprobe:libc:malloc  { printf("hit\n"); }' -c ls; 
done
  # bpftrace -e 'profile:hz:10 { @[ustack()] = count(); }'

[Backport]
The fix is upstream as commit b583ef82b671 ("uprobes: Fix race in
uprobe_free_utask")

However this patch was massaged by stable for its inclusion in 6.12,
6.6, and 6.1.  Instead of re-doing stable's conflict resolution, take
the patch directly from 6.6.x instead, at commit eff00c5e29ab.

This patch is in stable as of 6.12.19, 6.6.83, and 6.1.131.

[Test]

I've run the provided reproducer and validated that I can reproduce the
problem without the patch applied and that I cannot reproduce it again
once I have applied the patch.

[Potential Regression]

The regression potential here seems quite low.  The fix has been
upstream for a couple releases and no subsequent issues have been
reported.  It makes no functional change beyond ensuring that the utask
pointer is set to NULL before the utask structure itself is freed.  The
dereference and free occur on the same cpu.

** Affects: linux (Ubuntu)
 Importance: Undecided
 Status: New


** Tags: patch patch-accepted-upstream

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2104210

Title:
  uprobe-related panics during profiling

Status in linux package in Ubuntu:
  New

Bug description:
  Impact]

  On systems that utilize both uprobes and perf_events style profiling, it
  is possible to hit a panic in the uprobe_free_utask code.  This occurs
  during process exit.  If the profiler fires while uprobe_free_utask is
  in the process of cleaning up the utask, the NMI may read freed memory
  because the cleanup code frees the utask before setting its pointer to
  NULL.  This submitter has encountered the problem on systems running
  workloads without intentionally trying to trigger the problem.

  The stacks look something like this:

   RIP: 0010:is_uprobe_at_func_entry+0x28/0x80
   ...
? die_addr+0x36/0x90
? exc_general_protection+0x217/0x420
? asm_exc_general_protection+0x26/0x30
? is_uprobe_at_func_entry+0x28/0x80
perf_callchain_user+0x20a/0x360
get_perf_callchain+0x147/0x1d0
bpf_get_stackid+0x60/0x90
bpf_prog_9aac297fb833e2f5_do_perf_event+0x434/0x53b
? __smp_call_single_queue+0xad/0x120
bpf_overflow_handler+0x75/0x110
...
asm_sysvec_apic_timer_interrupt+0x1a/0x20
   RIP: 0010:__kmem_cache_free+0x1cb/0x350
   ...
? uprobe_free_utask+0x62/0x80
? acct_collect+0x4c/0x220
uprobe_free_utask+0x62/0x80
mm_release+0x12/0xb0
do_exit+0x26b/0xaa0
__x64_sys_exit+0x1b/0x20
do_syscall_64+0x5a/0x80

  The person who reported the issue upstream provided this reproducer.
  (Run each command in a separate terminal):

# while :; do bpftrace -e 'uprobe:/bin/ls:_start  { printf("hit\n"); }' -c 
ls; done
# bpftrace -e 'profile:hz:10 { @[ustack()] = count(); }'

  However, since the binutils are stripped on some of the releases where I
  tested this, I ran the following instead:

# while :; do bpftrace -e 'uprobe:libc:malloc  { printf("hit\n"); }' -c ls; 
done
# bpftrace -e 'profile:hz:10 { @[ustack()] = count(); }'

  [Backport]
  The fix is upstream as commit b583ef82b671 ("uprobes: Fix race in
  uprobe_fr

[Kernel-packages] [Bug 2101120] Re: mptcp BUG 'scheduling while atomic' in mptcp_pm_nl_append_new_local_addr

2025-04-23 Thread Krister Johansen
I have tested the noble proposed and validated that it fixes this bug.

** Tags removed: verification-needed-noble-linux
** Tags added: verification-done-noble-linux

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2101120

Title:
  mptcp BUG 'scheduling while atomic' in
  mptcp_pm_nl_append_new_local_addr

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Noble:
  Fix Committed
Status in linux source package in Oracular:
  Fix Committed

Bug description:
  [Impact]

  If mptcp endpoints are configured on a host using an address that is
  external to the host, then the kernel will create an implicit endpoint
  with the host's local address when mptcp receives its first flow.  If
  multiple packets for these local interfaces arrive in parallel, more
  than one caller may end up in mptcp_pm_nl_append_new_local_addr
  because none found the address in local_addr_list during their call to
  mptcp_pm_nl_get_local_id.  In this case, the concurrent new_local_addr
  calls may delete the address entry created by the previous caller.
  These deletes use synchronize_rcu, but this is not permitted in some
  of the contexts where this function may be called.  During packet
  recv, the caller may be in a rcu read critical section and have
  preemption disabled.

  This can lead to a BUG / panic because synchronize_rcu is called in
  softint context.

  An example stack:

     BUG: scheduling while atomic: swapper/2/0/0x0302

     Call Trace:
     
     dump_stack_lvl (lib/dump_stack.c:117 (discriminator 1))
     dump_stack (lib/dump_stack.c:124)
     __schedule_bug (kernel/sched/core.c:5943)
     schedule_debug.constprop.0 (arch/x86/include/asm/preempt.h:33 
kernel/sched/core.c:5970)
     __schedule (arch/x86/include/asm/jump_label.h:27 
include/linux/jump_label.h:207 kernel/sched/features.h:29 
kernel/sched/core.c:6621)
     schedule (arch/x86/include/asm/preempt.h:84 kernel/sched/core.c:6804 
kernel/sched/core.c:6818)
     schedule_timeout (kernel/time/timer.c:2160)
     wait_for_completion (kernel/sched/completion.c:96 
kernel/sched/completion.c:116 kernel/sched/completion.c:127 
kernel/sched/completion.c:148)
     __wait_rcu_gp (include/linux/rcupdate.h:311 kernel/rcu/update.c:444)
     synchronize_rcu (kernel/rcu/tree.c:3609)
     mptcp_pm_nl_append_new_local_addr (net/mptcp/pm_netlink.c:966 
net/mptcp/pm_netlink.c:1061)
     mptcp_pm_nl_get_local_id (net/mptcp/pm_netlink.c:1164)
     mptcp_pm_get_local_id (net/mptcp/pm.c:420)
     subflow_check_req (net/mptcp/subflow.c:98 net/mptcp/subflow.c:213)
     subflow_v4_route_req (net/mptcp/subflow.c:305)
     tcp_conn_request (net/ipv4/tcp_input.c:7216)
     subflow_v4_conn_request (net/mptcp/subflow.c:651)
     tcp_rcv_state_process (net/ipv4/tcp_input.c:6709)
     tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1934)
     tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2334)
     ip_protocol_deliver_rcu (net/ipv4/ip_input.c:205 (discriminator 1))
     ip_local_deliver (include/linux/netfilter.h:314 
include/linux/netfilter.h:308 net/ipv4/ip_input.c:254)
     ip_sublist_rcv_finish (include/net/dst.h:461 net/ipv4/ip_input.c:580)
     ip_sublist_rcv (net/ipv4/ip_input.c:640)
     ip_list_rcv (net/ipv4/ip_input.c:675)
     __netif_receive_skb_list_core (net/core/dev.c:5583 net/core/dev.c:5631)
     netif_receive_skb_list_internal (net/core/dev.c:5685 net/core/dev.c:5774)
     napi_complete_done (include/linux/list.h:37 include/net/gro.h:449 
include/net/gro.h:444 net/core/dev.c:6114)
     igb_poll (drivers/net/ethernet/intel/igb/igb_main.c:8244) igb
     __napi_poll (net/core/dev.c:6582)
     net_rx_action (net/core/dev.c:6653 net/core/dev.c:6787)
     handle_softirqs (kernel/softirq.c:553)
     __irq_exit_rcu (kernel/softirq.c:588 kernel/softirq.c:427 
kernel/softirq.c:636)
     irq_exit_rcu (kernel/softirq.c:651)
     common_interrupt (arch/x86/kernel/irq.c:247 (discriminator 14))
     

  [Backport]

  Cherry-pick the following patch from upstream:

  022bfe24aad8 ("mptcp: fix 'scheduling while atomic' in
  mptcp_pm_nl_append_new_local_addr")

  This patch fixes the problem by deleting the duplicate prior to its
  insertion in local_addr_list by skipping the replacement operation in
  mptcp_pm_nl_append_new_local_addr.  Instead of the last implicit
  endpoint replacing the previous, it is discarded without a
  synchronize_rcu and the old copy is kept.  This mode is only selected
  in mptcp_pm_nl_get_local_id.

  [Test]

  This patch has passed the upstream mptcp test suites and has also been
  tested against the reproducer that triggered the panic.  (Add and
  remove mptcp endpoints with an external address that differs from the
  internal address).  Prior to this patch the problem would trigger in
  less than a minute.  With this patch applied, the test has run for
  hours without incident.

  [Potential Regression]

  The regression potential is low since 

[Kernel-packages] [Bug 2104210] Re: uprobe-related panics during profiling

2025-04-23 Thread Krister Johansen
I have verified this in noble proposed and validated that it fixes the
bug.

** Description changed:

- Impact]
+ [Impact]
  
  On systems that utilize both uprobes and perf_events style profiling, it
  is possible to hit a panic in the uprobe_free_utask code.  This occurs
  during process exit.  If the profiler fires while uprobe_free_utask is
  in the process of cleaning up the utask, the NMI may read freed memory
  because the cleanup code frees the utask before setting its pointer to
  NULL.  This submitter has encountered the problem on systems running
  workloads without intentionally trying to trigger the problem.
  
  The stacks look something like this:
  
-  RIP: 0010:is_uprobe_at_func_entry+0x28/0x80
-  ...
-   ? die_addr+0x36/0x90
-   ? exc_general_protection+0x217/0x420
-   ? asm_exc_general_protection+0x26/0x30
-   ? is_uprobe_at_func_entry+0x28/0x80
-   perf_callchain_user+0x20a/0x360
-   get_perf_callchain+0x147/0x1d0
-   bpf_get_stackid+0x60/0x90
-   bpf_prog_9aac297fb833e2f5_do_perf_event+0x434/0x53b
-   ? __smp_call_single_queue+0xad/0x120
-   bpf_overflow_handler+0x75/0x110
-   ...
-   asm_sysvec_apic_timer_interrupt+0x1a/0x20
-  RIP: 0010:__kmem_cache_free+0x1cb/0x350
-  ...
-   ? uprobe_free_utask+0x62/0x80
-   ? acct_collect+0x4c/0x220
-   uprobe_free_utask+0x62/0x80
-   mm_release+0x12/0xb0
-   do_exit+0x26b/0xaa0
-   __x64_sys_exit+0x1b/0x20
-   do_syscall_64+0x5a/0x80
+  RIP: 0010:is_uprobe_at_func_entry+0x28/0x80
+  ...
+   ? die_addr+0x36/0x90
+   ? exc_general_protection+0x217/0x420
+   ? asm_exc_general_protection+0x26/0x30
+   ? is_uprobe_at_func_entry+0x28/0x80
+   perf_callchain_user+0x20a/0x360
+   get_perf_callchain+0x147/0x1d0
+   bpf_get_stackid+0x60/0x90
+   bpf_prog_9aac297fb833e2f5_do_perf_event+0x434/0x53b
+   ? __smp_call_single_queue+0xad/0x120
+   bpf_overflow_handler+0x75/0x110
+   ...
+   asm_sysvec_apic_timer_interrupt+0x1a/0x20
+  RIP: 0010:__kmem_cache_free+0x1cb/0x350
+  ...
+   ? uprobe_free_utask+0x62/0x80
+   ? acct_collect+0x4c/0x220
+   uprobe_free_utask+0x62/0x80
+   mm_release+0x12/0xb0
+   do_exit+0x26b/0xaa0
+   __x64_sys_exit+0x1b/0x20
+   do_syscall_64+0x5a/0x80
  
  The person who reported the issue upstream provided this reproducer.
  (Run each command in a separate terminal):
  
-   # while :; do bpftrace -e 'uprobe:/bin/ls:_start  { printf("hit\n"); }' -c 
ls; done
-   # bpftrace -e 'profile:hz:10 { @[ustack()] = count(); }'
+   # while :; do bpftrace -e 'uprobe:/bin/ls:_start  { printf("hit\n"); }' -c 
ls; done
+   # bpftrace -e 'profile:hz:10 { @[ustack()] = count(); }'
  
  However, since the binutils are stripped on some of the releases where I
  tested this, I ran the following instead:
  
-   # while :; do bpftrace -e 'uprobe:libc:malloc  { printf("hit\n"); }' -c ls; 
done
-   # bpftrace -e 'profile:hz:10 { @[ustack()] = count(); }'
+   # while :; do bpftrace -e 'uprobe:libc:malloc  { printf("hit\n"); }' -c ls; 
done
+   # bpftrace -e 'profile:hz:10 { @[ustack()] = count(); }'
  
  [Backport]
  The fix is upstream as commit b583ef82b671 ("uprobes: Fix race in
  uprobe_free_utask")
  
  However this patch was massaged by stable for its inclusion in 6.12,
  6.6, and 6.1.  Instead of re-doing stable's conflict resolution, take
  the patch directly from 6.6.x instead, at commit eff00c5e29ab.
  
  This patch is in stable as of 6.12.19, 6.6.83, and 6.1.131.
  
  [Test]
  
  I've run the provided reproducer and validated that I can reproduce the
  problem without the patch applied and that I cannot reproduce it again
  once I have applied the patch.
  
  [Potential Regression]
  
  The regression potential here seems quite low.  The fix has been
  upstream for a couple releases and no subsequent issues have been
  reported.  It makes no functional change beyond ensuring that the utask
  pointer is set to NULL before the utask structure itself is freed.  The
  dereference and free occur on the same cpu.

** Tags removed: verification-needed-noble-linux
** Tags added: verification-done-noble-linux

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2104210

Title:
  uprobe-related panics during profiling

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Noble:
  Fix Committed
Status in linux source package in Oracular:
  Fix Committed

Bug description:
  [Impact]

  On systems that utilize both uprobes and perf_events style profiling, it
  is possible to hit a panic in the uprobe_free_utask code.  This occurs
  during process exit.  If the profiler fires while uprobe_free_utask is
  in the process of cleaning up the utask, the NMI may read freed memory
  because the cleanup code frees the utask before setting its pointer to
  NULL.  This submitter has encountered the problem on systems running
  workloads without intentionally trying to trigger the problem.

  The stacks look something like this:

[Kernel-packages] [Bug 2101120] Re: mptcp BUG 'scheduling while atomic' in mptcp_pm_nl_append_new_local_addr

2025-03-07 Thread Krister Johansen
Patches sent to kernel team's list:

https://lists.ubuntu.com/archives/kernel-team/2025-March/157856.html

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2101120

Title:
  mptcp BUG 'scheduling while atomic' in
  mptcp_pm_nl_append_new_local_addr

Status in linux package in Ubuntu:
  New

Bug description:
  [Impact]

  If mptcp endpoints are configured on a host using an address that is
  external to the host, then the kernel will create an implicit endpoint
  with the host's local address when mptcp receives its first flow.  If
  multiple packets for these local interfaces arrive in parallel, more
  than one caller may end up in mptcp_pm_nl_append_new_local_addr
  because none found the address in local_addr_list during their call to
  mptcp_pm_nl_get_local_id.  In this case, the concurrent new_local_addr
  calls may delete the address entry created by the previous caller.
  These deletes use synchronize_rcu, but this is not permitted in some
  of the contexts where this function may be called.  During packet
  recv, the caller may be in a rcu read critical section and have
  preemption disabled.

  This can lead to a BUG / panic because synchronize_rcu is called in
  softint context.

  An example stack:

     BUG: scheduling while atomic: swapper/2/0/0x0302

     Call Trace:
     
     dump_stack_lvl (lib/dump_stack.c:117 (discriminator 1))
     dump_stack (lib/dump_stack.c:124)
     __schedule_bug (kernel/sched/core.c:5943)
     schedule_debug.constprop.0 (arch/x86/include/asm/preempt.h:33 
kernel/sched/core.c:5970)
     __schedule (arch/x86/include/asm/jump_label.h:27 
include/linux/jump_label.h:207 kernel/sched/features.h:29 
kernel/sched/core.c:6621)
     schedule (arch/x86/include/asm/preempt.h:84 kernel/sched/core.c:6804 
kernel/sched/core.c:6818)
     schedule_timeout (kernel/time/timer.c:2160)
     wait_for_completion (kernel/sched/completion.c:96 
kernel/sched/completion.c:116 kernel/sched/completion.c:127 
kernel/sched/completion.c:148)
     __wait_rcu_gp (include/linux/rcupdate.h:311 kernel/rcu/update.c:444)
     synchronize_rcu (kernel/rcu/tree.c:3609)
     mptcp_pm_nl_append_new_local_addr (net/mptcp/pm_netlink.c:966 
net/mptcp/pm_netlink.c:1061)
     mptcp_pm_nl_get_local_id (net/mptcp/pm_netlink.c:1164)
     mptcp_pm_get_local_id (net/mptcp/pm.c:420)
     subflow_check_req (net/mptcp/subflow.c:98 net/mptcp/subflow.c:213)
     subflow_v4_route_req (net/mptcp/subflow.c:305)
     tcp_conn_request (net/ipv4/tcp_input.c:7216)
     subflow_v4_conn_request (net/mptcp/subflow.c:651)
     tcp_rcv_state_process (net/ipv4/tcp_input.c:6709)
     tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1934)
     tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2334)
     ip_protocol_deliver_rcu (net/ipv4/ip_input.c:205 (discriminator 1))
     ip_local_deliver (include/linux/netfilter.h:314 
include/linux/netfilter.h:308 net/ipv4/ip_input.c:254)
     ip_sublist_rcv_finish (include/net/dst.h:461 net/ipv4/ip_input.c:580)
     ip_sublist_rcv (net/ipv4/ip_input.c:640)
     ip_list_rcv (net/ipv4/ip_input.c:675)
     __netif_receive_skb_list_core (net/core/dev.c:5583 net/core/dev.c:5631)
     netif_receive_skb_list_internal (net/core/dev.c:5685 net/core/dev.c:5774)
     napi_complete_done (include/linux/list.h:37 include/net/gro.h:449 
include/net/gro.h:444 net/core/dev.c:6114)
     igb_poll (drivers/net/ethernet/intel/igb/igb_main.c:8244) igb
     __napi_poll (net/core/dev.c:6582)
     net_rx_action (net/core/dev.c:6653 net/core/dev.c:6787)
     handle_softirqs (kernel/softirq.c:553)
     __irq_exit_rcu (kernel/softirq.c:588 kernel/softirq.c:427 
kernel/softirq.c:636)
     irq_exit_rcu (kernel/softirq.c:651)
     common_interrupt (arch/x86/kernel/irq.c:247 (discriminator 14))
     

  [Backport]

  Cherry-pick the following patch from upstream:

  022bfe24aad8 ("mptcp: fix 'scheduling while atomic' in
  mptcp_pm_nl_append_new_local_addr")

  This patch fixes the problem by deleting the duplicate prior to its
  insertion in local_addr_list by skipping the replacement operation in
  mptcp_pm_nl_append_new_local_addr.  Instead of the last implicit
  endpoint replacing the previous, it is discarded without a
  synchronize_rcu and the old copy is kept.  This mode is only selected
  in mptcp_pm_nl_get_local_id.

  [Test]

  This patch has passed the upstream mptcp test suites and has also been
  tested against the reproducer that triggered the panic.  (Add and
  remove mptcp endpoints with an external address that differs from the
  internal address).  Prior to this patch the problem would trigger in
  less than a minute.  With this patch applied, the test has run for
  hours without incident.

  [Potential Regression]

  The regression potential is low since the behavior change is small.
  Implicit endpoints still get created and deleted, but they are only
  replaced when a user adds an endpoint with the same local address as
  an existin

[Kernel-packages] [Bug 2101120] [NEW] mptcp BUG 'scheduling while atomic' in mptcp_pm_nl_append_new_local_addr

2025-03-07 Thread Krister Johansen
Public bug reported:

[Impact]

If mptcp endpoints are configured on a host using an address that is
external to the host, then the kernel will create an implicit endpoint
with the host's local address when mptcp receives its first flow.  If
multiple packets for these local interfaces arrive in parallel, more
than one caller may end up in mptcp_pm_nl_append_new_local_addr because
none found the address in local_addr_list during their call to
mptcp_pm_nl_get_local_id.  In this case, the concurrent new_local_addr
calls may delete the address entry created by the previous caller. These
deletes use synchronize_rcu, but this is not permitted in some of the
contexts where this function may be called.  During packet recv, the
caller may be in a rcu read critical section and have preemption
disabled.

This can lead to a BUG / panic because synchronize_rcu is called in
softint context.

An example stack:

   BUG: scheduling while atomic: swapper/2/0/0x0302

   Call Trace:
   
   dump_stack_lvl (lib/dump_stack.c:117 (discriminator 1))
   dump_stack (lib/dump_stack.c:124)
   __schedule_bug (kernel/sched/core.c:5943)
   schedule_debug.constprop.0 (arch/x86/include/asm/preempt.h:33 
kernel/sched/core.c:5970)
   __schedule (arch/x86/include/asm/jump_label.h:27 
include/linux/jump_label.h:207 kernel/sched/features.h:29 
kernel/sched/core.c:6621)
   schedule (arch/x86/include/asm/preempt.h:84 kernel/sched/core.c:6804 
kernel/sched/core.c:6818)
   schedule_timeout (kernel/time/timer.c:2160)
   wait_for_completion (kernel/sched/completion.c:96 
kernel/sched/completion.c:116 kernel/sched/completion.c:127 
kernel/sched/completion.c:148)
   __wait_rcu_gp (include/linux/rcupdate.h:311 kernel/rcu/update.c:444)
   synchronize_rcu (kernel/rcu/tree.c:3609)
   mptcp_pm_nl_append_new_local_addr (net/mptcp/pm_netlink.c:966 
net/mptcp/pm_netlink.c:1061)
   mptcp_pm_nl_get_local_id (net/mptcp/pm_netlink.c:1164)
   mptcp_pm_get_local_id (net/mptcp/pm.c:420)
   subflow_check_req (net/mptcp/subflow.c:98 net/mptcp/subflow.c:213)
   subflow_v4_route_req (net/mptcp/subflow.c:305)
   tcp_conn_request (net/ipv4/tcp_input.c:7216)
   subflow_v4_conn_request (net/mptcp/subflow.c:651)
   tcp_rcv_state_process (net/ipv4/tcp_input.c:6709)
   tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1934)
   tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2334)
   ip_protocol_deliver_rcu (net/ipv4/ip_input.c:205 (discriminator 1))
   ip_local_deliver (include/linux/netfilter.h:314 
include/linux/netfilter.h:308 net/ipv4/ip_input.c:254)
   ip_sublist_rcv_finish (include/net/dst.h:461 net/ipv4/ip_input.c:580)
   ip_sublist_rcv (net/ipv4/ip_input.c:640)
   ip_list_rcv (net/ipv4/ip_input.c:675)
   __netif_receive_skb_list_core (net/core/dev.c:5583 net/core/dev.c:5631)
   netif_receive_skb_list_internal (net/core/dev.c:5685 net/core/dev.c:5774)
   napi_complete_done (include/linux/list.h:37 include/net/gro.h:449 
include/net/gro.h:444 net/core/dev.c:6114)
   igb_poll (drivers/net/ethernet/intel/igb/igb_main.c:8244) igb
   __napi_poll (net/core/dev.c:6582)
   net_rx_action (net/core/dev.c:6653 net/core/dev.c:6787)
   handle_softirqs (kernel/softirq.c:553)
   __irq_exit_rcu (kernel/softirq.c:588 kernel/softirq.c:427 
kernel/softirq.c:636)
   irq_exit_rcu (kernel/softirq.c:651)
   common_interrupt (arch/x86/kernel/irq.c:247 (discriminator 14))
   

[Backport]

Cherry-pick the following patch from upstream:

022bfe24aad8 ("mptcp: fix 'scheduling while atomic' in
mptcp_pm_nl_append_new_local_addr")

This patch fixes the problem by deleting the duplicate prior to its
insertion in local_addr_list by skipping the replacement operation in
mptcp_pm_nl_append_new_local_addr.  Instead of the last implicit
endpoint replacing the previous, it is discarded without a
synchronize_rcu and the old copy is kept.  This mode is only selected in
mptcp_pm_nl_get_local_id.

[Test]

This patch has passed the upstream mptcp test suites and has also been
tested against the reproducer that triggered the panic.  (Add and remove
mptcp endpoints with an external address that differs from the internal
address).  Prior to this patch the problem would trigger in less than a
minute.  With this patch applied, the test has run for hours without
incident.

[Potential Regression]

The regression potential is low since the behavior change is small.
Implicit endpoints still get created and deleted, but they are only
replaced when a user adds an endpoint with the same local address as an
existing implicit address.  No replacements via mptcp_pm_nl_get_local_id
will occur anymore.

** Affects: linux (Ubuntu)
 Importance: Undecided
 Status: New


** Tags: patch patch-accepted-upstream

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2101120

Title:
  mptcp BUG 'scheduling while atomic' in
  mptcp_pm_nl_append_new_local_addr

Status in linux package in Ubuntu:
  New

Bug description:
  [Impact]

  If mptcp endpoints ar

[Kernel-packages] [Bug 2101120] Re: mptcp BUG 'scheduling while atomic' in mptcp_pm_nl_append_new_local_addr

2025-03-07 Thread Krister Johansen
I have a patch for this accepted upstream that I'll send to the Ubuntu
kernel team in short order.  This has been merged to Linus's tree but
has yet to be picked up by Stable.  It's tagged to go there, it just
hasn't been picked up by the robots yet.  It affects all releases from
5.17 onward, which should put it in scope for Noble, Oracular, and
Plucky.

** Description changed:

  [Impact]
  
  If mptcp endpoints are configured on a host using an address that is
  external to the host, then the kernel will create an implicit endpoint
  with the host's local address when mptcp receives its first flow.  If
  multiple packets for these local interfaces arrive in parallel, more
  than one caller may end up in mptcp_pm_nl_append_new_local_addr because
  none found the address in local_addr_list during their call to
  mptcp_pm_nl_get_local_id.  In this case, the concurrent new_local_addr
  calls may delete the address entry created by the previous caller. These
  deletes use synchronize_rcu, but this is not permitted in some of the
  contexts where this function may be called.  During packet recv, the
  caller may be in a rcu read critical section and have preemption
  disabled.
  
  This can lead to a BUG / panic because synchronize_rcu is called in
  softint context.
  
  An example stack:
  
-BUG: scheduling while atomic: swapper/2/0/0x0302
+    BUG: scheduling while atomic: swapper/2/0/0x0302
  
-Call Trace:
-
-dump_stack_lvl (lib/dump_stack.c:117 (discriminator 1))
-dump_stack (lib/dump_stack.c:124)
-__schedule_bug (kernel/sched/core.c:5943)
-schedule_debug.constprop.0 (arch/x86/include/asm/preempt.h:33 
kernel/sched/core.c:5970)
-__schedule (arch/x86/include/asm/jump_label.h:27 
include/linux/jump_label.h:207 kernel/sched/features.h:29 
kernel/sched/core.c:6621)
-schedule (arch/x86/include/asm/preempt.h:84 kernel/sched/core.c:6804 
kernel/sched/core.c:6818)
-schedule_timeout (kernel/time/timer.c:2160)
-wait_for_completion (kernel/sched/completion.c:96 
kernel/sched/completion.c:116 kernel/sched/completion.c:127 
kernel/sched/completion.c:148)
-__wait_rcu_gp (include/linux/rcupdate.h:311 kernel/rcu/update.c:444)
-synchronize_rcu (kernel/rcu/tree.c:3609)
-mptcp_pm_nl_append_new_local_addr (net/mptcp/pm_netlink.c:966 
net/mptcp/pm_netlink.c:1061)
-mptcp_pm_nl_get_local_id (net/mptcp/pm_netlink.c:1164)
-mptcp_pm_get_local_id (net/mptcp/pm.c:420)
-subflow_check_req (net/mptcp/subflow.c:98 net/mptcp/subflow.c:213)
-subflow_v4_route_req (net/mptcp/subflow.c:305)
-tcp_conn_request (net/ipv4/tcp_input.c:7216)
-subflow_v4_conn_request (net/mptcp/subflow.c:651)
-tcp_rcv_state_process (net/ipv4/tcp_input.c:6709)
-tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1934)
-tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2334)
-ip_protocol_deliver_rcu (net/ipv4/ip_input.c:205 (discriminator 1))
-ip_local_deliver (include/linux/netfilter.h:314 
include/linux/netfilter.h:308 net/ipv4/ip_input.c:254)
-ip_sublist_rcv_finish (include/net/dst.h:461 net/ipv4/ip_input.c:580)
-ip_sublist_rcv (net/ipv4/ip_input.c:640)
-ip_list_rcv (net/ipv4/ip_input.c:675)
-__netif_receive_skb_list_core (net/core/dev.c:5583 net/core/dev.c:5631)
-netif_receive_skb_list_internal (net/core/dev.c:5685 net/core/dev.c:5774)
-napi_complete_done (include/linux/list.h:37 include/net/gro.h:449 
include/net/gro.h:444 net/core/dev.c:6114)
-igb_poll (drivers/net/ethernet/intel/igb/igb_main.c:8244) igb
-__napi_poll (net/core/dev.c:6582)
-net_rx_action (net/core/dev.c:6653 net/core/dev.c:6787)
-handle_softirqs (kernel/softirq.c:553)
-__irq_exit_rcu (kernel/softirq.c:588 kernel/softirq.c:427 
kernel/softirq.c:636)
-irq_exit_rcu (kernel/softirq.c:651)
-common_interrupt (arch/x86/kernel/irq.c:247 (discriminator 14))
-
+    Call Trace:
+    
+    dump_stack_lvl (lib/dump_stack.c:117 (discriminator 1))
+    dump_stack (lib/dump_stack.c:124)
+    __schedule_bug (kernel/sched/core.c:5943)
+    schedule_debug.constprop.0 (arch/x86/include/asm/preempt.h:33 
kernel/sched/core.c:5970)
+    __schedule (arch/x86/include/asm/jump_label.h:27 
include/linux/jump_label.h:207 kernel/sched/features.h:29 
kernel/sched/core.c:6621)
+    schedule (arch/x86/include/asm/preempt.h:84 kernel/sched/core.c:6804 
kernel/sched/core.c:6818)
+    schedule_timeout (kernel/time/timer.c:2160)
+    wait_for_completion (kernel/sched/completion.c:96 
kernel/sched/completion.c:116 kernel/sched/completion.c:127 
kernel/sched/completion.c:148)
+    __wait_rcu_gp (include/linux/rcupdate.h:311 kernel/rcu/update.c:444)
+    synchronize_rcu (kernel/rcu/tree.c:3609)
+    mptcp_pm_nl_append_new_local_addr (net/mptcp/pm_netlink.c:966 
net/mptcp/pm_netlink.c:1061)
+    mptcp_pm_nl_get_local_id (net/mptcp/pm_netlink.c:1164)
+    mptcp_pm_get_local_id (net/mptcp/pm.c:420)
+    subflow_check_req (net/mptcp/subflow.c:98 net/mptcp/subflow.c:213)
+    subflow_v4_route_req (net/mptcp/subf