On Wed, Jul 23, 2025 at 04:30:18PM +0530, Madhavan Srinivasan wrote:
>
>
> On 7/23/25 2:17 AM, Bjorn Helgaas wrote:
> > On Thu, Jul 17, 2025 at 06:27:52PM -0500, Bjorn Helgaas wrote:
> >> On Tue, Jul 15, 2025 at 04:31:49PM -0500, Timothy Pearson wrote:
> >>>
[-> to: Madhavan, Michael, Mahesh; seeking acks]
On Thu, Jul 17, 2025 at 06:27:52PM -0500, Bjorn Helgaas wrote:
> On Tue, Jul 15, 2025 at 04:31:49PM -0500, Timothy Pearson wrote:
> > Hello all,
> >
> > This series includes several fixes for bugs in the PowerNV PCIe hotp
tree, but would need acks from the
powerpc folks for the arch/powerpc parts.
Alternatively it could be merged via powerpc with my ack on the
drivers/pci patches:
Acked-by: Bjorn Helgaas
If you do merge via powerpc, I made some comment formatting and commit
log tweaks that I would like reflected
On Tue, Jul 15, 2025 at 04:39:06PM -0500, Timothy Pearson wrote:
> The existing PowerNV hotplug code did not handle surprise plug events
> correctly, leading to a complete failure of the hotplug system after
> device removal and a required reboot to detect new devices.
> +++ b/drivers/pci/hotplug/
On Fri, Jul 11, 2025 at 01:18:07PM -0500, Timothy Pearson wrote:
> - Original Message -
> > From: "Krishna Kumar"
> > To: "Bjorn Helgaas" , "Timothy Pearson"
> >
> > Cc: "linuxppc-dev" , "linux-kernel&
On Wed, Jun 18, 2025 at 07:37:54PM -0500, Timothy Pearson wrote:
> - Original Message -
> > From: "Bjorn Helgaas"
> > To: "Timothy Pearson"
> > Cc: "linuxppc-dev" , "linux-kernel"
> > , "linux-pci"
> > ,
On Wed, Jun 18, 2025 at 02:50:04PM -0500, Timothy Pearson wrote:
> - Original Message -
> > From: "Bjorn Helgaas"
> > To: "Timothy Pearson"
> > Cc: "linuxppc-dev" , "linux-kernel"
> > , "linux-pci"
> > ,
[+cc Lukas, pciehp expert]
On Wed, Jun 18, 2025 at 11:56:54AM -0500, Timothy Pearson wrote:
> presence detection
(subject/commit wrapping seems to be on all of these patches)
> The Microsemi Switchtec PM8533 PFX 48xG3 [11f8:8533] PCIe switch system
> was observed to incorrectly assert the Prese
On Wed, Jun 18, 2025 at 11:58:23AM -0500, Timothy Pearson wrote:
> recovery
Same weird subject/commit wrapping.
> The existing PowerNV hotplug code did not handle suprise plug events
> correctly, leading to a complete failure of the hotplug system after
> device removal and a required reboot to
On Wed, Jun 18, 2025 at 11:58:59AM -0500, Timothy Pearson wrote:
> state
Weird wrapping of last word of subject to here.
> The PCIe specification allows three attention indicator states,
> on, off, and blink. Enable all three states instead of basic
> on / off control.
>
> Signed-off-by: Timot
On Fri, May 30, 2025 at 09:39:28PM +0530, Manivannan Sadhasivam wrote:
> On Fri, May 30, 2025 at 06:34:04AM -0500, Bjorn Helgaas wrote:
> > On Fri, May 30, 2025 at 09:16:59AM +0530, Manivannan Sadhasivam wrote:
> > > On Wed, May 28, 2025 at 05:35:00PM -0500, Bjorn Helgaas wro
On Thu, May 08, 2025 at 12:40:33PM +0530, Manivannan Sadhasivam wrote:
> The PCI link, when down, needs to be recovered to bring it back. But that
> cannot be done in a generic way as link recovery procedure is specific to
> host bridges. So add a new API pci_host_handle_link_down() that could be
>
On Fri, May 30, 2025 at 09:16:59AM +0530, Manivannan Sadhasivam wrote:
> On Wed, May 28, 2025 at 05:35:00PM -0500, Bjorn Helgaas wrote:
> > On Thu, May 08, 2025 at 12:40:33PM +0530, Manivannan Sadhasivam wrote:
> > > The PCI link, when down, needs to be recovered to bring
On Thu, May 08, 2025 at 12:40:33PM +0530, Manivannan Sadhasivam wrote:
> The PCI link, when down, needs to be recovered to bring it back. But that
> cannot be done in a generic way as link recovery procedure is specific to
> host bridges. So add a new API pci_host_handle_link_down() that could be
>
On Thu, May 22, 2025 at 06:21:06PM -0500, Bjorn Helgaas wrote:
> From: Bjorn Helgaas
>
> This work is mostly due to Jon Pan-Doh and Karolina Stolarek. I rebased
> this to v6.15-rc1, factored out some of the trace and statistics updates,
> and added some minor cleanups.
>
>
On Fri, May 23, 2025 at 02:13:52PM +0300, Ilpo Järvinen wrote:
> On Thu, 22 May 2025, Bjorn Helgaas wrote:
>
> > From: Bjorn Helgaas
> >
> > Previously aer_get_device_error_info() and aer_print_error() took a pointer
> > to struct aer_err_info and a pointer to a p
On Thu, May 22, 2025 at 04:56:56PM -0700, Sathyanarayanan Kuppuswamy wrote:
> On 5/22/25 4:21 PM, Bjorn Helgaas wrote:
> > From: Jon Pan-Doh
> >
> > Spammy devices can flood kernel logs with AER errors and slow/stall
> > execution. Add per-device ratelimits for AE
git
[bhelgaas: note fatal errors are not ratelimited, "aer_report" ->
"aer_info", replace ratelimit_log_enable toggle with *_ratelimit_interval_ms]
Signed-off-by: Karolina Stolarek
Signed-off-by: Jon Pan-Doh
Signed-off-by: Bjorn Helgaas
Link: https://patch.msgid.link/2025
-by: Bjorn Helgaas
Link: https://patch.msgid.link/20250520215047.1350603-17-helg...@kernel.org
---
Documentation/PCI/pcieaer-howto.rst | 12
1 file changed, 12 insertions(+)
diff --git a/Documentation/PCI/pcieaer-howto.rst
b/Documentation/PCI/pcieaer-howto.rst
index f013f3b27c82
quot;aer_report" ->
"aer_info", "cor_log_ratelimit" -> "correctable_ratelimit",
"uncor_log_ratelimit" -> "nonfatal_ratelimit"]
Reported-by: Sargun Dhillon
Signed-off-by: Jon Pan-Doh
Signed-off-by: Bjorn Helgaas
Link: https://patch
From: Bjorn Helgaas
Return -ENOSPC error early so the usual path through add_error_device() is
the straightline code.
Signed-off-by: Bjorn Helgaas
---
drivers/pci/pcie/aer.c | 15 +--
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/drivers/pci/pcie/aer.c b/drivers
From: Karolina Stolarek
Update name to reflect the broader definition of structs/variables that are
stored (e.g. ratelimits). This is a preparatory patch for adding rate limit
support.
[bhelgaas: "aer_report" -> "aer_info"]
Signed-off-by: Karolina Stolarek
Signed-off-by
From: Bjorn Helgaas
Previously aer_get_device_error_info() and aer_print_error() took a pointer
to struct aer_err_info and a pointer to a pci_dev. Typically the pci_dev
was one of the elements of the aer_err_info.dev[] array (DPC was an
exception, where the dev[] array was unused).
Convert
From: Karolina Stolarek
Some existing logs in pci_print_aer() log with error severity by default.
Convert them to use KERN_WARNING for correctable errors and KERN_ERR for
uncorrectable errors.
[bhelgaas: commit log]
Signed-off-by: Karolina Stolarek
Signed-off-by: Bjorn Helgaas
Tested-by
From: Bjorn Helgaas
aer_print_error() produces output at a printk level (KERN_ERR/KERN_WARNING/
etc) that depends on the kind of error, and it calls pcie_print_tlp_log(),
which previously always produced output at KERN_ERR.
Add a "level" parameter so aer_print_error() can control th
From: Bjorn Helgaas
There are two AER logging entry points:
- aer_print_error() is used by DPC (dpc_process_error()) and native AER
handling (aer_process_err_devices()).
- pci_print_aer() is used by GHES (aer_recover_work_func()) and CXL
(cxl_handle_rdport_errors())
Both use
aer_err_info instead of passing it
as a parameter]
Signed-off-by: Karolina Stolarek
Tested-by: Krzysztof Wilczyński
Reviewed-by: Ilpo Järvinen
Reviewed-by: Kuppuswamy Sathyanarayanan
Reviewed-by: Jonathan Cameron
Signed-off-by: Bjorn Helgaas
Link: https://patch.msgid.link
From: Bjorn Helgaas
As with the AER statistics, we always want to emit trace events, even if
the actual dmesg logging is rate limited.
Call trace_aer_event() immediately after pci_dev_aer_stats_incr() so both
happen before ratelimiting.
Signed-off-by: Bjorn Helgaas
Tested-by: Krzysztof
From: Bjorn Helgaas
Simplify pci_print_aer() by initializing the struct aer_err_info "info"
with a designated initializer list (it was previously initialized with
memset()) and using pci_name().
Signed-off-by: Bjorn Helgaas
Tested-by: Krzysztof Wilczyński
Reviewed-by: Ilpo Järvinen
From: Bjorn Helgaas
Move aer_print_source() earlier in the file so a future change can use it
from aer_print_error(), where it's easier to rate limit it.
Signed-off-by: Bjorn Helgaas
Tested-by: Krzysztof Wilczyński
Reviewed-by: Kuppuswamy Sathyanarayanan
Reviewed-by: Ilpo Järvinen
Rev
From: Bjorn Helgaas
Previously the struct aer_err_info "e_info" was allocated on the stack
without being initialized, so it contained junk except for the fields we
explicitly set later.
Initialize "e_info" at declaration with a designated initializer list,
which initializes
rce()]
Signed-off-by: Jon Pan-Doh
Tested-by: Krzysztof Wilczyński
Reviewed-by: Ilpo Järvinen
Reviewed-by: Kuppuswamy Sathyanarayanan
Reviewed-by: Jonathan Cameron
Signed-off-by: Bjorn Helgaas
Link: https://patch.msgid.link/20250520215047.1350603-7-helg...@kernel.org
---
drivers/pci/pcie/aer.c
From: Bjorn Helgaas
aer_isr_one_error() duplicates the Error Source ID logging and AER error
processing for Correctable Errors and Uncorrectable Errors. Factor out the
duplicated code to aer_isr_one_error_type().
aer_isr_one_error() doesn't need the struct aer_rpc pointer, so pass it the
From: Bjorn Helgaas
DPC Error Source ID is only valid when the DPC Trigger Reason indicates
that DPC was triggered due to reception of an ERR_NONFATAL or ERR_FATAL
Message (PCIe r6.0, sec 7.9.14.5).
When DPC was triggered by ERR_NONFATAL (PCI_EXP_DPC_STATUS_TRIGGER_RSN_NFE)
or ERR_FATAL
From: Bjorn Helgaas
Use PCI_BUS_NUM(), PCI_SLOT(), PCI_FUNC() to extract the bus number,
device, and function number directly from the Error Source ID. There's no
need to shift and mask it explicitly.
Signed-off-by: Bjorn Helgaas
Tested-by: Krzysztof Wilczyński
Reviewed-by: Kuppu
From: Bjorn Helgaas
Previously we decoded the AER Error Source ID in aer_isr_one_error_type(),
then again in find_source_device() if we didn't find any devices with
errors logged in their AER Capabilities.
Consolidate this so we only decode and log the Error Source ID on
From: Bjorn Helgaas
This work is mostly due to Jon Pan-Doh and Karolina Stolarek. I rebased
this to v6.15-rc1, factored out some of the trace and statistics updates,
and added some minor cleanups.
I pushed this to pci/aer at
https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git/log/?h
From: Bjorn Helgaas
Previously the struct aer_err_info "info" was allocated on the stack
without being initialized, so it contained junk except for the fields we
explicitly set later.
Initialize "info" at declaration so it starts as all zeros.
Fixes: 8aefa9b0d910 ("PCI
On Tue, May 20, 2025 at 04:50:34PM -0500, Bjorn Helgaas wrote:
> From: Jon Pan-Doh
>
> Allow userspace to read/write log ratelimits per device (including
> enable/disable). Create aer/ sysfs directory to store them and any
> future aer configs.
> +P
On Tue, May 20, 2025 at 04:50:17PM -0500, Bjorn Helgaas wrote:
> From: Bjorn Helgaas
>
> This work is mostly due to Jon Pan-Doh and Karolina Stolarek. I rebased
> this to v6.15-rc1, factored out some of the trace and statistics updates,
> and added some minor cleanups.
>
>
On Tue, May 20, 2025 at 03:33:45PM -0700, Sathyanarayanan Kuppuswamy wrote:
> On 5/20/25 2:50 PM, Bjorn Helgaas wrote:
> > From: Jon Pan-Doh
> >
> > Spammy devices can flood kernel logs with AER errors and slow/stall
> > execution. Add per-device ratelimits for AE
On Wed, May 21, 2025 at 11:46:00AM +0100, Jonathan Cameron wrote:
> On Tue, 20 May 2025 16:50:34 -0500
> Bjorn Helgaas wrote:
>
> > From: Jon Pan-Doh
> >
> > Allow userspace to read/write log ratelimits per device (including
> > enable/disable). Create aer/
On Wed, May 21, 2025 at 11:31:21AM +0100, Jonathan Cameron wrote:
> On Tue, 20 May 2025 16:50:32 -0500
> Bjorn Helgaas wrote:
>
> > From: Jon Pan-Doh
> >
> > Spammy devices can flood kernel logs with AER errors and slow/stall
> > execution. Add per-device ratel
On Wed, May 21, 2025 at 10:00:35AM +0100, Jonathan Cameron wrote:
> On Tue, 20 May 2025 16:50:19 -0500
> Bjorn Helgaas wrote:
>
> > From: Bjorn Helgaas
> >
> > DPC Error Source ID is only valid when the DPC Trigger Reason indicates
> > that DPC was triggered du
On Wed, May 21, 2025 at 10:20:41AM +0100, Jonathan Cameron wrote:
> On Tue, 20 May 2025 16:50:21 -0500
> Bjorn Helgaas wrote:
>
> > From: Bjorn Helgaas
> >
> > Previously we decoded the AER Error Source ID in aer_isr_one_error_type(),
> > then again in find_
On Wed, May 21, 2025 at 09:52:18AM +0100, Jonathan Cameron wrote:
> On Tue, 20 May 2025 16:50:18 -0500
> Bjorn Helgaas wrote:
>
> > From: Bjorn Helgaas
> >
> > Previously the struct aer_err_info "info" was allocated on the stack
> > without being i
On Wed, May 21, 2025 at 10:56:59AM +0100, Jonathan Cameron wrote:
> On Tue, 20 May 2025 16:50:30 -0500
> Bjorn Helgaas wrote:
>
> > From: Karolina Stolarek
> >
> > Some existing logs in pci_print_aer() log with error severity by default.
> > Convert them to
On Wed, May 21, 2025 at 10:46:42AM +0100, Jonathan Cameron wrote:
> On Tue, 20 May 2025 16:50:28 -0500
> Bjorn Helgaas wrote:
>
> > From: Bjorn Helgaas
> >
> > As with the AER statistics, we always want to emit trace events, even if
> > the actual dmesg loggi
olina Stolarek
Signed-off-by: Jon Pan-Doh
Signed-off-by: Bjorn Helgaas
Tested-by: Krzysztof Wilczyński
---
...es-aer_stats => sysfs-bus-pci-devices-aer} | 34 +++
Documentation/PCI/pcieaer-howto.rst | 5 +-
drivers/pci/pci-sysfs.c
From: Karolina Stolarek
Some existing logs in pci_print_aer() log with error severity by default.
Convert them to depend on error type (consistent with rest of AER logging).
Signed-off-by: Karolina Stolarek
Signed-off-by: Bjorn Helgaas
Tested-by: Krzysztof Wilczyński
Reviewed-by: Kuppuswamy
From: Bjorn Helgaas
As with the AER statistics, we always want to emit trace events, even if
the actual dmesg logging is rate limited.
Call trace_aer_event() directly from pci_dev_aer_stats_incr(), where we
update the statistics.
Signed-off-by: Bjorn Helgaas
Tested-by: Krzysztof Wilczyński
rce()]
Signed-off-by: Jon Pan-Doh
Signed-off-by: Bjorn Helgaas
Tested-by: Krzysztof Wilczyński
Reviewed-by: Ilpo Järvinen
Reviewed-by: Kuppuswamy Sathyanarayanan
---
drivers/pci/pcie/aer.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/pci/pcie/aer.c b/drivers/
From: Karolina Stolarek
Update name to reflect the broader definition of structs/variables that are
stored (e.g. ratelimits). This is a preparatory patch for adding rate limit
support.
[bhelgaas: "aer_report" -> "aer_info"]
Signed-off-by: Karolina Stolarek
Signed-off-by
From: Jon Pan-Doh
Add ratelimits section for rationale and defaults.
[bhelgaas: note fatal errors are not ratelimited]
Signed-off-by: Karolina Stolarek
Signed-off-by: Jon Pan-Doh
Signed-off-by: Bjorn Helgaas
Tested-by: Krzysztof Wilczyński
Reviewed-by: Kuppuswamy Sathyanarayanan
Acked-by
quot;aer_report" -> "aer_info"]
Reported-by: Sargun Dhillon
Signed-off-by: Jon Pan-Doh
Signed-off-by: Bjorn Helgaas
---
drivers/pci/pci.h | 3 +-
drivers/pci/pcie/aer.c | 66 ++
drivers/pci/pcie/dpc.c | 1 +
3 files changed, 64 insertions
aer_err_info instead of passing it
as a parameter]
Signed-off-by: Karolina Stolarek
Signed-off-by: Bjorn Helgaas
Tested-by: Krzysztof Wilczyński
Reviewed-by: Ilpo Järvinen
Reviewed-by: Kuppuswamy Sathyanarayanan
---
drivers/pci/pci.h | 1 +
drivers/pci/pcie/aer.c | 21
From: Bjorn Helgaas
Previously the struct aer_err_info "e_info" was allocated on the stack
without being initialized, so it contained junk except for the fields we
explicitly set later.
Initialize "e_info" at declaration with a designated initializer list,
which initializes
From: Bjorn Helgaas
There are two AER logging entry points:
- aer_print_error() is used by DPC (dpc_process_error()) and native AER
handling (aer_process_err_devices()).
- pci_print_aer() is used by GHES (aer_recover_work_func()) and CXL
(cxl_handle_rdport_errors())
Both use
From: Bjorn Helgaas
Move aer_print_source() earlier in the file so a future change can use it
from aer_print_error(), where it's easier to rate limit it.
Signed-off-by: Bjorn Helgaas
Tested-by: Krzysztof Wilczyński
Reviewed-by: Kuppuswamy Sathyanarayanan
Reviewed-by: Ilpo Jär
From: Bjorn Helgaas
Simplify pci_print_aer() by initializing the struct aer_err_info "info"
with a designated initializer list (it was previously initialized with
memset()) and using pci_name().
Signed-off-by: Bjorn Helgaas
Tested-by: Krzysztof Wilczyński
Reviewed-by: Ilp
From: Bjorn Helgaas
Use PCI_BUS_NUM(), PCI_SLOT(), PCI_FUNC() to extract the bus number,
device, and function number directly from the Error Source ID. There's no
need to shift and mask it explicitly.
Signed-off-by: Bjorn Helgaas
Tested-by: Krzysztof Wilczyński
Reviewed-by: Kuppu
From: Bjorn Helgaas
Previously we decoded the AER Error Source ID in aer_isr_one_error_type(),
then again in find_source_device() if we didn't find any devices with
errors logged in their AER Capabilities.
Consolidate this so we only decode and log the Error Source ID on
From: Bjorn Helgaas
Previously the struct aer_err_info "info" was allocated on the stack
without being initialized, so it contained junk except for the fields we
explicitly set later.
Initialize "info" at declaration so it starts as all zeros.
Signed-off-by: Bjorn Helgaas
From: Bjorn Helgaas
aer_isr_one_error() duplicates the Error Source ID logging and AER error
processing for Correctable Errors and Uncorrectable Errors. Factor out the
duplicated code to aer_isr_one_error_type().
aer_isr_one_error() doesn't need the struct aer_rpc pointer, so pass it the
From: Bjorn Helgaas
DPC Error Source ID is only valid when the DPC Trigger Reason indicates
that DPC was triggered due to reception of an ERR_NONFATAL or ERR_FATAL
Message (PCIe r6.0, sec 7.9.14.5).
When DPC was triggered by ERR_NONFATAL (PCI_EXP_DPC_STATUS_TRIGGER_RSN_NFE)
or ERR_FATAL
From: Bjorn Helgaas
This work is mostly due to Jon Pan-Doh and Karolina Stolarek. I rebased
this to v6.15-rc1, factored out some of the trace and statistics updates,
and added some minor cleanups.
I'm sorry to post a v7 so soon after v6, but I really want to get this in
v6.16 so it nee
On Mon, May 19, 2025 at 08:30:09PM -0700, Sathyanarayanan Kuppuswamy wrote:
>
> On 5/19/25 2:35 PM, Bjorn Helgaas wrote:
> > From: Karolina Stolarek
> >
> > Update name to reflect the broader definition of structs/variables that are
> > stored (e.g. ratelimits). T
On Mon, May 19, 2025 at 10:01:09PM -0700, Sathyanarayanan Kuppuswamy wrote:
>
> On 5/19/25 2:35 PM, Bjorn Helgaas wrote:
> > From: Jon Pan-Doh
> >
> > Add ratelimits section for rationale and defaults.
> > +AER Ratelimits
> > +--
> > +
>
On Tue, May 20, 2025 at 02:55:32PM +0300, Ilpo Järvinen wrote:
> On Mon, 19 May 2025, Bjorn Helgaas wrote:
>
> > From: Jon Pan-Doh
> >
> > Spammy devices can flood kernel logs with AER errors and slow/stall
> > execution. Add per-device ratelimits for AER
On Mon, May 19, 2025 at 09:59:29PM -0700, Sathyanarayanan Kuppuswamy wrote:
> On 5/19/25 2:35 PM, Bjorn Helgaas wrote:
> > From: Jon Pan-Doh
> >
> > Spammy devices can flood kernel logs with AER errors and slow/stall
> > execution. Add per-device ratelimits for AER co
On Tue, May 20, 2025 at 03:02:06PM +0300, Ilpo Järvinen wrote:
> On Mon, 19 May 2025, Bjorn Helgaas wrote:
>
> > From: Jon Pan-Doh
> >
> > Allow userspace to read/write log ratelimits per device (including
> > enable/disable). Create aer/ sysfs directory to sto
On Tue, May 20, 2025 at 02:37:33PM +0300, Ilpo Järvinen wrote:
> On Mon, 19 May 2025, Bjorn Helgaas wrote:
>
> > From: Karolina Stolarek
> >
> > Some existing logs in pci_print_aer() log with error severity by default.
> > Convert them to depend on error typ
On Mon, May 19, 2025 at 11:17:28PM +, Weinan Liu wrote:
> > diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> > index 315bf2bfd570..34af0ea45c0d 100644
> > --- a/drivers/pci/pcie/dpc.c
> > +++ b/drivers/pci/pcie/dpc.c
> > @@ -252,6 +252,7 @@ static int dpc_get_aer_uncorrect_severit
On Mon, May 19, 2025 at 05:02:28PM -0700, Sathyanarayanan Kuppuswamy wrote:
> On 5/19/25 2:35 PM, Bjorn Helgaas wrote:
> > From: Bjorn Helgaas
> >
> > Simplify pci_print_aer() by initializing the struct aer_err_info "info"
> > with a designated initialize
On Tue, May 20, 2025 at 01:39:06PM +0300, Ilpo Järvinen wrote:
> On Mon, 19 May 2025, Bjorn Helgaas wrote:
>
> > From: Bjorn Helgaas
> >
> > Previously the struct aer_err_info "e_info" was allocated on the stack
> > without being initialized, so it
On Mon, May 19, 2025 at 04:39:19PM -0700, Sathyanarayanan Kuppuswamy wrote:
> On 5/19/25 2:35 PM, Bjorn Helgaas wrote:
> > From: Bjorn Helgaas
> >
> > Previously we decoded the AER Error Source ID in two places. Consolidate
> > them so both places use aer_print_p
On Tue, May 20, 2025 at 01:28:02PM +0300, Ilpo Järvinen wrote:
> On Mon, 19 May 2025, Bjorn Helgaas wrote:
> > DPC Error Source ID is only valid when the DPC Trigger Reason indicates
> > that DPC was triggered due to reception of an ERR_NONFATAL or ERR_FATAL
> > Message (PC
On Tue, May 20, 2025 at 12:39:18PM +0300, Ilpo Järvinen wrote:
> On Mon, 19 May 2025, Bjorn Helgaas wrote:
>
> > From: Bjorn Helgaas
> >
> > Previously the struct aer_err_info "info" was allocated on the stack
> > without being initialized, so it
On Mon, May 19, 2025 at 04:15:56PM -0700, Sathyanarayanan Kuppuswamy wrote:
> On 5/19/25 2:35 PM, Bjorn Helgaas wrote:
> > From: Bjorn Helgaas
> >
> > DPC Error Source ID is only valid when the DPC Trigger Reason indicates
> > that DPC was triggered due to rece
On Mon, May 19, 2025 at 03:41:50PM -0700, Sathyanarayanan Kuppuswamy wrote:
> Hi,
>
> On 5/19/25 2:35 PM, Bjorn Helgaas wrote:
> > From: Bjorn Helgaas
> >
> > Previously the struct aer_err_info "info" was allocated on the stack
>
> /s/Previously/Curr
On Sat, May 17, 2025 at 12:55:14AM +0800, Hans Zhang wrote:
> The following series introduces a new kernel command-line option aer_panic
> to enhance error handling for PCIe Advanced Error Reporting (AER) in
> mission-critical environments. This feature ensures deterministic recover
> from fatal PC
From: Bjorn Helgaas
Use PCI_BUS_NUM(), PCI_SLOT(), PCI_FUNC() to extract the bus number,
device, and function number directly from the Error Source ID. There's no
need to shift and mask it explicitly.
Signed-off-by: Bjorn Helgaas
---
drivers/pci/pcie/aer.c | 7 +++
1 file chang
From: Bjorn Helgaas
As with the AER statistics, we always want to emit trace events, even if
the actual dmesg logging is rate limited.
Call trace_aer_event() directly from pci_dev_aer_stats_incr(), where we
update the statistics.
Signed-off-by: Bjorn Helgaas
---
drivers/pci/pcie/aer.c | 12
Signed-off-by: Jon Pan-Doh
Signed-off-by: Bjorn Helgaas
---
drivers/pci/pci.h | 3 ++-
drivers/pci/pcie/aer.c | 49 --
drivers/pci/pcie/dpc.c | 1 +
3 files changed, 46 insertions(+), 7 deletions(-)
diff --git a/drivers/pci/pci.h b/drivers/pci
From: Bjorn Helgaas
DPC Error Source ID is only valid when the DPC Trigger Reason indicates
that DPC was triggered due to reception of an ERR_NONFATAL or ERR_FATAL
Message (PCIe r6.0, sec 7.9.14.5).
When DPC was triggered by ERR_NONFATAL (PCI_EXP_DPC_STATUS_TRIGGER_RSN_NFE)
or ERR_FATAL
Signed-off-by: Bjorn Helgaas
---
drivers/pci/pcie/aer.c | 50 +-
include/linux/pci.h| 2 +-
2 files changed, 26 insertions(+), 26 deletions(-)
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 06a7dda20846..da62032bf024 100644
--- a
and sent 6 more AER errors. Observed all 6 errors
logged and accounted in AER stats (12 total errors).
[1] https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git
Signed-off-by: Karolina Stolarek
Signed-off-by: Jon Pan-Doh
Signed-off-by: Bjorn Helgaas
Acked-by: Paul E. McKen
From: Jon Pan-Doh
Add ratelimits section for rationale and defaults.
Signed-off-by: Karolina Stolarek
Signed-off-by: Jon Pan-Doh
Signed-off-by: Bjorn Helgaas
Reviewed-by: Kuppuswamy Sathyanarayanan
Acked-by: Paul E. McKenney
---
Documentation/PCI/pcieaer-howto.rst | 11 +++
1
: Bjorn Helgaas
---
drivers/pci/pcie/aer.c | 16 +++-
1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 73b03a195b14..06a7dda20846 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -788,15 +788,21
From: Bjorn Helgaas
Simplify pci_print_aer() by initializing the struct aer_err_info "info"
with a designated initializer list (it was previously initialized with
memset()) and using pci_name().
Signed-off-by: Bjorn Helgaas
---
drivers/pci/pcie/aer.c | 16
1 file
aer_err_info instead of passing it
as a parameter]
Link: https://lore.kernel.org/r/20250321015806.954866-2-pan...@google.com
Signed-off-by: Karolina Stolarek
Signed-off-by: Bjorn Helgaas
---
drivers/pci/pci.h | 1 +
drivers/pci/pcie/aer.c | 21 ++---
drivers/pci/pcie/dpc.c
From: Bjorn Helgaas
There are two AER logging entry points:
- aer_print_error() is used by DPC (dpc_process_error()) and native AER
handling (aer_process_err_devices()).
- pci_print_aer() is used by GHES (aer_recover_work_func()) and CXL
(cxl_handle_rdport_errors())
Both use
From: Bjorn Helgaas
Move aer_print_source() earlier in the file so a future change can use it
from aer_print_error(), where it's easier to rate limit it.
Signed-off-by: Bjorn Helgaas
---
drivers/pci/pcie/aer.c | 24
1 file changed, 12 insertions(+), 12 dele
From: Bjorn Helgaas
Previously the struct aer_err_info "e_info" was allocated on the stack
without being initialized, so it contained junk except for the fields we
explicitly set later.
Initialize "e_info" at declaration with a designated initializer list,
which initializes
rce()]
Link: https://lore.kernel.org/r/20250321015806.954866-5-pan...@google.com
Signed-off-by: Jon Pan-Doh
Signed-off-by: Bjorn Helgaas
---
drivers/pci/pcie/aer.c | 10 +-
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
in
From: Bjorn Helgaas
Previously we decoded the AER Error Source ID in two places. Consolidate
them so both places use aer_print_port_info(). Add a "details" parameter
so we can add a note when we didn't find any downstream devices with errors
logged in their AER Capability.
When
From: Bjorn Helgaas
Previously the struct aer_err_info "info" was allocated on the stack
without being initialized, so it contained junk except for the fields we
explicitly set later.
Initialize "info" at declaration so it starts as all zeroes.
Signed-off-by: Bjorn Helga
From: Bjorn Helgaas
This work is mostly due to Jon Pan-Doh and Karolina Stolarek. I rebased
this to v6.15-rc1, factored out some of the trace and statistics updates,
and added some minor cleanups.
Proposal
When using native AER, spammy devices can flood kernel logs with AER errors
[+cc Jon, Karolina]
On Wed, Jan 08, 2025 at 03:57:03PM +0800, Bijie Xu wrote:
> Sometimes certain PCIE devices installed on some servers occasionally
> produce large number of AER correctable error logs, which is quite
> annoying. Add this sysctl parameter kernel.aer_print_skip_mask to
> skip prin
On Fri, Feb 07, 2025 at 06:18:34PM +0200, Ilpo Järvinen wrote:
> This series adds support for Flit Mode (PCIe6).
>
> v2:
> - Rebased
>
> Ilpo Järvinen (2):
> PCI: Track Flit Mode Status & print it with link status
> PCI: Handle TLP Log in Flit mode
>
> drivers/pci/hotplug/pciehp_hpc.c | 5
1 - 100 of 1001 matches
Mail list logo