Re: [PATCH v3 0/6] PowerNV PCIe Hotplug Driver Fixes

2025-07-23 Thread Bjorn Helgaas
On Wed, Jul 23, 2025 at 04:30:18PM +0530, Madhavan Srinivasan wrote: > > > On 7/23/25 2:17 AM, Bjorn Helgaas wrote: > > On Thu, Jul 17, 2025 at 06:27:52PM -0500, Bjorn Helgaas wrote: > >> On Tue, Jul 15, 2025 at 04:31:49PM -0500, Timothy Pearson wrote: > >>>

Re: [PATCH v3 0/6] PowerNV PCIe Hotplug Driver Fixes

2025-07-22 Thread Bjorn Helgaas
[-> to: Madhavan, Michael, Mahesh; seeking acks] On Thu, Jul 17, 2025 at 06:27:52PM -0500, Bjorn Helgaas wrote: > On Tue, Jul 15, 2025 at 04:31:49PM -0500, Timothy Pearson wrote: > > Hello all, > > > > This series includes several fixes for bugs in the PowerNV PCIe hotp

Re: [PATCH v3 0/6] PowerNV PCIe Hotplug Driver Fixes

2025-07-17 Thread Bjorn Helgaas
tree, but would need acks from the powerpc folks for the arch/powerpc parts. Alternatively it could be merged via powerpc with my ack on the drivers/pci patches: Acked-by: Bjorn Helgaas If you do merge via powerpc, I made some comment formatting and commit log tweaks that I would like reflected

Re: [PATCH v3 5/6] PCI: pnv_php: Fix surprise plug detection and recovery

2025-07-17 Thread Bjorn Helgaas
On Tue, Jul 15, 2025 at 04:39:06PM -0500, Timothy Pearson wrote: > The existing PowerNV hotplug code did not handle surprise plug events > correctly, leading to a complete failure of the hotplug system after > device removal and a required reboot to detect new devices. > +++ b/drivers/pci/hotplug/

Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator

2025-07-11 Thread Bjorn Helgaas
On Fri, Jul 11, 2025 at 01:18:07PM -0500, Timothy Pearson wrote: > - Original Message - > > From: "Krishna Kumar" > > To: "Bjorn Helgaas" , "Timothy Pearson" > > > > Cc: "linuxppc-dev" , "linux-kernel&

Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator

2025-06-24 Thread Bjorn Helgaas
On Wed, Jun 18, 2025 at 07:37:54PM -0500, Timothy Pearson wrote: > - Original Message - > > From: "Bjorn Helgaas" > > To: "Timothy Pearson" > > Cc: "linuxppc-dev" , "linux-kernel" > > , "linux-pci" > > ,

Re: [PATCH v2 2/6] pci/hotplug/pnv_php: Work around switches with broken

2025-06-18 Thread Bjorn Helgaas
On Wed, Jun 18, 2025 at 02:50:04PM -0500, Timothy Pearson wrote: > - Original Message - > > From: "Bjorn Helgaas" > > To: "Timothy Pearson" > > Cc: "linuxppc-dev" , "linux-kernel" > > , "linux-pci" > > ,

Re: [PATCH v2 2/6] pci/hotplug/pnv_php: Work around switches with broken

2025-06-18 Thread Bjorn Helgaas
[+cc Lukas, pciehp expert] On Wed, Jun 18, 2025 at 11:56:54AM -0500, Timothy Pearson wrote: > presence detection (subject/commit wrapping seems to be on all of these patches) > The Microsemi Switchtec PM8533 PFX 48xG3 [11f8:8533] PCIe switch system > was observed to incorrectly assert the Prese

Re: [PATCH v2 5/6] pci/hotplug/pnv_php: Fix surprise plug detection and

2025-06-18 Thread Bjorn Helgaas
On Wed, Jun 18, 2025 at 11:58:23AM -0500, Timothy Pearson wrote: > recovery Same weird subject/commit wrapping. > The existing PowerNV hotplug code did not handle suprise plug events > correctly, leading to a complete failure of the hotplug system after > device removal and a required reboot to

Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator

2025-06-18 Thread Bjorn Helgaas
On Wed, Jun 18, 2025 at 11:58:59AM -0500, Timothy Pearson wrote: > state Weird wrapping of last word of subject to here. > The PCIe specification allows three attention indicator states, > on, off, and blink. Enable all three states instead of basic > on / off control. > > Signed-off-by: Timot

Re: [PATCH v4 4/5] PCI: host-common: Add link down handling for host bridges

2025-06-02 Thread Bjorn Helgaas
On Fri, May 30, 2025 at 09:39:28PM +0530, Manivannan Sadhasivam wrote: > On Fri, May 30, 2025 at 06:34:04AM -0500, Bjorn Helgaas wrote: > > On Fri, May 30, 2025 at 09:16:59AM +0530, Manivannan Sadhasivam wrote: > > > On Wed, May 28, 2025 at 05:35:00PM -0500, Bjorn Helgaas wro

Re: [PATCH v4 4/5] PCI: host-common: Add link down handling for host bridges

2025-06-02 Thread Bjorn Helgaas
On Thu, May 08, 2025 at 12:40:33PM +0530, Manivannan Sadhasivam wrote: > The PCI link, when down, needs to be recovered to bring it back. But that > cannot be done in a generic way as link recovery procedure is specific to > host bridges. So add a new API pci_host_handle_link_down() that could be >

Re: [PATCH v4 4/5] PCI: host-common: Add link down handling for host bridges

2025-05-30 Thread Bjorn Helgaas
On Fri, May 30, 2025 at 09:16:59AM +0530, Manivannan Sadhasivam wrote: > On Wed, May 28, 2025 at 05:35:00PM -0500, Bjorn Helgaas wrote: > > On Thu, May 08, 2025 at 12:40:33PM +0530, Manivannan Sadhasivam wrote: > > > The PCI link, when down, needs to be recovered to bring

Re: [PATCH v4 4/5] PCI: host-common: Add link down handling for host bridges

2025-05-28 Thread Bjorn Helgaas
On Thu, May 08, 2025 at 12:40:33PM +0530, Manivannan Sadhasivam wrote: > The PCI link, when down, needs to be recovered to bring it back. But that > cannot be done in a generic way as link recovery procedure is specific to > host bridges. So add a new API pci_host_handle_link_down() that could be >

Re: [PATCH v8 00/20] Rate limit AER logs

2025-05-23 Thread Bjorn Helgaas
On Thu, May 22, 2025 at 06:21:06PM -0500, Bjorn Helgaas wrote: > From: Bjorn Helgaas > > This work is mostly due to Jon Pan-Doh and Karolina Stolarek. I rebased > this to v6.15-rc1, factored out some of the trace and statistics updates, > and added some minor cleanups. > >

Re: [PATCH v8 16/20] PCI/AER: Convert aer_get_device_error_info(), aer_print_error() to index

2025-05-23 Thread Bjorn Helgaas
On Fri, May 23, 2025 at 02:13:52PM +0300, Ilpo Järvinen wrote: > On Thu, 22 May 2025, Bjorn Helgaas wrote: > > > From: Bjorn Helgaas > > > > Previously aer_get_device_error_info() and aer_print_error() took a pointer > > to struct aer_err_info and a pointer to a p

Re: [PATCH v8 18/20] PCI/AER: Ratelimit correctable and non-fatal error logging

2025-05-23 Thread Bjorn Helgaas
On Thu, May 22, 2025 at 04:56:56PM -0700, Sathyanarayanan Kuppuswamy wrote: > On 5/22/25 4:21 PM, Bjorn Helgaas wrote: > > From: Jon Pan-Doh > > > > Spammy devices can flood kernel logs with AER errors and slow/stall > > execution. Add per-device ratelimits for AE

[PATCH v8 20/20] PCI/AER: Add sysfs attributes for log ratelimits

2025-05-22 Thread Bjorn Helgaas
git [bhelgaas: note fatal errors are not ratelimited, "aer_report" -> "aer_info", replace ratelimit_log_enable toggle with *_ratelimit_interval_ms] Signed-off-by: Karolina Stolarek Signed-off-by: Jon Pan-Doh Signed-off-by: Bjorn Helgaas Link: https://patch.msgid.link/2025

[PATCH v8 19/20] PCI/AER: Add ratelimits to PCI AER Documentation

2025-05-22 Thread Bjorn Helgaas
-by: Bjorn Helgaas Link: https://patch.msgid.link/20250520215047.1350603-17-helg...@kernel.org --- Documentation/PCI/pcieaer-howto.rst | 12 1 file changed, 12 insertions(+) diff --git a/Documentation/PCI/pcieaer-howto.rst b/Documentation/PCI/pcieaer-howto.rst index f013f3b27c82

[PATCH v8 18/20] PCI/AER: Ratelimit correctable and non-fatal error logging

2025-05-22 Thread Bjorn Helgaas
quot;aer_report" -> "aer_info", "cor_log_ratelimit" -> "correctable_ratelimit", "uncor_log_ratelimit" -> "nonfatal_ratelimit"] Reported-by: Sargun Dhillon Signed-off-by: Jon Pan-Doh Signed-off-by: Bjorn Helgaas Link: https://patch

[PATCH v8 17/20] PCI/AER: Simplify add_error_device()

2025-05-22 Thread Bjorn Helgaas
From: Bjorn Helgaas Return -ENOSPC error early so the usual path through add_error_device() is the straightline code. Signed-off-by: Bjorn Helgaas --- drivers/pci/pcie/aer.c | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/pci/pcie/aer.c b/drivers

[PATCH v8 15/20] PCI/AER: Rename struct aer_stats to aer_info

2025-05-22 Thread Bjorn Helgaas
From: Karolina Stolarek Update name to reflect the broader definition of structs/variables that are stored (e.g. ratelimits). This is a preparatory patch for adding rate limit support. [bhelgaas: "aer_report" -> "aer_info"] Signed-off-by: Karolina Stolarek Signed-off-by

[PATCH v8 16/20] PCI/AER: Convert aer_get_device_error_info(), aer_print_error() to index

2025-05-22 Thread Bjorn Helgaas
From: Bjorn Helgaas Previously aer_get_device_error_info() and aer_print_error() took a pointer to struct aer_err_info and a pointer to a pci_dev. Typically the pci_dev was one of the elements of the aer_err_info.dev[] array (DPC was an exception, where the dev[] array was unused). Convert

[PATCH v8 14/20] PCI/AER: Reduce pci_print_aer() correctable error level to KERN_WARNING

2025-05-22 Thread Bjorn Helgaas
From: Karolina Stolarek Some existing logs in pci_print_aer() log with error severity by default. Convert them to use KERN_WARNING for correctable errors and KERN_ERR for uncorrectable errors. [bhelgaas: commit log] Signed-off-by: Karolina Stolarek Signed-off-by: Bjorn Helgaas Tested-by

[PATCH v8 13/20] PCI/ERR: Add printk level to pcie_print_tlp_log()

2025-05-22 Thread Bjorn Helgaas
From: Bjorn Helgaas aer_print_error() produces output at a printk level (KERN_ERR/KERN_WARNING/ etc) that depends on the kind of error, and it calls pcie_print_tlp_log(), which previously always produced output at KERN_ERR. Add a "level" parameter so aer_print_error() can control th

[PATCH v8 10/20] PCI/AER: Update statistics before ratelimiting

2025-05-22 Thread Bjorn Helgaas
From: Bjorn Helgaas There are two AER logging entry points: - aer_print_error() is used by DPC (dpc_process_error()) and native AER handling (aer_process_err_devices()). - pci_print_aer() is used by GHES (aer_recover_work_func()) and CXL (cxl_handle_rdport_errors()) Both use

[PATCH v8 12/20] PCI/AER: Check log level once and remember it

2025-05-22 Thread Bjorn Helgaas
aer_err_info instead of passing it as a parameter] Signed-off-by: Karolina Stolarek Tested-by: Krzysztof Wilczyński Reviewed-by: Ilpo Järvinen Reviewed-by: Kuppuswamy Sathyanarayanan Reviewed-by: Jonathan Cameron Signed-off-by: Bjorn Helgaas Link: https://patch.msgid.link

[PATCH v8 11/20] PCI/AER: Trace error event before ratelimiting

2025-05-22 Thread Bjorn Helgaas
From: Bjorn Helgaas As with the AER statistics, we always want to emit trace events, even if the actual dmesg logging is rate limited. Call trace_aer_event() immediately after pci_dev_aer_stats_incr() so both happen before ratelimiting. Signed-off-by: Bjorn Helgaas Tested-by: Krzysztof

[PATCH v8 09/20] PCI/AER: Simplify pci_print_aer()

2025-05-22 Thread Bjorn Helgaas
From: Bjorn Helgaas Simplify pci_print_aer() by initializing the struct aer_err_info "info" with a designated initializer list (it was previously initialized with memset()) and using pci_name(). Signed-off-by: Bjorn Helgaas Tested-by: Krzysztof Wilczyński Reviewed-by: Ilpo Järvinen

[PATCH v8 07/20] PCI/AER: Move aer_print_source() earlier in file

2025-05-22 Thread Bjorn Helgaas
From: Bjorn Helgaas Move aer_print_source() earlier in the file so a future change can use it from aer_print_error(), where it's easier to rate limit it. Signed-off-by: Bjorn Helgaas Tested-by: Krzysztof Wilczyński Reviewed-by: Kuppuswamy Sathyanarayanan Reviewed-by: Ilpo Järvinen Rev

[PATCH v8 08/20] PCI/AER: Initialize aer_err_info before using it

2025-05-22 Thread Bjorn Helgaas
From: Bjorn Helgaas Previously the struct aer_err_info "e_info" was allocated on the stack without being initialized, so it contained junk except for the fields we explicitly set later. Initialize "e_info" at declaration with a designated initializer list, which initializes

[PATCH v8 06/20] PCI/AER: Rename aer_print_port_info() to aer_print_source()

2025-05-22 Thread Bjorn Helgaas
rce()] Signed-off-by: Jon Pan-Doh Tested-by: Krzysztof Wilczyński Reviewed-by: Ilpo Järvinen Reviewed-by: Kuppuswamy Sathyanarayanan Reviewed-by: Jonathan Cameron Signed-off-by: Bjorn Helgaas Link: https://patch.msgid.link/20250520215047.1350603-7-helg...@kernel.org --- drivers/pci/pcie/aer.c

[PATCH v8 03/20] PCI/AER: Factor COR/UNCOR error handling out from aer_isr_one_error()

2025-05-22 Thread Bjorn Helgaas
From: Bjorn Helgaas aer_isr_one_error() duplicates the Error Source ID logging and AER error processing for Correctable Errors and Uncorrectable Errors. Factor out the duplicated code to aer_isr_one_error_type(). aer_isr_one_error() doesn't need the struct aer_rpc pointer, so pass it the

[PATCH v8 02/20] PCI/DPC: Log Error Source ID only when valid

2025-05-22 Thread Bjorn Helgaas
From: Bjorn Helgaas DPC Error Source ID is only valid when the DPC Trigger Reason indicates that DPC was triggered due to reception of an ERR_NONFATAL or ERR_FATAL Message (PCIe r6.0, sec 7.9.14.5). When DPC was triggered by ERR_NONFATAL (PCI_EXP_DPC_STATUS_TRIGGER_RSN_NFE) or ERR_FATAL

[PATCH v8 05/20] PCI/AER: Extract bus/dev/fn in aer_print_port_info() with PCI_BUS_NUM(), etc

2025-05-22 Thread Bjorn Helgaas
From: Bjorn Helgaas Use PCI_BUS_NUM(), PCI_SLOT(), PCI_FUNC() to extract the bus number, device, and function number directly from the Error Source ID. There's no need to shift and mask it explicitly. Signed-off-by: Bjorn Helgaas Tested-by: Krzysztof Wilczyński Reviewed-by: Kuppu

[PATCH v8 04/20] PCI/AER: Consolidate Error Source ID logging in aer_isr_one_error_type()

2025-05-22 Thread Bjorn Helgaas
From: Bjorn Helgaas Previously we decoded the AER Error Source ID in aer_isr_one_error_type(), then again in find_source_device() if we didn't find any devices with errors logged in their AER Capabilities. Consolidate this so we only decode and log the Error Source ID on

[PATCH v8 00/20] Rate limit AER logs

2025-05-22 Thread Bjorn Helgaas
From: Bjorn Helgaas This work is mostly due to Jon Pan-Doh and Karolina Stolarek. I rebased this to v6.15-rc1, factored out some of the trace and statistics updates, and added some minor cleanups. I pushed this to pci/aer at https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git/log/?h

[PATCH v8 01/20] PCI/DPC: Initialize aer_err_info before using it

2025-05-22 Thread Bjorn Helgaas
From: Bjorn Helgaas Previously the struct aer_err_info "info" was allocated on the stack without being initialized, so it contained junk except for the fields we explicitly set later. Initialize "info" at declaration so it starts as all zeros. Fixes: 8aefa9b0d910 ("PCI

Re: [PATCH v7 17/17] PCI/AER: Add sysfs attributes for log ratelimits

2025-05-22 Thread Bjorn Helgaas
On Tue, May 20, 2025 at 04:50:34PM -0500, Bjorn Helgaas wrote: > From: Jon Pan-Doh > > Allow userspace to read/write log ratelimits per device (including > enable/disable). Create aer/ sysfs directory to store them and any > future aer configs. > +P

Re: [PATCH v7 00/17] Rate limit AER logs

2025-05-21 Thread Bjorn Helgaas
On Tue, May 20, 2025 at 04:50:17PM -0500, Bjorn Helgaas wrote: > From: Bjorn Helgaas > > This work is mostly due to Jon Pan-Doh and Karolina Stolarek. I rebased > this to v6.15-rc1, factored out some of the trace and statistics updates, > and added some minor cleanups. > >

Re: [PATCH v7 15/17] PCI/AER: Ratelimit correctable and non-fatal error logging

2025-05-21 Thread Bjorn Helgaas
On Tue, May 20, 2025 at 03:33:45PM -0700, Sathyanarayanan Kuppuswamy wrote: > On 5/20/25 2:50 PM, Bjorn Helgaas wrote: > > From: Jon Pan-Doh > > > > Spammy devices can flood kernel logs with AER errors and slow/stall > > execution. Add per-device ratelimits for AE

Re: [PATCH v7 17/17] PCI/AER: Add sysfs attributes for log ratelimits

2025-05-21 Thread Bjorn Helgaas
On Wed, May 21, 2025 at 11:46:00AM +0100, Jonathan Cameron wrote: > On Tue, 20 May 2025 16:50:34 -0500 > Bjorn Helgaas wrote: > > > From: Jon Pan-Doh > > > > Allow userspace to read/write log ratelimits per device (including > > enable/disable). Create aer/

Re: [PATCH v7 15/17] PCI/AER: Ratelimit correctable and non-fatal error logging

2025-05-21 Thread Bjorn Helgaas
On Wed, May 21, 2025 at 11:31:21AM +0100, Jonathan Cameron wrote: > On Tue, 20 May 2025 16:50:32 -0500 > Bjorn Helgaas wrote: > > > From: Jon Pan-Doh > > > > Spammy devices can flood kernel logs with AER errors and slow/stall > > execution. Add per-device ratel

Re: [PATCH v7 02/17] PCI/DPC: Log Error Source ID only when valid

2025-05-21 Thread Bjorn Helgaas
On Wed, May 21, 2025 at 10:00:35AM +0100, Jonathan Cameron wrote: > On Tue, 20 May 2025 16:50:19 -0500 > Bjorn Helgaas wrote: > > > From: Bjorn Helgaas > > > > DPC Error Source ID is only valid when the DPC Trigger Reason indicates > > that DPC was triggered du

Re: [PATCH v7 04/17] PCI/AER: Consolidate Error Source ID logging in aer_isr_one_error_type()

2025-05-21 Thread Bjorn Helgaas
On Wed, May 21, 2025 at 10:20:41AM +0100, Jonathan Cameron wrote: > On Tue, 20 May 2025 16:50:21 -0500 > Bjorn Helgaas wrote: > > > From: Bjorn Helgaas > > > > Previously we decoded the AER Error Source ID in aer_isr_one_error_type(), > > then again in find_

Re: [PATCH v7 01/17] PCI/DPC: Initialize aer_err_info before using it

2025-05-21 Thread Bjorn Helgaas
On Wed, May 21, 2025 at 09:52:18AM +0100, Jonathan Cameron wrote: > On Tue, 20 May 2025 16:50:18 -0500 > Bjorn Helgaas wrote: > > > From: Bjorn Helgaas > > > > Previously the struct aer_err_info "info" was allocated on the stack > > without being i

Re: [PATCH v7 13/17] PCI/AER: Make all pci_print_aer() log levels depend on error type

2025-05-21 Thread Bjorn Helgaas
On Wed, May 21, 2025 at 10:56:59AM +0100, Jonathan Cameron wrote: > On Tue, 20 May 2025 16:50:30 -0500 > Bjorn Helgaas wrote: > > > From: Karolina Stolarek > > > > Some existing logs in pci_print_aer() log with error severity by default. > > Convert them to

Re: [PATCH v7 11/17] PCI/AER: Combine trace_aer_event() with statistics updates

2025-05-21 Thread Bjorn Helgaas
On Wed, May 21, 2025 at 10:46:42AM +0100, Jonathan Cameron wrote: > On Tue, 20 May 2025 16:50:28 -0500 > Bjorn Helgaas wrote: > > > From: Bjorn Helgaas > > > > As with the AER statistics, we always want to emit trace events, even if > > the actual dmesg loggi

[PATCH v7 17/17] PCI/AER: Add sysfs attributes for log ratelimits

2025-05-20 Thread Bjorn Helgaas
olina Stolarek Signed-off-by: Jon Pan-Doh Signed-off-by: Bjorn Helgaas Tested-by: Krzysztof Wilczyński --- ...es-aer_stats => sysfs-bus-pci-devices-aer} | 34 +++ Documentation/PCI/pcieaer-howto.rst | 5 +- drivers/pci/pci-sysfs.c

[PATCH v7 13/17] PCI/AER: Make all pci_print_aer() log levels depend on error type

2025-05-20 Thread Bjorn Helgaas
From: Karolina Stolarek Some existing logs in pci_print_aer() log with error severity by default. Convert them to depend on error type (consistent with rest of AER logging). Signed-off-by: Karolina Stolarek Signed-off-by: Bjorn Helgaas Tested-by: Krzysztof Wilczyński Reviewed-by: Kuppuswamy

[PATCH v7 11/17] PCI/AER: Combine trace_aer_event() with statistics updates

2025-05-20 Thread Bjorn Helgaas
From: Bjorn Helgaas As with the AER statistics, we always want to emit trace events, even if the actual dmesg logging is rate limited. Call trace_aer_event() directly from pci_dev_aer_stats_incr(), where we update the statistics. Signed-off-by: Bjorn Helgaas Tested-by: Krzysztof Wilczyński

[PATCH v7 06/17] PCI/AER: Rename aer_print_port_info() to aer_print_source()

2025-05-20 Thread Bjorn Helgaas
rce()] Signed-off-by: Jon Pan-Doh Signed-off-by: Bjorn Helgaas Tested-by: Krzysztof Wilczyński Reviewed-by: Ilpo Järvinen Reviewed-by: Kuppuswamy Sathyanarayanan --- drivers/pci/pcie/aer.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/pci/pcie/aer.c b/drivers/

[PATCH v7 14/17] PCI/AER: Rename struct aer_stats to aer_info

2025-05-20 Thread Bjorn Helgaas
From: Karolina Stolarek Update name to reflect the broader definition of structs/variables that are stored (e.g. ratelimits). This is a preparatory patch for adding rate limit support. [bhelgaas: "aer_report" -> "aer_info"] Signed-off-by: Karolina Stolarek Signed-off-by

[PATCH v7 16/17] PCI/AER: Add ratelimits to PCI AER Documentation

2025-05-20 Thread Bjorn Helgaas
From: Jon Pan-Doh Add ratelimits section for rationale and defaults. [bhelgaas: note fatal errors are not ratelimited] Signed-off-by: Karolina Stolarek Signed-off-by: Jon Pan-Doh Signed-off-by: Bjorn Helgaas Tested-by: Krzysztof Wilczyński Reviewed-by: Kuppuswamy Sathyanarayanan Acked-by

[PATCH v7 15/17] PCI/AER: Ratelimit correctable and non-fatal error logging

2025-05-20 Thread Bjorn Helgaas
quot;aer_report" -> "aer_info"] Reported-by: Sargun Dhillon Signed-off-by: Jon Pan-Doh Signed-off-by: Bjorn Helgaas --- drivers/pci/pci.h | 3 +- drivers/pci/pcie/aer.c | 66 ++ drivers/pci/pcie/dpc.c | 1 + 3 files changed, 64 insertions

[PATCH v7 12/17] PCI/AER: Check log level once and remember it

2025-05-20 Thread Bjorn Helgaas
aer_err_info instead of passing it as a parameter] Signed-off-by: Karolina Stolarek Signed-off-by: Bjorn Helgaas Tested-by: Krzysztof Wilczyński Reviewed-by: Ilpo Järvinen Reviewed-by: Kuppuswamy Sathyanarayanan --- drivers/pci/pci.h | 1 + drivers/pci/pcie/aer.c | 21

[PATCH v7 08/17] PCI/AER: Initialize aer_err_info before using it

2025-05-20 Thread Bjorn Helgaas
From: Bjorn Helgaas Previously the struct aer_err_info "e_info" was allocated on the stack without being initialized, so it contained junk except for the fields we explicitly set later. Initialize "e_info" at declaration with a designated initializer list, which initializes

[PATCH v7 10/17] PCI/AER: Update statistics early in logging

2025-05-20 Thread Bjorn Helgaas
From: Bjorn Helgaas There are two AER logging entry points: - aer_print_error() is used by DPC (dpc_process_error()) and native AER handling (aer_process_err_devices()). - pci_print_aer() is used by GHES (aer_recover_work_func()) and CXL (cxl_handle_rdport_errors()) Both use

[PATCH v7 07/17] PCI/AER: Move aer_print_source() earlier in file

2025-05-20 Thread Bjorn Helgaas
From: Bjorn Helgaas Move aer_print_source() earlier in the file so a future change can use it from aer_print_error(), where it's easier to rate limit it. Signed-off-by: Bjorn Helgaas Tested-by: Krzysztof Wilczyński Reviewed-by: Kuppuswamy Sathyanarayanan Reviewed-by: Ilpo Jär

[PATCH v7 09/17] PCI/AER: Simplify pci_print_aer()

2025-05-20 Thread Bjorn Helgaas
From: Bjorn Helgaas Simplify pci_print_aer() by initializing the struct aer_err_info "info" with a designated initializer list (it was previously initialized with memset()) and using pci_name(). Signed-off-by: Bjorn Helgaas Tested-by: Krzysztof Wilczyński Reviewed-by: Ilp

[PATCH v7 05/17] PCI/AER: Extract bus/dev/fn in aer_print_port_info() with PCI_BUS_NUM(), etc

2025-05-20 Thread Bjorn Helgaas
From: Bjorn Helgaas Use PCI_BUS_NUM(), PCI_SLOT(), PCI_FUNC() to extract the bus number, device, and function number directly from the Error Source ID. There's no need to shift and mask it explicitly. Signed-off-by: Bjorn Helgaas Tested-by: Krzysztof Wilczyński Reviewed-by: Kuppu

[PATCH v7 04/17] PCI/AER: Consolidate Error Source ID logging in aer_isr_one_error_type()

2025-05-20 Thread Bjorn Helgaas
From: Bjorn Helgaas Previously we decoded the AER Error Source ID in aer_isr_one_error_type(), then again in find_source_device() if we didn't find any devices with errors logged in their AER Capabilities. Consolidate this so we only decode and log the Error Source ID on

[PATCH v7 01/17] PCI/DPC: Initialize aer_err_info before using it

2025-05-20 Thread Bjorn Helgaas
From: Bjorn Helgaas Previously the struct aer_err_info "info" was allocated on the stack without being initialized, so it contained junk except for the fields we explicitly set later. Initialize "info" at declaration so it starts as all zeros. Signed-off-by: Bjorn Helgaas

[PATCH v7 03/17] PCI/AER: Factor COR/UNCOR error handling out from aer_isr_one_error()

2025-05-20 Thread Bjorn Helgaas
From: Bjorn Helgaas aer_isr_one_error() duplicates the Error Source ID logging and AER error processing for Correctable Errors and Uncorrectable Errors. Factor out the duplicated code to aer_isr_one_error_type(). aer_isr_one_error() doesn't need the struct aer_rpc pointer, so pass it the

[PATCH v7 02/17] PCI/DPC: Log Error Source ID only when valid

2025-05-20 Thread Bjorn Helgaas
From: Bjorn Helgaas DPC Error Source ID is only valid when the DPC Trigger Reason indicates that DPC was triggered due to reception of an ERR_NONFATAL or ERR_FATAL Message (PCIe r6.0, sec 7.9.14.5). When DPC was triggered by ERR_NONFATAL (PCI_EXP_DPC_STATUS_TRIGGER_RSN_NFE) or ERR_FATAL

[PATCH v7 00/17] Rate limit AER logs

2025-05-20 Thread Bjorn Helgaas
From: Bjorn Helgaas This work is mostly due to Jon Pan-Doh and Karolina Stolarek. I rebased this to v6.15-rc1, factored out some of the trace and statistics updates, and added some minor cleanups. I'm sorry to post a v7 so soon after v6, but I really want to get this in v6.16 so it nee

Re: [PATCH v6 13/16] PCI/AER: Rename struct aer_stats to aer_report

2025-05-20 Thread Bjorn Helgaas
On Mon, May 19, 2025 at 08:30:09PM -0700, Sathyanarayanan Kuppuswamy wrote: > > On 5/19/25 2:35 PM, Bjorn Helgaas wrote: > > From: Karolina Stolarek > > > > Update name to reflect the broader definition of structs/variables that are > > stored (e.g. ratelimits). T

Re: [PATCH v6 15/16] PCI/AER: Add ratelimits to PCI AER Documentation

2025-05-20 Thread Bjorn Helgaas
On Mon, May 19, 2025 at 10:01:09PM -0700, Sathyanarayanan Kuppuswamy wrote: > > On 5/19/25 2:35 PM, Bjorn Helgaas wrote: > > From: Jon Pan-Doh > > > > Add ratelimits section for rationale and defaults. > > +AER Ratelimits > > +-- > > + >

Re: [PATCH v6 14/16] PCI/AER: Introduce ratelimit for error logs

2025-05-20 Thread Bjorn Helgaas
On Tue, May 20, 2025 at 02:55:32PM +0300, Ilpo Järvinen wrote: > On Mon, 19 May 2025, Bjorn Helgaas wrote: > > > From: Jon Pan-Doh > > > > Spammy devices can flood kernel logs with AER errors and slow/stall > > execution. Add per-device ratelimits for AER

Re: [PATCH v6 14/16] PCI/AER: Introduce ratelimit for error logs

2025-05-20 Thread Bjorn Helgaas
On Mon, May 19, 2025 at 09:59:29PM -0700, Sathyanarayanan Kuppuswamy wrote: > On 5/19/25 2:35 PM, Bjorn Helgaas wrote: > > From: Jon Pan-Doh > > > > Spammy devices can flood kernel logs with AER errors and slow/stall > > execution. Add per-device ratelimits for AER co

Re: [PATCH v6 16/16] PCI/AER: Add sysfs attributes for log ratelimits

2025-05-20 Thread Bjorn Helgaas
On Tue, May 20, 2025 at 03:02:06PM +0300, Ilpo Järvinen wrote: > On Mon, 19 May 2025, Bjorn Helgaas wrote: > > > From: Jon Pan-Doh > > > > Allow userspace to read/write log ratelimits per device (including > > enable/disable). Create aer/ sysfs directory to sto

Re: [PATCH v6 12/16] PCI/AER: Make all pci_print_aer() log levels depend on error type

2025-05-20 Thread Bjorn Helgaas
On Tue, May 20, 2025 at 02:37:33PM +0300, Ilpo Järvinen wrote: > On Mon, 19 May 2025, Bjorn Helgaas wrote: > > > From: Karolina Stolarek > > > > Some existing logs in pci_print_aer() log with error severity by default. > > Convert them to depend on error typ

Re: [PATCH v6 11/16] PCI/AER: Check log level once and remember it

2025-05-20 Thread Bjorn Helgaas
On Mon, May 19, 2025 at 11:17:28PM +, Weinan Liu wrote: > > diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c > > index 315bf2bfd570..34af0ea45c0d 100644 > > --- a/drivers/pci/pcie/dpc.c > > +++ b/drivers/pci/pcie/dpc.c > > @@ -252,6 +252,7 @@ static int dpc_get_aer_uncorrect_severit

Re: [PATCH v6 08/16] PCI/AER: Simplify pci_print_aer()

2025-05-20 Thread Bjorn Helgaas
On Mon, May 19, 2025 at 05:02:28PM -0700, Sathyanarayanan Kuppuswamy wrote: > On 5/19/25 2:35 PM, Bjorn Helgaas wrote: > > From: Bjorn Helgaas > > > > Simplify pci_print_aer() by initializing the struct aer_err_info "info" > > with a designated initialize

Re: [PATCH v6 07/16] PCI/AER: Initialize aer_err_info before using it

2025-05-20 Thread Bjorn Helgaas
On Tue, May 20, 2025 at 01:39:06PM +0300, Ilpo Järvinen wrote: > On Mon, 19 May 2025, Bjorn Helgaas wrote: > > > From: Bjorn Helgaas > > > > Previously the struct aer_err_info "e_info" was allocated on the stack > > without being initialized, so it

Re: [PATCH v6 03/16] PCI/AER: Consolidate Error Source ID logging in aer_print_port_info()

2025-05-20 Thread Bjorn Helgaas
On Mon, May 19, 2025 at 04:39:19PM -0700, Sathyanarayanan Kuppuswamy wrote: > On 5/19/25 2:35 PM, Bjorn Helgaas wrote: > > From: Bjorn Helgaas > > > > Previously we decoded the AER Error Source ID in two places. Consolidate > > them so both places use aer_print_p

Re: [PATCH v6 02/16] PCI/DPC: Log Error Source ID only when valid

2025-05-20 Thread Bjorn Helgaas
On Tue, May 20, 2025 at 01:28:02PM +0300, Ilpo Järvinen wrote: > On Mon, 19 May 2025, Bjorn Helgaas wrote: > > DPC Error Source ID is only valid when the DPC Trigger Reason indicates > > that DPC was triggered due to reception of an ERR_NONFATAL or ERR_FATAL > > Message (PC

Re: [PATCH v6 01/16] PCI/DPC: Initialize aer_err_info before using it

2025-05-20 Thread Bjorn Helgaas
On Tue, May 20, 2025 at 12:39:18PM +0300, Ilpo Järvinen wrote: > On Mon, 19 May 2025, Bjorn Helgaas wrote: > > > From: Bjorn Helgaas > > > > Previously the struct aer_err_info "info" was allocated on the stack > > without being initialized, so it

Re: [PATCH v6 02/16] PCI/DPC: Log Error Source ID only when valid

2025-05-20 Thread Bjorn Helgaas
On Mon, May 19, 2025 at 04:15:56PM -0700, Sathyanarayanan Kuppuswamy wrote: > On 5/19/25 2:35 PM, Bjorn Helgaas wrote: > > From: Bjorn Helgaas > > > > DPC Error Source ID is only valid when the DPC Trigger Reason indicates > > that DPC was triggered due to rece

Re: [PATCH v6 01/16] PCI/DPC: Initialize aer_err_info before using it

2025-05-20 Thread Bjorn Helgaas
On Mon, May 19, 2025 at 03:41:50PM -0700, Sathyanarayanan Kuppuswamy wrote: > Hi, > > On 5/19/25 2:35 PM, Bjorn Helgaas wrote: > > From: Bjorn Helgaas > > > > Previously the struct aer_err_info "info" was allocated on the stack > > /s/Previously/Curr

Re: [PATCH 0/4] pci: implement "pci=aer_panic"

2025-05-19 Thread Bjorn Helgaas
On Sat, May 17, 2025 at 12:55:14AM +0800, Hans Zhang wrote: > The following series introduces a new kernel command-line option aer_panic > to enhance error handling for PCIe Advanced Error Reporting (AER) in > mission-critical environments. This feature ensures deterministic recover > from fatal PC

[PATCH v6 04/16] PCI/AER: Extract bus/dev/fn in aer_print_port_info() with PCI_BUS_NUM(), etc

2025-05-19 Thread Bjorn Helgaas
From: Bjorn Helgaas Use PCI_BUS_NUM(), PCI_SLOT(), PCI_FUNC() to extract the bus number, device, and function number directly from the Error Source ID. There's no need to shift and mask it explicitly. Signed-off-by: Bjorn Helgaas --- drivers/pci/pcie/aer.c | 7 +++ 1 file chang

[PATCH v6 10/16] PCI/AER: Combine trace_aer_event() with statistics updates

2025-05-19 Thread Bjorn Helgaas
From: Bjorn Helgaas As with the AER statistics, we always want to emit trace events, even if the actual dmesg logging is rate limited. Call trace_aer_event() directly from pci_dev_aer_stats_incr(), where we update the statistics. Signed-off-by: Bjorn Helgaas --- drivers/pci/pcie/aer.c | 12

[PATCH v6 14/16] PCI/AER: Introduce ratelimit for error logs

2025-05-19 Thread Bjorn Helgaas
Signed-off-by: Jon Pan-Doh Signed-off-by: Bjorn Helgaas --- drivers/pci/pci.h | 3 ++- drivers/pci/pcie/aer.c | 49 -- drivers/pci/pcie/dpc.c | 1 + 3 files changed, 46 insertions(+), 7 deletions(-) diff --git a/drivers/pci/pci.h b/drivers/pci

[PATCH v6 02/16] PCI/DPC: Log Error Source ID only when valid

2025-05-19 Thread Bjorn Helgaas
From: Bjorn Helgaas DPC Error Source ID is only valid when the DPC Trigger Reason indicates that DPC was triggered due to reception of an ERR_NONFATAL or ERR_FATAL Message (PCIe r6.0, sec 7.9.14.5). When DPC was triggered by ERR_NONFATAL (PCI_EXP_DPC_STATUS_TRIGGER_RSN_NFE) or ERR_FATAL

[PATCH v6 13/16] PCI/AER: Rename struct aer_stats to aer_report

2025-05-19 Thread Bjorn Helgaas
Signed-off-by: Bjorn Helgaas --- drivers/pci/pcie/aer.c | 50 +- include/linux/pci.h| 2 +- 2 files changed, 26 insertions(+), 26 deletions(-) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 06a7dda20846..da62032bf024 100644 --- a

[PATCH v6 16/16] PCI/AER: Add sysfs attributes for log ratelimits

2025-05-19 Thread Bjorn Helgaas
and sent 6 more AER errors. Observed all 6 errors logged and accounted in AER stats (12 total errors). [1] https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git Signed-off-by: Karolina Stolarek Signed-off-by: Jon Pan-Doh Signed-off-by: Bjorn Helgaas Acked-by: Paul E. McKen

[PATCH v6 15/16] PCI/AER: Add ratelimits to PCI AER Documentation

2025-05-19 Thread Bjorn Helgaas
From: Jon Pan-Doh Add ratelimits section for rationale and defaults. Signed-off-by: Karolina Stolarek Signed-off-by: Jon Pan-Doh Signed-off-by: Bjorn Helgaas Reviewed-by: Kuppuswamy Sathyanarayanan Acked-by: Paul E. McKenney --- Documentation/PCI/pcieaer-howto.rst | 11 +++ 1

[PATCH v6 12/16] PCI/AER: Make all pci_print_aer() log levels depend on error type

2025-05-19 Thread Bjorn Helgaas
: Bjorn Helgaas --- drivers/pci/pcie/aer.c | 16 +++- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 73b03a195b14..06a7dda20846 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -788,15 +788,21

[PATCH v6 08/16] PCI/AER: Simplify pci_print_aer()

2025-05-19 Thread Bjorn Helgaas
From: Bjorn Helgaas Simplify pci_print_aer() by initializing the struct aer_err_info "info" with a designated initializer list (it was previously initialized with memset()) and using pci_name(). Signed-off-by: Bjorn Helgaas --- drivers/pci/pcie/aer.c | 16 1 file

[PATCH v6 11/16] PCI/AER: Check log level once and remember it

2025-05-19 Thread Bjorn Helgaas
aer_err_info instead of passing it as a parameter] Link: https://lore.kernel.org/r/20250321015806.954866-2-pan...@google.com Signed-off-by: Karolina Stolarek Signed-off-by: Bjorn Helgaas --- drivers/pci/pci.h | 1 + drivers/pci/pcie/aer.c | 21 ++--- drivers/pci/pcie/dpc.c

[PATCH v6 09/16] PCI/AER: Update statistics early in logging

2025-05-19 Thread Bjorn Helgaas
From: Bjorn Helgaas There are two AER logging entry points: - aer_print_error() is used by DPC (dpc_process_error()) and native AER handling (aer_process_err_devices()). - pci_print_aer() is used by GHES (aer_recover_work_func()) and CXL (cxl_handle_rdport_errors()) Both use

[PATCH v6 06/16] PCI/AER: Move aer_print_source() earlier in file

2025-05-19 Thread Bjorn Helgaas
From: Bjorn Helgaas Move aer_print_source() earlier in the file so a future change can use it from aer_print_error(), where it's easier to rate limit it. Signed-off-by: Bjorn Helgaas --- drivers/pci/pcie/aer.c | 24 1 file changed, 12 insertions(+), 12 dele

[PATCH v6 07/16] PCI/AER: Initialize aer_err_info before using it

2025-05-19 Thread Bjorn Helgaas
From: Bjorn Helgaas Previously the struct aer_err_info "e_info" was allocated on the stack without being initialized, so it contained junk except for the fields we explicitly set later. Initialize "e_info" at declaration with a designated initializer list, which initializes

[PATCH v6 05/16] PCI/AER: Rename aer_print_port_info() to aer_print_source()

2025-05-19 Thread Bjorn Helgaas
rce()] Link: https://lore.kernel.org/r/20250321015806.954866-5-pan...@google.com Signed-off-by: Jon Pan-Doh Signed-off-by: Bjorn Helgaas --- drivers/pci/pcie/aer.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c in

[PATCH v6 03/16] PCI/AER: Consolidate Error Source ID logging in aer_print_port_info()

2025-05-19 Thread Bjorn Helgaas
From: Bjorn Helgaas Previously we decoded the AER Error Source ID in two places. Consolidate them so both places use aer_print_port_info(). Add a "details" parameter so we can add a note when we didn't find any downstream devices with errors logged in their AER Capability. When

[PATCH v6 01/16] PCI/DPC: Initialize aer_err_info before using it

2025-05-19 Thread Bjorn Helgaas
From: Bjorn Helgaas Previously the struct aer_err_info "info" was allocated on the stack without being initialized, so it contained junk except for the fields we explicitly set later. Initialize "info" at declaration so it starts as all zeroes. Signed-off-by: Bjorn Helga

[PATCH v6 00/16] Rate limit AER logs

2025-05-19 Thread Bjorn Helgaas
From: Bjorn Helgaas This work is mostly due to Jon Pan-Doh and Karolina Stolarek. I rebased this to v6.15-rc1, factored out some of the trace and statistics updates, and added some minor cleanups. Proposal When using native AER, spammy devices can flood kernel logs with AER errors

Re: [PATCH] PCI/AER: Add kernel.aer_print_skip_mask to control aer log

2025-03-04 Thread Bjorn Helgaas
[+cc Jon, Karolina] On Wed, Jan 08, 2025 at 03:57:03PM +0800, Bijie Xu wrote: > Sometimes certain PCIE devices installed on some servers occasionally > produce large number of AER correctable error logs, which is quite > annoying. Add this sysctl parameter kernel.aer_print_skip_mask to > skip prin

Re: [PATCH v2 0/2] PCI: Add support for logging Flit Mode TLPs (PCIe6)

2025-02-21 Thread Bjorn Helgaas
On Fri, Feb 07, 2025 at 06:18:34PM +0200, Ilpo Järvinen wrote: > This series adds support for Flit Mode (PCIe6). > > v2: > - Rebased > > Ilpo Järvinen (2): > PCI: Track Flit Mode Status & print it with link status > PCI: Handle TLP Log in Flit mode > > drivers/pci/hotplug/pciehp_hpc.c | 5

  1   2   3   4   5   6   7   8   9   10   >