[PATCH v2 2/4] PCI/AER: Handle Advisory Non-Fatal properly

2024-01-25 Thread Wang, Qingshun
[8086:0db0] error status/mask=2000/ [13] NonFatalErr Uncorrectable errors that may cause Advisory Non-Fatal: [18] TLP Signed-off-by: "Wang, Qingshun" --- drivers/pci/pcie/aer.c | 61 +- 1 file changed, 60 insertions(+),

[PATCH v2 0/4] PCI/AER: Handle Advisory Non-Fatal properly

2024-01-25 Thread Wang, Qingshun
PATCH 1, as suggested by Bjorn Helgaas. - Add more details of behavior changes in the commit message of PATCH 2, as suggested by Bjorn Helgaas. v1: https://lore.kernel.org/linux-pci/20240111073227.31488-1-qingshun.w...@linux.intel.com/ Wang, Qingshun (4): PCI/AER: Store more informa

[PATCH v2 4/4] RAS: Trace more information in aer_event

2024-01-25 Thread Wang, Qingshun
e" from "Device Control 2" Signed-off-by: "Wang, Qingshun" --- drivers/pci/pcie/aer.c| 17 +++-- include/ras/ras_event.h | 48 --- include/uapi/linux/pci_regs.h | 1 + 3 files changed, 60 insertions(+), 6 deletions(

[PATCH v2 1/4] PCI/AER: Store more information in aer_err_info

2024-01-25 Thread Wang, Qingshun
and the values of the Device Status register are also recorded, which will be used to determine UEs that should be handled by the ANFE handler. Refactor the rest of the code to use cor/uncor_status and cor/uncor_mask fields instead of status and mask fields. Signed-off-by: "Wang, Qin

[PATCH v2 3/4] PCI/AER: Fetch information for FTrace

2024-01-25 Thread Wang, Qingshun
Fetch and store the data of 3 more registers: "Link Status", "Device Control 2", and "Advanced Error Capabilities and Control". This data is needed for external observation to better understand ANFE. Signed-off-by: "Wang, Qingshun" --- drivers/acpi/ape

Re: [PATCH v2 1/4] PCI/AER: Store more information in aer_err_info

2024-01-31 Thread Wang, Qingshun
On Tue, Jan 30, 2024 at 06:26:39PM -0800, Kuppuswamy Sathyanarayanan wrote: > > On 1/24/24 10:27 PM, Wang, Qingshun wrote: > > When Advisory Non-Fatal errors are raised, both correctable and > > Maybe you can start with same info about what Advisory Non-FataL > errors are

Re: [PATCH v2 3/4] PCI/AER: Fetch information for FTrace

2024-02-02 Thread Wang, Qingshun
On Fri, Feb 02, 2024 at 10:01:40AM -0800, Dan Williams wrote: > Wang, Qingshun wrote: > > Fetch and store the data of 3 more registers: "Link Status", "Device > > Control 2", and "Advanced Error Capabilities and Control". This data is > > needed

Re: [PATCH v2 1/4] PCI/AER: Store more information in aer_err_info

2024-02-06 Thread Wang, Qingshun
On Mon, Feb 05, 2024 at 05:12:31PM -0600, Bjorn Helgaas wrote: > On Thu, Jan 25, 2024 at 02:27:59PM +0800, Wang, Qingshun wrote: > > When Advisory Non-Fatal errors are raised, both correctable and > > uncorrectable error statuses will be set. The current kernel code cannot > >

Re: [PATCH v2 2/4] PCI/AER: Handle Advisory Non-Fatal properly

2024-02-06 Thread Wang, Qingshun
On Mon, Feb 05, 2024 at 05:26:16PM -0600, Bjorn Helgaas wrote: > In the subject, "properly" really doesn't convey information. I think > this patch does two things: > > - Prints error bits that might be ANFE > - Clears UNCOR_STATUS bits that were previously not cleared > > Maybe the subject

Re: [PATCH v2 1/4] PCI/AER: Store more information in aer_err_info

2024-02-08 Thread Wang, Qingshun
On Tue, Feb 06, 2024 at 11:23:35AM -0600, Bjorn Helgaas wrote: > On Wed, Feb 07, 2024 at 12:41:41AM +0800, Wang, Qingshun wrote: > > On Mon, Feb 05, 2024 at 05:12:31PM -0600, Bjorn Helgaas wrote: > > > On Thu, Jan 25, 2024 at 02:27:59PM +0800, Wang, Qingshun wrote: > > >