Re: Radeon regression in 6.6 kernel

2023-11-18 Thread Dave Airlie
> > On 12.11.23 01:46, Phillip Susi wrote: > > I had been testing some things on a post 6.6-rc5 kernel for a week or > > two and then when I pulled to a post 6.6 release kernel, I found that > > system suspend was broken. It seems that the radeon driver failed to > > suspend, leaving the display d

[PATCH 19/20] x86/mce/apei: Handle variable register array size

2023-11-18 Thread Yazen Ghannam
ACPI Boot Error Record Table (BERT) is being used by the kernel to report errors that occurred in a previous boot. On some modern AMD systems, these very errors within the BERT are reported through the x86 Common Platform Error Record (CPER) format which consists of one or more Processor Context In

[PATCH 17/20] x86/mce: Add wrapper for struct mce to export vendor specific info

2023-11-18 Thread Yazen Ghannam
From: Avadhut Naik Currently, exporting new additional machine check error information involves adding new fields for the same at the end of the struct mce. This additional information can then be consumed through mcelog or tracepoint. However, as new MSRs are being added (and will be added in t

[PATCH 11/20] x86/mce/amd: Simplify DFR handler setup

2023-11-18 Thread Yazen Ghannam
AMD systems with the SUCCOR feature can send an APIC LVT interrupt for deferred errors. The LVT offset is 0x2 by convention, i.e. this is the default as listed in hardware documentation. However, the MCA registers may list a different LVT offset for this interrupt. The kernel should honor the valu

[PATCH 20/20] EDAC/mce_amd: Add support for FRU Text in MCA

2023-11-18 Thread Yazen Ghannam
A new "FRU Text in MCA" feature is defined where the Field Replaceable Unit (FRU) Text for a device is represented by a string in the new MCA_SYND1 and MCA_SYND2 registers. This feature is supported per MCA bank, and it is advertised by the McaFruTextInMca bit (MCA_CONFIG[9]). The FRU Text is popu

[PATCH 16/20] x86/mce/amd: Support SMCA Corrected Error Interrupt

2023-11-18 Thread Yazen Ghannam
AMD systems optionally support MCA Thresholding which provides the ability for hardware to send an interrupt when a set error threshold is reached. This feature counts errors of all severities, but it is commonly used to report correctable errors with an interrupt rather than polling. Scalable MCA

[PATCH 18/20] x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1, 2} registers

2023-11-18 Thread Yazen Ghannam
From: Avadhut Naik AMD's Scalable MCA systems viz. Genoa will include two new registers: MCA_SYND1 and MCA_SYND2. These registers will include supplemental error information in addition to the existing MCA_SYND register. The data within the registers is considered valid if MCA_STATUS[SyndV] is s

[PATCH 14/20] x86/mce/amd: Unify AMD DFR handler with MCA Polling

2023-11-18 Thread Yazen Ghannam
AMD systems optionally support a Deferred error interrupt. The interrupt should be used as another signal to trigger MCA polling. This is similar to how other MCA interrupts are handled. Deferred errors do not require any special handling related to the interrupt, e.g. resetting or rearming the in

[PATCH 10/20] x86/mce/amd: Prep DFR handler before enabling banks

2023-11-18 Thread Yazen Ghannam
Scalable MCA systems use the per-bank MCA_CONFIG register to enable deferred error interrupts. This is done as part of SMCA configuration. Currently, the deferred error interrupt handler is set up after SMCA configuration. Move the deferred error interrupt handler set up before SMCA configuration

[PATCH 04/20] x86/mce/amd, EDAC/mce_amd: Move long names to decoder module

2023-11-18 Thread Yazen Ghannam
The "long names" for SMCA banks are only used by the MCE decoder module. Move them out of the arch code and into the decoder module. Signed-off-by: Yazen Ghannam --- arch/x86/include/asm/mce.h| 1 - arch/x86/kernel/cpu/mce/amd.c | 74 ++- drivers/edac/mce_am

[PATCH 12/20] x86/mce/amd: Clean up enable_deferred_error_interrupt()

2023-11-18 Thread Yazen Ghannam
Switch to bitops to help with clarity. Also, avoid an unnecessary wrmsr() for SMCA systems. Use the updated name for MSR 0xC000_0410 to match the documentation for Family 0x17 and later systems. This MSR is used for setting up both Deferred and MCA Thresholding interrupts on current systems. So r

[PATCH 09/20] x86/mce/amd: Clean up SMCA configuration

2023-11-18 Thread Yazen Ghannam
The current SMCA configuration function does more than just configure SMCA features. It also detects and caches the SMCA bank types. However, the bank type caching flow will be removed during the init path clean up. Define a new function that only configures SMCA features. This will operate on th

[PATCH 15/20] x86/mce: Skip AMD threshold init if no threshold banks found

2023-11-18 Thread Yazen Ghannam
AMD systems optionally support MCA Thresholding. This feature is discovered by checking capability bits in the MCA_MISC* registers. Currently, MCA Thresholding is set up in two passes. The first is during CPU init where available banks are detected, and the "bank_map" variable is updated. The seco

[PATCH 13/20] x86/mce: Unify AMD THR handler with MCA Polling

2023-11-18 Thread Yazen Ghannam
AMD systems optionally support an MCA Thresholding interrupt. The interrupt should be used as another signal to trigger MCA polling. This is similar to how the Intel Corrected Machine Check interrupt (CMCI) is handled. AMD MCA Thresholding is managed using the MCA_MISC registers within an MCA bank

[PATCH 08/20] x86/mce/amd: Look up bank type by IPID

2023-11-18 Thread Yazen Ghannam
Scalable MCA systems use values within the MCA_IPID register to describe a bank's type. Other information is not needed. Currently, the bank types are cached during boot and this information is used during boot and run time. The cached values are per-CPU and per-bank. The boot path needs the cache

[PATCH 06/20] x86/mce/amd: Use helper for GPU UMC bank type checks

2023-11-18 Thread Yazen Ghannam
The type of an Scalable MCA bank should be determined solely using the values in its MCA_IPID register. Define and use a helper function to determine if a bank represents a GPU Unified Memory Controller (UMC), and where the exact bank type is not needed. Use bitops and rename old mask until remov

[PATCH 05/20] x86/mce/amd: Use helper for UMC bank type check

2023-11-18 Thread Yazen Ghannam
Scalable MCA systems use values in the MCA_IPID register to describe the type of hardware for an MCA bank. This information is used when bank-specific actions or decoding are needed. Otherwise, microarchitectural information, like MCA_STATUS bits, should be used. Currently, the bank type informati

[PATCH 07/20] x86/mce/amd: Use fixed bank number for quirks

2023-11-18 Thread Yazen Ghannam
Quirks break micro-architectural definitions. Therefore, quirk conditions don't need to follow micro-architectural requirements. Currently, there is a quirk to filter some errors from the Instruction Fetch (IF) unit on specific models. The IF unit is represented by MCA bank 1 for these models. Rel

[PATCH 03/20] x86/mce: Use mce_setup() helpers for apei_smca_report_x86_error()

2023-11-18 Thread Yazen Ghannam
Current AMD systems may report MCA errors using the ACPI Boot Error Record Table (BERT). The BERT entries for MCA errors will be an x86 Common Platform Error Record (CPER) with an MSR register context that matches the MCAX/SMCA register space. However, the BERT will not necessarily be processed on

[PATCH 02/20] x86/mce: Define mce_setup() helpers for global and per-CPU fields

2023-11-18 Thread Yazen Ghannam
Generally, MCA information for an error is gathered on the CPU that reported the error. In this case, CPU-specific information from the running CPU will be correct. However, this will be incorrect if the MCA information is gathered while running on a CPU that didn't report the error. One example i

[PATCH 01/20] x86/mce/inject: Clear test status value

2023-11-18 Thread Yazen Ghannam
AMD systems generally allow MCA "simulation" where MCA registers can be written with valid data and the full MCA handling flow can be tested by software. However, the Platform on Scalable MCA systems, may prevent software from writing data to the MCA registers. There is no architectural way to det

[PATCH 00/20] MCA Updates

2023-11-18 Thread Yazen Ghannam
Hi all, This set is a collection of logically independent updates that make changes to common code. I've collected them to resolve conflicts and ordering. Furthermore, this is the first half of a larger set. The second half is focused on refactoring the AMD MCA Thresholding feature support. So I d

[PATCH] drm/amdgpu: Force order between a read and write to the same address

2023-11-18 Thread Alex Sierra
Setting register to force ordering to prevent read/write or write/read hazards for un-cached modes. Signed-off-by: Alex Sierra --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c| 22 +-- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c| 8 +++ .../include/asic_reg/gc/gc_11_0_0