did
v2: Refine commit message
Signed-off-by: Xiang Liu
---
drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c | 32 ++---
1 file changed, 23 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c
b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c
index d011e4678ca1
In the case of RAS err_event_athub, the VCPU buffers are corrupted and
cannot be restored in amdgpu_vcn_resume(), the buffers are cleared to
0 for good. However, the firmware flags stored in the buffers need to be
reset, or the firmware cannot work properly.
Signed-off-by: Xiang Liu
---
drivers
redundant code like vcn_v4_0 did
v2: Refine commit message
v3: Drop the volatile
v3: Refine commit message
Signed-off-by: Xiang Liu
---
drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c | 30 ++---
1 file changed, 22 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu
From: Tao Zhou
And initialize it, this is a pure software ring to store RAS CPER data.
v2: update the initialization of count_dw of cper ring, it's dword
variable.
Signed-off-by: Tao Zhou
Reviewed-by: Hawking Zhang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c | 39 +++---
d
From: Hawking Zhang
Introduce utility functions designed to assist
in populating CPER records.
v2: call cper_init/fini in device_ip_init/fini.
Signed-off-by: Hawking Zhang
Reviewed-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/Makefile| 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu.h
ring
drm/amdgpu: add mutex lock for cper ring
Xiang Liu (3):
drm/amdgpu: Get timestamp from system time
drm/amdgpu: Commit CPER entry
drm/amdgpu: Generate bad page threshold cper records
drivers/gpu/drm/amd/amdgpu/Makefile| 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu.h| 4
From: Hawking Zhang
AMD is using Common Platform Error Record (CPER) format
to report all gpu hardware errors.
v2: add program attribute
Signed-off-by: Hawking Zhang
Signed-off-by: Xiang Liu
Reviewed-by: Tao Zhou
---
drivers/gpu/drm/amd/include/amd_cper.h | 269 +
1
From: Tao Zhou
Avoid the confliction between read and write of ring buffer.
Signed-off-by: Tao Zhou
Reviewed-by: Hawking Zhang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c | 4
drivers/gpu/drm/amd/amdgpu/amdgpu_cper.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 21 +++
From: Tao Zhou
Old CPER data will be overwritten if ring buffer is full, and read
pointer always points to CPER header.
Signed-off-by: Tao Zhou
Reviewed-by: Hawking Zhang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c | 93
drivers/gpu/drm/amd/amdgpu/amdgpu_cper.h | 2
Commit the CPER entry to the ring buffer.
Signed-off-by: Xiang Liu
Reviewed-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
From: Hawking Zhang
Introduce new functions that are used to generate
cper ue or ce records.
v2: return -ENOMEM instead of false
v2: check return value of fill section function
Signed-off-by: Hawking Zhang
Signed-off-by: Xiang Liu
Reviewed-by: Yang Wang
Reviewed-by: Tao Zhou
---
drivers
From: Hawking Zhang
ACA error types managed by driver a direct 1:1
correspondence with those managed by firmware.
To address this, for each ACA bank, include
both the ACA error type and the ACA SMU type.
This addition is useful for creating CPER records.
Signed-off-by: Hawking Zhang
Reviewed-
From: Hawking Zhang
Encode the error information in CPER format and commit
to the cper ring
Signed-off-by: Hawking Zhang
Reviewed-by: Yang Wang
Reviewed-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c | 32 +
1 file changed, 32 insertions(+)
diff --git a/dri
From: Tao Zhou
We read CPER data from read pointer to write pointer without changing
the pointers.
Signed-off-by: Tao Zhou
Reviewed-by: Hawking Zhang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 47 ++--
1 file changed, 36 insertions(+), 11 deletions(-)
diff --git a/dri
Get system local time and encode it to timestamp for CPER.
Signed-off-by: Xiang Liu
Reviewed-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c | 19 ++-
1 file changed, 18 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
b/drivers/gpu
Generate CPER record when bad page threshold exceed and
commit to CPER ring.
v2: return -ENOMEM instead of false
v2: check return value of fill section function
Signed-off-by: Xiang Liu
Reviewed-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c | 23 +++
drivers
In the case of CPER disabled, generating CPER will cause kernel NULL
pointer dereference without checking.
Signed-off-by: Xiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c | 3 +++
drivers/gpu/drm/amd/pm/amdgpu_dpm.c | 5 +++--
2 files changed, 6 insertions(+), 2 deletions(-)
diff
Setting cper.enabled to be true only after cper ring is successfully
created.
Signed-off-by: Xiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c | 10 +-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
b/drivers/gpu/drm/amd
In the case of poison consumption's inband log, the error type need
to be specified by checking the poison bit of status register.
Signed-off-by: Xiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_aca.h | 5 +
drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 4 ++--
drivers/gpu/drm/amd/a
The fru_id field is disabled cause of mis-matching defination
between CPER spec and driver.
Signed-off-by: Xiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c | 5 +
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
b/drivers/gpu/drm
In the case of poison inband log, the error type need to be specified
by checking the deferred or poison bit of status register.
v2: check both deferred and poison bit
Signed-off-by: Xiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_aca.h | 6 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 4
Change the DMESG reporting of unknown errors to "Boot Controller
Generic Error" to align with the RAS SPEC and provide more clarity
to customers.
Signed-off-by: Xiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 4 ++--
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 2 +-
2 files
There is no need to check adev for sure.
Signed-off-by: Xiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c
index c0da9096a7fa..d11593cd1922
Move code about checking aca enabled to the cper init/fini function
to make code clean.
Signed-off-by: Xiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c | 6 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 ++
2 files changed, 8 insertions(+), 4 deletions(-)
diff --git a
Free CPER entry when it's committed to CPER ring to avoid memory leak.
Signed-off-by: Xiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
Encode socket id to CPER record id to be unique across devices.
Signed-off-by: Xiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c | 6 +-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
The mistake of computation for remain size of CPER ring will cause
unbreakable while cycle when CPER ring overflow.
Signed-off-by: Xiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c | 15 ---
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/amd
Encode socket id to CPER record id to be unique across devices.
v2: add pointer check for adev->smuio.funcs->get_socket_id
v2: set 0 if adev->smuio.funcs->get_socket_id is NULL
Signed-off-by: Xiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c | 18 +-
1 file
Enable ACA by default for psp v13_0_6/v13_0_14.
Signed-off-by: Xiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 7cf8a3036828
The aca handle is introduced by upper caller, it's inappropriate to
poll aca handle to match and validate aca bank, which will cause
unexcepted ras error report.
Signed-off-by: Xiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c | 122 ++--
drivers/gpu/drm/amd/a
DEs among UEs
Signed-off-by: Xiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c | 25 +++--
drivers/gpu/drm/amd/amdgpu/amdgpu_aca.h | 16 +++-
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 8
3 files changed, 38 insertions(+), 11 deletions(-)
diff --git a
In the case of injecting uncorrected error with background workload,
the deferred error among uncorrected errors need to be specified
by checking the deferred and poison bits of status register.
Signed-off-by: Xiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c | 3 +++
drivers/gpu/drm/amd
Double checking UC and PCC bits of status register for GFX UE to
avoid unexcepted GFX UE report.
Signed-off-by: Xiang Liu
---
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 10 +++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
b/drivers
In the case of parsing GFX deferred error from SMU corrected error
channel, the error count should be set to 1 instead of parsing from
MISC0 register, which is 0.
Signed-off-by: Xiang Liu
---
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 7 ---
1 file changed, 4 insertions(+), 3 deletions
We should only increase the deferred errors in UMC block.
Signed-off-by: Xiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c | 4
drivers/gpu/drm/amd/amdgpu/amdgpu_aca.h | 8
drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 2 +-
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 8
35 matches
Mail list logo