[AMD Official Use Only - AMD Internal Distribution Only] Signed-off-by: Tony Yi <tony...@amd.com> ________________________________ From: Zhang, Hawking <hawking.zh...@amd.com> Sent: Wednesday, April 2, 2025 12:23 AM To: Skvortsov, Victor <victor.skvort...@amd.com>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org> Cc: Luo, Zhigang <zhigang....@amd.com>; Zhou1, Tao <tao.zh...@amd.com>; Zhao, Victor <victor.z...@amd.com>; Yi, Tony <tony...@amd.com> Subject: RE: [PATCH] drm/amdgpu: Fix CPER error handling on VFs
[AMD Official Use Only - AMD Internal Distribution Only] Reviewed-by: Hawking Zhang <hawking.zh...@amd.com> Regards, Hawking -----Original Message----- From: Skvortsov, Victor <victor.skvort...@amd.com> Sent: Wednesday, April 2, 2025 04:44 To: amd-gfx@lists.freedesktop.org Cc: Luo, Zhigang <zhigang....@amd.com>; Zhang, Hawking <hawking.zh...@amd.com>; Zhou1, Tao <tao.zh...@amd.com>; Zhao, Victor <victor.z...@amd.com>; Yi, Tony <tony...@amd.com>; Skvortsov, Victor <victor.skvort...@amd.com> Subject: [PATCH] drm/amdgpu: Fix CPER error handling on VFs From: Tony Yi <tony...@amd.com> CPER read will loop infinitely if an error is encountered and the more bit is set. Add error checks to break upon failure. Suggested-by: Tony Yi <tony...@amd.com> Signed-off-by: Victor Skvortsov <victor.skvort...@amd.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c index 0bb8cbe0dcc0..8d2da3a27440 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c @@ -1378,14 +1378,16 @@ amdgpu_virt_write_cpers_to_ring(struct amdgpu_device *adev, used_size = host_telemetry->header.used_size; if (used_size > (AMD_SRIOV_RAS_TELEMETRY_SIZE_KB << 10)) - return 0; + return -EINVAL; cper_dump = kmemdup(&host_telemetry->body.cper_dump, used_size, GFP_KERNEL); if (!cper_dump) return -ENOMEM; - if (checksum != amd_sriov_msg_checksum(cper_dump, used_size, 0, 0)) + if (checksum != amd_sriov_msg_checksum(cper_dump, used_size, 0, 0)) { + ret = -EINVAL; goto out; + } *more = cper_dump->more; @@ -1434,7 +1436,7 @@ static int amdgpu_virt_req_ras_cper_dump_internal(struct amdgpu_device *adev) adev, virt->fw_reserve.ras_telemetry, &more); else ret = 0; - } while (more); + } while (more && !ret); return ret; } -- 2.34.1