[AMD Official Use Only - AMD Internal Distribution Only]

Signed-off-by: Tony Yi <tony...@amd.com>
________________________________
From: Zhang, Hawking <hawking.zh...@amd.com>
Sent: Wednesday, April 2, 2025 12:23 AM
To: Skvortsov, Victor <victor.skvort...@amd.com>; amd-gfx@lists.freedesktop.org 
<amd-gfx@lists.freedesktop.org>
Cc: Luo, Zhigang <zhigang....@amd.com>; Zhou1, Tao <tao.zh...@amd.com>; Zhao, 
Victor <victor.z...@amd.com>; Yi, Tony <tony...@amd.com>
Subject: RE: [PATCH] drm/amdgpu: Fix CPER error handling on VFs

[AMD Official Use Only - AMD Internal Distribution Only]

Reviewed-by: Hawking Zhang <hawking.zh...@amd.com>

Regards,
Hawking
-----Original Message-----
From: Skvortsov, Victor <victor.skvort...@amd.com>
Sent: Wednesday, April 2, 2025 04:44
To: amd-gfx@lists.freedesktop.org
Cc: Luo, Zhigang <zhigang....@amd.com>; Zhang, Hawking <hawking.zh...@amd.com>; 
Zhou1, Tao <tao.zh...@amd.com>; Zhao, Victor <victor.z...@amd.com>; Yi, Tony 
<tony...@amd.com>; Skvortsov, Victor <victor.skvort...@amd.com>
Subject: [PATCH] drm/amdgpu: Fix CPER error handling on VFs

From: Tony Yi <tony...@amd.com>

CPER read will loop infinitely if an error is encountered and the more bit is 
set. Add error checks to break upon failure.

Suggested-by: Tony Yi <tony...@amd.com>
Signed-off-by: Victor Skvortsov <victor.skvort...@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index 0bb8cbe0dcc0..8d2da3a27440 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -1378,14 +1378,16 @@ amdgpu_virt_write_cpers_to_ring(struct amdgpu_device 
*adev,
        used_size = host_telemetry->header.used_size;

        if (used_size > (AMD_SRIOV_RAS_TELEMETRY_SIZE_KB << 10))
-               return 0;
+               return -EINVAL;

        cper_dump = kmemdup(&host_telemetry->body.cper_dump, used_size, 
GFP_KERNEL);
        if (!cper_dump)
                return -ENOMEM;

-       if (checksum != amd_sriov_msg_checksum(cper_dump, used_size, 0, 0))
+       if (checksum != amd_sriov_msg_checksum(cper_dump, used_size, 0, 0)) {
+               ret = -EINVAL;
                goto out;
+       }

        *more = cper_dump->more;

@@ -1434,7 +1436,7 @@ static int amdgpu_virt_req_ras_cper_dump_internal(struct 
amdgpu_device *adev)
                                adev, virt->fw_reserve.ras_telemetry, &more);
                else
                        ret = 0;
-       } while (more);
+       } while (more && !ret);

        return ret;
 }
--
2.34.1

Reply via email to