[Public]

-----Original Message-----
From: Jesse.Zhang <[email protected]>
Sent: Tuesday, September 16, 2025 11:54 AM
To: [email protected]
Cc: Deucher, Alexander <[email protected]>; Koenig, Christian 
<[email protected]>; Lazar, Lijo <[email protected]>; Zhang, Jesse(Jie) 
<[email protected]>
Subject: [PATCH V2] drm/amdgpu: Add fallback to pipe reset if KCQ ring reset 
fails

From: Lijo Lazar <[email protected]>

Add a fallback mechanism to attempt pipe reset when KCQ reset fails to recover 
the ring. After performing the KCQ reset and queue remapping, test the ring 
functionality. If the ring test fails, initiate a pipe reset as an additional 
recovery step.

Signed-off-by: Lijo Lazar <[email protected]>
Signed-off-by: Jesse Zhang <[email protected]>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
index 8ba66d4dfe86..8804c5844f48 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
@@ -3560,6 +3560,7 @@ static int gfx_v9_4_3_reset_kcq(struct amdgpu_ring *ring,
        struct amdgpu_device *adev = ring->adev;
        struct amdgpu_kiq *kiq = &adev->gfx.kiq[ring->xcc_id];
        struct amdgpu_ring *kiq_ring = &kiq->ring;
+       int reset_mode = AMDGPU_RESET_TYPE_PER_QUEUE;
        unsigned long flags;
        int r;

@@ -3597,6 +3598,7 @@ static int gfx_v9_4_3_reset_kcq(struct amdgpu_ring *ring,
                if (!(adev->gfx.compute_supported_reset & 
AMDGPU_RESET_TYPE_PER_PIPE))
                        return -EOPNOTSUPP;
                r = gfx_v9_4_3_reset_hw_pipe(ring);
+               reset_mode = AMDGPU_RESET_TYPE_PER_PIPE;
                dev_info(adev->dev, "ring: %s pipe reset :%s\n", ring->name,
                                r ? "failed" : "successfully");
                if (r)
@@ -3623,6 +3625,13 @@ static int gfx_v9_4_3_reset_kcq(struct amdgpu_ring *ring,
                return r;
        }

[lijo]
        Missed this - should pipe reset be considered for the case where 
kiq_map_queue fails?

Thanks,
Lijo

+       if (reset_mode == AMDGPU_RESET_TYPE_PER_QUEUE) {
+               r = amdgpu_ring_test_ring(ring);
+               if (r)
+                       goto pipe_reset;
+       }
+
+
        return amdgpu_ring_reset_helper_end(ring, timedout_fence);  }

--
2.49.0

Reply via email to