Am 11.03.25 um 09:32 schrieb jesse.zh...@amd.com: > From: "jesse.zh...@amd.com" <jesse.zh...@amd.com> > > This patch introduces two new callbacks, `stop_queue` and `start_queue`, to > the > `amdgpu_ring_funcs` structure. These callbacks are designed to handle the > stopping > and starting of SDMA queues during engine reset operations. The changes > include: > > 1. **Addition of Callbacks**: > - Added `stop_queue` and `start_queue` function pointers to > `amdgpu_ring_funcs`. > - These callbacks allow for modular and flexible management of SDMA queues > during > reset operations.
Why does that needs to be per ring callbacks? Flexibility is usually something bad when not needed. Regards, Christian. > > 2. **Integration with SDMA v4.4.2**: > - Implemented `sdma_v4_4_2_stop_queue` and `sdma_v4_4_2_restore_queue` as > the > respective callback functions for SDMA v4.4.2. > - These functions handle the stopping and starting of SDMA queues, > ensuring that > the scheduler's work queue is properly managed during resets. > > 3. **Purpose**: > - The new callbacks provide a standardized way to stop and start SDMA > queues, > which is essential for handling engine resets gracefully. > - This change simplifies the reset logic and improves maintainability by > centralizing queue management in the `amdgpu_ring_funcs` structure. > > 4. **Impact**: > - The addition of these callbacks ensures that SDMA queues are properly > stopped > and started during reset operations, reducing the risk of race > conditions and > improving the reliability of the reset process. > - This change is a prerequisite for future improvements to the SDMA reset > logic, > including better coordination between the KGD and KFD during resets. > > Suggested-by:Jonathan Kim <jonathan....@amd.com> > Signed-off-by: Jesse Zhang <jesse.zh...@amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 ++ > drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 2 ++ > 2 files changed, 4 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > index b4fd1e17205e..1c52ff92ea26 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > @@ -237,6 +237,8 @@ struct amdgpu_ring_funcs { > void (*patch_ce)(struct amdgpu_ring *ring, unsigned offset); > void (*patch_de)(struct amdgpu_ring *ring, unsigned offset); > int (*reset)(struct amdgpu_ring *ring, unsigned int vmid); > + int (*stop_queue)(struct amdgpu_device *adev, uint32_t instance_id); > + int (*start_queue)(struct amdgpu_device *adev, uint32_t instance_id); > void (*emit_cleaner_shader)(struct amdgpu_ring *ring); > bool (*is_guilty)(struct amdgpu_ring *ring); > }; > diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c > b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c > index fd34dc138081..c1f7ccff9c4e 100644 > --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c > +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c > @@ -2132,6 +2132,8 @@ static const struct amdgpu_ring_funcs > sdma_v4_4_2_ring_funcs = { > .emit_reg_wait = sdma_v4_4_2_ring_emit_reg_wait, > .emit_reg_write_reg_wait = amdgpu_ring_emit_reg_write_reg_wait_helper, > .reset = sdma_v4_4_2_reset_queue, > + .stop_queue = sdma_v4_4_2_stop_queue, > + .start_queue = sdma_v4_4_2_restore_queue, > .is_guilty = sdma_v4_4_2_ring_is_guilty, > }; >