Am 11.03.25 um 09:32 schrieb jesse.zh...@amd.com:
> From: "jesse.zh...@amd.com" <jesse.zh...@amd.com>
>
> This patch introduces two new callbacks, `stop_queue` and `start_queue`, to 
> the
> `amdgpu_ring_funcs` structure. These callbacks are designed to handle the 
> stopping
> and starting of SDMA queues during engine reset operations. The changes 
> include:
>
> 1. **Addition of Callbacks**:
>    - Added `stop_queue` and `start_queue` function pointers to 
> `amdgpu_ring_funcs`.
>    - These callbacks allow for modular and flexible management of SDMA queues 
> during
>      reset operations.

Why does that needs to be per ring callbacks?

Flexibility is usually something bad when not needed.

Regards,
Christian.

>
> 2. **Integration with SDMA v4.4.2**:
>    - Implemented `sdma_v4_4_2_stop_queue` and `sdma_v4_4_2_restore_queue` as 
> the
>      respective callback functions for SDMA v4.4.2.
>    - These functions handle the stopping and starting of SDMA queues, 
> ensuring that
>      the scheduler's work queue is properly managed during resets.
>
> 3. **Purpose**:
>    - The new callbacks provide a standardized way to stop and start SDMA 
> queues,
>      which is essential for handling engine resets gracefully.
>    - This change simplifies the reset logic and improves maintainability by
>      centralizing queue management in the `amdgpu_ring_funcs` structure.
>
> 4. **Impact**:
>    - The addition of these callbacks ensures that SDMA queues are properly 
> stopped
>      and started during reset operations, reducing the risk of race 
> conditions and
>      improving the reliability of the reset process.
>    - This change is a prerequisite for future improvements to the SDMA reset 
> logic,
>      including better coordination between the KGD and KFD during resets.
>
> Suggested-by:Jonathan Kim <jonathan....@amd.com>
> Signed-off-by: Jesse Zhang <jesse.zh...@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 ++
>  drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 2 ++
>  2 files changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> index b4fd1e17205e..1c52ff92ea26 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> @@ -237,6 +237,8 @@ struct amdgpu_ring_funcs {
>       void (*patch_ce)(struct amdgpu_ring *ring, unsigned offset);
>       void (*patch_de)(struct amdgpu_ring *ring, unsigned offset);
>       int (*reset)(struct amdgpu_ring *ring, unsigned int vmid);
> +     int (*stop_queue)(struct amdgpu_device *adev, uint32_t instance_id);
> +     int (*start_queue)(struct amdgpu_device *adev, uint32_t instance_id);
>       void (*emit_cleaner_shader)(struct amdgpu_ring *ring);
>       bool (*is_guilty)(struct amdgpu_ring *ring);
>  };
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c 
> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
> index fd34dc138081..c1f7ccff9c4e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
> @@ -2132,6 +2132,8 @@ static const struct amdgpu_ring_funcs 
> sdma_v4_4_2_ring_funcs = {
>       .emit_reg_wait = sdma_v4_4_2_ring_emit_reg_wait,
>       .emit_reg_write_reg_wait = amdgpu_ring_emit_reg_write_reg_wait_helper,
>       .reset = sdma_v4_4_2_reset_queue,
> +     .stop_queue = sdma_v4_4_2_stop_queue,
> +     .start_queue = sdma_v4_4_2_restore_queue,
>       .is_guilty = sdma_v4_4_2_ring_is_guilty,
>  };
>  

Reply via email to