On 6/11/2025 11:26 AM, Jesse Zhang wrote:
> Adds logic to handle instance ID conversion during SDMA engine reset
> when harvest_config is active. This ensures correct physical engine
> addressing when some SDMA instances are harvested.
> 
> Changes include:
> 1. Added instance ID remapping using GET_INST macro when harvest_config
>    is non-zero
> 2. Conversion happens before engine reset procedure begins
> 3. Maintains existing reset flow for non-harvested configurations
> 
> This fixes hardware initialization issues on devices with harvested
> SDMA instances where the logical instance IDs don't match physical
> hardware mapping.
> 

This shouldn't be required. Without harvest-awareness, driver won't load
properly on MI308.

Thanks,
Lijo

> Suggested-by: Jonathan Kim <jonathan....@amd.com>
> Signed-off-by: Jesse Zhang <jesse.zh...@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c      | 3 +++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h      | 1 +
>  3 files changed, 5 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> index a0e9bf9b2710..4282f60a0cef 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> @@ -759,6 +759,7 @@ static void 
> amdgpu_discovery_read_from_harvest_table(struct amdgpu_device *adev,
>                               ~(1U << harvest_info->list[i].number_instance);
>                       break;
>               case SDMA0_HWID:
> +                     adev->sdma.harvest_config |= (1U << 
> harvest_info->list[i].number_instance);
>                       adev->sdma.sdma_mask &=
>                               ~(1U << harvest_info->list[i].number_instance);
>                       break;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
> index 6716ac281c49..0bfd2c138d24 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
> @@ -581,6 +581,9 @@ int amdgpu_sdma_reset_engine(struct amdgpu_device *adev, 
> uint32_t instance_id)
>       bool gfx_sched_stopped = false, page_sched_stopped = false;
>  
>       mutex_lock(&sdma_instance->engine_reset_mutex);
> +
> +     if (adev->sdma.harvest_config)
> +             instance_id = GET_INST(SDMA0, instance_id);
>       /* Stop the scheduler's work queue for the GFX and page rings if they 
> are running.
>       * This ensures that no new tasks are submitted to the queues while
>       * the reset is in progress.
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h
> index e5f8951bbb6f..fed00854a1a2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h
> @@ -123,6 +123,7 @@ struct amdgpu_sdma {
>  
>       int                     num_instances;
>       uint32_t                sdma_mask;
> +     uint32_t                harvest_config;
>       int                     num_inst_per_aid;
>       uint32_t                    srbm_soft_reset;
>       bool                    has_page_queue;

Reply via email to