[AMD Official Use Only - AMD Internal Distribution Only] Thanks Lijo As we discussed offline, we will remove the harvest_config check.
Regards Jesse -----Original Message----- From: Lazar, Lijo <lijo.la...@amd.com> Sent: Wednesday, June 11, 2025 2:15 PM To: Zhang, Jesse(Jie) <jesse.zh...@amd.com>; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander <alexander.deuc...@amd.com>; Koenig, Christian <christian.koe...@amd.com>; Kim, Jonathan <jonathan....@amd.com>; Zhu, Jiadong <jiadong....@amd.com> Subject: Re: [PATCH 1/2] drm/amdgpu: Implement instance ID remapping for harvested SDMA engines On 6/11/2025 11:26 AM, Jesse Zhang wrote: > Adds logic to handle instance ID conversion during SDMA engine reset > when harvest_config is active. This ensures correct physical engine > addressing when some SDMA instances are harvested. > > Changes include: > 1. Added instance ID remapping using GET_INST macro when harvest_config > is non-zero > 2. Conversion happens before engine reset procedure begins 3. > Maintains existing reset flow for non-harvested configurations > > This fixes hardware initialization issues on devices with harvested > SDMA instances where the logical instance IDs don't match physical > hardware mapping. > This shouldn't be required. Without harvest-awareness, driver won't load properly on MI308. Thanks, Lijo > Suggested-by: Jonathan Kim <jonathan....@amd.com> > Signed-off-by: Jesse Zhang <jesse.zh...@amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 1 + > drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c | 3 +++ > drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h | 1 + > 3 files changed, 5 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c > index a0e9bf9b2710..4282f60a0cef 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c > @@ -759,6 +759,7 @@ static void > amdgpu_discovery_read_from_harvest_table(struct amdgpu_device *adev, > ~(1U << harvest_info->list[i].number_instance); > break; > case SDMA0_HWID: > + adev->sdma.harvest_config |= (1U << > +harvest_info->list[i].number_instance); > adev->sdma.sdma_mask &= > ~(1U << harvest_info->list[i].number_instance); > break; > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c > index 6716ac281c49..0bfd2c138d24 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c > @@ -581,6 +581,9 @@ int amdgpu_sdma_reset_engine(struct amdgpu_device *adev, > uint32_t instance_id) > bool gfx_sched_stopped = false, page_sched_stopped = false; > > mutex_lock(&sdma_instance->engine_reset_mutex); > + > + if (adev->sdma.harvest_config) > + instance_id = GET_INST(SDMA0, instance_id); > /* Stop the scheduler's work queue for the GFX and page rings if they > are running. > * This ensures that no new tasks are submitted to the queues while > * the reset is in progress. > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h > index e5f8951bbb6f..fed00854a1a2 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h > @@ -123,6 +123,7 @@ struct amdgpu_sdma { > > int num_instances; > uint32_t sdma_mask; > + uint32_t harvest_config; > int num_inst_per_aid; > uint32_t srbm_soft_reset; > bool has_page_queue;