[AMD Official Use Only - AMD Internal Distribution Only]

Hi Harish,
     Operations within full access mode are hardware-related and require 
exclusive GPU ownership. Software-related operations, particularly those that 
are time-consuming, must not be placed inside full access mode, as this would 
impact other VFs by blocking their access to the GPU.

Emily Deng
Best Wishes

>-----Original Message-----
>From: Kasiviswanathan, Harish <[email protected]>
>Sent: Wednesday, January 7, 2026 4:47 AM
>To: Li, Chong(Alan) <[email protected]>; [email protected]
>Cc: Deng, Emily <[email protected]>; Zhao, Victor <[email protected]>;
>Yang, Philip <[email protected]>; Kuehling, Felix <[email protected]>
>Subject: Re: [PATCH] drm/amdgpu: reduce the full gpu access time in
>amdgpu_device_init.
>
>Hi Alan,
>
>Based on your older patches, I understand that this patch is required because 
>host
>(gim) driver assuemes guest driver is available within 3s. I am not sure how 
>the 3s
>timeout was decided. I feel better approach should be a more robust handshake
>between guest and host driver. You might be able to temporarily get away by
>rearranging the initialization code but that could break easily if some other 
>change in
>future causes a delay.
>
>Best Regards,
>Harish
>
>
>On 2025-11-17 01:38, chong li wrote:
>> [Why]
>> function "devm_memremap_pages" in function "kgd2kfd_init_zone_device",
>> sometimes cost too much time.
>>
>> [How]
>> move the function "kgd2kfd_init_zone_device"
>> after release full gpu access(amdgpu_virt_release_full_gpu).
>>
>> v2:
>> improve the coding style.
>>
>> Signed-off-by: chong li <[email protected]>
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h |  2 +-
>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  8 +++++++-
>> drivers/gpu/drm/amd/amdkfd/kfd_topology.c  | 23 ++++++++++++++++++++++
>> drivers/gpu/drm/amd/amdkfd/kfd_topology.h  |  6 ++++++
>>  4 files changed, 37 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>> index 40c46e6c8898..6d204ba2c267 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>> @@ -37,7 +37,7 @@
>>  #include "amdgpu_sync.h"
>>  #include "amdgpu_vm.h"
>>  #include "amdgpu_xcp.h"
>> -
>> +#include "kfd_topology.h"
>>  extern uint64_t amdgpu_amdkfd_total_mem_size;
>>
>>  enum TLB_FLUSH_TYPE {
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 0b40ddcb8ba1..b4e1f258119c 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -3333,7 +3333,6 @@ static int amdgpu_device_ip_init(struct
>> amdgpu_device *adev)
>>
>>      /* Don't init kfd if whole hive need to be reset during init */
>>      if (adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI) {
>> -            kgd2kfd_init_zone_device(adev);
>>              amdgpu_amdkfd_device_init(adev);
>>      }
>>
>> @@ -4931,6 +4930,13 @@ int amdgpu_device_init(struct amdgpu_device
>> *adev,
>>
>>      if (adev->init_lvl->level == AMDGPU_INIT_LEVEL_MINIMAL_XGMI)
>>              amdgpu_xgmi_reset_on_init(adev);
>> +
>> +    /* Don't init kfd if whole hive need to be reset during init */
>> +    if (adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI) {
>> +            kgd2kfd_init_zone_device(adev);
>> +            kfd_update_svm_support_properties(adev);
>> +    }
>> +
>>      /*
>>       * Place those sysfs registering after `late_init`. As some of those
>>       * operations performed in `late_init` might affect the sysfs diff
>> --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>> index 8644039777b8..8511b00a7463 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>> @@ -2475,3 +2475,26 @@ int kfd_debugfs_rls_by_device(struct seq_file
>> *m, void *data)  }
>>
>>  #endif
>> +
>> +void kfd_update_svm_support_properties(struct amdgpu_device *adev) {
>> +    struct kfd_topology_device *dev;
>> +    int ret;
>> +
>> +    down_write(&topology_lock);
>> +    list_for_each_entry(dev, &topology_device_list, list) {
>> +            if (!dev->gpu || dev->gpu->adev != adev)
>> +                    continue;
>> +
>> +            if (KFD_IS_SVM_API_SUPPORTED(adev)) {
>> +                    dev->node_props.capability |=
>HSA_CAP_SVMAPI_SUPPORTED;
>> +                    ret = kfd_topology_update_sysfs();
>> +                    if (!ret)
>> +                            sys_props.generation_count++;
>> +                    else
>> +                            dev_err(adev->dev, "Failed to update SVM support
>properties. ret=%d\n", ret);
>> +            } else
>> +                    dev->node_props.capability &=
>~HSA_CAP_SVMAPI_SUPPORTED;
>> +    }
>> +    up_write(&topology_lock);
>> +}
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>> index ab7a3bf1bdef..129b447fcf84 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>> @@ -202,4 +202,10 @@ struct kfd_topology_device *kfd_create_topology_device(
>>              struct list_head *device_list);
>>  void kfd_release_topology_device_list(struct list_head *device_list);
>>
>> +#if IS_ENABLED(CONFIG_HSA_AMD)
>> +void kfd_update_svm_support_properties(struct amdgpu_device *adev);
>> +#else static inline void kfd_update_svm_support_properties(struct
>> +amdgpu_device *adev) {} #endif
>> +
>>  #endif /* __KFD_TOPOLOGY_H__ */

Reply via email to