On 2/7/2025 4:56 AM, Kim, Jonathan wrote:
> [Public]
> 
>> -----Original Message-----
>> From: Lazar, Lijo <lijo.la...@amd.com>
>> Sent: Thursday, February 6, 2025 8:13 AM
>> To: amd-gfx@lists.freedesktop.org; Lazar, Lijo <lijo.la...@amd.com>
>> Cc: Zhang, Hawking <hawking.zh...@amd.com>; Kim, Jonathan
>> <jonathan....@amd.com>
>> Subject: [PATCH 2/4] drm/amdgpu: Add xgmi speed/width related info
>>
>> Add APIs to initialize XGMI speed, width details and get to max
>> bandwidth supported. It is assumed that a device only supports same
>> generation of XGMI links with uniform width.
>>
>> Signed-off-by: Lijo Lazar <lijo.la...@amd.com>
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 41 ++++++++++++++++++++++++
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h | 11 +++++++
>>  2 files changed, 52 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
>> index 03d909c7b14b..edef5763e2b8 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
>> @@ -1679,3 +1679,44 @@ bool amdgpu_xgmi_same_hive(struct amdgpu_device
>> *adev,
>>               adev->gmc.xgmi.hive_id &&
>>               adev->gmc.xgmi.hive_id == bo_adev->gmc.xgmi.hive_id);
>>  }
>> +
>> +static inline uint32_t
>> +__amdgpu_xgmi_get_max_bandwidth(uint8_t link_width,
>> +                             enum amdgpu_xgmi_link_speed max_speed)
>> +{
>> +     /* Returns unidirectional bandwidth in Mbps */
>> +     return max_speed * 1000 * link_width;
> 
> Instead of wrapping this in a new static, would it work to declare a new 
> member adev->gmc.xgmi.max_bandwidth?
> Then in amdgpu_xgmi_init_info, you can define max_bandwidth directly after 
> the switch.
>

Link Width/Link speed is generally the standard way to represent link
info. Presently, tools like SMI show these for PCIe. Followed the same
here and kept a max bandwidth value calculation.

>> +}
>> +
>> +/**
>> + * amdgpu_xgmi_get_max_bandwidth - Get max possible bandwidth of a single
>> XGMI
>> + * link of the device in Mbps
>> + * @adev: Target device.
>> + *
>> + * Gets the max possible bandwidth of a single XGMI link of the device in 
>> Mbps.
>> + * Assumption is all links of the device have the same width and are of the
>> + * same XGMI generation.
>> + */
>> +uint32_t amdgpu_xgmi_get_max_bandwidth(struct amdgpu_device *adev)
>> +{
>> +     if (!adev->gmc.xgmi.supported)
>> +             return 0;
>> +
>> +     return __amdgpu_xgmi_get_max_bandwidth(adev->gmc.xgmi.max_width,
>> adev->gmc.xgmi.max_speed);
> 
> If you do the suggested above, both tests can be simplified into 1 line as
> -> return = xgmi_supported ? max_bandwidth : 0;
> 

Yes, primarily width/speed is chosen to represent link info.

>> +}
>> +
>> +void amdgpu_xgmi_init_info(struct amdgpu_device *adev)
>> +{
>> +     switch (amdgpu_ip_version(adev, XGMI_HWIP, 0)) {
>> +     case IP_VERSION(6, 1, 0):
>> +             adev->gmc.xgmi.max_speed = XGMI2_5_SPEED_GT;
>> +             adev->gmc.xgmi.max_width = 16;
>> +             break;
>> +     case IP_VERSION(6, 4, 0):
>> +             adev->gmc.xgmi.max_speed = XGMI3_SPEED_GT;
>> +             adev->gmc.xgmi.max_width = 16;
>> +             break;
>> +     default:
>> +             break;
>> +     }
>> +}
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
>> index 044c4f6be44a..384c4ddef78a 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
>> @@ -25,6 +25,12 @@
>>  #include <drm/task_barrier.h>
>>  #include "amdgpu_ras.h"
>>
>> +enum amdgpu_xgmi_link_speed {
>> +     XGMI2_SPEED_GT = 16,
>> +     XGMI2_5_SPEED_GT = 25,
>> +     XGMI3_SPEED_GT = 32
> 
> Why not declare and define them as units of Mbps to begin with so you don't 
> have to do the calculation later?

Actually, this is also something which is followed from PCIe to
represent as GT/s. If bandwidth field alone is kept, then this also
won't be required.

I'm not so sure if that is a good way to keep things.

Thanks,
Lijo

> 
> Jon
> 
>> +};
>> +
>>  struct amdgpu_hive_info {
>>       struct kobject kobj;
>>       uint64_t hive_id;
>> @@ -75,6 +81,8 @@ struct amdgpu_xgmi {
>>       struct ras_common_if *ras_if;
>>       bool connected_to_cpu;
>>       struct amdgpu_xgmi_ras *ras;
>> +     enum amdgpu_xgmi_link_speed max_speed;
>> +     uint8_t max_width;
>>  };
>>
>>  struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev);
>> @@ -102,4 +110,7 @@ int amdgpu_xgmi_request_nps_change(struct
>> amdgpu_device *adev,
>>  int amdgpu_get_xgmi_link_status(struct amdgpu_device *adev,
>>                               int global_link_num);
>>
>> +void amdgpu_xgmi_init_info(struct amdgpu_device *adev);
>> +uint32_t amdgpu_xgmi_get_max_bandwidth(struct amdgpu_device *adev);
>> +
>>  #endif
>> --
>> 2.25.1
> 

Reply via email to