[Public]

> -----Original Message-----
> From: amd-gfx <[email protected]> On Behalf Of Sunday
> Clement
> Sent: Friday, October 17, 2025 10:33 AM
> To: [email protected]
> Cc: Kasiviswanathan, Harish <[email protected]>; Kuehling,
> Felix <[email protected]>; Clement, Sunday <[email protected]>
> Subject: [PATCH] drm/amdkfd: Fix nullpointer dereference
>
> In the event no device is found with the given proximity domain and
> kfd_topology_device_by_proximity_domain_no_lock() returns a null device
> immediately checking !peer_Dev->gpu will result in a null pointer
> dereference.
>
> Signed-off-by: Sunday Clement <[email protected]>
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> index 4a7180b46b71..6093d96c5892 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> @@ -2357,7 +2357,7 @@ static int kfd_create_vcrat_image_gpu(void
> *pcrat_image,
>       if (kdev->kfd->hive_id) {
>               for (nid = 0; nid < proximity_domain; ++nid) {
>                       peer_dev =
> kfd_topology_device_by_proximity_domain_no_lock(nid);
> -                     if (!peer_dev->gpu)
> +                     if (!peer_dev || !peer_dev->gpu)

Is this a real failure?
If so, we should figure out why our assumption that proximity domain ids as a 
counter for valid devices should work but actually don't.
Either way, probably better to throw an error (something like -ENODEV) rather 
than continue since IO link data has now been assigned garbage and we probably 
don't want to keep building the hive at this point.

Jon

>                               continue;
>                       if (peer_dev->gpu->kfd->hive_id != kdev->kfd->hive_id)
>                               continue;
> --
> 2.43.0

Reply via email to