[Public]

Ping

> -----Original Message-----
> From: Kim, Jonathan <[email protected]>
> Sent: Friday, November 7, 2025 7:57 PM
> To: [email protected]
> Cc: Deucher, Alexander <[email protected]>; Kuehling, Felix
> <[email protected]>; Six, Lancelot <[email protected]>; Yang, Philip
> <[email protected]>; Kim, Jonathan <[email protected]>
> Subject: [PATCH] drm/amdkfd: relax checks for over allocation of save area
>
> Over allocation of save area is not fatal, only under allocation is.
> ROCm has various components that independently claim authority over save
> area size.
>
> Unless KFD decides to claim single authority, relax size checks.
>
> v2: remove warning
>
> Signed-off-by: Jonathan Kim <[email protected]>
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_queue.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_queue.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_queue.c
> index a65c67cf56ff..f1e7583650c4 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_queue.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_queue.c
> @@ -297,16 +297,16 @@ int kfd_queue_acquire_buffers(struct
> kfd_process_device *pdd, struct queue_prope
>               goto out_err_unreserve;
>       }
>
> -     if (properties->ctx_save_restore_area_size != topo_dev-
> >node_props.cwsr_size) {
> -             pr_debug("queue cwsr size 0x%x not equal to node cwsr size
> 0x%x\n",
> +     if (properties->ctx_save_restore_area_size < topo_dev-
> >node_props.cwsr_size) {
> +             pr_debug("queue cwsr size 0x%x not sufficient for node cwsr size
> 0x%x\n",
>                       properties->ctx_save_restore_area_size,
>                       topo_dev->node_props.cwsr_size);
>               err = -EINVAL;
>               goto out_err_unreserve;
>       }
>
> -     total_cwsr_size = (topo_dev->node_props.cwsr_size + topo_dev-
> >node_props.debug_memory_size)
> -                       * NUM_XCC(pdd->dev->xcc_mask);
> +     total_cwsr_size = (properties->ctx_save_restore_area_size +
> +                        topo_dev->node_props.debug_memory_size) *
> NUM_XCC(pdd->dev->xcc_mask);
>       total_cwsr_size = ALIGN(total_cwsr_size, PAGE_SIZE);
>
>       err = kfd_queue_buffer_get(vm, (void *)properties-
> >ctx_save_restore_area_address,
> @@ -352,8 +352,8 @@ int kfd_queue_release_buffers(struct kfd_process_device
> *pdd, struct queue_prope
>       topo_dev = kfd_topology_device_by_id(pdd->dev->id);
>       if (!topo_dev)
>               return -EINVAL;
> -     total_cwsr_size = (topo_dev->node_props.cwsr_size + topo_dev-
> >node_props.debug_memory_size)
> -                       * NUM_XCC(pdd->dev->xcc_mask);
> +     total_cwsr_size = (properties->ctx_save_restore_area_size +
> +                        topo_dev->node_props.debug_memory_size) *
> NUM_XCC(pdd->dev->xcc_mask);
>       total_cwsr_size = ALIGN(total_cwsr_size, PAGE_SIZE);
>
>       kfd_queue_buffer_svm_put(pdd, properties-
> >ctx_save_restore_area_address, total_cwsr_size);
> --
> 2.34.1

Reply via email to