[+Xinhui]

Am 2021-06-15 um 1:50 p.m. schrieb Amber Lin:
> Calling free_mqd inside of destroy_queue_nocpsch_locked can cause a
> circular lock. destroy_queue_nocpsch_locked is called under a DQM lock,
> which is taken in MMU notifiers, potentially in FS reclaim context.
> Taking another lock, which is BO reservation lock from free_mqd, while
> causing an FS reclaim inside the DQM lock creates a problematic circular
> lock dependency. Therefore move free_mqd out of
> destroy_queue_nocpsch_locked and call it after unlocking DQM.
>
> Signed-off-by: Amber Lin <amber....@amd.com>
> Reviewed-by: Felix Kuehling <felix.kuehl...@amd.com>

Let's submit this patch as is. I'm making some comments inline for
things that Xinhui can address in his race condition patch.


> ---
>  .../drm/amd/amdkfd/kfd_device_queue_manager.c  | 18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index 72bea5278add..c069fa259b30 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -486,9 +486,6 @@ static int destroy_queue_nocpsch_locked(struct 
> device_queue_manager *dqm,
>       if (retval == -ETIME)
>               qpd->reset_wavefronts = true;
>  
> -
> -     mqd_mgr->free_mqd(mqd_mgr, q->mqd, q->mqd_mem_obj);
> -
>       list_del(&q->list);
>       if (list_empty(&qpd->queues_list)) {
>               if (qpd->reset_wavefronts) {
> @@ -523,6 +520,8 @@ static int destroy_queue_nocpsch(struct 
> device_queue_manager *dqm,
>       int retval;
>       uint64_t sdma_val = 0;
>       struct kfd_process_device *pdd = qpd_to_pdd(qpd);
> +     struct mqd_manager *mqd_mgr =
> +             dqm->mqd_mgrs[get_mqd_type_from_queue_type(q->properties.type)];
>  
>       /* Get the SDMA queue stats */
>       if ((q->properties.type == KFD_QUEUE_TYPE_SDMA) ||
> @@ -540,6 +539,8 @@ static int destroy_queue_nocpsch(struct 
> device_queue_manager *dqm,
>               pdd->sdma_past_activity_counter += sdma_val;
>       dqm_unlock(dqm);
>  
> +     mqd_mgr->free_mqd(mqd_mgr, q->mqd, q->mqd_mem_obj);
> +
>       return retval;
>  }
>  
> @@ -1629,7 +1630,7 @@ static bool set_cache_memory_policy(struct 
> device_queue_manager *dqm,
>  static int process_termination_nocpsch(struct device_queue_manager *dqm,
>               struct qcm_process_device *qpd)
>  {
> -     struct queue *q, *next;
> +     struct queue *q;
>       struct device_process_node *cur, *next_dpn;
>       int retval = 0;
>       bool found = false;
> @@ -1637,12 +1638,19 @@ static int process_termination_nocpsch(struct 
> device_queue_manager *dqm,
>       dqm_lock(dqm);
>  
>       /* Clear all user mode queues */
> -     list_for_each_entry_safe(q, next, &qpd->queues_list, list) {
> +     while (!list_empty(&qpd->queues_list)) {
> +             struct mqd_manager *mqd_mgr;
>               int ret;
>  
> +             q = list_first_entry(&qpd->queues_list, struct queue, list);
> +             mqd_mgr = dqm->mqd_mgrs[get_mqd_type_from_queue_type(
> +                             q->properties.type)];
>               ret = destroy_queue_nocpsch_locked(dqm, qpd, q);
>               if (ret)
>                       retval = ret;
> +             dqm_unlock(dqm);
> +             mqd_mgr->free_mqd(mqd_mgr, q->mqd, q->mqd_mem_obj);
> +             dqm_lock(dqm);

This is the correct way to clean up the list when dropping the dqm-lock
in the middle. Xinhui, you can use the same method in
process_termination_cpsch.

I believe the swapping of the q->mqd with a temporary variable is not
needed. When free_mqd is called, the queue is no longer on the
qpd->queues_list, so destroy_queue cannot race with it. If we ensure
that queues are always removed from the list before calling free_mqd,
and that list-removal happens under the dqm_lock, then there should be
no risk of a race condition that causes a double-free.

Regards,
  Felix


>       }
>  
>       /* Unregister process */
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to