From: Philip Yang <philip.y...@amd.com>

[ Upstream commit 1b9366c601039d60546794c63fbb83ce8e53b978 ]

If waiting for gpu reset done in KFD release_work, thers is WARNING:
possible circular locking dependency detected

  #2  kfd_create_process
        kfd_process_mutex
          flush kfd release work

  #1  kfd release work
        wait for amdgpu reset work

  #0  amdgpu_device_gpu_reset
        kgd2kfd_pre_reset
          kfd_process_mutex

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock((work_completion)(&p->release_work));
                  lock((wq_completion)kfd_process_wq);
                  lock((work_completion)(&p->release_work));
   lock((wq_completion)amdgpu-reset-dev);

To fix this, KFD create process move flush release work outside
kfd_process_mutex.

Signed-off-by: Philip Yang <philip.y...@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehl...@amd.com>
Signed-off-by: Alex Deucher <alexander.deuc...@amd.com>
Signed-off-by: Sasha Levin <sas...@kernel.org>
---
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 662e4d973f13a..b07deeb987475 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -277,6 +277,14 @@ struct kfd_process *kfd_create_process(struct file *filep)
        if (thread->group_leader->mm != thread->mm)
                return ERR_PTR(-EINVAL);
 
+       /* If the process just called exec(3), it is possible that the
+        * cleanup of the kfd_process (following the release of the mm
+        * of the old process image) is still in the cleanup work queue.
+        * Make sure to drain any job before trying to recreate any
+        * resource for this process.
+        */
+       flush_workqueue(kfd_process_wq);
+
        /*
         * take kfd processes mutex before starting of process creation
         * so there won't be a case where two threads of the same process
@@ -289,14 +297,6 @@ struct kfd_process *kfd_create_process(struct file *filep)
        if (process) {
                pr_debug("Process already found\n");
        } else {
-               /* If the process just called exec(3), it is possible that the
-                * cleanup of the kfd_process (following the release of the mm
-                * of the old process image) is still in the cleanup work queue.
-                * Make sure to drain any job before trying to recreate any
-                * resource for this process.
-                */
-               flush_workqueue(kfd_process_wq);
-
                process = create_process(thread);
                if (IS_ERR(process))
                        goto out;
-- 
2.39.5

Reply via email to