Am 13.02.19 um 19:58 schrieb Andrey Grodzovsky:
When ring hang happens amdgpu_dm_commit_planes during flip is holding
the BO reserved and then stack waiting for fences to signal in
reservation_object_wait_timeout_rcu (which won't signal because there
was a hnag). Then when we try to shutdown display block during reset
recovery from drm_atomic_helper_suspend we also try to reserve the BO
from dm_plane_helper_cleanup_fb ending in deadlock.
Also remove useless WARN_ON

Well it is good that you pointed this out, but there are more problems than just waiting wile the BO is reserved here.


Signed-off-by: Andrey Grodzovsky <andrey.grodzov...@amd.com>
---
  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 19 +++++++++++++------
  1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index acc4ff8..f8dec36 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -4802,14 +4802,21 @@ static void amdgpu_dm_commit_planes(struct 
drm_atomic_state *state,
                         */
                        abo = gem_to_amdgpu_bo(fb->obj[0]);
                        r = amdgpu_bo_reserve(abo, true);

Well why do we reserve the BO in the first place? As the name indicates reservation_object_wait_timeout_rcu() just uses rcu to wait for the BO to be idle, no need to actually reserve it at all.

-                       if (unlikely(r != 0)) {
+                       if (unlikely(r != 0))
                                DRM_ERROR("failed to reserve buffer before 
flip\n");
-                               WARN_ON(1);
-                       }
- /* Wait for all fences on this FB */
-                       
WARN_ON(reservation_object_wait_timeout_rcu(abo->tbo.resv, true, false,
-                                                                                  
 MAX_SCHEDULE_TIMEOUT) < 0);
+                       /*
+                        * Wait for all fences on this FB. Do limited wait to 
avoid
+                        * deadlock during GPU reset when this fence will not 
signal
+                        * but we hold reservation lock for the BO.
+                        */
+                       r = reservation_object_wait_timeout_rcu(abo->tbo.resv,
+                                                               true, false,

Does this waiting happen in a work item or process context? If it's process context we should actually try to wait interruptible here.

Regards,
Christian.

+                                                               
msecs_to_jiffies(5000));
+                       if (unlikely(r == 0))
+                               DRM_ERROR("Waiting for fences timed out.");
+
+
amdgpu_bo_get_tiling_flags(abo, &tiling_flags);

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to