On 2/13/19 2:10 PM, Grodzovsky, Andrey wrote:
> 
> On 2/13/19 2:00 PM, Kazlauskas, Nicholas wrote:
>> On 2/13/19 1:58 PM, Andrey Grodzovsky wrote:
>>> When ring hang happens amdgpu_dm_commit_planes during flip is holding
>>> the BO reserved and then stack waiting for fences to signal in
>>> reservation_object_wait_timeout_rcu (which won't signal because there
>>> was a hnag). Then when we try to shutdown display block during reset
>>> recovery from drm_atomic_helper_suspend we also try to reserve the BO
>>> from dm_plane_helper_cleanup_fb ending in deadlock.
>>> Also remove useless WARN_ON
>>>
>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzov...@amd.com>
>>> ---
>>>     drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 19 
>>> +++++++++++++------
>>>     1 file changed, 13 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
>>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> index acc4ff8..f8dec36 100644
>>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> @@ -4802,14 +4802,21 @@ static void amdgpu_dm_commit_planes(struct 
>>> drm_atomic_state *state,
>>>                              */
>>>                             abo = gem_to_amdgpu_bo(fb->obj[0]);
>>>                             r = amdgpu_bo_reserve(abo, true);
>>> -                   if (unlikely(r != 0)) {
>>> +                   if (unlikely(r != 0))
>>>                                     DRM_ERROR("failed to reserve buffer 
>>> before flip\n");
>>> -                           WARN_ON(1);
>>> -                   }
>>>     
>>> -                   /* Wait for all fences on this FB */
>>> -                   
>>> WARN_ON(reservation_object_wait_timeout_rcu(abo->tbo.resv, true, false,
>>> -                                                                           
>>>     MAX_SCHEDULE_TIMEOUT) < 0);
>>> +                   /*
>>> +                    * Wait for all fences on this FB. Do limited wait to 
>>> avoid
>>> +                    * deadlock during GPU reset when this fence will not 
>>> signal
>>> +                    * but we hold reservation lock for the BO.
>>> +                    */
>>> +                   r = reservation_object_wait_timeout_rcu(abo->tbo.resv,
>>> +                                                           true, false,
>>> +                                                           
>>> msecs_to_jiffies(5000));
>>> +                   if (unlikely(r == 0))
>>> +                           DRM_ERROR("Waiting for fences timed out.");
>>> +
>>> +
>>>     
>>>                             amdgpu_bo_get_tiling_flags(abo, &tiling_flags);
>>>     
>>>
>> Is it safe that we're just continuing like this? It's probably better to
>> just unreserve the buffer and continue to the next plane, no?
>>
>> Nicholas Kazlauskas
> 
> As far as I see it should be safe as you are simply flipping to a buffer
> for which rendering hasn't finished (or stack actually in this case) so
> you might see visual corruption but that the least of your problems if
> after 5s the BO still not finalized for presentation, the system is
> already probably in very bad shape. Also, in case we do want to  do
> error handling we should also take care of  amdgpu_bo_reserve failure
> just before that.
> 
> Andrey
> 
> 

Yeah, I guess this whole blocks needs to be cleaned up in that case. 
This is a good first step at least. Technically 
reservation_object_wait_timeout_rcu will return < 0 when it's been 
interrupted too as an error code but I guess that will just be silently 
ignored here.

If you want you can change the condition to:

if (unlikely(r >= 0))
      DRM_ERROR("Waiting for FB fence failed: id=%d res=%d\n", 
plane->id, r);

But with or without that change this patch is:

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlaus...@amd.com>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to