https://bugs.freedesktop.org/show_bug.cgi?id=106500
--- Comment #3 from Andrey Grodzovsky <andrey.grodzov...@amd.com> ---
(In reply to Bas Nieuwenhuizen from comment #2)
> Created attachment 139568 [details]
> dmesg after trying 139562
>
> I tried the patch and as expected we do not deadlock at the original places
> since we don't call those anymore. But I get garbage on my display (possibly
> expected due to loss of VRAM), can't switch VT and stopping X hangs X.
>
> Furthermore I eventually still get stuck fence waits in dmesg (attached).
>
> Furthermore, it seems the UVDF ringtest fails.
I think indeed the garbage is due to VRAM lost, maybe we don't create a shadow
BO for the display's BO. GPU reset fails due to UVD failure to resume and SMU
failure so I believe that why any further fence submission hangs. The pipe
never recovers.
Harry, check the patch I attached, no reason to call
drm_atomic_helper_resume/suspend explicitly from amdgpu_device_gpu_recover -
First of all it's already being called from the display code from
amd_ip_funcs.suspend/resume hooks.
Second of all, the place in amdgpu_device_gpu_recover it's being called is
wrong for GPU stalls since it is called BEFORE we cancel and force completion
of all in flight jobs which are stuck on the GPU. So as Bas explained it will
try to wait for fence in amdgpu_pm_compute_clocks but the pipe is hanged so we
end up in deadlock. If we call the mode set AFTER forceful completion (as the
patch makes happen) no deadlock will happen.
UVD/SMU failures require further debugging but I am on a different task at the
moment so maybe some one can pick this up...
Do you remember why that code is there ? I think it's remains of old code.
If you OK with this patch I will send it for review.
Further
--
You are receiving this mail because:
You are the assignee for the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel