Am 04.08.2014 um 10:55 schrieb Maarten Lankhorst: > op 04-08-14 10:36, Christian K?nig schreef: >> Hi Maarten, >> >> Sorry for the delay. I've got way to much todo recently. >> >> Am 01.08.2014 um 19:46 schrieb Maarten Lankhorst: >>> On 01-08-14 18:35, Christian K?nig wrote: >>>> Am 31.07.2014 um 17:33 schrieb Maarten Lankhorst: >>>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst at canonical.com> >>>>> --- >>>>> V1 had a nasty bug breaking gpu lockup recovery. The fix is not >>>>> allowing radeon_fence_driver_check_lockup to take exclusive_lock, >>>>> and kill it during lockup recovery instead. >>>> That looks like the delayed work starts running as soon as we submit a >>>> fence, and not when it's needed for waiting. >>>> >>>> Since it's a backup for failing IRQs I would rather put it into >>>> radeon_irq_kms.c and start/stop it when the IRQs are started/stoped. >>> The delayed work is not just for failing irq's, it's also the handler >>> that's used to detect lockups, which is why I trigger after processing >>> fences, and reset the timer after processing. >> The idea was turning the delayed work on and off when we turn the irq on and >> off as well, processing of the delayed work handler can still happen in >> radeon_fence.c >> >>> Specifically what happened was this scenario: >>> >>> - lock up occurs >>> - write lock taken by gpu_reset >>> - delayed work runs, tries to acquire read lock, blocks >>> - gpu_reset tries to cancel delayed work synchronously >>> - has to wait for delayed work to finish -> deadlock >> Why do you want to disable the work item from the lockup handler in the >> first place? >> >> Just take the exclusive lock in the work item, when it concurrently runs >> with the lockup handler it will just block for the lockup handler to >> complete. > With the delayed work radeon_fence_wait no longer handles unreliable > interrupts itself, so it has to run from the lockup handler. > But an alternative solution could be adding a radeon_fence_wait_timeout, > ignore the timeout and check if fence is signaled on timeout. > This would probably be a cleaner solution.
Yeah, agree. Manually specifying a timeout in the fence wait on lockup handling sounds like the best alternative to me. Christian. > > ~Maarten >