> It'a pain to deal with gpu reset. Yeah, well that's nothing new.
> I've now tried other solutions but that would mean reverting to the old style > during gpu lockup recovery, and only running the delayed work when !lockup. > But this meant that the timeout was useless to add. I think the cleanest is > keeping the v2 patch, because potentially any waiting code can be called > during lockup recovery. The lockup code itself should never call any waiting code and V2 doesn't seem to handle a couple of cases correctly either. How about moving the fence waiting out of the reset code? Christian. Am 04.08.2014 um 15:34 schrieb Maarten Lankhorst: > Hey, > > op 04-08-14 13:57, Christian K?nig schreef: >> Am 04.08.2014 um 10:55 schrieb Maarten Lankhorst: >>> op 04-08-14 10:36, Christian K?nig schreef: >>>> Hi Maarten, >>>> >>>> Sorry for the delay. I've got way to much todo recently. >>>> >>>> Am 01.08.2014 um 19:46 schrieb Maarten Lankhorst: >>>>> On 01-08-14 18:35, Christian K?nig wrote: >>>>>> Am 31.07.2014 um 17:33 schrieb Maarten Lankhorst: >>>>>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst at canonical.com> >>>>>>> --- >>>>>>> V1 had a nasty bug breaking gpu lockup recovery. The fix is not >>>>>>> allowing radeon_fence_driver_check_lockup to take exclusive_lock, >>>>>>> and kill it during lockup recovery instead. >>>>>> That looks like the delayed work starts running as soon as we submit a >>>>>> fence, and not when it's needed for waiting. >>>>>> >>>>>> Since it's a backup for failing IRQs I would rather put it into >>>>>> radeon_irq_kms.c and start/stop it when the IRQs are started/stoped. >>>>> The delayed work is not just for failing irq's, it's also the handler >>>>> that's used to detect lockups, which is why I trigger after processing >>>>> fences, and reset the timer after processing. >>>> The idea was turning the delayed work on and off when we turn the irq on >>>> and off as well, processing of the delayed work handler can still happen >>>> in radeon_fence.c >>>> >>>>> Specifically what happened was this scenario: >>>>> >>>>> - lock up occurs >>>>> - write lock taken by gpu_reset >>>>> - delayed work runs, tries to acquire read lock, blocks >>>>> - gpu_reset tries to cancel delayed work synchronously >>>>> - has to wait for delayed work to finish -> deadlock >>>> Why do you want to disable the work item from the lockup handler in the >>>> first place? >>>> >>>> Just take the exclusive lock in the work item, when it concurrently runs >>>> with the lockup handler it will just block for the lockup handler to >>>> complete. >>> With the delayed work radeon_fence_wait no longer handles unreliable >>> interrupts itself, so it has to run from the lockup handler. >>> But an alternative solution could be adding a radeon_fence_wait_timeout, >>> ignore the timeout and check if fence is signaled on timeout. >>> This would probably be a cleaner solution. >> Yeah, agree. Manually specifying a timeout in the fence wait on lockup >> handling sounds like the best alternative to me. > It'a pain to deal with gpu reset. > > I've now tried other solutions but that would mean reverting to the old style > during gpu lockup recovery, and only running the delayed work when !lockup. > But this meant that the timeout was useless to add. I think the cleanest is > keeping the v2 patch, because potentially any waiting code can be called > during lockup recovery. > > ~Maarten >