Op 18-01-2021 om 16:05 schreef Thomas Hellström:
>
> On 1/18/21 3:46 PM, Maarten Lankhorst wrote:
>> Op 18-01-2021 om 14:28 schreef Thomas Hellström:
>>> On 1/18/21 2:22 PM, Thomas Hellström wrote:
>>>> On 1/18/21 1:01 PM, Maarten Lankhorst wrote:
>>>>> Op 18-01-2021 om 12:11 schreef Thomas Hellström:
>>>>>> On 1/5/21 4:35 PM, Maarten Lankhorst wrote:
>>>>>>> We get a lockdep splat when the reset mutex is held, because it can be
>>>>>>> taken from fence_wait. This conflicts with the mmu notifier we have,
>>>>>>> because we recurse between reset mutex and mmap lock -> mmu notifier.
>>>>>>>
>>>>>>> Remove this recursion by calling revoke_mmaps before taking the lock.
>>>>>> Hmm. Is the mmap se taken from gt_revoke()?
>>>>>>
>>>>>> If so, isn't the real problem that the mmap_sem is taken in the 
>>>>>> dma_fence critical path (where the reset code sits)?
>>>>> Hey,
>>>>>
>>>>> The gpu reset code specifically needs to revoke all gtt mappings, and the 
>>>>> fault handler uses intel_gt_reset_trylock(),
>>>>>
>>>>> so this change should be ok since all those mappings are invalidated 
>>>>> correctly and completed before this point.
>>>>>
>>>>> The reset mutex isn't actually taken inside fence code, but used for 
>>>>> lockdep validation, so this should be ok.
>>>>>
>>>>> ~Maarten
>>>> Hmm, OK but then we still have the following established locking order.
>>>>
>>>> lock(fence_signaling)
>>>> lock(i_mmap_lock)
>>>>
>>>> But in the notifier
>>>>
>>>> lock(i_mmap_lock)
>>>> fence_signaling(within notifier)
>>>>
>>>> So gt_revoke() is violating dma-fence rules.
>>>>
>>>> BTW it looks to me like the reset mutex notation is actually doing much 
>>>> the same as the dma-fence annotations; While we can move gt_revoke() out 
>>>> of the reset mutex, that only gives us false hopes since it moves it out 
>>>> of the equivalent dma-fence annotation. I figure the reason this was not 
>>>> seen before the new code is that the reset mutex lockdep isn't taken when 
>>>> waiting for active. Only when waiting for dma-fence, but IMO the root 
>>>> problem is pre-existing.
>>>>
>>>> /Thomas
>>>>
>>>>
>>> The interesting scenario is
>>>
>>> thread 1:
>>> take i_mmap_lock()
>>> enter_mmu_notifier()
>>> wait_fence()
>>>
>>> thread 2:
>>> need_to_reset_gpu_for_the_above_fence();
>>> take i_mmap_lock()
>>>
>>> Deadlock.
>>>
>>> /Thomas
>>>
>>>
>> Yeah, I think gpu reset isn't completely following lockdep rules yet. Thread 
>> 1 isn't doing anything wrong, gpu reset probably should stop revoking gt 
>> bindings, and allow some garbage during reset. I don't see another way out. 
>> :-/
>
> Me neither.
>
> But to silence lockdep until dma_fence annotation is widely added:
>
> Reviewed-by: Thomas Hellström <thomas.hellst...@linux.intel.com>
>
>
Ideally we'll add fence signaling annotations to gpu reset, to exactly detect 
these kind of things. Hopefully in the future. :)

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to