On 1/18/21 3:46 PM, Maarten Lankhorst wrote:
Op 18-01-2021 om 14:28 schreef Thomas Hellström:
On 1/18/21 2:22 PM, Thomas Hellström wrote:
On 1/18/21 1:01 PM, Maarten Lankhorst wrote:
Op 18-01-2021 om 12:11 schreef Thomas Hellström:
On 1/5/21 4:35 PM, Maarten Lankhorst wrote:
We get a lockdep splat when the reset mutex is held, because it can be
taken from fence_wait. This conflicts with the mmu notifier we have,
because we recurse between reset mutex and mmap lock -> mmu notifier.
Remove this recursion by calling revoke_mmaps before taking the lock.
Hmm. Is the mmap se taken from gt_revoke()?
If so, isn't the real problem that the mmap_sem is taken in the dma_fence
critical path (where the reset code sits)?
Hey,
The gpu reset code specifically needs to revoke all gtt mappings, and the fault
handler uses intel_gt_reset_trylock(),
so this change should be ok since all those mappings are invalidated correctly
and completed before this point.
The reset mutex isn't actually taken inside fence code, but used for lockdep
validation, so this should be ok.
~Maarten
Hmm, OK but then we still have the following established locking order.
lock(fence_signaling)
lock(i_mmap_lock)
But in the notifier
lock(i_mmap_lock)
fence_signaling(within notifier)
So gt_revoke() is violating dma-fence rules.
BTW it looks to me like the reset mutex notation is actually doing much the
same as the dma-fence annotations; While we can move gt_revoke() out of the
reset mutex, that only gives us false hopes since it moves it out of the
equivalent dma-fence annotation. I figure the reason this was not seen before
the new code is that the reset mutex lockdep isn't taken when waiting for
active. Only when waiting for dma-fence, but IMO the root problem is
pre-existing.
/Thomas
The interesting scenario is
thread 1:
take i_mmap_lock()
enter_mmu_notifier()
wait_fence()
thread 2:
need_to_reset_gpu_for_the_above_fence();
take i_mmap_lock()
Deadlock.
/Thomas
Yeah, I think gpu reset isn't completely following lockdep rules yet. Thread 1
isn't doing anything wrong, gpu reset probably should stop revoking gt
bindings, and allow some garbage during reset. I don't see another way out. :-/
Me neither.
But to silence lockdep until dma_fence annotation is widely added:
Reviewed-by: Thomas Hellström <thomas.hellst...@linux.intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx