radeon: use common fence implementation for fences

Maarten Lankhorst Wed, 23 Jul 2014 08:40:59 +0200

op 22-07-14 17:59, Christian K?nig schreef:
> Am 22.07.2014 17:42, schrieb Daniel Vetter:
>> On Tue, Jul 22, 2014 at 5:35 PM, Christian K?nig
>> <christian.koenig at amd.com> wrote:
>>> Drivers exporting fences need to provide a fence->signaled and a fence->wait
>>> function, everything else like fence->enable_signaling or calling
>>> fence_signaled() from the driver is optional.
>>>
>>> Drivers wanting to use exported fences don't call fence->signaled or
>>> fence->wait in atomic or interrupt context, and not with holding any global
>>> locking primitives (like mmap_sem etc...). Holding locking primitives local
>>> to the driver is ok, as long as they don't conflict with anything possible
>>> used by their own fence implementation.
>> Well that's almost what we have right now with the exception that
>> drivers are allowed (actually must for correctness when updating
>> fences) the ww_mutexes for dma-bufs (or other buffer objects).
>
> In this case sorry for so much noise. I really haven't looked in so much 
> detail into anything but Maarten's Radeon patches.
>
> But how does that then work right now? My impression was that it's mandatory 
> for drivers to call fence_signaled()?
It's only mandatory to call fence_signal() if the .enable_signaling callback 
has been called, else you can get away with never calling signaling a fence at 
all before dropping the last refcount to it.
This allows you to keep interrupts disabled when you don't need them.
>> Locking
>> correctness is enforced with some extremely nasty lockdep annotations
>> + additional debugging infrastructure enabled with
>> CONFIG_DEBUG_WW_MUTEX_SLOWPATH. We really need to be able to hold
>> dma-buf ww_mutexes while updating fences or waiting for them. And
>> obviously for ->wait we need non-atomic context, not just
>> non-interrupt.
>
> Sounds mostly reasonable, but for holding the dma-buf ww_mutex, wouldn't be 
> an RCU be more appropriate here? E.g. aren't we just interested that the 
> current assigned fence at some point is signaled?
You can wait with RCU, without holding the ww_mutex, by calling 
reservation_object_wait_timeout_rcu on ttm_bo->resv.
If you don't want to block you could test with 
reservation_object_test_signaled_rcu.
Or if you want a copy of all fences without taking locks, try 
reservation_object_get_fences_rcu. (Might be out of date by the time the 
function returns if you don't hold ww_mutex, if you hold ww_mutex you probably 
don't need to call this function.)


I didn't add non-rcu versions, but using the RCU functions would work with 
ww_mutex held too, probably with some small overhead.
> Something like grab ww_mutexes, grab a reference to the current fence object, 
> release ww_mutex, wait for fence, release reference to the fence object.
This is what I do currently. :-) The reservation_object that's embedded in TTM 
gets shared with the dma-buf, so there will be no special case needed for 
dma-buf at all, all objects can simply be shared and the synchronization is 
handled in the same way.

ttm_bo_reserve and friends automatically end up locking the dma-buf too, and 
lockdep works on it.

>
>> Agreed that any shared locks are out of the way (especially stuff like
>> dev->struct_mutex or other non-strictly driver-private stuff, i915 is
>> really bad here still).
>
> Yeah that's also an point I've wanted to note on Maartens patch. Radeon grabs 
> the read side of it's exclusive semaphore while waiting for fences (because 
> it assumes that the fence it waits for is a Radeon fence).
>
> Assuming that we need to wait in both directions with Prime (e.g. Intel 
> driver needs to wait for Radeon to finish rendering and Radeon needs to wait 
> for Intel to finish displaying), this might become a perfect example of 
> locking inversion.
In the preliminary patches where I can sync radeon with other GPU's I've been 
very careful in all the places that call into fences, to make sure that radeon 
wouldn't try to handle lockups for a different (possibly also radeon) card.

This is also why fence_is_signaled should never block, and why it trylocks the 
exclusive_lock. :-) I think lockdep would complain if I grabbed exclusive_lock 
while blocking in is_signaled.

>> So from the core fence framework I think we already have exactly this,
>> and we only need to adjust the radeon implementation a bit to make it
>> less risky and invasive to the radeon driver logic.
>
> Agree. Well the biggest problem I see is that exclusive semaphore I need to 
> take when anything calls into the driver. For the fence code I need to move 
> that down into the fence->signaled handler, cause that now can be called from 
> outside the driver.
>
> Maarten solved this by telling the driver in the lockup handler (where we 
> grab the write side of the exclusive lock) that all interrupts are already 
> enabled, so that fence->signaled hopefully wouldn't mess with the hardware at 
> all. While this probably works, it just leaves me with a feeling that we are 
> doing something wrong here.
There is unfortunately no global mechanism to say 'this card is locked up, 
please don't call into any of my fences', and I don't associate fences with 
devices, and radeon doesn't keep a global list of fences.
If all of that existed, it would complicate the interface and its callers a 
lot, while I like to keep things simple.
So I did the best I could, and simply prevented the fence calls from fiddling 
with the hardware. Fortunately gpu lockup is not a common operation. :-)

~Maarten

[Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

Reply via email to