ttm: call ttm_bo_cleanup_refs with reservation and lru lock held

Thomas Hellstrom Wed, 28 Nov 2012 16:10:02 +0100

On 11/28/2012 03:46 PM, Maarten Lankhorst wrote:
> Op 28-11-12 15:23, Thomas Hellstrom schreef:
>> On 11/28/2012 02:55 PM, Maarten Lankhorst wrote:
>>> Op 28-11-12 14:21, Thomas Hellstrom schreef:
>>>> On 11/28/2012 01:15 PM, Maarten Lankhorst wrote:
>>>>> Op 28-11-12 12:54, Thomas Hellstrom schreef:
>>>>>> On 11/28/2012 12:25 PM, Maarten Lankhorst wrote:
>>>>>>> By removing the unlocking of lru and retaking it immediately, a race is
>>>>>>> removed where the bo is taken off the swap list or the lru list between
>>>>>>> the unlock and relock. As such the cleanup_refs code can be simplified,
>>>>>>> it will attempt to call ttm_bo_wait non-blockingly, and if it fails
>>>>>>> it will drop the locks and perform a blocking wait, or return an error
>>>>>>> if no_wait_gpu was set.
>>>>>>>
>>>>>>> The need for looping is also eliminated, since swapout and 
>>>>>>> evict_mem_first
>>>>>>> will always follow the destruction path, so no new fence is allowed
>>>>>>> to be attached. As far as I can see this may already have been the case,
>>>>>>> but the unlocking / relocking required a complicated loop to deal with
>>>>>>> re-reservation.
>>>>>>>
>>>>>>> The downside is that ttm_bo_cleanup_memtype_use is no longer called with
>>>>>>> reservation held, so drivers must be aware that move_notify with a null
>>>>>>> parameter doesn't require a reservation.
>>>>>> Why can't we unreserve *after* ttm_bo_cleanup_memtype_use? That's not
>>>>>> immediately clear from this patch.
>>>>> Because we would hold the reservation while waiting and with the object 
>>>>> still
>>>>> on swap and lru lists still, that would defeat the whole purpose of 
>>>>> keeping
>>>>> the object on multiple lists, plus break current code that assumes bo's 
>>>>> on the
>>>>> those lists can always be reserved.
>>>>>
>>>>> the if (ret && !no_wait_gpu) path has to drop the reservation and lru 
>>>>> lock, and
>>>>> isn't guaranteed to be able to retake it. Maybe it could be guaranteed 
>>>>> now, but
>>>>> I'm sure that would not be the case if the reservations were shared across
>>>>> devices.
>>>> The evict path removes the BO from the LRU lists, drops the LRU lock but 
>>>> hangs on to the reservation,
>>>> and in case the wait goes wrong, re-adds the bo to the LRU lists and 
>>>> returns an error.
>>> If you really want to, we could hang on to the !no_wait_gpu path, wait 
>>> shouldn't ever fail there, so I suppose
>>> leaving it off the lru lists and not re-add on any list in case of wait 
>>> fail is fine. It's still on the ddestroy list in that
>>> case, so not adding it back to the other lists is harmless.
>>>
>> Well I'm a bit afraid that theoretically, other callers may have a bo 
>> reserved, while cleanup_refs_and_unlock
>> more or less runs the whole destroy path on that buffer. Sure, we have 
>> control over those other reservers,
>> but it may come back and bite us.
> That's why initially I moved all the destruction to ttm_bo_release_list, to 
> have all destruction in
> only 1 place. But even now it's serialized with the lru lock, while the 
> destruction may not happen
> right away, it still happens before last list ref to the bo is dropped.
>
> But it's your call, just choose the approach you want and I'll resubmit this. 
> :-)
>
>> Also the wait might fail if a signal is hit, so it's definitely possible, 
>> and even likely in the case of the X server process.
>>
>> Anyway, I prefer if we could try to keep the reservation across the 
>> ttm_cleanup_memtype_use  function, and as far
>> as I can tell, the only thing preventing that is the reservation release in 
>> the (!no_wait_gpu) path. So if we alter that to
>> do the same as the evict path I think without looking to deeply into the 
>> consequences that we should be safe.
> I think returning success early without removing off ddestroy list if 
> re-reserving fails
> with lru lock held would be better.
>
> We completed the wait and attempt to reserve the bo, which failed. Without 
> the lru
> lock atomicity, this can't happen since the only places that would do it call 
> this with
> the lru lock held.
>
> With the atomicity removal, the only place that could do this is 
> ttm_bo_delayed_delete
> with remove_all set to true. And even if that happened the destruction code 
> would run
> *anyway* since we completed the waiting part already, it would just not 
> necessarily be
> run from this thread, but that guarantee didn't exist anyway.
>> Then we should be able to skip patch 2 as well.
> If my tryreserve approach sounds sane, second patch should still be 
> skippable. :-)


Sure, Lets go for that approach.

>
> ~Maarten
/Thomas

[PATCH 3/6] drm/ttm: call ttm_bo_cleanup_refs with reservation and lru lock held

Reply via email to