nouveau: add reservation to nouveau_gem_ioctl_cpu_prep

Maarten Lankhorst Wed, 22 Jan 2014 10:55:32 +0100

op 22-01-14 10:40, Thomas Hellstrom schreef:
> On 01/22/2014 09:19 AM, Maarten Lankhorst wrote:
>> op 21-01-14 18:44, Thomas Hellstrom schreef:
>>> On 01/21/2014 04:29 PM, Maarten Lankhorst wrote:
>>>> Hey,
>>>>
>>>> op 21-01-14 16:17, Thomas Hellstrom schreef:
>>>>> Maarten, for this and the other patches in this series,
>>>>>
>>>>> I seem to recall we have this discussion before?
>>>>> IIRC I stated that reservation was a too heavy-weight lock to hold to
>>>>> determine whether a buffer was idle? It's a pretty nasty thing to
>>>>> build in.
>>>>>
>>>> I've sent this patch after determining that this already didn't end up
>>>> being heavyweight.
>>>> Most places were already using the fence_lock and reservation, I just
>>>> fixed up the few
>>>> places that didn't hold a reservation while waiting. Converting the
>>>> few places that didn't
>>>> ended up being trivial, so I thought I'd submit it.
>>> Actually the only *valid* reason for holding a reservation when waiting
>>> for idle is
>>> 1) You want to block further command submission on the buffer.
>>> 2) You want to switch GPU engine and don't have access to gpu semaphores
>>> / barriers.
>>>
>>> Reservation has the nasty side effect that it blocks command submission
>>> and pins the buffer (in addition now makes the evict list traversals
>>> skip the buffer) which in general is *not* necessary for most wait
>>> cases, so we should instead actually convert the wait cases that don't
>>> fulfill 1) and 2) above in the other direction if we have performance
>>> and latency-reduction in mind. I can't see how a spinlock protecting a
>>> fence pointer or fence list is stopping you from using RW fences as long
>>> as the spinlock is held while manipulating the fence list?
>>>
>> You wish. Fine I'll enumerate all cases of ttm_bo_wait (with the
>> patchset, though) and enumerate if they can be changed to work without
>> reservation or not.
>>
>> ttm/ttm_bo.c
>> ttm_bo_cleanup_refs_or_queue: needs reservation and ttm_bo_wait to
>> finish for the direct destroy fastpath, if either fails it needs to be
>> queued. Cannot work without reservation.
> Doesn't block and no significant reservation contention expected.
>
>> ttm_bo_cleanup_refs_and_unlock: already drops reservation to wait,
>> doesn't need to re-acquire. Simply reordering ttm_bo_wait until after
>> re-reserve is enough.
> Currently follows the above rules.
>
>> ttm_bo_evict: already has the reservation, cannot be dropped since
>> only trylock is allowed. Dropping reservation would cause badness,
>> cannot be converted.
> Follows rule 2 above. We're about to move the buffer and if that can't
> be pipelined using the GPU (which TTM currently doesn't allow), we need
> to wait. Although eviction should be low priority compared to new
> command submission, so I can't really see why we couldn't wait before
> trying to reserve here?
>
>> ttm_bo_move_buffer: called from ttm_bo_validate, cannot drop
>> reservation for same reason as ttm_bo_evict. It might be part of a
>> ticketed reservation so really don't drop lock here.
> Part of command submission and as such follows rule 2 above. If we can
> pipeline the move with the GPU, no need to wait (but needs to be
> implemented, of course).
>
>> ttm_bo_synccpu_write_grab: the wait could be converted to be done
>> afterwards, without  fence_lock. But in this case reservation could
>> take the role of fence_lock too,
>>
>> so no separate fence_lock would be needed.
> With the exception that reservation is more likely to be contended.
True but rule 1.
>> ttm_bo_swapout: see ttm_bo_evict.
>>
>> ttm/ttm_bo_util.c:
>> ttm_bo_move_accel_cleanup: calls ttm_bo_wait, cannot drop lock, see
>> ttm_bo_move_buffer, can be called from that function.
> Rule 2.
>
>> ttm/ttm_bo_vm.c
>> ttm_bo_vm_fault_idle: I guess you COULD drop the reservation here, but
>> you already had the reservation, so a similar optimization to
>> ttm_bo_synccpu_write_grab could be done without requiring fence_lock.
>> If you would write it like that, you would end up with a patch similar
>> to drm/nouveau: add reservation to nouveau_gem_ioctl_cpu_prep. I think
>> we should do this, an
>>
>> Ok, so the core does NOT need fence_lock because we can never drop
>> reservations except in synccpu_write_grab and maybe
>> ttm_bo_vm_fault_idle, but even in those cases reservation is done. So
>> that could be used instead of fence_lock.
>>
>> nouveau_gem_ioctl_cpu_prep:
>> Either block on a global spinlock or a local reservation lock. Doesn't
>> matter much which, I don't need the need to keep a global lock for
>> this function...
>> 2 cases can happen in the trylock reservation failure case: buffer is
>> not reserved, so it's not in the process of being evicted. buffer is
>> reserved, which means it's being used in command submission right now,
>> or in one of the functions described above (eg not idle).
>>
>> nouveau_gem_pushbuf_reloc_apply:
>> has to call ttm_bo_wait with reservation, cannot be dropped.
>>
>> So for core ttm and nouveau the fence_lock is never needed, radeon has
>> only 1 function that calls ttm_bo_wait which uses a reservation too.
>> It doesn't need the fence_lock either.
> And vmwgfx now also has a syccpu IOCTL (see drm-next).
>
> So assuming that we converted the functions that can be converted to
> wait outside of reservation, the same way you have done with Nouveau,
> leaving the ones that fall under 1) and 2) above, I would still argue
> that a spinlock should be used because taking a reservation may
> implicitly mean wait for gpu, and could give bad performance- and
> latency charateristics. You shouldn't need to wait for gpu to check for
> buffer idle.
Except that without reservation you can't tell if the buffer is really idle, or 
is currently being
used as part of some command submission/eviction before the fence pointer is 
set.


~Maarten

[PATCH 2/5] drm/nouveau: add reservation to nouveau_gem_ioctl_cpu_prep

Reply via email to