op 22-01-14 10:40, Thomas Hellstrom schreef: > On 01/22/2014 09:19 AM, Maarten Lankhorst wrote: >> op 21-01-14 18:44, Thomas Hellstrom schreef: >>> On 01/21/2014 04:29 PM, Maarten Lankhorst wrote: >>>> Hey, >>>> >>>> op 21-01-14 16:17, Thomas Hellstrom schreef: >>>>> Maarten, for this and the other patches in this series, >>>>> >>>>> I seem to recall we have this discussion before? >>>>> IIRC I stated that reservation was a too heavy-weight lock to hold to >>>>> determine whether a buffer was idle? It's a pretty nasty thing to >>>>> build in. >>>>> >>>> I've sent this patch after determining that this already didn't end up >>>> being heavyweight. >>>> Most places were already using the fence_lock and reservation, I just >>>> fixed up the few >>>> places that didn't hold a reservation while waiting. Converting the >>>> few places that didn't >>>> ended up being trivial, so I thought I'd submit it. >>> Actually the only *valid* reason for holding a reservation when waiting >>> for idle is >>> 1) You want to block further command submission on the buffer. >>> 2) You want to switch GPU engine and don't have access to gpu semaphores >>> / barriers. >>> >>> Reservation has the nasty side effect that it blocks command submission >>> and pins the buffer (in addition now makes the evict list traversals >>> skip the buffer) which in general is *not* necessary for most wait >>> cases, so we should instead actually convert the wait cases that don't >>> fulfill 1) and 2) above in the other direction if we have performance >>> and latency-reduction in mind. I can't see how a spinlock protecting a >>> fence pointer or fence list is stopping you from using RW fences as long >>> as the spinlock is held while manipulating the fence list? >>> >> You wish. Fine I'll enumerate all cases of ttm_bo_wait (with the >> patchset, though) and enumerate if they can be changed to work without >> reservation or not. >> >> ttm/ttm_bo.c >> ttm_bo_cleanup_refs_or_queue: needs reservation and ttm_bo_wait to >> finish for the direct destroy fastpath, if either fails it needs to be >> queued. Cannot work without reservation. > Doesn't block and no significant reservation contention expected. > >> ttm_bo_cleanup_refs_and_unlock: already drops reservation to wait, >> doesn't need to re-acquire. Simply reordering ttm_bo_wait until after >> re-reserve is enough. > Currently follows the above rules. > >> ttm_bo_evict: already has the reservation, cannot be dropped since >> only trylock is allowed. Dropping reservation would cause badness, >> cannot be converted. > Follows rule 2 above. We're about to move the buffer and if that can't > be pipelined using the GPU (which TTM currently doesn't allow), we need > to wait. Although eviction should be low priority compared to new > command submission, so I can't really see why we couldn't wait before > trying to reserve here? > >> ttm_bo_move_buffer: called from ttm_bo_validate, cannot drop >> reservation for same reason as ttm_bo_evict. It might be part of a >> ticketed reservation so really don't drop lock here. > Part of command submission and as such follows rule 2 above. If we can > pipeline the move with the GPU, no need to wait (but needs to be > implemented, of course). > >> ttm_bo_synccpu_write_grab: the wait could be converted to be done >> afterwards, without fence_lock. But in this case reservation could >> take the role of fence_lock too, >> >> so no separate fence_lock would be needed. > With the exception that reservation is more likely to be contended. True but rule 1. >> ttm_bo_swapout: see ttm_bo_evict. >> >> ttm/ttm_bo_util.c: >> ttm_bo_move_accel_cleanup: calls ttm_bo_wait, cannot drop lock, see >> ttm_bo_move_buffer, can be called from that function. > Rule 2. > >> ttm/ttm_bo_vm.c >> ttm_bo_vm_fault_idle: I guess you COULD drop the reservation here, but >> you already had the reservation, so a similar optimization to >> ttm_bo_synccpu_write_grab could be done without requiring fence_lock. >> If you would write it like that, you would end up with a patch similar >> to drm/nouveau: add reservation to nouveau_gem_ioctl_cpu_prep. I think >> we should do this, an >> >> Ok, so the core does NOT need fence_lock because we can never drop >> reservations except in synccpu_write_grab and maybe >> ttm_bo_vm_fault_idle, but even in those cases reservation is done. So >> that could be used instead of fence_lock. >> >> nouveau_gem_ioctl_cpu_prep: >> Either block on a global spinlock or a local reservation lock. Doesn't >> matter much which, I don't need the need to keep a global lock for >> this function... >> 2 cases can happen in the trylock reservation failure case: buffer is >> not reserved, so it's not in the process of being evicted. buffer is >> reserved, which means it's being used in command submission right now, >> or in one of the functions described above (eg not idle). >> >> nouveau_gem_pushbuf_reloc_apply: >> has to call ttm_bo_wait with reservation, cannot be dropped. >> >> So for core ttm and nouveau the fence_lock is never needed, radeon has >> only 1 function that calls ttm_bo_wait which uses a reservation too. >> It doesn't need the fence_lock either. > And vmwgfx now also has a syccpu IOCTL (see drm-next). > > So assuming that we converted the functions that can be converted to > wait outside of reservation, the same way you have done with Nouveau, > leaving the ones that fall under 1) and 2) above, I would still argue > that a spinlock should be used because taking a reservation may > implicitly mean wait for gpu, and could give bad performance- and > latency charateristics. You shouldn't need to wait for gpu to check for > buffer idle. Except that without reservation you can't tell if the buffer is really idle, or is currently being used as part of some command submission/eviction before the fence pointer is set.
~Maarten