op 22-01-14 13:11, Thomas Hellstrom schreef: > On 01/22/2014 11:58 AM, Maarten Lankhorst wrote: >> op 22-01-14 11:27, Thomas Hellstrom schreef: >>> On 01/22/2014 10:55 AM, Maarten Lankhorst wrote: >>>> op 22-01-14 10:40, Thomas Hellstrom schreef: >>>>> On 01/22/2014 09:19 AM, Maarten Lankhorst wrote: >>>>>> op 21-01-14 18:44, Thomas Hellstrom schreef: >>>>>>> On 01/21/2014 04:29 PM, Maarten Lankhorst wrote: >>>>>>>> Hey, >>>>>>>> >>>>>>>> op 21-01-14 16:17, Thomas Hellstrom schreef: >>>>>>>>> Maarten, for this and the other patches in this series, >>>>>>>>> >>>>>>>>> I seem to recall we have this discussion before? >>>>>>>>> IIRC I stated that reservation was a too heavy-weight lock to >>>>>>>>> hold to >>>>>>>>> determine whether a buffer was idle? It's a pretty nasty thing to >>>>>>>>> build in. >>>>>>>>> >>>>>>>> I've sent this patch after determining that this already didn't >>>>>>>> end up >>>>>>>> being heavyweight. >>>>>>>> Most places were already using the fence_lock and reservation, I >>>>>>>> just >>>>>>>> fixed up the few >>>>>>>> places that didn't hold a reservation while waiting. Converting the >>>>>>>> few places that didn't >>>>>>>> ended up being trivial, so I thought I'd submit it. >>>>>>> Actually the only *valid* reason for holding a reservation when >>>>>>> waiting >>>>>>> for idle is >>>>>>> 1) You want to block further command submission on the buffer. >>>>>>> 2) You want to switch GPU engine and don't have access to gpu >>>>>>> semaphores >>>>>>> / barriers. >>>>>>> >>>>>>> Reservation has the nasty side effect that it blocks command >>>>>>> submission >>>>>>> and pins the buffer (in addition now makes the evict list traversals >>>>>>> skip the buffer) which in general is *not* necessary for most wait >>>>>>> cases, so we should instead actually convert the wait cases that >>>>>>> don't >>>>>>> fulfill 1) and 2) above in the other direction if we have >>>>>>> performance >>>>>>> and latency-reduction in mind. I can't see how a spinlock >>>>>>> protecting a >>>>>>> fence pointer or fence list is stopping you from using RW fences as >>>>>>> long >>>>>>> as the spinlock is held while manipulating the fence list? >>>>>>> >>>>>> You wish. Fine I'll enumerate all cases of ttm_bo_wait (with the >>>>>> patchset, though) and enumerate if they can be changed to work >>>>>> without >>>>>> reservation or not. >>>>>> >>>>>> ttm/ttm_bo.c >>>>>> ttm_bo_cleanup_refs_or_queue: needs reservation and ttm_bo_wait to >>>>>> finish for the direct destroy fastpath, if either fails it needs >>>>>> to be >>>>>> queued. Cannot work without reservation. >>>>> Doesn't block and no significant reservation contention expected. >>>>> >>>>>> ttm_bo_cleanup_refs_and_unlock: already drops reservation to wait, >>>>>> doesn't need to re-acquire. Simply reordering ttm_bo_wait until after >>>>>> re-reserve is enough. >>>>> Currently follows the above rules. >>>>> >>>>>> ttm_bo_evict: already has the reservation, cannot be dropped since >>>>>> only trylock is allowed. Dropping reservation would cause badness, >>>>>> cannot be converted. >>>>> Follows rule 2 above. We're about to move the buffer and if that can't >>>>> be pipelined using the GPU (which TTM currently doesn't allow), we >>>>> need >>>>> to wait. Although eviction should be low priority compared to new >>>>> command submission, so I can't really see why we couldn't wait before >>>>> trying to reserve here? >>>>> >>>>>> ttm_bo_move_buffer: called from ttm_bo_validate, cannot drop >>>>>> reservation for same reason as ttm_bo_evict. It might be part of a >>>>>> ticketed reservation so really don't drop lock here. >>>>> Part of command submission and as such follows rule 2 above. If we can >>>>> pipeline the move with the GPU, no need to wait (but needs to be >>>>> implemented, of course). >>>>> >>>>>> ttm_bo_synccpu_write_grab: the wait could be converted to be done >>>>>> afterwards, without fence_lock. But in this case reservation could >>>>>> take the role of fence_lock too, >>>>>> >>>>>> so no separate fence_lock would be needed. >>>>> With the exception that reservation is more likely to be contended. >>>> True but rule 1. >>>>>> ttm_bo_swapout: see ttm_bo_evict. >>>>>> >>>>>> ttm/ttm_bo_util.c: >>>>>> ttm_bo_move_accel_cleanup: calls ttm_bo_wait, cannot drop lock, see >>>>>> ttm_bo_move_buffer, can be called from that function. >>>>> Rule 2. >>>>> >>>>>> ttm/ttm_bo_vm.c >>>>>> ttm_bo_vm_fault_idle: I guess you COULD drop the reservation here, >>>>>> but >>>>>> you already had the reservation, so a similar optimization to >>>>>> ttm_bo_synccpu_write_grab could be done without requiring fence_lock. >>>>>> If you would write it like that, you would end up with a patch >>>>>> similar >>>>>> to drm/nouveau: add reservation to nouveau_gem_ioctl_cpu_prep. I >>>>>> think >>>>>> we should do this, an >>>>>> >>>>>> Ok, so the core does NOT need fence_lock because we can never drop >>>>>> reservations except in synccpu_write_grab and maybe >>>>>> ttm_bo_vm_fault_idle, but even in those cases reservation is done. So >>>>>> that could be used instead of fence_lock. >>>>>> >>>>>> nouveau_gem_ioctl_cpu_prep: >>>>>> Either block on a global spinlock or a local reservation lock. >>>>>> Doesn't >>>>>> matter much which, I don't need the need to keep a global lock for >>>>>> this function... >>>>>> 2 cases can happen in the trylock reservation failure case: buffer is >>>>>> not reserved, so it's not in the process of being evicted. buffer is >>>>>> reserved, which means it's being used in command submission right >>>>>> now, >>>>>> or in one of the functions described above (eg not idle). >>>>>> >>>>>> nouveau_gem_pushbuf_reloc_apply: >>>>>> has to call ttm_bo_wait with reservation, cannot be dropped. >>>>>> >>>>>> So for core ttm and nouveau the fence_lock is never needed, radeon >>>>>> has >>>>>> only 1 function that calls ttm_bo_wait which uses a reservation too. >>>>>> It doesn't need the fence_lock either. >>>>> And vmwgfx now also has a syccpu IOCTL (see drm-next). >>>>> >>>>> So assuming that we converted the functions that can be converted to >>>>> wait outside of reservation, the same way you have done with Nouveau, >>>>> leaving the ones that fall under 1) and 2) above, I would still argue >>>>> that a spinlock should be used because taking a reservation may >>>>> implicitly mean wait for gpu, and could give bad performance- and >>>>> latency charateristics. You shouldn't need to wait for gpu to check >>>>> for >>>>> buffer idle. >>>> Except that without reservation you can't tell if the buffer is really >>>> idle, or is currently being >>>> used as part of some command submission/eviction before the fence >>>> pointer is set. >>>> >>> Yes, but when that matters, you're either in case 1 or case 2 again. >>> Otherwise, when you release the reservation, you still don't know. >>> A typical example of this is the vmwgfx synccpu ioctl, where you can >>> either choose to block command submission (not used currently) >>> or not (user-space inter-process synchronization). The former is a case >>> 1 wait and holds reservation while waiting for idle and then ups >>> cpu_writers. The latter waits without reservation for previously >>> submitted rendering to finish. >> Yeah you could, but what exactly are you waiting on then? If it's some >> specific existing rendering, >> I would argue that you should create an android userspace fence during >> command submission, >> or provide your own api to block on a specfic fence in userspace. >> >> If you don't then I think taking a reservation is not unreasonable. In >> the most common case the buffer is >> idle and not reserved, so it isn't contested. The actual waiting >> itself can be done without reservation held, >> by taking a reference on the fence. > Yeah, here is where we disagree. I'm afraid people will start getting > sloppy with reservations and use them to protect more stuff, and after a > while they start wondering why the GPU command queue drains... > Also the reservation_object's lock is a normal lock, so you should be able to pull info about it when enabling CONFIG_LOCKSTAT.
~Maarten