Op 14-10-2021 om 15:25 schreef Tvrtko Ursulin: > > On 14/10/2021 13:05, Maarten Lankhorst wrote: >> Op 14-10-2021 om 10:37 schreef Tvrtko Ursulin: >>> >>> On 13/10/2021 11:41, Maarten Lankhorst wrote: >>>> No memory should be allocated when calling i915_gem_object_wait, >>>> because it may be called to idle a BO when evicting memory. >>>> >>>> Fix this by using dma_resv_iter helpers to call >>>> i915_gem_object_wait_fence() on each fence, which cleans up the code a lot. >>>> Also remove dma_resv_prune, it's questionably. >>>> >>>> This will result in the following lockdep splat. >>> >>> <snip> >>> >>>> @@ -37,56 +36,17 @@ i915_gem_object_wait_reservation(struct dma_resv *resv, >>>> unsigned int flags, >>>> long timeout) >>>> { >>>> - struct dma_fence *excl; >>>> - bool prune_fences = false; >>>> - >>>> - if (flags & I915_WAIT_ALL) { >>>> - struct dma_fence **shared; >>>> - unsigned int count, i; >>>> - int ret; >>>> + struct dma_resv_iter cursor; >>>> + struct dma_fence *fence; >>>> - ret = dma_resv_get_fences(resv, &excl, &count, &shared); >>>> - if (ret) >>>> - return ret; >>>> - >>>> - for (i = 0; i < count; i++) { >>>> - timeout = i915_gem_object_wait_fence(shared[i], >>>> - flags, timeout); >>>> - if (timeout < 0) >>>> - break; >>>> + dma_resv_iter_begin(&cursor, resv, flags & I915_WAIT_ALL); >>>> + dma_resv_for_each_fence_unlocked(&cursor, fence) { >>>> - dma_fence_put(shared[i]); >>>> - } >>>> - >>>> - for (; i < count; i++) >>>> - dma_fence_put(shared[i]); >>>> - kfree(shared); >>>> - >>>> - /* >>>> - * If both shared fences and an exclusive fence exist, >>>> - * then by construction the shared fences must be later >>>> - * than the exclusive fence. If we successfully wait for >>>> - * all the shared fences, we know that the exclusive fence >>>> - * must all be signaled. If all the shared fences are >>>> - * signaled, we can prune the array and recover the >>>> - * floating references on the fences/requests. >>>> - */ >>>> - prune_fences = count && timeout >= 0; >>>> - } else { >>>> - excl = dma_resv_get_excl_unlocked(resv); >>>> + timeout = i915_gem_object_wait_fence(fence, flags, timeout); >>>> + if (timeout <= 0) >>>> + break; >>> >>> You have another change in behaviour here, well a bug really. When >>> userspace passes in zero timeout you fail to report activity in other than >>> the first fence. >> >> Hmm, not necessarily, passing 0 to i915_gem_object_wait_fence timeout = 0 is >> a special case and means test only. It will return 1 on success. > > I tried to enumerate the whole chain here. All for timeout == 0. Please > double check I did not make a mistake somewhere since there are many return > code inversions here. > > As building blocks for the whole "game" we have: > > 1. dma_fence_default_wait, it returns for states: > > not signaled -> 0 > signaled -> 1 > > 2. i915_request_wait > > not signaled -> -ETIME > signaled -> 0 > > Then i915_gem_object_wait_fence builds on top of it and has therefore these > possible outputs: > > signaled -> 0 > not signaled: > i915 path -> -ETIME > ext fence -> 0 > > So this looks a like problem already with 0 for signaled and not signaled. > Unless it is by design that the return value does not want to report external > fences? But it is not documented and it still waits on them so odd. > > Then in i915_gem_object_wait_reservation we have a loop: > > for (i = 0; i < count; i++) { > timeout = i915_gem_object_wait_fence(shared[i], > flags, timeout); > if (timeout < 0) > break; > > So short circuit happens only for i915 fences, by virtue of no negative > return codes otherwise. > > If we focus for i915 fences only for a moment. It means it keeps skipping > signaled to check if any is not, therefore returning -ETIME if any is not > signaled. i915_gem_object_wait passes the negative return on. > > With your patch you have: > > + timeout = i915_gem_object_wait_fence(fence, flags, timeout); > + if (timeout <= 0) > + break; > > Which means you break on first signaled fence (i915 or external), therefore > missing to report any possible subsequent unsignaled fences. So gem_wait > ioctl breaks unless I am missing something.
You're cc'd on a mail I sent to König regarding this. "Re: [PATCH 20/28] drm/i915: use new iterator in i915_gem_object_wait_reservation" 5accca25-8ac3-47ca-ee56-8b33c208f...@linux.intel.com timeout = 0 is a special case, fence_wait should return 1 if signaled, or 0 if waiting. Not -ETIME, as i915 does currently. This means our i915_fence_wait() handler is currently very wrong too, needs to be fixed. It returns 0 if timeout = 0 even if signaled. I think it cancels the fail in our gem_object_wait, but more consistency is definitely needed first. I think it's best to keep the current semantics for i915_reuest_wait, but make it a wrapper around a fixed i915_request_wait_timeout(), which would have the correct return semantics. ~Maarten