Chris Wilson writes:
> Track the position of the HWSP for each timeline.
>
> References: https://gitlab.freedesktop.org/drm/intel/-/issues/2169
> Signed-off-by: Chris Wilson
> Cc: Mika Kuoppala
> ---
> drivers/gpu/drm/i915/gt/intel_timeline.c| 7 +++
> drivers/gpu/drm/i915/gt/selftest
Am 14.07.20 um 22:06 schrieb Chris Wilson:
From: Bas Nieuwenhuizen
Calltree:
timeline_fence_release
drm_sched_entity_wakeup
dma_fence_signal_locked
sync_timeline_signal
sw_sync_ioctl
Releasing the reference to the fence in the fence signal callback
seems reasonable to me, so thi
Quoting YueHaibing (2020-07-15 04:21:04)
> It is not used since commit 058179e72e09 ("drm/i915/gt: Replace
> hangcheck by heartbeats")
>
> Signed-off-by: YueHaibing
Indeed, it is no more.
Reviewed-by: Chris Wilson
-Chris
___
Intel-gfx mailing list
Int
Still Reviewed-by: Bas Nieuwenhuizen
On Tue, Jul 14, 2020 at 11:24 PM Chris Wilson wrote:
>
> Since we decouple the sync_pt from the timeline tree upon release, in
> order to allow releasing the sync_pt from a signal callback we need to
> separate the sync_pt signaling lock from the timeline tre
dma_fence_release() objects to a fence being freed before it is
signaled, so instead of playing fancy tricks to avoid handling dying
requests, let's keep the syncpt alive until signaled. This neatly
removes the issue with having to decouple the syncpt from the timeline
upon fence release.
-Chris
If a signal callback releases the sw_sync fence, that will trigger a
deadlock as the timeline_fence_release recurses onto the fence->lock
(used both for signaling and the the timeline tree).
If we always hold a reference for an unsignaled fence held by the
timeline, we no longer need to detach the
While sw_sync is purely a debug facility for userspace to create fences
and timelines it can control, nevertheless it has some tricky locking
semantics of its own. In particular, Bas Nieuwenhuizen reported that we
had reintroduced a deadlock if a signal callback attempted to destroy
the fence. So l
== Series Details ==
Series: series starting with [1/2] dma-buf/sw_sync: Avoid recursive lock during
fence signal
URL : https://patchwork.freedesktop.org/series/79510/
State : warning
== Summary ==
$ dim checkpatch origin/drm-tip
f09f86114c26 dma-buf/sw_sync: Avoid recursive lock during fence
Hi Chris,
My concern with going in this direction was that we potentially allow
an application to allocate a lot of kernel memory but not a lot of fds
by creating lots of fences and then closing the fds but never
signaling them. Is that not an issue?
- Bas
On Wed, Jul 15, 2020 at 12:04 PM Chris
Hi,
On Wed, 15 Jul 2020 at 11:23, Bas Nieuwenhuizen
wrote:
> My concern with going in this direction was that we potentially allow
> an application to allocate a lot of kernel memory but not a lot of fds
> by creating lots of fences and then closing the fds but never
> signaling them. Is that no
== Series Details ==
Series: series starting with [1/2] dma-buf/sw_sync: Avoid recursive lock during
fence signal
URL : https://patchwork.freedesktop.org/series/79510/
State : success
== Summary ==
CI Bug Log - changes from CI_DRM_8748 -> Patchwork_18173
==
Quoting Bas Nieuwenhuizen (2020-07-15 11:23:35)
> Hi Chris,
>
> My concern with going in this direction was that we potentially allow
> an application to allocate a lot of kernel memory but not a lot of fds
> by creating lots of fences and then closing the fds but never
> signaling them. Is that n
Rearrange the code to pull the operations beore the fence->lock critical
section, and remove a small amount of redundancy:
Function old new delta
dma_fence_add_callback 156 145 -11
Signed-off-by: Chris Wilson
---
drivers/dm
When waiting with a callback on the stack, we must remove the callback
upon wait completion. Since this will be notified by the fence signal
callback, the removal often contends with the fence->lock being held by
the signaler. We can look at the list entry to see if the callback was
already signale
Make sure vma_lock is not used as inner lock when kernel context is used,
and add ww handling where appropriate.
Ensure that execbuf selftests keep passing by using ww handling.
Changes since v2:
- Fix i915_gem_context finally.
Signed-off-by: Maarten Lankhorst
---
.../i915/gem/selftests/i915_g
Currently, we use i915_request_completed() directly in
i915_request_wait() and follow up with a manual invocation of
dma_fence_signal(). This appears to cause a large number of contentions
on i915_request.lock as when the process is woken up after the fence is
signaled by an interrupt, we will then
On Wed, Jul 15, 2020 at 12:34 PM Chris Wilson wrote:
>
> Quoting Bas Nieuwenhuizen (2020-07-15 11:23:35)
> > Hi Chris,
> >
> > My concern with going in this direction was that we potentially allow
> > an application to allocate a lot of kernel memory but not a lot of fds
> > by creating lots of fe
Ping?
On 08/07/2020 16:17, Lionel Landwerlin wrote:
Hi all,
This is resuming the work on trying to get timeline semaphore support
for i915 upstream, now that some selftests have been added to
dma-fence-chain.
There are a few fix from the last iteration and a rebase following the
changes in the
== Series Details ==
Series: series starting with [1/2] dma-buf/dma-fence: Trim
dma_fence_add_callback()
URL : https://patchwork.freedesktop.org/series/79513/
State : success
== Summary ==
CI Bug Log - changes from CI_DRM_8748 -> Patchwork_18174
===
== Series Details ==
Series: series starting with [01/23] Revert "drm/i915/gem: Async GPU
relocations only" (rev3)
URL : https://patchwork.freedesktop.org/series/79470/
State : warning
== Summary ==
$ dim checkpatch origin/drm-tip
814752a982cc Revert "drm/i915/gem: Async GPU relocations only"
== Series Details ==
Series: series starting with [01/23] Revert "drm/i915/gem: Async GPU
relocations only" (rev3)
URL : https://patchwork.freedesktop.org/series/79470/
State : warning
== Summary ==
$ dim sparse --fast origin/drm-tip
Sparse version: v0.6.0
Fast mode used, each commit won't be
Reviewed-by: Bas Nieuwenhuizen
On Wed, Jul 15, 2020 at 12:04 PM Chris Wilson wrote:
>
> If a signal callback releases the sw_sync fence, that will trigger a
> deadlock as the timeline_fence_release recurses onto the fence->lock
> (used both for signaling and the the timeline tree).
>
> If we alw
Am 14.07.20 um 16:31 schrieb Daniel Vetter:
On Tue, Jul 14, 2020 at 01:40:11PM +0200, Christian König wrote:
Am 14.07.20 um 12:49 schrieb Daniel Vetter:
On Tue, Jul 07, 2020 at 10:12:23PM +0200, Daniel Vetter wrote:
My dma-fence lockdep annotations caught an inversion because we
allocate memor
Em Tue, Jul 14, 2020 at 12:59:34PM +0200, Peter Zijlstra escreveu:
> On Mon, Jul 13, 2020 at 03:51:52PM -0300, Arnaldo Carvalho de Melo wrote:
>
> > > > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > > > index 856d98c36f56..a2397f724c10 100644
> > > > --- a/kernel/events/core.c
> > >
== Series Details ==
Series: series starting with [01/23] Revert "drm/i915/gem: Async GPU
relocations only" (rev3)
URL : https://patchwork.freedesktop.org/series/79470/
State : success
== Summary ==
CI Bug Log - changes from CI_DRM_8748 -> Patchwork_18175
=
Hi,
On Wed, 15 Jul 2020 at 12:05, Bas Nieuwenhuizen
wrote:
> On Wed, Jul 15, 2020 at 12:34 PM Chris Wilson
> wrote:
> > Maybe now is the time to ask: are you using sw_sync outside of
> > validation?
>
> Yes, this is used as part of the Android stack on Chrome OS (need to
> see if ChromeOS spec
Since preempt-to-busy, we may unsubmit a request while it is still on
the HW and completes asynchronously. That means it may be retired and in
the process destroy the virtual engine (as the user has closed their
context), but that engine may still be holding onto the unsubmitted
compelted request.
To support legacy ring buffer scheduling, we want a virtual ringbuffer
for each client. These rings are purely for holding the requests as they
are being constructed on the CPU and never accessed by the GPU, so they
should not be bound into the GGTT, and we can use plain old WB mapped
pages.
As th
As a prelude to the next step where we want to perform all the object
allocations together under the same lock, we first must delay the
i915_vma_pin() as that implicitly does the allocations for us, one by
one. As it only does the allocations one by one, it is not allowed to
wait/evict, whereas pul
Inside schedule_out, we do extra work upon idling the context, such as
updating the runtime, kicking off retires, kicking virtual engines.
However, if we are in a series of processing single requests per
contexts, we may find ourselves scheduling out the context, only to
immediately schedule it bac
Lift the list iteration defines for traversing the signaler/waiter lists
into i915_scheduler.h for reuse.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/gt/intel_lrc.c | 10 --
drivers/gpu/drm/i915/i915_scheduler_types.h | 10 ++
2 files changed, 10 insertions(+), 1
Allocate a few dma fence context id that we can use to associate async work
[for the CPU] launched on behalf of this context. For extra fun, we allow
a configurable concurrency width.
A current example would be that we spawn an unbound worker for every
userptr get_pages. In the future, we wish to
Looking to the future, we want to set the scheduling attributes
explicitly and so replace the generic engine->schedule() with the more
direct i915_request_set_priority()
What it loses in removing the 'schedule' name from the function, it
gains in having an explicit entry point with a stated goal.
Rename the current list of unbound objects so that we can track of all
objects that we need to bind, as well as the list of currently unbound
[unprocessed] objects.
Signed-off-by: Chris Wilson
Reviewed-by: Tvrtko Ursulin
---
drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 14 +++---
1
Having recognised that we do not change the sibling until we schedule
out, we can then defer the decision to resubmit the virtual engine from
the unwind of the active queue to scheduling out of the virtual context.
By keeping the unwind order intact on the local engine, we can preserve
data depend
Since we are not using any internal priority levels, and in the next few
patches will introduce a new index for which the optimisation is not so
lear cut, discard the small table within the priolist.
Signed-off-by: Chris Wilson
---
.../gpu/drm/i915/gt/intel_engine_heartbeat.c | 2 +-
drivers/g
As we know when we expect the heartbeat to be checked for completion,
pass this information along as its deadline. We still do not complain if
the deadline is missed, at least until we have tried a few times, but it
will allow for quicker hang detection on systems where deadlines are
adhered to.
S
Lift the busy-stats context-in/out implementation out of intel_lrc, so
that we can reuse it for other scheduler implementations.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/gt/intel_engine_stats.h | 49
drivers/gpu/drm/i915/gt/intel_lrc.c | 34 +
Rather than having special case code for opportunistically calling
process_csb() and performing a direct submit while holding the engine
spinlock for submitting the request, simply call the tasklet directly.
This allows us to retain the direct submission path, including the CS
draining to allow fas
Rather than going back and forth between the rb_node entry and the
virtual_engine type, store the ve local and reuse it. As the
container_of conversion from rb_node to virtual_engine requires a
variable offset, performing that conversion just once shaves off a bit
of code.
v2: Keep a single virtua
The Global GTT mmapings do not require any backing storage for the page
directories and so do not need extensive support for preallocations, or
for handling multiple bindings en masse. The Global GTT bindings also
need to take into account an eviction strategy for pinned vma, that we
want to explic
A key prolem with legacy ring buffer submission is that it is an inheret
FIFO queue across all clients; if one blocks, they all block. A
scheduler allows us to avoid that limitation, and ensures that all
clients can submit in parallel, removing the resource contention of the
global ringbuffer.
Hav
For a modeset/pageflip, there is a very precise deadline by which the
frame must be completed in order to hit the vblank and be shown. While
we don't pass along that exact information, we can at least inform the
scheduler that this request-chain needs to be completed asap.
Signed-off-by: Chris Wil
Pull the repeated check for the last active request being completed to a
single spot, when deciding whether or not execlist preemption is
required.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/gt/intel_lrc.c | 15 ---
1 file changed, 4 insertions(+), 11 deletions(-)
diff --g
Switch over from FIFO global submission to the priority-sorted
topographical scheduler. At the cost of more busy work on the CPU to
keep the GPU supplied with the next packet of requests, this allows us
to reorder requests around submission stalls.
This also enables the timer based RPS, with the e
Sometimes we have to be very careful not to allocate underneath a mutex
(or spinlock) and yet still want to track activity. Enter
i915_active_acquire_for_context(). This raises the activity counter on
i915_active prior to use and ensures that the fence-tree contains a slot
for the context.
Signed-
As snb is the only one to require an alternative engine for performing
relocations, we know that we can reuse a common timeline between
engines.
Signed-off-by: Chris Wilson
---
.../gpu/drm/i915/gem/i915_gem_execbuffer.c| 22 +--
1 file changed, 5 insertions(+), 17 deletions(-
As we do not have any internal priority levels, the priority can be set
directed from the user values.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/display/intel_display.c | 4 +-
drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 +--
.../i915/gem/selftests/i915_gem_object_blt.c | 4
Rather than synchronously wait for the context to be bound, within the
intel_context_pin(), we can track the pending completion of the bind
fence and only submit requests along the context when signaled.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/Makefile | 1 +
drivers/g
We include a tasklet flush before waiting on a request as a precaution
against the HW being lax in event signaling. We now have a precautionary
flush in the engine's heartbeat and so do not need to be quite so
zealous on every request wait. If we focus on the request, the only
tasklet flush that ma
From: Maarten Lankhorst
i915_gem_ww_ctx is used to lock all gem bo's for pinning and memory
eviction. We don't use it yet, but lets start adding the definition
first.
To use it, we have to pass a non-NULL ww to gem_object_lock, and don't
unlock directly. It is done in i915_gem_ww_ctx_fini.
Chan
Since schedule-in and schedule-out are now both always under the tasklet
bitlock, we can reduce the individual atomic operations to simple
instructions and worry less.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/gt/intel_lrc.c | 44 +
1 file changed, 19 inser
In anticipation of wanting to be able to call pi from underneath an
engine's active.lock, rework the priority inheritance to primarily work
along an engine's priority queue, delegating any other engine that the
chain may traverse to a worker. This reduces the global spinlock from
governing the enti
One more list iterator variant, for when we want to unwind from inside
one list iterator with the intention of restarting from the current
entry as the new head of the list.
Signed-off-by: Chris Wilson
Reviewed-by: Tvrtko Ursulin
---
drivers/gpu/drm/i915/i915_utils.h | 6 ++
1 file changed,
If we allow for per-client timelines, even with legacy ring submission,
we open the door to a world full of possiblities [scheduling and
semaphores].
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/gt/gen6_engine_cs.c | 21 +
1 file changed, 9 insertions(+), 12 deletions
This was removed in commit 478ffad6d690 ("drm/i915: drop
engine_pin/unpin_breadcrumbs_irq") as the last user had been removed,
but now there is a promise of a new user in the next patch.
Signed-off-by: Chris Wilson
Reviewed-by: Mika Kuoppala
---
drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 22
Build a bare bones scheduler to sit on top the global legacy ringbuffer
submission. This virtual execlists scheme should be applicable to all
older platforms.
A key problem we have with the legacy ring buffer submission is that it
only allows for FIFO queuing. All clients share the global request
Currently, if an error is raised we always call the cleanup locally
[and skip the main work callback]. However, some future users may need
to take a mutex to cleanup and so we cannot immediately execute the
cleanup as we may still be in interrupt context.
With the execute-immediate flag, for most
Remove the stub i915_vma_pin() used for incrementally pining objects for
execbuf (under the severe restriction that they must not wait on a
resource as we may have already pinned it) and replace it with a
i915_vma_pin_inplace() that is only allowed to reclaim the currently
bound location for the vm
Rather than require the next timeline after idling to match the MRU
before idling, reset the index on the node and allow it to match the
first request. However, this requires cmpxchg(u64) and so is not trivial
on 32b, so for compatibility we just fallback to keeping the cached node
pointing to the
We are using the i915_request.lock to serialise adding an execution
callback with __i915_request_submit. However, if we use an atomic
llist_add to serialise multiple waiters and then check to see if the
request is already executing, we can remove the irq-spinlock.
Fixes: 1d9221e9d395 ("drm/i915: S
Obsolete, last user removed.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_drv.h | 1 -
drivers/gpu/drm/i915/i915_gem_evict.c | 57 ---
.../gpu/drm/i915/selftests/i915_gem_evict.c | 40 -
3 files changed, 98 deletions(-)
diff --gi
Acquire all the objects and their backing storage, and page directories,
as used by execbuf under a single common ww_mutex. Albeit we have to
restart the critical section a few times in order to handle various
restrictions (such as avoiding copy_(from|to)_user and mmap_sem).
Signed-off-by: Chris W
Pull the cmdparser allocations in to the reservation phase, and then
they are included in the common vma pinning pass.
Signed-off-by: Chris Wilson
---
.../gpu/drm/i915/gem/i915_gem_execbuffer.c| 360 +++---
drivers/gpu/drm/i915/gem/i915_gem_object.h| 10 +
drivers/gpu/drm/i9
In preparation for making eb_vma bigger and heavy to run in parallel,
we need to stop applying an in-place swap() to reorder around ww_mutex
deadlocks. Keep the array intact and reorder the locks using a dedicated
list.
Signed-off-by: Chris Wilson
Reviewed-by: Tvrtko Ursulin
---
.../gpu/drm/i91
Once a virtual engine has been bound to a sibling, it will remain bound
until we finally schedule out the last active request. We can not rebind
the context to a new sibling while it is inflight as the context save
will conflict, hence we wait. As we cannot then use any other sibliing
while the con
Treat the dependency between bonded requests as weak and leave the
remainder of the pair on the GPU if one hangs.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/gt/intel_lrc.c | 6 ++
1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c
b/drivers/gpu/drm/i
We use i915_active_fini() as a debug check on the i915_active state
before freeing. If we forget to call it, we may end up angering the
debugobjects contained within.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/display/intel_frontbuffer.c| 2 ++
drivers/gpu/drm/i915/gt/selftest_engi
Since the breadcrumb enabling/cancelling itself is serialised by the
breadcrumbs.irq_lock, with a bit of care we can remove the outer
serialisation with i915_request.lock for concurrent
dma_fence_enable_signaling(). This has the important side-effect of
eliminating the nested i915_request.lock with
In the next patch, we remove the strict priority system and continuously
re-evaluate the relative priority of tasks. As such we need to enable
the timeslice whenever there is more than one context in the pipeline.
This simplifies the decision and removes some of the tweaks to suppress
timeslicing,
Pull the GGTT binding for the secure batch dispatch into the common vma
pinning routine for execbuf, so that there is just a single central
place for all i915_vma_pin().
Signed-off-by: Chris Wilson
---
.../gpu/drm/i915/gem/i915_gem_execbuffer.c| 88 +++
1 file changed, 51 ins
As context-in/out is now always serialised, we do not have to worry
about concurrent enabling/disable of the busy-stats and can reduce the
atomic_t active to a plain unsigned int, and the seqlock to a seqcount.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/gt/intel_engine_cs.c| 8 ++-
It is reasonably common for userspace (even modern drivers like iris) to
reuse an active address for a new buffer. This would cause the
application to stall under its mutex (originally struct_mutex) until the
old batches were idle and it could synchronously remove the stale PTE.
However, we can que
The prospect of locking the entire submission sequence under a wide
ww_mutex re-imposes some key restrictions, in particular that we must
not call copy_(from|to)_user underneath the mutex (as the faulthandlers
themselves may need to take the ww_mutex). To satisfy this requirement,
we need to split
Now that the tasklet completely controls scheduling of the requests, and
we postpone scheduling out the old requests, we can keep a hanging
virtual request bound to the engine on which it hung, and remove it from
te queue. On release, it will be returned to the same engine and remain
in its queue u
Since the introduction of preempt-to-busy, requests can complete in the
background, even while they are not on the engine->active.requests list.
As such, the engine->active.request list itself is not in strict
retirement order, and we have to scan the entire list while unwinding to
not miss any. Ho
The first "scheduler" was a topographical sorting of requests into
priority order. The execution order was deterministic, the earliest
submitted, highest priority request would be executed first. Priority
inherited ensured that inversions were kept at bay, and allowed us to
dynamically boost priori
Pull the individual acquisition of the context objects (state, ring,
timeline) under a common i915_acquire_ctx in preparation to allow the
context to evict memory (or rather the i915_acquire_ctx on its behalf).
The context objects maintain their semi-permanent status; that is they
are assumed to b
If any engine asks for the tasklet to be kicked from the CS interrupt,
do so. Currently, this is used by the execlists scheduler backends to
feed in the next request to the HW, and similarly could be used by a
ring scheduler, as will be seen in the next patch.
Signed-off-by: Chris Wilson
Reviewed
It is illegal to wait on an another vma while holding the vm->mutex, as
that easily leads to ABBA deadlocks (we wait on a second vma that waits
on us to release the vm->mutex). So while the vm->mutex exists, move the
waiting outside of the lock into the async binding pipeline.
Signed-off-by: Chris
When we introduced the saturated workload detection to tell us to back
off from semaphore usage [semaphores have a noticeable impact on
contended bus cycles with the CPU for some heavy workloads], we first
introduced it as a per-context tracker. This allows individual contexts
to try and optimise t
The GEM object is grossly overweight for the practicality of tracking
large numbers of individual pages, yet it is currently our only
abstraction for tracking DMA allocations. Since those allocations need
to be reserved upfront before an operation, and that we need to break
away from simple system
Since schedule-in/out is now entirely serialised by the tasklet bitlock,
we do not need to worry about concurrent in/out operations and so reduce
the atomic operations to plain instructions.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/gt/intel_engine_cs.c| 2 +-
drivers/gpu/drm/i915
Currently, we use i915_request_completed() directly in
i915_request_wait() and follow up with a manual invocation of
dma_fence_signal(). This appears to cause a large number of contentions
on i915_request.lock as when the process is woken up after the fence is
signaled by an interrupt, we will then
We need to make the DMA allocations used for page directories to be
performed up front so that we can include those allocations in our
memory reservation pass. The downside is that we have to assume the
worst case, even before we know the final layout, and always allocate
enough page directories fo
Now that we have pushed the binding itself outside of the vm->mutex, we
are clear of the potential wakeref inversions and can take the wakeref
around the actual duration of the HW interaction.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/gt/intel_ggtt.c | 39
Our goal is to pull all memory reservations (next iteration
obj->ops->get_pages()) under a ww_mutex, and to align those reservations
with other drivers, i.e. control all such allocations with the
reservation_ww_class. Currently, this is under the purview of the
obj->mm.mutex, and while obj->mm rema
On Wed, Jul 15, 2020 at 11:17 AM Christian König
wrote:
>
> Am 14.07.20 um 16:31 schrieb Daniel Vetter:
> > On Tue, Jul 14, 2020 at 01:40:11PM +0200, Christian König wrote:
> >> Am 14.07.20 um 12:49 schrieb Daniel Vetter:
> >>> On Tue, Jul 07, 2020 at 10:12:23PM +0200, Daniel Vetter wrote:
>
Our timeline lock is our defence against a concurrent execbuf
interrupting our request construction. we need hold it throughout or,
for example, a second thread may interject a relocation request in
between our own relocation request and execution in the ring.
A second, major benefit, is that it a
If no active callback is defined for i915_active, we do not need to
serialise its enabling with the mutex. We still do only want to call the
debug activate once, and must still serialise with a concurrent retire.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_active.c | 25 +++
Before we can execute a request, we must wait for all of its vma to be
bound. This is a frequent operation for which we can optimise away a
few atomic operations (notably a cmpxchg) in lieu of the RCU protection.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_active.h | 15 +++
Currently, we construct and teardown the i915_dependency chains using a
global spinlock. As the lists are entirely local, it should be possible
to use an double-lock with an explicit nesting [signaler -> waiter,
always] and so avoid the costly convenience of a global spinlock.
Signed-off-by: Chris
Whenever an i915_active idles, we prune its tree of old fence slots to
prevent a gradual leak should it be used to track many, many timelines.
The downside is that we then have to frequently reallocate the rbtree.
A compromise is that we keep the most recently used fence slot, and
reuse that for th
On Wed, Jul 15, 2020 at 1:47 PM Daniel Stone wrote:
>
> Hi,
>
> On Wed, 15 Jul 2020 at 12:05, Bas Nieuwenhuizen
> wrote:
> > On Wed, Jul 15, 2020 at 12:34 PM Chris Wilson
> > wrote:
> > > Maybe now is the time to ask: are you using sw_sync outside of
> > > validation?
> >
> > Yes, this is used
On 15/07/2020 11:50, Chris Wilson wrote:
Currently, we use i915_request_completed() directly in
i915_request_wait() and follow up with a manual invocation of
dma_fence_signal(). This appears to cause a large number of contentions
on i915_request.lock as when the process is woken up after the fe
On Wed, Jul 15, 2020 at 11:49:05AM +0100, Chris Wilson wrote:
> When waiting with a callback on the stack, we must remove the callback
> upon wait completion. Since this will be notified by the fence signal
> callback, the removal often contends with the fence->lock being held by
> the signaler. We
== Series Details ==
Series: drm/i915: Reduce i915_request.lock contention for i915_request_wait
URL : https://patchwork.freedesktop.org/series/79514/
State : warning
== Summary ==
$ dim sparse --fast origin/drm-tip
Sparse version: v0.6.0
Fast mode used, each commit won't be checked separately
Quoting Daniel Vetter (2020-07-15 13:10:22)
> On Wed, Jul 15, 2020 at 11:49:05AM +0100, Chris Wilson wrote:
> > When waiting with a callback on the stack, we must remove the callback
> > upon wait completion. Since this will be notified by the fence signal
> > callback, the removal often contends w
On 15/07/2020 13:06, Tvrtko Ursulin wrote:
On 15/07/2020 11:50, Chris Wilson wrote:
Currently, we use i915_request_completed() directly in
i915_request_wait() and follow up with a manual invocation of
dma_fence_signal(). This appears to cause a large number of contentions
on i915_request.lock
== Series Details ==
Series: drm/i915: Reduce i915_request.lock contention for i915_request_wait
URL : https://patchwork.freedesktop.org/series/79514/
State : success
== Summary ==
CI Bug Log - changes from CI_DRM_8749 -> Patchwork_18176
Su
1 - 100 of 190 matches
Mail list logo