Re: [Intel-gfx] [PATCH] drm/i915/gt: Trace placement of timeline HWSP

2020-07-15 Thread Mika Kuoppala
Chris Wilson writes: > Track the position of the HWSP for each timeline. > > References: https://gitlab.freedesktop.org/drm/intel/-/issues/2169 > Signed-off-by: Chris Wilson > Cc: Mika Kuoppala > --- > drivers/gpu/drm/i915/gt/intel_timeline.c| 7 +++ > drivers/gpu/drm/i915/gt/selftest

Re: [Intel-gfx] [PATCH 1/3] dma-buf/sw_sync: Avoid recursive lock during fence signal.

2020-07-15 Thread Christian König
Am 14.07.20 um 22:06 schrieb Chris Wilson: From: Bas Nieuwenhuizen Calltree: timeline_fence_release drm_sched_entity_wakeup dma_fence_signal_locked sync_timeline_signal sw_sync_ioctl Releasing the reference to the fence in the fence signal callback seems reasonable to me, so thi

Re: [Intel-gfx] [PATCH -next] drm/i915: Remove unused inline function drain_delayed_work()

2020-07-15 Thread Chris Wilson
Quoting YueHaibing (2020-07-15 04:21:04) > It is not used since commit 058179e72e09 ("drm/i915/gt: Replace > hangcheck by heartbeats") > > Signed-off-by: YueHaibing Indeed, it is no more. Reviewed-by: Chris Wilson -Chris ___ Intel-gfx mailing list Int

Re: [Intel-gfx] [PATCH v2 2/3] dma-buf/sw_sync: Separate signal/timeline locks

2020-07-15 Thread Bas Nieuwenhuizen
Still Reviewed-by: Bas Nieuwenhuizen On Tue, Jul 14, 2020 at 11:24 PM Chris Wilson wrote: > > Since we decouple the sync_pt from the timeline tree upon release, in > order to allow releasing the sync_pt from a signal callback we need to > separate the sync_pt signaling lock from the timeline tre

[Intel-gfx] sw_sync deadlock avoidance, take 3

2020-07-15 Thread Chris Wilson
dma_fence_release() objects to a fence being freed before it is signaled, so instead of playing fancy tricks to avoid handling dying requests, let's keep the syncpt alive until signaled. This neatly removes the issue with having to decouple the syncpt from the timeline upon fence release. -Chris

[Intel-gfx] [PATCH 1/2] dma-buf/sw_sync: Avoid recursive lock during fence signal

2020-07-15 Thread Chris Wilson
If a signal callback releases the sw_sync fence, that will trigger a deadlock as the timeline_fence_release recurses onto the fence->lock (used both for signaling and the the timeline tree). If we always hold a reference for an unsignaled fence held by the timeline, we no longer need to detach the

[Intel-gfx] [PATCH 2/2] dma-buf/selftests: Add locking selftests for sw_sync

2020-07-15 Thread Chris Wilson
While sw_sync is purely a debug facility for userspace to create fences and timelines it can control, nevertheless it has some tricky locking semantics of its own. In particular, Bas Nieuwenhuizen reported that we had reintroduced a deadlock if a signal callback attempted to destroy the fence. So l

[Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/2] dma-buf/sw_sync: Avoid recursive lock during fence signal

2020-07-15 Thread Patchwork
== Series Details == Series: series starting with [1/2] dma-buf/sw_sync: Avoid recursive lock during fence signal URL : https://patchwork.freedesktop.org/series/79510/ State : warning == Summary == $ dim checkpatch origin/drm-tip f09f86114c26 dma-buf/sw_sync: Avoid recursive lock during fence

Re: [Intel-gfx] sw_sync deadlock avoidance, take 3

2020-07-15 Thread Bas Nieuwenhuizen
Hi Chris, My concern with going in this direction was that we potentially allow an application to allocate a lot of kernel memory but not a lot of fds by creating lots of fences and then closing the fds but never signaling them. Is that not an issue? - Bas On Wed, Jul 15, 2020 at 12:04 PM Chris

Re: [Intel-gfx] sw_sync deadlock avoidance, take 3

2020-07-15 Thread Daniel Stone
Hi, On Wed, 15 Jul 2020 at 11:23, Bas Nieuwenhuizen wrote: > My concern with going in this direction was that we potentially allow > an application to allocate a lot of kernel memory but not a lot of fds > by creating lots of fences and then closing the fds but never > signaling them. Is that no

[Intel-gfx] ✓ Fi.CI.BAT: success for series starting with [1/2] dma-buf/sw_sync: Avoid recursive lock during fence signal

2020-07-15 Thread Patchwork
== Series Details == Series: series starting with [1/2] dma-buf/sw_sync: Avoid recursive lock during fence signal URL : https://patchwork.freedesktop.org/series/79510/ State : success == Summary == CI Bug Log - changes from CI_DRM_8748 -> Patchwork_18173 ==

Re: [Intel-gfx] sw_sync deadlock avoidance, take 3

2020-07-15 Thread Chris Wilson
Quoting Bas Nieuwenhuizen (2020-07-15 11:23:35) > Hi Chris, > > My concern with going in this direction was that we potentially allow > an application to allocate a lot of kernel memory but not a lot of fds > by creating lots of fences and then closing the fds but never > signaling them. Is that n

[Intel-gfx] [PATCH 1/2] dma-buf/dma-fence: Trim dma_fence_add_callback()

2020-07-15 Thread Chris Wilson
Rearrange the code to pull the operations beore the fence->lock critical section, and remove a small amount of redundancy: Function old new delta dma_fence_add_callback 156 145 -11 Signed-off-by: Chris Wilson --- drivers/dm

[Intel-gfx] [PATCH 2/2] dma-buf/dma-fence: Add quick tests before dma_fence_remove_callback

2020-07-15 Thread Chris Wilson
When waiting with a callback on the stack, we must remove the callback upon wait completion. Since this will be notified by the fence signal callback, the removal often contends with the fence->lock being held by the signaler. We can look at the list entry to see if the callback was already signale

[Intel-gfx] [PATCH] drm/i915: Move i915_vma_lock in the selftests to avoid lock inversion, v3.

2020-07-15 Thread Maarten Lankhorst
Make sure vma_lock is not used as inner lock when kernel context is used, and add ww handling where appropriate. Ensure that execbuf selftests keep passing by using ww handling. Changes since v2: - Fix i915_gem_context finally. Signed-off-by: Maarten Lankhorst --- .../i915/gem/selftests/i915_g

[Intel-gfx] [PATCH] drm/i915: Reduce i915_request.lock contention for i915_request_wait

2020-07-15 Thread Chris Wilson
Currently, we use i915_request_completed() directly in i915_request_wait() and follow up with a manual invocation of dma_fence_signal(). This appears to cause a large number of contentions on i915_request.lock as when the process is woken up after the fence is signaled by an interrupt, we will then

Re: [Intel-gfx] sw_sync deadlock avoidance, take 3

2020-07-15 Thread Bas Nieuwenhuizen
On Wed, Jul 15, 2020 at 12:34 PM Chris Wilson wrote: > > Quoting Bas Nieuwenhuizen (2020-07-15 11:23:35) > > Hi Chris, > > > > My concern with going in this direction was that we potentially allow > > an application to allocate a lot of kernel memory but not a lot of fds > > by creating lots of fe

Re: [Intel-gfx] [PATCH v12 0/3] drm/i915: timeline semaphore support

2020-07-15 Thread Lionel Landwerlin
Ping? On 08/07/2020 16:17, Lionel Landwerlin wrote: Hi all, This is resuming the work on trying to get timeline semaphore support for i915 upstream, now that some selftests have been added to dma-fence-chain. There are a few fix from the last iteration and a rebase following the changes in the

[Intel-gfx] ✓ Fi.CI.BAT: success for series starting with [1/2] dma-buf/dma-fence: Trim dma_fence_add_callback()

2020-07-15 Thread Patchwork
== Series Details == Series: series starting with [1/2] dma-buf/dma-fence: Trim dma_fence_add_callback() URL : https://patchwork.freedesktop.org/series/79513/ State : success == Summary == CI Bug Log - changes from CI_DRM_8748 -> Patchwork_18174 ===

[Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/23] Revert "drm/i915/gem: Async GPU relocations only" (rev3)

2020-07-15 Thread Patchwork
== Series Details == Series: series starting with [01/23] Revert "drm/i915/gem: Async GPU relocations only" (rev3) URL : https://patchwork.freedesktop.org/series/79470/ State : warning == Summary == $ dim checkpatch origin/drm-tip 814752a982cc Revert "drm/i915/gem: Async GPU relocations only"

[Intel-gfx] ✗ Fi.CI.SPARSE: warning for series starting with [01/23] Revert "drm/i915/gem: Async GPU relocations only" (rev3)

2020-07-15 Thread Patchwork
== Series Details == Series: series starting with [01/23] Revert "drm/i915/gem: Async GPU relocations only" (rev3) URL : https://patchwork.freedesktop.org/series/79470/ State : warning == Summary == $ dim sparse --fast origin/drm-tip Sparse version: v0.6.0 Fast mode used, each commit won't be

Re: [Intel-gfx] [PATCH 1/2] dma-buf/sw_sync: Avoid recursive lock during fence signal

2020-07-15 Thread Bas Nieuwenhuizen
Reviewed-by: Bas Nieuwenhuizen On Wed, Jul 15, 2020 at 12:04 PM Chris Wilson wrote: > > If a signal callback releases the sw_sync fence, that will trigger a > deadlock as the timeline_fence_release recurses onto the fence->lock > (used both for signaling and the the timeline tree). > > If we alw

Re: [Intel-gfx] [PATCH 19/25] drm/amdgpu: s/GFP_KERNEL/GFP_ATOMIC in scheduler code

2020-07-15 Thread Christian König
Am 14.07.20 um 16:31 schrieb Daniel Vetter: On Tue, Jul 14, 2020 at 01:40:11PM +0200, Christian König wrote: Am 14.07.20 um 12:49 schrieb Daniel Vetter: On Tue, Jul 07, 2020 at 10:12:23PM +0200, Daniel Vetter wrote: My dma-fence lockdep annotations caught an inversion because we allocate memor

Re: [Intel-gfx] [PATCH v8 00/12] Introduce CAP_PERFMON to secure system performance monitoring and observability

2020-07-15 Thread Arnaldo Carvalho de Melo
Em Tue, Jul 14, 2020 at 12:59:34PM +0200, Peter Zijlstra escreveu: > On Mon, Jul 13, 2020 at 03:51:52PM -0300, Arnaldo Carvalho de Melo wrote: > > > > > diff --git a/kernel/events/core.c b/kernel/events/core.c > > > > index 856d98c36f56..a2397f724c10 100644 > > > > --- a/kernel/events/core.c > > >

[Intel-gfx] ✓ Fi.CI.BAT: success for series starting with [01/23] Revert "drm/i915/gem: Async GPU relocations only" (rev3)

2020-07-15 Thread Patchwork
== Series Details == Series: series starting with [01/23] Revert "drm/i915/gem: Async GPU relocations only" (rev3) URL : https://patchwork.freedesktop.org/series/79470/ State : success == Summary == CI Bug Log - changes from CI_DRM_8748 -> Patchwork_18175 =

Re: [Intel-gfx] sw_sync deadlock avoidance, take 3

2020-07-15 Thread Daniel Stone
Hi, On Wed, 15 Jul 2020 at 12:05, Bas Nieuwenhuizen wrote: > On Wed, Jul 15, 2020 at 12:34 PM Chris Wilson > wrote: > > Maybe now is the time to ask: are you using sw_sync outside of > > validation? > > Yes, this is used as part of the Android stack on Chrome OS (need to > see if ChromeOS spec

[Intel-gfx] [PATCH 37/66] drm/i915/gt: Free stale request on destroying the virtual engine

2020-07-15 Thread Chris Wilson
Since preempt-to-busy, we may unsubmit a request while it is still on the HW and completes asynchronously. That means it may be retired and in the process destroy the virtual engine (as the user has closed their context), but that engine may still be holding onto the unsubmitted compelted request.

[Intel-gfx] [PATCH 61/66] drm/i915/gt: Support creation of 'internal' rings

2020-07-15 Thread Chris Wilson
To support legacy ring buffer scheduling, we want a virtual ringbuffer for each client. These rings are purely for holding the requests as they are being constructed on the CPU and never accessed by the GPU, so they should not be bound into the GGTT, and we can use plain old WB mapped pages. As th

[Intel-gfx] [PATCH 15/66] drm/i915/gem: Break apart the early i915_vma_pin from execbuf object lookup

2020-07-15 Thread Chris Wilson
As a prelude to the next step where we want to perform all the object allocations together under the same lock, we first must delay the i915_vma_pin() as that implicitly does the allocations for us, one by one. As it only does the allocations one by one, it is not allowed to wait/evict, whereas pul

[Intel-gfx] [PATCH 40/66] drm/i915/gt: Defer schedule_out until after the next dequeue

2020-07-15 Thread Chris Wilson
Inside schedule_out, we do extra work upon idling the context, such as updating the runtime, kicking off retires, kicking virtual engines. However, if we are in a series of processing single requests per contexts, we may find ourselves scheduling out the context, only to immediately schedule it bac

[Intel-gfx] [PATCH 47/66] drm/i915: Lift waiter/signaler iterators

2020-07-15 Thread Chris Wilson
Lift the list iteration defines for traversing the signaler/waiter lists into i915_scheduler.h for reuse. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_lrc.c | 10 -- drivers/gpu/drm/i915/i915_scheduler_types.h | 10 ++ 2 files changed, 10 insertions(+), 1

[Intel-gfx] [PATCH 19/66] drm/i915/gem: Assign context id for async work

2020-07-15 Thread Chris Wilson
Allocate a few dma fence context id that we can use to associate async work [for the CPU] launched on behalf of this context. For extra fun, we allow a configurable concurrency width. A current example would be that we spawn an unbound worker for every userptr get_pages. In the future, we wish to

[Intel-gfx] [PATCH 50/66] drm/i915: Replace engine->schedule() with a known request operation

2020-07-15 Thread Chris Wilson
Looking to the future, we want to set the scheduling attributes explicitly and so replace the generic engine->schedule() with the more direct i915_request_set_priority() What it loses in removing the 'schedule' name from the function, it gains in having an explicit entry point with a stated goal.

[Intel-gfx] [PATCH 14/66] drm/i915/gem: Rename execbuf.bind_link to unbound_link

2020-07-15 Thread Chris Wilson
Rename the current list of unbound objects so that we can track of all objects that we need to bind, as well as the list of currently unbound [unprocessed] objects. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 14 +++--- 1

[Intel-gfx] [PATCH 41/66] drm/i915/gt: Resubmit the virtual engine on schedule-out

2020-07-15 Thread Chris Wilson
Having recognised that we do not change the sibling until we schedule out, we can then defer the decision to resubmit the virtual engine from the unwind of the active queue to scheduling out of the virtual context. By keeping the unwind order intact on the local engine, we can preserve data depend

[Intel-gfx] [PATCH 48/66] drm/i915: Strip out internal priorities

2020-07-15 Thread Chris Wilson
Since we are not using any internal priority levels, and in the next few patches will introduce a new index for which the optimisation is not so lear cut, discard the small table within the priolist. Signed-off-by: Chris Wilson --- .../gpu/drm/i915/gt/intel_engine_heartbeat.c | 2 +- drivers/g

[Intel-gfx] [PATCH 56/66] drm/i915/gt: Specify a deadline for the heartbeat

2020-07-15 Thread Chris Wilson
As we know when we expect the heartbeat to be checked for completion, pass this information along as its deadline. We still do not complain if the deadline is missed, at least until we have tried a few times, but it will allow for quicker hang detection on systems where deadlines are adhered to. S

[Intel-gfx] [PATCH 45/66] drm/i915/gt: Extract busy-stats for ring-scheduler

2020-07-15 Thread Chris Wilson
Lift the busy-stats context-in/out implementation out of intel_lrc, so that we can reuse it for other scheduler implementations. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_engine_stats.h | 49 drivers/gpu/drm/i915/gt/intel_lrc.c | 34 +

[Intel-gfx] [PATCH 36/66] drm/i915/gt: Replace direct submit with direct call to tasklet

2020-07-15 Thread Chris Wilson
Rather than having special case code for opportunistically calling process_csb() and performing a direct submit while holding the engine spinlock for submitting the request, simply call the tasklet directly. This allows us to retain the direct submission path, including the CS draining to allow fas

[Intel-gfx] [PATCH 38/66] drm/i915/gt: Use virtual_engine during execlists_dequeue

2020-07-15 Thread Chris Wilson
Rather than going back and forth between the rb_node entry and the virtual_engine type, store the ve local and reuse it. As the container_of conversion from rb_node to virtual_engine requires a variable offset, performing that conversion just once shaves off a bit of code. v2: Keep a single virtua

[Intel-gfx] [PATCH 30/66] drm/i915: Specialise GGTT binding

2020-07-15 Thread Chris Wilson
The Global GTT mmapings do not require any backing storage for the page directories and so do not need extensive support for preallocations, or for handling multiple bindings en masse. The Global GTT bindings also need to take into account an eviction strategy for pinned vma, that we want to explic

[Intel-gfx] [PATCH 64/66] drm/i915/gt: Implement ring scheduler for gen6/7

2020-07-15 Thread Chris Wilson
A key prolem with legacy ring buffer submission is that it is an inheret FIFO queue across all clients; if one blocks, they all block. A scheduler allows us to avoid that limitation, and ensures that all clients can submit in parallel, removing the resource contention of the global ringbuffer. Hav

[Intel-gfx] [PATCH 57/66] drm/i915: Replace the priority boosting for the display with a deadline

2020-07-15 Thread Chris Wilson
For a modeset/pageflip, there is a very precise deadline by which the frame must be completed in order to hit the vblank and be shown. While we don't pass along that exact information, we can at least inform the scheduler that this request-chain needs to be completed asap. Signed-off-by: Chris Wil

[Intel-gfx] [PATCH 35/66] drm/i915/gt: Check for a completed last request once

2020-07-15 Thread Chris Wilson
Pull the repeated check for the last active request being completed to a single spot, when deciding whether or not execlist preemption is required. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_lrc.c | 15 --- 1 file changed, 4 insertions(+), 11 deletions(-) diff --g

[Intel-gfx] [PATCH 65/66] drm/i915/gt: Enable ring scheduling for gen6/7

2020-07-15 Thread Chris Wilson
Switch over from FIFO global submission to the priority-sorted topographical scheduler. At the cost of more busy work on the CPU to keep the GPU supplied with the next packet of requests, this allows us to reorder requests around submission stalls. This also enables the timer based RPS, with the e

[Intel-gfx] [PATCH 06/66] drm/i915: Export a preallocate variant of i915_active_acquire()

2020-07-15 Thread Chris Wilson
Sometimes we have to be very careful not to allocate underneath a mutex (or spinlock) and yet still want to track activity. Enter i915_active_acquire_for_context(). This raises the activity counter on i915_active prior to use and ensures that the fence-tree contains a slot for the context. Signed-

[Intel-gfx] [PATCH 66/66] drm/i915/gem: Remove timeline nesting from snb relocs

2020-07-15 Thread Chris Wilson
As snb is the only one to require an alternative engine for performing relocations, we know that we can reuse a common timeline between engines. Signed-off-by: Chris Wilson --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 22 +-- 1 file changed, 5 insertions(+), 17 deletions(-

[Intel-gfx] [PATCH 49/66] drm/i915: Remove I915_USER_PRIORITY_SHIFT

2020-07-15 Thread Chris Wilson
As we do not have any internal priority levels, the priority can be set directed from the user values. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/display/intel_display.c | 4 +- drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 +-- .../i915/gem/selftests/i915_gem_object_blt.c | 4

[Intel-gfx] [PATCH 32/66] drm/i915/gt: Push the wait for the context to bound to the request

2020-07-15 Thread Chris Wilson
Rather than synchronously wait for the context to be bound, within the intel_context_pin(), we can track the pending completion of the bind fence and only submit requests along the context when signaled. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/Makefile | 1 + drivers/g

[Intel-gfx] [PATCH 10/66] drm/i915: Soften the tasklet flush frequency before waits

2020-07-15 Thread Chris Wilson
We include a tasklet flush before waiting on a request as a precaution against the HW being lax in event signaling. We now have a precautionary flush in the engine's heartbeat and so do not need to be quite so zealous on every request wait. If we focus on the request, the only tasklet flush that ma

[Intel-gfx] [PATCH 26/66] drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2.

2020-07-15 Thread Chris Wilson
From: Maarten Lankhorst i915_gem_ww_ctx is used to lock all gem bo's for pinning and memory eviction. We don't use it yet, but lets start adding the definition first. To use it, we have to pass a non-NULL ww to gem_object_lock, and don't unlock directly. It is done in i915_gem_ww_ctx_fini. Chan

[Intel-gfx] [PATCH 43/66] drm/i915/gt: ce->inflight updates are now serialised

2020-07-15 Thread Chris Wilson
Since schedule-in and schedule-out are now both always under the tasklet bitlock, we can reduce the individual atomic operations to simple instructions and worry less. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_lrc.c | 44 + 1 file changed, 19 inser

[Intel-gfx] [PATCH 53/66] drm/i915: Restructure priority inheritance

2020-07-15 Thread Chris Wilson
In anticipation of wanting to be able to call pi from underneath an engine's active.lock, rework the priority inheritance to primarily work along an engine's priority queue, delegating any other engine that the chain may traverse to a worker. This reduces the global spinlock from governing the enti

[Intel-gfx] [PATCH 17/66] drm/i915: Add list_for_each_entry_safe_continue_reverse

2020-07-15 Thread Chris Wilson
One more list iterator variant, for when we want to unwind from inside one list iterator with the intention of restarting from the current entry as the new head of the list. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_utils.h | 6 ++ 1 file changed,

[Intel-gfx] [PATCH 62/66] drm/i915/gt: Use client timeline address for seqno writes

2020-07-15 Thread Chris Wilson
If we allow for per-client timelines, even with legacy ring submission, we open the door to a world full of possiblities [scheduling and semaphores]. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/gen6_engine_cs.c | 21 + 1 file changed, 9 insertions(+), 12 deletions

[Intel-gfx] [PATCH 59/66] Restore "drm/i915: drop engine_pin/unpin_breadcrumbs_irq"

2020-07-15 Thread Chris Wilson
This was removed in commit 478ffad6d690 ("drm/i915: drop engine_pin/unpin_breadcrumbs_irq") as the last user had been removed, but now there is a promise of a new user in the next patch. Signed-off-by: Chris Wilson Reviewed-by: Mika Kuoppala --- drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 22

[Intel-gfx] [PATCH 63/66] drm/i915/gt: Infrastructure for ring scheduling

2020-07-15 Thread Chris Wilson
Build a bare bones scheduler to sit on top the global legacy ringbuffer submission. This virtual execlists scheme should be applicable to all older platforms. A key problem we have with the legacy ring buffer submission is that it only allows for FIFO queuing. All clients share the global request

[Intel-gfx] [PATCH 18/66] drm/i915: Always defer fenced work to the worker

2020-07-15 Thread Chris Wilson
Currently, if an error is raised we always call the cleanup locally [and skip the main work callback]. However, some future users may need to take a mutex to cleanup and so we cannot immediately execute the cleanup as we may still be in interrupt context. With the execute-immediate flag, for most

[Intel-gfx] [PATCH 16/66] drm/i915/gem: Remove the call for no-evict i915_vma_pin

2020-07-15 Thread Chris Wilson
Remove the stub i915_vma_pin() used for incrementally pining objects for execbuf (under the severe restriction that they must not wait on a resource as we may have already pinned it) and replace it with a i915_vma_pin_inplace() that is only allowed to reclaim the currently bound location for the vm

[Intel-gfx] [PATCH 08/66] drm/i915: Make the stale cached active node available for any timeline

2020-07-15 Thread Chris Wilson
Rather than require the next timeline after idling to match the MRU before idling, reset the index on the node and allow it to match the first request. However, this requires cmpxchg(u64) and so is not trivial on 32b, so for compatibility we just fallback to keeping the cached node pointing to the

[Intel-gfx] [PATCH 02/66] drm/i915: Remove i915_request.lock requirement for execution callbacks

2020-07-15 Thread Chris Wilson
We are using the i915_request.lock to serialise adding an execution callback with __i915_request_submit. However, if we use an atomic llist_add to serialise multiple waiters and then check to see if the request is already executing, we can remove the irq-spinlock. Fixes: 1d9221e9d395 ("drm/i915: S

[Intel-gfx] [PATCH 33/66] drm/i915: Remove unused i915_gem_evict_vm()

2020-07-15 Thread Chris Wilson
Obsolete, last user removed. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_drv.h | 1 - drivers/gpu/drm/i915/i915_gem_evict.c | 57 --- .../gpu/drm/i915/selftests/i915_gem_evict.c | 40 - 3 files changed, 98 deletions(-) diff --gi

[Intel-gfx] [PATCH 27/66] drm/i915/gem: Pull execbuf dma resv under a single critical section

2020-07-15 Thread Chris Wilson
Acquire all the objects and their backing storage, and page directories, as used by execbuf under a single common ww_mutex. Albeit we have to restart the critical section a few times in order to handle various restrictions (such as avoiding copy_(from|to)_user and mmap_sem). Signed-off-by: Chris W

[Intel-gfx] [PATCH 23/66] drm/i915/gem: Include cmdparser in common execbuf pinning

2020-07-15 Thread Chris Wilson
Pull the cmdparser allocations in to the reservation phase, and then they are included in the common vma pinning pass. Signed-off-by: Chris Wilson --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 360 +++--- drivers/gpu/drm/i915/gem/i915_gem_object.h| 10 + drivers/gpu/drm/i9

[Intel-gfx] [PATCH 20/66] drm/i915/gem: Separate the ww_mutex walker into its own list

2020-07-15 Thread Chris Wilson
In preparation for making eb_vma bigger and heavy to run in parallel, we need to stop applying an in-place swap() to reorder around ww_mutex deadlocks. Keep the array intact and reorder the locks using a dedicated list. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- .../gpu/drm/i91

[Intel-gfx] [PATCH 39/66] drm/i915/gt: Decouple inflight virtual engines

2020-07-15 Thread Chris Wilson
Once a virtual engine has been bound to a sibling, it will remain bound until we finally schedule out the last active request. We can not rebind the context to a new sibling while it is inflight as the context save will conflict, hence we wait. As we cannot then use any other sibliing while the con

[Intel-gfx] [PATCH 51/66] drm/i915/gt: Do not suspend bonded requests if one hangs

2020-07-15 Thread Chris Wilson
Treat the dependency between bonded requests as weak and leave the remainder of the pair on the GPU if one hangs. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_lrc.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i

[Intel-gfx] [PATCH 04/66] drm/i915: Add a couple of missing i915_active_fini()

2020-07-15 Thread Chris Wilson
We use i915_active_fini() as a debug check on the i915_active state before freeing. If we forget to call it, we may end up angering the debugobjects contained within. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/display/intel_frontbuffer.c| 2 ++ drivers/gpu/drm/i915/gt/selftest_engi

[Intel-gfx] [PATCH 03/66] drm/i915: Remove requirement for holding i915_request.lock for breadcrumbs

2020-07-15 Thread Chris Wilson
Since the breadcrumb enabling/cancelling itself is serialised by the breadcrumbs.irq_lock, with a bit of care we can remove the outer serialisation with i915_request.lock for concurrent dma_fence_enable_signaling(). This has the important side-effect of eliminating the nested i915_request.lock with

[Intel-gfx] [PATCH 54/66] drm/i915/gt: Remove timeslice suppression

2020-07-15 Thread Chris Wilson
In the next patch, we remove the strict priority system and continuously re-evaluate the relative priority of tasks. As such we need to enable the timeslice whenever there is more than one context in the pipeline. This simplifies the decision and removes some of the tweaks to suppress timeslicing,

[Intel-gfx] [PATCH 24/66] drm/i915/gem: Include secure batch in common execbuf pinning

2020-07-15 Thread Chris Wilson
Pull the GGTT binding for the secure batch dispatch into the common vma pinning routine for execbuf, so that there is just a single central place for all i915_vma_pin(). Signed-off-by: Chris Wilson --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 88 +++ 1 file changed, 51 ins

[Intel-gfx] [PATCH 46/66] drm/i915/gt: Convert stats.active to plain unsigned int

2020-07-15 Thread Chris Wilson
As context-in/out is now always serialised, we do not have to worry about concurrent enabling/disable of the busy-stats and can reduce the atomic_t active to a plain unsigned int, and the seqlock to a seqcount. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_engine_cs.c| 8 ++-

[Intel-gfx] [PATCH 21/66] drm/i915/gem: Asynchronous GTT unbinding

2020-07-15 Thread Chris Wilson
It is reasonably common for userspace (even modern drivers like iris) to reuse an active address for a new buffer. This would cause the application to stall under its mutex (originally struct_mutex) until the old batches were idle and it could synchronously remove the stale PTE. However, we can que

[Intel-gfx] [PATCH 25/66] drm/i915/gem: Reintroduce multiple passes for reloc processing

2020-07-15 Thread Chris Wilson
The prospect of locking the entire submission sequence under a wide ww_mutex re-imposes some key restrictions, in particular that we must not call copy_(from|to)_user underneath the mutex (as the faulthandlers themselves may need to take the ww_mutex). To satisfy this requirement, we need to split

[Intel-gfx] [PATCH 42/66] drm/i915/gt: Simplify virtual engine handling for execlists_hold()

2020-07-15 Thread Chris Wilson
Now that the tasklet completely controls scheduling of the requests, and we postpone scheduling out the old requests, we can keep a hanging virtual request bound to the engine on which it hung, and remove it from te queue. On release, it will be returned to the same engine and remain in its queue u

[Intel-gfx] [PATCH 34/66] drm/i915/gt: Decouple completed requests on unwind

2020-07-15 Thread Chris Wilson
Since the introduction of preempt-to-busy, requests can complete in the background, even while they are not on the engine->active.requests list. As such, the engine->active.request list itself is not in strict retirement order, and we have to scan the entire list while unwinding to not miss any. Ho

[Intel-gfx] [PATCH 55/66] drm/i915: Fair low-latency scheduling

2020-07-15 Thread Chris Wilson
The first "scheduler" was a topographical sorting of requests into priority order. The execution order was deterministic, the earliest submitted, highest priority request would be executed first. Priority inherited ensured that inversions were kept at bay, and allowed us to dynamically boost priori

[Intel-gfx] [PATCH 31/66] drm/i915/gt: Acquire backing storage for the context

2020-07-15 Thread Chris Wilson
Pull the individual acquisition of the context objects (state, ring, timeline) under a common i915_acquire_ctx in preparation to allow the context to evict memory (or rather the i915_acquire_ctx on its behalf). The context objects maintain their semi-permanent status; that is they are assumed to b

[Intel-gfx] [PATCH 60/66] drm/i915/gt: Couple tasklet scheduling for all CS interrupts

2020-07-15 Thread Chris Wilson
If any engine asks for the tasklet to be kicked from the CS interrupt, do so. Currently, this is used by the execlists scheduler backends to feed in the next request to the HW, and similarly could be used by a ring scheduler, as will be seen in the next patch. Signed-off-by: Chris Wilson Reviewed

[Intel-gfx] [PATCH 22/66] drm/i915/gem: Bind the fence async for execbuf

2020-07-15 Thread Chris Wilson
It is illegal to wait on an another vma while holding the vm->mutex, as that easily leads to ABBA deadlocks (we wait on a second vma that waits on us to release the vm->mutex). So while the vm->mutex exists, move the waiting outside of the lock into the async binding pipeline. Signed-off-by: Chris

[Intel-gfx] [PATCH 58/66] drm/i915: Move saturated workload detection to the GT

2020-07-15 Thread Chris Wilson
When we introduced the saturated workload detection to tell us to back off from semaphore usage [semaphores have a noticeable impact on contended bus cycles with the CPU for some heavy workloads], we first introduced it as a per-context tracker. This allows individual contexts to try and optimise t

[Intel-gfx] [PATCH 12/66] drm/i915: Switch to object allocations for page directories

2020-07-15 Thread Chris Wilson
The GEM object is grossly overweight for the practicality of tracking large numbers of individual pages, yet it is currently our only abstraction for tracking DMA allocations. Since those allocations need to be reserved upfront before an operation, and that we need to break away from simple system

[Intel-gfx] [PATCH 44/66] drm/i915/gt: Drop atomic for engine->fw_active tracking

2020-07-15 Thread Chris Wilson
Since schedule-in/out is now entirely serialised by the tasklet bitlock, we do not need to worry about concurrent in/out operations and so reduce the atomic operations to plain instructions. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_engine_cs.c| 2 +- drivers/gpu/drm/i915

[Intel-gfx] [PATCH 01/66] drm/i915: Reduce i915_request.lock contention for i915_request_wait

2020-07-15 Thread Chris Wilson
Currently, we use i915_request_completed() directly in i915_request_wait() and follow up with a manual invocation of dma_fence_signal(). This appears to cause a large number of contentions on i915_request.lock as when the process is woken up after the fence is signaled by an interrupt, we will then

[Intel-gfx] [PATCH 11/66] drm/i915: Preallocate stashes for vma page-directories

2020-07-15 Thread Chris Wilson
We need to make the DMA allocations used for page directories to be performed up front so that we can include those allocations in our memory reservation pass. The downside is that we have to assume the worst case, even before we know the final layout, and always allocate enough page directories fo

[Intel-gfx] [PATCH 29/66] drm/i915: Hold wakeref for the duration of the vma GGTT binding

2020-07-15 Thread Chris Wilson
Now that we have pushed the binding itself outside of the vm->mutex, we are clear of the potential wakeref inversions and can take the wakeref around the actual duration of the HW interaction. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_ggtt.c | 39

[Intel-gfx] [PATCH 28/66] drm/i915/gem: Replace i915_gem_object.mm.mutex with reservation_ww_class

2020-07-15 Thread Chris Wilson
Our goal is to pull all memory reservations (next iteration obj->ops->get_pages()) under a ww_mutex, and to align those reservations with other drivers, i.e. control all such allocations with the reservation_ww_class. Currently, this is under the purview of the obj->mm.mutex, and while obj->mm rema

Re: [Intel-gfx] [PATCH 19/25] drm/amdgpu: s/GFP_KERNEL/GFP_ATOMIC in scheduler code

2020-07-15 Thread Daniel Vetter
On Wed, Jul 15, 2020 at 11:17 AM Christian König wrote: > > Am 14.07.20 um 16:31 schrieb Daniel Vetter: > > On Tue, Jul 14, 2020 at 01:40:11PM +0200, Christian König wrote: > >> Am 14.07.20 um 12:49 schrieb Daniel Vetter: > >>> On Tue, Jul 07, 2020 at 10:12:23PM +0200, Daniel Vetter wrote: >

[Intel-gfx] [PATCH 13/66] drm/i915/gem: Don't drop the timeline lock during execbuf

2020-07-15 Thread Chris Wilson
Our timeline lock is our defence against a concurrent execbuf interrupting our request construction. we need hold it throughout or, for example, a second thread may interject a relocation request in between our own relocation request and execution in the ring. A second, major benefit, is that it a

[Intel-gfx] [PATCH 05/66] drm/i915: Skip taking acquire mutex for no ref->active callback

2020-07-15 Thread Chris Wilson
If no active callback is defined for i915_active, we do not need to serialise its enabling with the mutex. We still do only want to call the debug activate once, and must still serialise with a concurrent retire. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_active.c | 25 +++

[Intel-gfx] [PATCH 09/66] drm/i915: Provide a fastpath for waiting on vma bindings

2020-07-15 Thread Chris Wilson
Before we can execute a request, we must wait for all of its vma to be bound. This is a frequent operation for which we can optimise away a few atomic operations (notably a cmpxchg) in lieu of the RCU protection. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_active.h | 15 +++

[Intel-gfx] [PATCH 52/66] drm/i915: Teach the i915_dependency to use a double-lock

2020-07-15 Thread Chris Wilson
Currently, we construct and teardown the i915_dependency chains using a global spinlock. As the lists are entirely local, it should be possible to use an double-lock with an explicit nesting [signaler -> waiter, always] and so avoid the costly convenience of a global spinlock. Signed-off-by: Chris

[Intel-gfx] [PATCH 07/66] drm/i915: Keep the most recently used active-fence upon discard

2020-07-15 Thread Chris Wilson
Whenever an i915_active idles, we prune its tree of old fence slots to prevent a gradual leak should it be used to track many, many timelines. The downside is that we then have to frequently reallocate the rbtree. A compromise is that we keep the most recently used fence slot, and reuse that for th

Re: [Intel-gfx] sw_sync deadlock avoidance, take 3

2020-07-15 Thread Daniel Vetter
On Wed, Jul 15, 2020 at 1:47 PM Daniel Stone wrote: > > Hi, > > On Wed, 15 Jul 2020 at 12:05, Bas Nieuwenhuizen > wrote: > > On Wed, Jul 15, 2020 at 12:34 PM Chris Wilson > > wrote: > > > Maybe now is the time to ask: are you using sw_sync outside of > > > validation? > > > > Yes, this is used

Re: [Intel-gfx] [PATCH] drm/i915: Reduce i915_request.lock contention for i915_request_wait

2020-07-15 Thread Tvrtko Ursulin
On 15/07/2020 11:50, Chris Wilson wrote: Currently, we use i915_request_completed() directly in i915_request_wait() and follow up with a manual invocation of dma_fence_signal(). This appears to cause a large number of contentions on i915_request.lock as when the process is woken up after the fe

Re: [Intel-gfx] [PATCH 2/2] dma-buf/dma-fence: Add quick tests before dma_fence_remove_callback

2020-07-15 Thread Daniel Vetter
On Wed, Jul 15, 2020 at 11:49:05AM +0100, Chris Wilson wrote: > When waiting with a callback on the stack, we must remove the callback > upon wait completion. Since this will be notified by the fence signal > callback, the removal often contends with the fence->lock being held by > the signaler. We

[Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915: Reduce i915_request.lock contention for i915_request_wait

2020-07-15 Thread Patchwork
== Series Details == Series: drm/i915: Reduce i915_request.lock contention for i915_request_wait URL : https://patchwork.freedesktop.org/series/79514/ State : warning == Summary == $ dim sparse --fast origin/drm-tip Sparse version: v0.6.0 Fast mode used, each commit won't be checked separately

Re: [Intel-gfx] [PATCH 2/2] dma-buf/dma-fence: Add quick tests before dma_fence_remove_callback

2020-07-15 Thread Chris Wilson
Quoting Daniel Vetter (2020-07-15 13:10:22) > On Wed, Jul 15, 2020 at 11:49:05AM +0100, Chris Wilson wrote: > > When waiting with a callback on the stack, we must remove the callback > > upon wait completion. Since this will be notified by the fence signal > > callback, the removal often contends w

Re: [Intel-gfx] [PATCH] drm/i915: Reduce i915_request.lock contention for i915_request_wait

2020-07-15 Thread Tvrtko Ursulin
On 15/07/2020 13:06, Tvrtko Ursulin wrote: On 15/07/2020 11:50, Chris Wilson wrote: Currently, we use i915_request_completed() directly in i915_request_wait() and follow up with a manual invocation of dma_fence_signal(). This appears to cause a large number of contentions on i915_request.lock

[Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915: Reduce i915_request.lock contention for i915_request_wait

2020-07-15 Thread Patchwork
== Series Details == Series: drm/i915: Reduce i915_request.lock contention for i915_request_wait URL : https://patchwork.freedesktop.org/series/79514/ State : success == Summary == CI Bug Log - changes from CI_DRM_8749 -> Patchwork_18176 Su

  1   2   >