[Intel-gfx] [PATCH 063/190] drm/i915: Rename struct intel_ringbuffer to intel_ring

2016-01-11 Thread Chris Wilson
Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_debugfs.c| 21 +++--- drivers/gpu/drm/i915/i915_drv.h| 2 +- drivers/gpu/drm/i915/i915_gem.c| 43 ++-- drivers/gpu/drm/i915/i915_gem_context.c| 2 +- drivers/gpu/drm/i915/i915_gem_execbuffe

[Intel-gfx] [PATCH 084/190] drm/i915: Track active vma requests

2016-01-11 Thread Chris Wilson
Hook the vma itself into the i915_gem_request_retire() so that we can accurately track when a solitary vma is inactive (as opposed to having to wait for the entire object to be idle). This improves the interaction when using multiple contexts (with full-ppgtt) and eliminates some frequent list walk

[Intel-gfx] [PATCH 076/190] drm/i915: Rename vma->*_list to *_link for consistency

2016-01-11 Thread Chris Wilson
Elsewhere we have adopted the convention of using '_link' to denote elements in the list (and '_list' for the actual list_head itself), and that the name should indicate which list the link belongs to (and preferrably not just where the link is being stored). s/vma_link/obj_link/ (we iterate over

[Intel-gfx] [PATCH 028/190] drm/i915: On GPU reset, set the HWS breadcrumb to the last seqno

2016-01-11 Thread Chris Wilson
After the GPU reset and we discard all of the incomplete requests, mark the GPU as having advanced to the last_submitted_seqno (as having completed the requests and ready for fresh work). The impact of this is negligble, as all the requests will be considered completed by this point, it just brings

[Intel-gfx] [PATCH 082/190] drm/i915: Count how many VMA are bound for an object

2016-01-11 Thread Chris Wilson
Since we may have VMA allocated for an object, but we interrupted their binding, there is a disparity between have elements on the obj->vma_list and being bound. i915_gem_obj_bound_any() does this check, but this is not rigorously observed - add an explicit count to make it easier. Signed-off-by:

[Intel-gfx] [PATCH 064/190] drm/i915: Rename intel_pin_and_map_ring()

2016-01-11 Thread Chris Wilson
For more consistent oop-naming, we would use intel_ring_verb, so pick intel_ring_map(). Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/intel_lrc.c| 6 ++--- drivers/gpu/drm/i915/intel_ringbuffer.c | 44 - drivers/gpu/drm/i915/intel_ringbuffer.h | 4

[Intel-gfx] ✓ success: Fi.CI.BAT

2016-01-11 Thread Patchwork
== Summary == Built on ff88655b3a5467bbc3be8c67d3e05ebf182557d3 drm-intel-nightly: 2016y-01m-11d-07h-30m-16s UTC integration manifest Test kms_pipe_crc_basic: Subgroup read-crc-pipe-b: dmesg-warn -> PASS (byt-nuc) bdw-ultratotal:138 pass:130 dwarn:1 dfa

Re: [Intel-gfx] [PATCH 03/13] drm/i915: Avoid invariant conditionals in lrc interrupt handler

2016-01-11 Thread Tvrtko Ursulin
On 11/01/16 08:29, Daniel Vetter wrote: On Fri, Jan 08, 2016 at 11:29:42AM +, Tvrtko Ursulin wrote: From: Tvrtko Ursulin There is no need to check on what Gen we are running on every interrupt and every command submission. We can instead set up some of that when engines are initialized, s

[Intel-gfx] ✗ failure: Fi.CI.BAT

2016-01-11 Thread Patchwork
== Summary == Built on ff88655b3a5467bbc3be8c67d3e05ebf182557d3 drm-intel-nightly: 2016y-01m-11d-07h-30m-16s UTC integration manifest Test gem_basic: Subgroup create-close: pass -> DMESG-WARN (skl-i7k-2) Test gem_cpu_reloc: Subgroup basic: pa

Re: [Intel-gfx] [PATCH 06/13] drm/i915: Only grab timestamps when needed

2016-01-11 Thread Tvrtko Ursulin
On 11/01/16 08:42, Daniel Vetter wrote: On Fri, Jan 08, 2016 at 11:29:45AM +, Tvrtko Ursulin wrote: From: Tvrtko Ursulin No need to call ktime_get_raw_ns twice per unlimited wait and can also elimate a local variable. Signed-off-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_gem.c |

Re: [Intel-gfx] [PATCH 07/13] drm/i915: Introduce dedicated object VMA iterator

2016-01-11 Thread Tvrtko Ursulin
On 11/01/16 08:43, Daniel Vetter wrote: > On Fri, Jan 08, 2016 at 01:29:14PM +, Tvrtko Ursulin wrote: >> >> On 08/01/16 11:29, Tvrtko Ursulin wrote: >>> From: Tvrtko Ursulin >>> >>> Purpose is to catch places which iterate the object VMA list >>> without holding the big lock. >>> >>> Implemen

[Intel-gfx] ✗ failure: Fi.CI.BAT

2016-01-11 Thread Patchwork
== Summary == Built on ff88655b3a5467bbc3be8c67d3e05ebf182557d3 drm-intel-nightly: 2016y-01m-11d-07h-30m-16s UTC integration manifest Test gem_storedw_loop: Subgroup basic-render: pass -> DMESG-WARN (skl-i5k-2) UNSTABLE dmesg-warn -> PASS (bdw-

[Intel-gfx] ✗ warning: Fi.CI.BAT

2016-01-11 Thread Patchwork
== Summary == Built on ff88655b3a5467bbc3be8c67d3e05ebf182557d3 drm-intel-nightly: 2016y-01m-11d-07h-30m-16s UTC integration manifest Test gem_storedw_loop: Subgroup basic-render: pass -> DMESG-WARN (skl-i5k-2) UNSTABLE dmesg-warn -> PASS (bdw-

Re: [Intel-gfx] [PATCH 5/5] drm/vmwgfx: Nuke preclose hook

2016-01-11 Thread Thomas Hellstrom
LGTM. Reviewed-by: Thomas Hellstrom On 01/10/2016 11:26 PM, Daniel Vetter wrote: > Again since the drm core takes care of event unlinking/disarming this > is now just needless code. > > v2: I've completely missed eaction->fpriv_head and all the related > code. We need to nuke that too to avoid

[Intel-gfx] ✗ failure: Fi.CI.BAT

2016-01-11 Thread Patchwork
== Summary == Built on ff88655b3a5467bbc3be8c67d3e05ebf182557d3 drm-intel-nightly: 2016y-01m-11d-07h-30m-16s UTC integration manifest Test gem_storedw_loop: Subgroup basic-render: dmesg-warn -> PASS (bdw-ultra) Test kms_flip: Subgroup basic-flip-vs-dpms:

[Intel-gfx] ✗ warning: Fi.CI.BAT

2016-01-11 Thread Patchwork
== Summary == Built on ff88655b3a5467bbc3be8c67d3e05ebf182557d3 drm-intel-nightly: 2016y-01m-11d-07h-30m-16s UTC integration manifest Test gem_storedw_loop: Subgroup basic-render: dmesg-warn -> PASS (bdw-ultra) Test kms_flip: Subgroup basic-flip-vs-dpms:

[Intel-gfx] [PATCH 105/190] drm/i915: Pad GTT views of exec objects up to user specified size

2016-01-11 Thread Chris Wilson
Our GPUs impose certain requirements upon buffers that depend upon how exactly they are used. Typically this is expressed as that they require a larger surface than would be naively computed by pitch * height. Normally such requirements are hidden away in the userspace driver, but when we accept po

[Intel-gfx] [PATCH 106/190] drm/i915: Split insertion/binding of an object into the VM

2016-01-11 Thread Chris Wilson
Split the insertion into the address space's range manager and binding of that object into the GTT to simplify the code flow when pinning a VMA. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_gem.c | 33 +++-- 1 file changed, 15 insertions(+), 18 deletions(

[Intel-gfx] [PATCH 089/190] drm/i915: Tidy execlists submission and tracking

2016-01-11 Thread Chris Wilson
Other than dramatically simplifying the submission code (requests ftw), we can reduce the execlist spinlock duration and importantly avoid having to hold it across the context switch register reads. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_debugfs.c| 20 +- drivers/gpu/

[Intel-gfx] [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction"

2016-01-11 Thread Chris Wilson
This reverts commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae. The patch was only a stop-gap measure that fixed half the problem - the leak of the fbcon when restarting X. A complete solution required releasing the VMA when the object itself was closed rather than rely on file/process exit. The pre

[Intel-gfx] [PATCH 108/190] drm/i915: Start passing around i915_vma from execbuffer

2016-01-11 Thread Chris Wilson
During execbuffer we look up the i915_vma in order to reserver them in the VM. However, we then do a double lookup of the vma in order to then pin them, all because we lack the necessary interfaces to operate on i915_vma. v2: Tidy parameter lists to remove one level of redirection in the hot path.

[Intel-gfx] [PATCH 096/190] drm/i915: Eliminate early submission of context enabling request

2016-01-11 Thread Chris Wilson
Now that the first request is simplified to a pure context enabling request (i.e. any request will do the required initialisation as appropriate), we can forgo explicitly sending that required during early hw initialisation. The only reason we might want to do such is in enabling power contexts, i.

[Intel-gfx] [PATCH 088/190] drm/i915: Move execlists interrupt based submission to a bottom-half

2016-01-11 Thread Chris Wilson
[ 196.988204] clocksource: timekeeping watchdog: Marking clocksource 'tsc' as unstable because the skew is too large: [ 196.988512] clocksource: 'refined-jiffies' wd_now: 9b48 wd_last: 9acb mask: [ 196.988559] clocksource: 'tsc' cs_n

[Intel-gfx] [PATCH 095/190] drm/i915: Rearrange switch_context to load the aliasing ppgtt on first use

2016-01-11 Thread Chris Wilson
The code to switch_mm() is already handled by i915_switch_context(), the only difference required to setup the aliasing ppgtt is that we need to emit te switch_mm() on the first context, i.e. when transitioning from engine->last_context == NULL. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i9

[Intel-gfx] [PATCH 126/190] drm/i915: Print the batchbuffer offset next to BBADDR in error state

2016-01-11 Thread Chris Wilson
It is useful when looking at captured error states to check the recorded BBADDR register (the address of the last batchbuffer instruction loaded) against the expected offset of the batch buffer, and so do a quick check that (a) the capture is true or (b) HEAD hasn't wandered off into the badlands.

[Intel-gfx] [PATCH 101/190] drm/i915: Only retire if necessary when creating a userptr

2016-01-11 Thread Chris Wilson
We only want to retire requests if we have an existing object that conflicts with the fresh userptr range in order to avoid unnecessary work during creation of every userptr. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_gem_userptr.c | 20 +--- 1 file changed, 13 ins

[Intel-gfx] [PATCH 114/190] drm/i915: Remove (struct_mutex) locking for wait-ioctl

2016-01-11 Thread Chris Wilson
With a bit of care (and leniency) we can iterate over the object and wait for previous rendering to complete with judicial use of atomic reference counting. The ABI requires us to ensure that an active object is eventually flushed (like the busy-ioctl) which is guaranteed by our management of reque

[Intel-gfx] [PATCH 093/190] drm/i915: Move the forced switch back to the kernel context into eviction

2016-01-11 Thread Chris Wilson
Currently, we always switch back to the kernel context (if available, i.e. legacy HW contexts not execlists) whenever we try and idle the GPU. We actually only require the switch when trying to evict everything (in order to prevent fragmentation from placement of the currently active context) from

[Intel-gfx] [PATCH 099/190] drm/i915: Check for request completion before choosing CS flips

2016-01-11 Thread Chris Wilson
Only queue a CS flip if the outstanding request is not complete, and in particular do not rely on the request tracking being fresh (since it is only updated when requests are retired). Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/intel_display.c | 5 - 1 file changed, 4 insertions(+)

[Intel-gfx] [PATCH 132/190] drm/i915: Tidy up flush cpu/gtt write domains

2016-01-11 Thread Chris Wilson
Since we know the write domain, we can drop the local variable and make the code look a tiny bit simpler. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_gem.c | 15 --- 1 file changed, 4 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers

[Intel-gfx] [PATCH 125/190] drm/i915: Track pinned VMA

2016-01-11 Thread Chris Wilson
Treat the VMA as the primary struct responsible for tracking bindings into the GPU's VM. That is we want to treat the VMA returned after we pin an object into the VM as the cookie we hold and eventually release when unpinning. Doing so eliminates the ambiguity in pinning the object and then searchi

[Intel-gfx] [PATCH 137/190] drm/i915: Shrink pages around failure to dma map

2016-01-11 Thread Chris Wilson
Similar to how we handle resource allocation failure of both physical memory and GGTT mmap space, if we fail to allocate our DMAR remapping, shrink some of our other objects and try again. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_gem_gtt.c | 35 ++

[Intel-gfx] [PATCH 136/190] drm/i915: Move ioremap_wc tracking onto VMA

2016-01-11 Thread Chris Wilson
By tracking the iomapping on the VMA itself, we can share that area between multiple users. Also by only revoking the iomapping upon unbinding from the mappable portion of the GGTT, we can keep that iomap across multiple invocations (e.g. execlists context pinning). Signed-off-by: Chris Wilson --

[Intel-gfx] [PATCH 109/190] drm/i915: Remove highly confusing i915_gem_obj_ggtt_pin()

2016-01-11 Thread Chris Wilson
Since i915_gem_obj_ggtt_pin() is an idiom breaking curry function for i915_gem_object_ggtt_pin(), spare us the confustion and remove it. Removing it now simplifies later patches to change the i915_vma_pin() (and friends) interface. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_drv.h

[Intel-gfx] [PATCH 124/190] drm/i915: Track pinned vma inside guc

2016-01-11 Thread Chris Wilson
Since the guc allocates and pins and object into the GGTT for its usage, it is more natural to use that pinned VMA as our resource cookie. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_debugfs.c| 10 +- drivers/gpu/drm/i915/i915_guc_submission.c | 142 ++-

[Intel-gfx] [PATCH 119/190] drm/i915: Reduce amount of duplicate buffer information captured on error

2016-01-11 Thread Chris Wilson
When capturing the error state, we do not need to know about every address space - just those that are related to the error. We know which context is active at the time, therefore we know which VM are implicated in the error. We can then restrict the VM which we report to the relevant subset. Sign

[Intel-gfx] [PATCH 122/190] drm/i915: Move setting of request->batch into its single callsite

2016-01-11 Thread Chris Wilson
request->batch_obj is only set by execbuffer for the convenience of debugging hangs. By moving that operation to the callsite, we can simplify all other callers and future patches. We also move the complications of reference handling of the request->batch_obj next to where the active tracking is se

[Intel-gfx] [PATCH 115/190] drm/i915: Remove (struct_mutex) locking for busy-ioctl

2016-01-11 Thread Chris Wilson
By applying the same logic as for wait-ioctl, we can query whether a request has completed without holding struct_mutex. The biggest impact system-wide is removing the flush_active and the contention that causes. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_gem.c | 51 ++

[Intel-gfx] [PATCH 104/190] drm/i915: Remove i915_gem_execbuffer_retire_commands()

2016-01-11 Thread Chris Wilson
Move the single line to the callsite as the name is now misleading, and the purpose is solely to add the request to the execution queue. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 9 + 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/drivers/

[Intel-gfx] [PATCH 097/190] drm/i915/shrinker: Flush active on objects before counting

2016-01-11 Thread Chris Wilson
As we inspect obj->active to decide how many objects we can shrink (we only shrink idle objects), it helps to flush the active lists first in order to have a more accurate count of available objects. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_gem_shrinker.c | 2 ++ 1 file changed,

[Intel-gfx] [PATCH 091/190] drm/i915: Move context initialisation to first-use

2016-01-11 Thread Chris Wilson
Instead of allocating a new request when allocating a context, use the request that initiated the allocation to emit the context initialisation. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_drv.h | 1 + drivers/gpu/drm/i915/intel_lrc.c | 42

[Intel-gfx] [PATCH 103/190] drm/i915: Move pinning of dev_priv->kernel_context into its creator

2016-01-11 Thread Chris Wilson
Rather than have every context ask "am I owned by the kernel? pin!", move that logic into the creator of the kernel context, in order to improve code comprehension. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_gem_context.c | 53 +++-- 1 file changed, 24

[Intel-gfx] [PATCH 090/190] drm/i915: Refactor execlists default context pinning

2016-01-11 Thread Chris Wilson
Refactor pinning and unpinning of contexts, such that the default context for an engine is pinned during initialisation and unpinned during teardown (pinning of the context handles the reference counting). Thus we can eliminate the special case handling of the default context that was required to m

[Intel-gfx] [PATCH 134/190] drm/i915: Refactor execbuffer relocation writing

2016-01-11 Thread Chris Wilson
With in the introduction of the reloc page cache, we are just one step away from refactoring the relocation write functions into one. Not only does it tidy the code (slightly), but it greatly simplifies the control logic much to gcc's satisfaction. Signed-off-by: Chris Wilson --- drivers/gpu/drm

[Intel-gfx] [PATCH 112/190] drm/i915: Move obj->active:5 to obj->flags

2016-01-11 Thread Chris Wilson
We are motivated to avoid using a bitfield for obj->active for a couple of reasons. Firstly, we wish to document our lockless read of obj->active using READ_ONCE inside i915_gem_busy_ioctl() and that requires an integral type (i.e. not a bitfield). Secondly, gcc produces abysmal code when presented

[Intel-gfx] [PATCH 131/190] drm/i915: Pin the pages first in shmem prepare read/write

2016-01-11 Thread Chris Wilson
There is an improbable, but not impossible, case that if we leave the pages unpin as we operate on the object, then somebody may steal the lock and change the cache domains after we have already inspected them. (Whilst here, avail ourselves of the opportunity to take a couple of steps to make the

[Intel-gfx] [PATCH 110/190] drm/i915: Move vma->pin_count:4 to vma->flags

2016-01-11 Thread Chris Wilson
Let's aide gcc in our pin_count tracking as i915_vma_pin()/i915_vma_unpin() are some of the hotest of the hot functions and gcc doesn't like bitfields that much! Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_drv.h| 20 +++ drivers/gpu/drm/i915/i915_gem.c

[Intel-gfx] [PATCH 118/190] drm/i915: Remove locking for get_tiling

2016-01-11 Thread Chris Wilson
Since we are not concerned with userspace racing itself with set-tiling (the order is indeterminant even if we take a lock), then we can safely read back the single obj->tiling_mode and do the static lookup of swizzle mode without having to take a lock. get-tiling is reasonably frequent due to the

[Intel-gfx] [PATCH 135/190] drm/i915: Move map-and-fenceable tracking to the VMA

2016-01-11 Thread Chris Wilson
By moving map-and-fenceable tracking from the object to the VMA, we gain fine-grained tracking and the ability to track individual fences on the VMA (subsequent patch). Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_debugfs.c| 46 +- drivers/gpu/drm

[Intel-gfx] [PATCH 102/190] drm/i915: Move the "per-ring" default_context to the device

2016-01-11 Thread Chris Wilson
We have a false notion of a default_context allocated per engine, whereas actually it is a singular context reserved for kernel use. Remove it from the engines, and rename it thus. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_debugfs.c| 19 ++- drivers/gpu/dr

[Intel-gfx] [PATCH 127/190] drm/i915: Cache kmap between relocations

2016-01-11 Thread Chris Wilson
When doing relocations, we have to obtain a mapping to the page containing the target address. This is either a kmap or iomap depending on GPU and its cache coherency. Neighbouring relocation entries are typically within the same page and so we can cache our kmapping between them and avoid those pe

[Intel-gfx] [PATCH 098/190] drm/i915: Double check the active status on the batch pool

2016-01-11 Thread Chris Wilson
We should not rely on obj->active being uptodate unless we manually flush it. Instead, we can verify that the next available batch object is idle by looking at its last active request (and checking it for completion). Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_gem_batch_pool.c | 2

[Intel-gfx] [PATCH 111/190] drm/i915: Make fb_tracking.lock a spinlock

2016-01-11 Thread Chris Wilson
We only need a very lightweight mechanism here as the locking is only used for co-ordinating a bitfield. Also double check that the object is still pinned to the display plane before processing the state change. v2: Move the cheap unlikely tests into the caller Signed-off-by: Chris Wilson ---

[Intel-gfx] [PATCH 113/190] drm/i915: Enable lockless lookup of request tracking via RCU

2016-01-11 Thread Chris Wilson
If we enable RCU for the requests (providing a grace period where we can inspect a "dead" request before it is freed), we can allow callers to carefully perform lockless lookup of an active request. However, by enabling deferred freeing of requests, we can potentially hog a lot of memory when deal

[Intel-gfx] [PATCH 100/190] drm/i915: Remove request retirement before each batch

2016-01-11 Thread Chris Wilson
This reimplements the denial-of-service protection against igt from commit 227f782e4667fc622810bce8be8ccdeee45f89c2 Author: Chris Wilson Date: Thu May 15 10:41:42 2014 +0100 drm/i915: Retire requests before creating a new one and transfers the stall from before each batch into a the close

[Intel-gfx] [PATCH 116/190] drm/i915: Reduce locking inside swfinish ioctl

2016-01-11 Thread Chris Wilson
We only need to take the struct_mutex if the object is pinned to the display engine and so requires checking for clflush. (The race with userspace pinning the object to a framebuffer is irrelevant.) v2: Use access once for compiler hints (or not as it is a bitfield) Signed-off-by: Chris Wilson C

[Intel-gfx] [PATCH 107/190] drm/i915: Record allocated vma size

2016-01-11 Thread Chris Wilson
Tracking the size of the VMA as allocated allows us to dramatically reduce the complexity of later functions (like inserting the VMA in to the drm_mm range manager). Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_drv.h | 10 +-- drivers/gpu/drm/i915/i915_gem.c | 117 +

[Intel-gfx] [PATCH 139/190] drm/i915: Move fence tracking from object to vma

2016-01-11 Thread Chris Wilson
In order to handle tiled partial GTT mmappings, we need to associate the fence with an individual vma. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_debugfs.c| 15 +- drivers/gpu/drm/i915/i915_drv.h| 81 -- drivers/gpu/drm/i915/i915_gem.c| 34 ++

[Intel-gfx] [PATCH 123/190] drm/i915: Mark unmappable GGTT entries as PIN_HIGH

2016-01-11 Thread Chris Wilson
We allocate a few objects into the GGTT that we never need to access via the mappable aperture (such as contexts, status pages). We can request that these are bound high in the VM to increase the amount of mappable aperture available. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_gem

[Intel-gfx] [PATCH 141/190] drm/i915: Choose not to evict faultable objects from the GGTT

2016-01-11 Thread Chris Wilson
Often times we do not want to evict mapped objects from the GGTT as these are quite expensive to teardown and frequently reused (causing an equally, if not more so, expensive setup). In particular, when faulting in a new object we want to avoid evicting an active object, or else we may trigger a pa

[Intel-gfx] [PATCH 129/190] drm/i915: Before accessing an object via the cpu, flush GTT writes

2016-01-11 Thread Chris Wilson
If we want to read the pages directly via the CPU, we have to be sure that we have to flush the writes via the GTT (as the CPU can not see the address aliasing). Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_gem.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/dr

[Intel-gfx] [PATCH 094/190] drm/i915: Remove early l3-remap

2016-01-11 Thread Chris Wilson
Since we do the l3-remap on context switch, and proceed to do a context switch immediately after manually doing the l3-remap, we can remove the redundant manual call. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_drv.h | 1 - drivers/gpu/drm/i915/i915_gem.c | 35 +---

[Intel-gfx] [PATCH 138/190] drm/i915/userptr: Make gup errors stickier

2016-01-11 Thread Chris Wilson
Keep any error reported by the gup_worker until we are notified that the arena has changed (via the mmu-notifier). This has the importance of making two consecutive calls to i915_gem_object_get_pages() reporting the same error, and curtailing an loop of detecting a fault and requeueing a gup_worker

[Intel-gfx] [PATCH 133/190] drm/i915: Convert known clflush paths over to clflush_cache_range()

2016-01-11 Thread Chris Wilson
A step towards removing redundant functions from the kernel, in this case both drm and arch/86 define a clflush(addr, range) operation. The difference is that drm_clflush_virt_range() provides a wbinvd() fallback, but along most paths, we only clflush when we know we can. Signed-off-by: Chris Wils

[Intel-gfx] [PATCH 117/190] drm/i915: Remove pinned check from madvise ioctl

2016-01-11 Thread Chris Wilson
We don't need to incur the overhead of checking whether the object is pinned prior to changing its madvise. If the object is pinned, the madvise will not take effect until it is unpinned and so we cannot free the pages being pointed at by hardware. Marking a pinned object with allocated pages as DO

[Intel-gfx] [PATCH 121/190] drm/i915: Scan GGTT active list for context object

2016-01-11 Thread Chris Wilson
Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_gpu_error.c | 15 +++ 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index a3090d7ac20a..9a18fc502145 100644 --- a/drivers/gpu/drm/i9

[Intel-gfx] [PATCH 130/190] drm/i915: Wait for writes through the GTT to land before reading back

2016-01-11 Thread Chris Wilson
If we quickly switch from writing through the GTT to a read of the physical page directly with the CPU (e.g. performing relocations through the GTT and then running the command parser), we can observe that the writes are not visible to the CPU. It is not a coherency problem, as extensive investigat

[Intel-gfx] [PATCH 128/190] drm/i915: Extract i915_gem_obj_prepare_shmem_write()

2016-01-11 Thread Chris Wilson
This is a companion to i915_gem_obj_prepare_shmem_read() that prepares the backing storage for direct writes. It first serialises with the GPU, pins the backing storage and then indicates what clfushes are required in order for the writes to be coherent. Whilst here, fix support for ancient CPUs w

[Intel-gfx] [PATCH 120/190] drm/i915: Stop the machine whilst capturing the GPU crash dump

2016-01-11 Thread Chris Wilson
The error state is purposefully racy as we expect it to be called at any time and so have avoided any locking whilst capturing the crash dump. However, with multi-engine GPUs and multiple CPUs, those races can manifest into OOPSes as we attempt to chase dangling pointers freed on other CPUs. Under

[Intel-gfx] [PATCH 092/190] drm/i915: Move the magical deferred context allocation into the request

2016-01-11 Thread Chris Wilson
We can hide more details of execlists from higher level code by removing the explicit call to create an execlist context into its first use. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 8 drivers/gpu/drm/i915/intel_lrc.c | 14 ++ d

[Intel-gfx] [PATCH 140/190] drm/i915: Fix partial GGTT faulting

2016-01-11 Thread Chris Wilson
We want to always use the partial VMA as a fallback for a failure to bind the object into the GGTT. This extends the support partial objects in the GGTT to cover everything, not just objects too large. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_gem.c | 64 +

[Intel-gfx] ✗ warning: Fi.CI.BAT

2016-01-11 Thread Patchwork
== Summary == Built on ff88655b3a5467bbc3be8c67d3e05ebf182557d3 drm-intel-nightly: 2016y-01m-11d-07h-30m-16s UTC integration manifest Test gem_storedw_loop: Subgroup basic-render: pass -> DMESG-WARN (bdw-nuci7) dmesg-warn -> PASS (bdw-ultra) Te

[Intel-gfx] ✗ warning: Fi.CI.BAT

2016-01-11 Thread Patchwork
== Summary == Built on ff88655b3a5467bbc3be8c67d3e05ebf182557d3 drm-intel-nightly: 2016y-01m-11d-07h-30m-16s UTC integration manifest Test gem_storedw_loop: Subgroup basic-render: pass -> DMESG-WARN (bdw-nuci7) dmesg-warn -> PASS (bdw-ultra) Te

[Intel-gfx] [PATCH 144/190] drm/i915: Bump the inactive MRU tracking for all VMA accessed

2016-01-11 Thread Chris Wilson
When we bump the MRU access tracking on set-to-gtt, we need to not only bump the primary GGTT VMA but all partials as well. Similarly we want to bump the MRU access for when unpinning an object from the scanout. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_gem.c | 27 +++

[Intel-gfx] [PATCH 145/190] drm/i915: Stop discarding GTT cache-domain on unbind vma

2016-01-11 Thread Chris Wilson
Since commit 43566dedde54f9729113f5f9fde77d53e75e61e9 Author: Chris Wilson Date: Fri Jan 2 16:29:29 2015 +0530 drm/i915: Broaden application of set-domain(GTT) we allowed objects to be in the GTT domain, but unbound. Therefore removing the GTT cache domain when removing the GGTT vma is no

[Intel-gfx] [PATCH 148/190] drm/i915: Stop marking the unaccessible scratch page as UC

2016-01-11 Thread Chris Wilson
Since by design, if not entirely by practice, nothing is allowed to access the scratch page we use to background fill the VM, then we do not need to ensure that it is coherent between the CPU and GPU. set_pages_uc() does a stop_machine() after changing the PAT, and that significantly impacts upon c

[Intel-gfx] [PATCH 143/190] drm/i915: Track display alignment on VMA

2016-01-11 Thread Chris Wilson
When using the aliasing ppgtt and pagefliping with the shrinker/eviction active, we note that we often have to rebind the backbuffer before flipping onto the scanout because it has an invalid alignment. If we store the worst-case alignment required for a VMA, we can avoid having to rebind at critic

[Intel-gfx] [PATCH 154/190] drm/i915: Move per-request pid from request to ctx

2016-01-11 Thread Chris Wilson
Since contexts are not currently shared between userspace processes, we have an exact correspondence between context creator and guilty batch submitter. Therefore we can save some per-batch work by inspecting the context->pid upon error instead. Note that we take the context's creator's pid rather

[Intel-gfx] [PATCH 146/190] io-mapping: Always create a struct to hold metadata about the io-mapping

2016-01-11 Thread Chris Wilson
Currently, we only allocate a structure to hold metadata if we need to allocate an ioremap for every access, such as on x86-32. However, it would be useful to store basic information about the io-mapping, such as its page protection, on all platforms. Signed-off-by: Chris Wilson Cc: linux...@kvac

[Intel-gfx] [PATCH 168/190] drm/i915: Skip holding context reference for duration of execbuffer call

2016-01-11 Thread Chris Wilson
Since the context can only be referenced and unreferenced whilst holding the bkl, we can safely forgo holding the reference on the context for the duration of our lock inside the execbuffer. After dropping the lock for the slow path, we then need to take care to reacquire the context, which has the

[Intel-gfx] [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout

2016-01-11 Thread Chris Wilson
The existing ABI says that scanouts are pinned into the mappable region so that legacy clients (e.g. old Xorg or plymouthd) can write directly into the scanout through a GTT mapping. However if the surface does not fit into the mappable region, we are better off just trying to fit it anywhere and h

[Intel-gfx] [PATCH 149/190] drm/i915: Use i915_vm_to_ppgtt()

2016-01-11 Thread Chris Wilson
We have a typesafe wrapper to extract the ppgtt from a generic address space, but only used it once out a few dozen places. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_drv.h | 1 - drivers/gpu/drm/i915/i915_gem_gtt.c | 40 - 2 files changed,

[Intel-gfx] [PATCH 152/190] drm/i915: Replace request->postfix with ->head for space searching

2016-01-11 Thread Chris Wilson
We can simplify the request code slightly by removing the postfix marker and simply using the head of the request when calculating how much space will be available when retiring upto that request. (We ignore the end of the request in case the interrupt arrives before the ring is actually past the t

[Intel-gfx] [PATCH 159/190] drm/i915: Defer active reference until required

2016-01-11 Thread Chris Wilson
We only need the active reference to keep the object alive after the handle has been deleted (so as to prevent a synchronous gem_close). Why the pay the price of a kref on every execbuf when we can insert that final active ref just in time for the handle deletion? Signed-off-by: Chris Wilson ---

[Intel-gfx] [PATCH 158/190] drm/i915: Skip holding an object reference for execbuf preparation

2016-01-11 Thread Chris Wilson
This is a golden oldie! We can shave a couple of locked instructions for about 10% of the per-object overhead by not taking an extra kref whilst reserving objects for an execbuf. Due to lock management this is safe, as we cannot lose the original object reference without the lock. Equally, because

[Intel-gfx] [PATCH 170/190] drm/i915: Store a direct lookup from object handle to vma

2016-01-11 Thread Chris Wilson
The advent of full-ppgtt lead to an extra indirection between the object and its binding. That extra indirection has a noticeable impact on how fast we can convert from the user handles to our internal vma for execbuffer. In order to bypass the extra indirection, we use a resizeable hashtable to ju

[Intel-gfx] [PATCH 155/190] drm/i915: Merge legacy+execlists context structs

2016-01-11 Thread Chris Wilson
struct intel_context contains two substructs, one for the legacy RCS and one for every execlists engine. Since legacy RCS is a subset of the execlists engine support, just combine the two substructs. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_debugfs.c | 34 +- dri

[Intel-gfx] [PATCH 156/190] drm/i915: Store the active context object on all engines upon error

2016-01-11 Thread Chris Wilson
With execlists, we have context objects everywhere, not just RCS. So store them for post-mortem debugging. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_gpu_error.c | 26 -- 1 file changed, 4 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/i915/i9

[Intel-gfx] [PATCH 162/190] drm/i915: Allow the user to pass a context to any ring

2016-01-11 Thread Chris Wilson
With full-ppgtt, we want the user to have full control over their memory layout, with a separate instance per context. Forcing them to use a shared memory layout for !RCS not only duplicates the amount of work we have to do, but also defeats the memory segregation on offer. Signed-off-by: Chris Wi

[Intel-gfx] [PATCH 171/190] drm/i915: Pass vma to relocate entry

2016-01-11 Thread Chris Wilson
We can simplify our tracking of pending writes in an execbuf to the single bit in the vma->exec_entry->flags, but that requires the relocation function knowing the object's vma. Pass it along. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_drv.h| 3 +- drivers/gpu/drm/i9

[Intel-gfx] [PATCH 182/190] drm/i915: Avoid allocating a vmap arena for a single page

2016-01-11 Thread Chris Wilson
If we want a contiguous mapping of a single page sized object, we can forgo using vmap() and just use a regular kmap(). (This maybe worth lifting to the core, with the additional proviso that the pgprot_t is compatible.) Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_gem.c | 28 +

[Intel-gfx] [PATCH 179/190] drm/i915: Skip MI_SET_CONTEXT for the same context

2016-01-11 Thread Chris Wilson
Fixes regression from commit 71b7e54f71b899db9f8def67a0e976969384e699 Author: Daniel Vetter Date: Tue Apr 14 17:35:18 2015 +0200 drm/i915: Don't look at pg_dirty_rings for aliasing ppgtt Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_gem_context.c | 12 1 file ch

[Intel-gfx] [PATCH 173/190] drm/i915: Wait upon userptr get-user-pages within execbuffer

2016-01-11 Thread Chris Wilson
This simply hides the EAGAIN caused by userptr when userspace causes resource contention. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_dma.c| 1 + drivers/gpu/drm/i915/i915_drv.h| 8 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 3 +++ drivers/gpu/

[Intel-gfx] [PATCH 180/190] drm/i915: Micro-optimise i915_gem_object_get_dirty_page()

2016-01-11 Thread Chris Wilson
We can skip the set_page_dirty() calls if we already know that the entire object is dirty. Futhermore, the WARN is redundant (we'll crash shortly afterwards) but adds substantial overhead to the function (roughly increasing the relocation per-page cost by 10%). Fixes regression from commit 033908a

[Intel-gfx] [PATCH 181/190] drm/i915: Introduce an internal allocator for disposable private objects

2016-01-11 Thread Chris Wilson
Quite a few of our objects used for internal hardware programming do not benefit from being swappable or from being zero initialised. As such they do not benefit from using a shmemfs backing storage and since they are internal and never directly exposed to the user, we do not need to worry about pr

[Intel-gfx] [PATCH 188/190] drm/i915: Use VMA for ringbuffer tracking

2016-01-11 Thread Chris Wilson
Use the GGTT VMA as the primary cookie for handing ring objects as the most common action upon the ring is mapping and unmapping which act upon the VMA itself. By restructuring the code to work with the ring VMA, we can shrink the code and remove a few cycles from context pinning. Signed-off-by: C

[Intel-gfx] [PATCH 175/190] drm/i915: Remove superfluous i915_add_request_no_flush() helper

2016-01-11 Thread Chris Wilson
The only time we need to emit a flush inside request emission is after an execbuffer, for which we can use the full __i915_add_request(). All other instances want the simpler i915_add_request() without flushing, so remove the useless helper. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i

[Intel-gfx] [PATCH 176/190] drm/i915: Use the MRU stack search after evicting

2016-01-11 Thread Chris Wilson
When we evict from the GTT to make room for an object, the hole we create is put onto the MRU stack inside the drm_mm range manager. On the next search pass, we can speed up a PIN_HIGH allocation by referencing that stack for the new hole. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i91

[Intel-gfx] [PATCH 165/190] drm/i915: Use the precomputed value for whether to enable command parsing

2016-01-11 Thread Chris Wilson
As i915.enable_cmd_parser is an unsafe option, make it read-only at runtime. Now that it is constant, we can use the value determined during initialisation as to whether we need the cmdparser at execbuffer time. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_cmd_parser.c | 36

[Intel-gfx] [PATCH 150/190] drm/i915: Embed the scratch page struct into each VM

2016-01-11 Thread Chris Wilson
As the scratch page is no longer shared between all VM, and each has their own, forgo the small allocation and simply embed the scratch page struct into the i915_address_space. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_gem_gtt.c | 83 +++-- drivers

<    1   2   3   4   5   >