Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_debugfs.c| 21 +++---
drivers/gpu/drm/i915/i915_drv.h| 2 +-
drivers/gpu/drm/i915/i915_gem.c| 43 ++--
drivers/gpu/drm/i915/i915_gem_context.c| 2 +-
drivers/gpu/drm/i915/i915_gem_execbuffe
Hook the vma itself into the i915_gem_request_retire() so that we can
accurately track when a solitary vma is inactive (as opposed to having
to wait for the entire object to be idle). This improves the interaction
when using multiple contexts (with full-ppgtt) and eliminates some
frequent list walk
Elsewhere we have adopted the convention of using '_link' to denote
elements in the list (and '_list' for the actual list_head itself), and
that the name should indicate which list the link belongs to (and
preferrably not just where the link is being stored).
s/vma_link/obj_link/ (we iterate over
After the GPU reset and we discard all of the incomplete requests, mark
the GPU as having advanced to the last_submitted_seqno (as having
completed the requests and ready for fresh work). The impact of this is
negligble, as all the requests will be considered completed by this
point, it just brings
Since we may have VMA allocated for an object, but we interrupted their
binding, there is a disparity between have elements on the obj->vma_list
and being bound. i915_gem_obj_bound_any() does this check, but this is
not rigorously observed - add an explicit count to make it easier.
Signed-off-by:
For more consistent oop-naming, we would use intel_ring_verb, so pick
intel_ring_map().
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/intel_lrc.c| 6 ++---
drivers/gpu/drm/i915/intel_ringbuffer.c | 44 -
drivers/gpu/drm/i915/intel_ringbuffer.h | 4
== Summary ==
Built on ff88655b3a5467bbc3be8c67d3e05ebf182557d3 drm-intel-nightly:
2016y-01m-11d-07h-30m-16s UTC integration manifest
Test kms_pipe_crc_basic:
Subgroup read-crc-pipe-b:
dmesg-warn -> PASS (byt-nuc)
bdw-ultratotal:138 pass:130 dwarn:1 dfa
On 11/01/16 08:29, Daniel Vetter wrote:
On Fri, Jan 08, 2016 at 11:29:42AM +, Tvrtko Ursulin wrote:
From: Tvrtko Ursulin
There is no need to check on what Gen we are running on every
interrupt and every command submission. We can instead set up
some of that when engines are initialized, s
== Summary ==
Built on ff88655b3a5467bbc3be8c67d3e05ebf182557d3 drm-intel-nightly:
2016y-01m-11d-07h-30m-16s UTC integration manifest
Test gem_basic:
Subgroup create-close:
pass -> DMESG-WARN (skl-i7k-2)
Test gem_cpu_reloc:
Subgroup basic:
pa
On 11/01/16 08:42, Daniel Vetter wrote:
On Fri, Jan 08, 2016 at 11:29:45AM +, Tvrtko Ursulin wrote:
From: Tvrtko Ursulin
No need to call ktime_get_raw_ns twice per unlimited wait and can
also elimate a local variable.
Signed-off-by: Tvrtko Ursulin
---
drivers/gpu/drm/i915/i915_gem.c |
On 11/01/16 08:43, Daniel Vetter wrote:
> On Fri, Jan 08, 2016 at 01:29:14PM +, Tvrtko Ursulin wrote:
>>
>> On 08/01/16 11:29, Tvrtko Ursulin wrote:
>>> From: Tvrtko Ursulin
>>>
>>> Purpose is to catch places which iterate the object VMA list
>>> without holding the big lock.
>>>
>>> Implemen
== Summary ==
Built on ff88655b3a5467bbc3be8c67d3e05ebf182557d3 drm-intel-nightly:
2016y-01m-11d-07h-30m-16s UTC integration manifest
Test gem_storedw_loop:
Subgroup basic-render:
pass -> DMESG-WARN (skl-i5k-2) UNSTABLE
dmesg-warn -> PASS (bdw-
== Summary ==
Built on ff88655b3a5467bbc3be8c67d3e05ebf182557d3 drm-intel-nightly:
2016y-01m-11d-07h-30m-16s UTC integration manifest
Test gem_storedw_loop:
Subgroup basic-render:
pass -> DMESG-WARN (skl-i5k-2) UNSTABLE
dmesg-warn -> PASS (bdw-
LGTM.
Reviewed-by: Thomas Hellstrom
On 01/10/2016 11:26 PM, Daniel Vetter wrote:
> Again since the drm core takes care of event unlinking/disarming this
> is now just needless code.
>
> v2: I've completely missed eaction->fpriv_head and all the related
> code. We need to nuke that too to avoid
== Summary ==
Built on ff88655b3a5467bbc3be8c67d3e05ebf182557d3 drm-intel-nightly:
2016y-01m-11d-07h-30m-16s UTC integration manifest
Test gem_storedw_loop:
Subgroup basic-render:
dmesg-warn -> PASS (bdw-ultra)
Test kms_flip:
Subgroup basic-flip-vs-dpms:
== Summary ==
Built on ff88655b3a5467bbc3be8c67d3e05ebf182557d3 drm-intel-nightly:
2016y-01m-11d-07h-30m-16s UTC integration manifest
Test gem_storedw_loop:
Subgroup basic-render:
dmesg-warn -> PASS (bdw-ultra)
Test kms_flip:
Subgroup basic-flip-vs-dpms:
Our GPUs impose certain requirements upon buffers that depend upon how
exactly they are used. Typically this is expressed as that they require
a larger surface than would be naively computed by pitch * height.
Normally such requirements are hidden away in the userspace driver, but
when we accept po
Split the insertion into the address space's range manager and binding
of that object into the GTT to simplify the code flow when pinning a
VMA.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_gem.c | 33 +++--
1 file changed, 15 insertions(+), 18 deletions(
Other than dramatically simplifying the submission code (requests ftw),
we can reduce the execlist spinlock duration and importantly avoid
having to hold it across the context switch register reads.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_debugfs.c| 20 +-
drivers/gpu/
This reverts commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae.
The patch was only a stop-gap measure that fixed half the problem - the
leak of the fbcon when restarting X. A complete solution required
releasing the VMA when the object itself was closed rather than rely on
file/process exit. The pre
During execbuffer we look up the i915_vma in order to reserver them in
the VM. However, we then do a double lookup of the vma in order to then
pin them, all because we lack the necessary interfaces to operate on
i915_vma.
v2: Tidy parameter lists to remove one level of redirection in the hot
path.
Now that the first request is simplified to a pure context enabling
request (i.e. any request will do the required initialisation as
appropriate), we can forgo explicitly sending that required during early
hw initialisation. The only reason we might want to do such is in
enabling power contexts, i.
[ 196.988204] clocksource: timekeeping watchdog: Marking clocksource 'tsc' as
unstable because the skew is too large:
[ 196.988512] clocksource: 'refined-jiffies' wd_now:
9b48 wd_last: 9acb mask:
[ 196.988559] clocksource: 'tsc' cs_n
The code to switch_mm() is already handled by i915_switch_context(), the
only difference required to setup the aliasing ppgtt is that we need to
emit te switch_mm() on the first context, i.e. when transitioning from
engine->last_context == NULL.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i9
It is useful when looking at captured error states to check the recorded
BBADDR register (the address of the last batchbuffer instruction loaded)
against the expected offset of the batch buffer, and so do a quick check
that (a) the capture is true or (b) HEAD hasn't wandered off into the
badlands.
We only want to retire requests if we have an existing object that
conflicts with the fresh userptr range in order to avoid unnecessary
work during creation of every userptr.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_gem_userptr.c | 20 +---
1 file changed, 13 ins
With a bit of care (and leniency) we can iterate over the object and
wait for previous rendering to complete with judicial use of atomic
reference counting. The ABI requires us to ensure that an active object
is eventually flushed (like the busy-ioctl) which is guaranteed by our
management of reque
Currently, we always switch back to the kernel context (if available,
i.e. legacy HW contexts not execlists) whenever we try and idle the GPU.
We actually only require the switch when trying to evict everything (in
order to prevent fragmentation from placement of the currently active
context) from
Only queue a CS flip if the outstanding request is not complete, and in
particular do not rely on the request tracking being fresh (since it is
only updated when requests are retired).
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/intel_display.c | 5 -
1 file changed, 4 insertions(+)
Since we know the write domain, we can drop the local variable and make
the code look a tiny bit simpler.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_gem.c | 15 ---
1 file changed, 4 insertions(+), 11 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers
Treat the VMA as the primary struct responsible for tracking bindings
into the GPU's VM. That is we want to treat the VMA returned after we
pin an object into the VM as the cookie we hold and eventually release
when unpinning. Doing so eliminates the ambiguity in pinning the object
and then searchi
Similar to how we handle resource allocation failure of both physical
memory and GGTT mmap space, if we fail to allocate our DMAR remapping,
shrink some of our other objects and try again.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 35 ++
By tracking the iomapping on the VMA itself, we can share that area
between multiple users. Also by only revoking the iomapping upon
unbinding from the mappable portion of the GGTT, we can keep that iomap
across multiple invocations (e.g. execlists context pinning).
Signed-off-by: Chris Wilson
--
Since i915_gem_obj_ggtt_pin() is an idiom breaking curry function for
i915_gem_object_ggtt_pin(), spare us the confustion and remove it.
Removing it now simplifies later patches to change the i915_vma_pin()
(and friends) interface.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_drv.h
Since the guc allocates and pins and object into the GGTT for its usage,
it is more natural to use that pinned VMA as our resource cookie.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_debugfs.c| 10 +-
drivers/gpu/drm/i915/i915_guc_submission.c | 142 ++-
When capturing the error state, we do not need to know about every
address space - just those that are related to the error. We know which
context is active at the time, therefore we know which VM are implicated
in the error. We can then restrict the VM which we report to the
relevant subset.
Sign
request->batch_obj is only set by execbuffer for the convenience of
debugging hangs. By moving that operation to the callsite, we can
simplify all other callers and future patches. We also move the
complications of reference handling of the request->batch_obj next to
where the active tracking is se
By applying the same logic as for wait-ioctl, we can query whether a
request has completed without holding struct_mutex. The biggest impact
system-wide is removing the flush_active and the contention that causes.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_gem.c | 51 ++
Move the single line to the callsite as the name is now misleading, and
the purpose is solely to add the request to the execution queue.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_gem_execbuffer.c | 9 +
1 file changed, 1 insertion(+), 8 deletions(-)
diff --git a/drivers/
As we inspect obj->active to decide how many objects we can shrink (we
only shrink idle objects), it helps to flush the active lists first
in order to have a more accurate count of available objects.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_gem_shrinker.c | 2 ++
1 file changed,
Instead of allocating a new request when allocating a context, use the
request that initiated the allocation to emit the context
initialisation.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_drv.h | 1 +
drivers/gpu/drm/i915/intel_lrc.c | 42
Rather than have every context ask "am I owned by the kernel? pin!",
move that logic into the creator of the kernel context, in order to
improve code comprehension.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_gem_context.c | 53 +++--
1 file changed, 24
Refactor pinning and unpinning of contexts, such that the default
context for an engine is pinned during initialisation and unpinned
during teardown (pinning of the context handles the reference counting).
Thus we can eliminate the special case handling of the default context
that was required to m
With in the introduction of the reloc page cache, we are just one step
away from refactoring the relocation write functions into one. Not only
does it tidy the code (slightly), but it greatly simplifies the control
logic much to gcc's satisfaction.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm
We are motivated to avoid using a bitfield for obj->active for a couple
of reasons. Firstly, we wish to document our lockless read of obj->active
using READ_ONCE inside i915_gem_busy_ioctl() and that requires an
integral type (i.e. not a bitfield). Secondly, gcc produces abysmal code
when presented
There is an improbable, but not impossible, case that if we leave the
pages unpin as we operate on the object, then somebody may steal the
lock and change the cache domains after we have already inspected them.
(Whilst here, avail ourselves of the opportunity to take a couple of
steps to make the
Let's aide gcc in our pin_count tracking as
i915_vma_pin()/i915_vma_unpin() are some of the hotest of the hot
functions and gcc doesn't like bitfields that much!
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_drv.h| 20 +++
drivers/gpu/drm/i915/i915_gem.c
Since we are not concerned with userspace racing itself with set-tiling
(the order is indeterminant even if we take a lock), then we can safely
read back the single obj->tiling_mode and do the static lookup of
swizzle mode without having to take a lock.
get-tiling is reasonably frequent due to the
By moving map-and-fenceable tracking from the object to the VMA, we gain
fine-grained tracking and the ability to track individual fences on the VMA
(subsequent patch).
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_debugfs.c| 46 +-
drivers/gpu/drm
We have a false notion of a default_context allocated per engine,
whereas actually it is a singular context reserved for kernel use.
Remove it from the engines, and rename it thus.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_debugfs.c| 19 ++-
drivers/gpu/dr
When doing relocations, we have to obtain a mapping to the page
containing the target address. This is either a kmap or iomap depending
on GPU and its cache coherency. Neighbouring relocation entries are
typically within the same page and so we can cache our kmapping between
them and avoid those pe
We should not rely on obj->active being uptodate unless we manually
flush it. Instead, we can verify that the next available batch object is
idle by looking at its last active request (and checking it for
completion).
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_gem_batch_pool.c | 2
We only need a very lightweight mechanism here as the locking is only
used for co-ordinating a bitfield.
Also double check that the object is still pinned to the display plane
before processing the state change.
v2: Move the cheap unlikely tests into the caller
Signed-off-by: Chris Wilson
---
If we enable RCU for the requests (providing a grace period where we can
inspect a "dead" request before it is freed), we can allow callers to
carefully perform lockless lookup of an active request.
However, by enabling deferred freeing of requests, we can potentially
hog a lot of memory when deal
This reimplements the denial-of-service protection against igt from
commit 227f782e4667fc622810bce8be8ccdeee45f89c2
Author: Chris Wilson
Date: Thu May 15 10:41:42 2014 +0100
drm/i915: Retire requests before creating a new one
and transfers the stall from before each batch into a the close
We only need to take the struct_mutex if the object is pinned to the
display engine and so requires checking for clflush. (The race with
userspace pinning the object to a framebuffer is irrelevant.)
v2: Use access once for compiler hints (or not as it is a bitfield)
Signed-off-by: Chris Wilson
C
Tracking the size of the VMA as allocated allows us to dramatically
reduce the complexity of later functions (like inserting the VMA in to
the drm_mm range manager).
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_drv.h | 10 +--
drivers/gpu/drm/i915/i915_gem.c | 117 +
In order to handle tiled partial GTT mmappings, we need to associate the
fence with an individual vma.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_debugfs.c| 15 +-
drivers/gpu/drm/i915/i915_drv.h| 81 --
drivers/gpu/drm/i915/i915_gem.c| 34 ++
We allocate a few objects into the GGTT that we never need to access via
the mappable aperture (such as contexts, status pages). We can request
that these are bound high in the VM to increase the amount of mappable
aperture available.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_gem
Often times we do not want to evict mapped objects from the GGTT as
these are quite expensive to teardown and frequently reused (causing an
equally, if not more so, expensive setup). In particular, when faulting
in a new object we want to avoid evicting an active object, or else we
may trigger a pa
If we want to read the pages directly via the CPU, we have to be sure
that we have to flush the writes via the GTT (as the CPU can not see
the address aliasing).
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_gem.c | 4
1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/dr
Since we do the l3-remap on context switch, and proceed to do a context
switch immediately after manually doing the l3-remap, we can remove the
redundant manual call.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_drv.h | 1 -
drivers/gpu/drm/i915/i915_gem.c | 35 +---
Keep any error reported by the gup_worker until we are notified that the
arena has changed (via the mmu-notifier). This has the importance of
making two consecutive calls to i915_gem_object_get_pages() reporting
the same error, and curtailing an loop of detecting a fault and requeueing
a gup_worker
A step towards removing redundant functions from the kernel, in this
case both drm and arch/86 define a clflush(addr, range) operation. The
difference is that drm_clflush_virt_range() provides a wbinvd()
fallback, but along most paths, we only clflush when we know we can.
Signed-off-by: Chris Wils
We don't need to incur the overhead of checking whether the object is
pinned prior to changing its madvise. If the object is pinned, the
madvise will not take effect until it is unpinned and so we cannot free
the pages being pointed at by hardware. Marking a pinned object with
allocated pages as DO
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_gpu_error.c | 15 +++
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c
b/drivers/gpu/drm/i915/i915_gpu_error.c
index a3090d7ac20a..9a18fc502145 100644
--- a/drivers/gpu/drm/i9
If we quickly switch from writing through the GTT to a read of the
physical page directly with the CPU (e.g. performing relocations through
the GTT and then running the command parser), we can observe that the
writes are not visible to the CPU. It is not a coherency problem, as
extensive investigat
This is a companion to i915_gem_obj_prepare_shmem_read() that prepares
the backing storage for direct writes. It first serialises with the GPU,
pins the backing storage and then indicates what clfushes are required in
order for the writes to be coherent.
Whilst here, fix support for ancient CPUs w
The error state is purposefully racy as we expect it to be called at any
time and so have avoided any locking whilst capturing the crash dump.
However, with multi-engine GPUs and multiple CPUs, those races can
manifest into OOPSes as we attempt to chase dangling pointers freed on
other CPUs. Under
We can hide more details of execlists from higher level code by removing
the explicit call to create an execlist context into its first use.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_gem_execbuffer.c | 8
drivers/gpu/drm/i915/intel_lrc.c | 14 ++
d
We want to always use the partial VMA as a fallback for a failure to
bind the object into the GGTT. This extends the support partial objects
in the GGTT to cover everything, not just objects too large.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_gem.c | 64 +
== Summary ==
Built on ff88655b3a5467bbc3be8c67d3e05ebf182557d3 drm-intel-nightly:
2016y-01m-11d-07h-30m-16s UTC integration manifest
Test gem_storedw_loop:
Subgroup basic-render:
pass -> DMESG-WARN (bdw-nuci7)
dmesg-warn -> PASS (bdw-ultra)
Te
== Summary ==
Built on ff88655b3a5467bbc3be8c67d3e05ebf182557d3 drm-intel-nightly:
2016y-01m-11d-07h-30m-16s UTC integration manifest
Test gem_storedw_loop:
Subgroup basic-render:
pass -> DMESG-WARN (bdw-nuci7)
dmesg-warn -> PASS (bdw-ultra)
Te
When we bump the MRU access tracking on set-to-gtt, we need to not only
bump the primary GGTT VMA but all partials as well. Similarly we want to
bump the MRU access for when unpinning an object from the scanout.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_gem.c | 27 +++
Since
commit 43566dedde54f9729113f5f9fde77d53e75e61e9
Author: Chris Wilson
Date: Fri Jan 2 16:29:29 2015 +0530
drm/i915: Broaden application of set-domain(GTT)
we allowed objects to be in the GTT domain, but unbound. Therefore
removing the GTT cache domain when removing the GGTT vma is no
Since by design, if not entirely by practice, nothing is allowed to
access the scratch page we use to background fill the VM, then we do not
need to ensure that it is coherent between the CPU and GPU.
set_pages_uc() does a stop_machine() after changing the PAT, and that
significantly impacts upon c
When using the aliasing ppgtt and pagefliping with the shrinker/eviction
active, we note that we often have to rebind the backbuffer before
flipping onto the scanout because it has an invalid alignment. If we
store the worst-case alignment required for a VMA, we can avoid having
to rebind at critic
Since contexts are not currently shared between userspace processes, we
have an exact correspondence between context creator and guilty batch
submitter. Therefore we can save some per-batch work by inspecting the
context->pid upon error instead. Note that we take the context's
creator's pid rather
Currently, we only allocate a structure to hold metadata if we need to
allocate an ioremap for every access, such as on x86-32. However, it
would be useful to store basic information about the io-mapping, such as
its page protection, on all platforms.
Signed-off-by: Chris Wilson
Cc: linux...@kvac
Since the context can only be referenced and unreferenced whilst holding
the bkl, we can safely forgo holding the reference on the context for
the duration of our lock inside the execbuffer. After dropping the lock
for the slow path, we then need to take care to reacquire the context,
which has the
The existing ABI says that scanouts are pinned into the mappable region
so that legacy clients (e.g. old Xorg or plymouthd) can write directly
into the scanout through a GTT mapping. However if the surface does not
fit into the mappable region, we are better off just trying to fit it
anywhere and h
We have a typesafe wrapper to extract the ppgtt from a generic address
space, but only used it once out a few dozen places.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_drv.h | 1 -
drivers/gpu/drm/i915/i915_gem_gtt.c | 40 -
2 files changed,
We can simplify the request code slightly by removing the postfix marker
and simply using the head of the request when calculating how much space
will be available when retiring upto that request. (We ignore the end of
the request in case the interrupt arrives before the ring is actually
past the t
We only need the active reference to keep the object alive after the
handle has been deleted (so as to prevent a synchronous gem_close). Why
the pay the price of a kref on every execbuf when we can insert that
final active ref just in time for the handle deletion?
Signed-off-by: Chris Wilson
---
This is a golden oldie! We can shave a couple of locked instructions for
about 10% of the per-object overhead by not taking an extra kref whilst
reserving objects for an execbuf. Due to lock management this is safe,
as we cannot lose the original object reference without the lock.
Equally, because
The advent of full-ppgtt lead to an extra indirection between the object
and its binding. That extra indirection has a noticeable impact on how
fast we can convert from the user handles to our internal vma for
execbuffer. In order to bypass the extra indirection, we use a
resizeable hashtable to ju
struct intel_context contains two substructs, one for the legacy RCS and
one for every execlists engine. Since legacy RCS is a subset of the
execlists engine support, just combine the two substructs.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_debugfs.c | 34 +-
dri
With execlists, we have context objects everywhere, not just RCS. So
store them for post-mortem debugging.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_gpu_error.c | 26 --
1 file changed, 4 insertions(+), 22 deletions(-)
diff --git a/drivers/gpu/drm/i915/i9
With full-ppgtt, we want the user to have full control over their memory
layout, with a separate instance per context. Forcing them to use a
shared memory layout for !RCS not only duplicates the amount of work we
have to do, but also defeats the memory segregation on offer.
Signed-off-by: Chris Wi
We can simplify our tracking of pending writes in an execbuf to the
single bit in the vma->exec_entry->flags, but that requires the
relocation function knowing the object's vma. Pass it along.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_drv.h| 3 +-
drivers/gpu/drm/i9
If we want a contiguous mapping of a single page sized object, we can
forgo using vmap() and just use a regular kmap().
(This maybe worth lifting to the core, with the additional proviso that
the pgprot_t is compatible.)
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_gem.c | 28 +
Fixes regression from
commit 71b7e54f71b899db9f8def67a0e976969384e699
Author: Daniel Vetter
Date: Tue Apr 14 17:35:18 2015 +0200
drm/i915: Don't look at pg_dirty_rings for aliasing ppgtt
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_gem_context.c | 12
1 file ch
This simply hides the EAGAIN caused by userptr when userspace causes
resource contention.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_dma.c| 1 +
drivers/gpu/drm/i915/i915_drv.h| 8
drivers/gpu/drm/i915/i915_gem_execbuffer.c | 3 +++
drivers/gpu/
We can skip the set_page_dirty() calls if we already know that the
entire object is dirty. Futhermore, the WARN is redundant (we'll crash
shortly afterwards) but adds substantial overhead to the function
(roughly increasing the relocation per-page cost by 10%).
Fixes regression from
commit 033908a
Quite a few of our objects used for internal hardware programming do not
benefit from being swappable or from being zero initialised. As such
they do not benefit from using a shmemfs backing storage and since they
are internal and never directly exposed to the user, we do not need to
worry about pr
Use the GGTT VMA as the primary cookie for handing ring objects as
the most common action upon the ring is mapping and unmapping which act
upon the VMA itself. By restructuring the code to work with the ring
VMA, we can shrink the code and remove a few cycles from context pinning.
Signed-off-by: C
The only time we need to emit a flush inside request emission is after
an execbuffer, for which we can use the full __i915_add_request(). All
other instances want the simpler i915_add_request() without flushing, so
remove the useless helper.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i
When we evict from the GTT to make room for an object, the hole we
create is put onto the MRU stack inside the drm_mm range manager. On the
next search pass, we can speed up a PIN_HIGH allocation by referencing
that stack for the new hole.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i91
As i915.enable_cmd_parser is an unsafe option, make it read-only at
runtime. Now that it is constant, we can use the value determined during
initialisation as to whether we need the cmdparser at execbuffer time.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_cmd_parser.c | 36
As the scratch page is no longer shared between all VM, and each has
their own, forgo the small allocation and simply embed the scratch page
struct into the i915_address_space.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 83 +++--
drivers
101 - 200 of 450 matches
Mail list logo