Re: [Intel-gfx] [PATCH 0/4] Steer multicast register workaround verification

2020-05-01 Thread Tvrtko Ursulin



Hi,

On 01/05/2020 00:15, Matt Roper wrote:

We're seeing some CI errors indicating that a workaround did not apply
properly on EHL/JSL.  The workaround in question is updating a multicast
register, the failures are only seen on specific CI machines, and the
failures only seem to happen on resets and such rather than on initial
driver load.  It seems likely that the culprit here is failure to steer
the multicast register readback on a SKU that has slice0 / subslice0
fused off.

This series makes a couple changes:
  * Workaround verification will explicitly steer MCR registers by
calling read_subslice_reg rather than a regular read.
  * New multicast ranges are added for gen11 and gen12.  Sadly this
information is still missing from the bspec (just like the updated
forcewake tables).  The hardware guys have given us a spreadsheet
with both the forcewake and the multicast information while they work
on getting the spec properly updated, so that's where the new ranges
come from.


I think there are multiple things here. To begin with, newly discovered 
ranges are of course a savior.


But I am not sure about the approach of using intel_read_subslice_reg in 
wa_verify. It is one suspicion that 0xfdc is lost on reset, but we do 
reprogram it afterwards don't we? And since it is the first register in 
the list it is supposed to be in place before the rest of verification 
runs, no?


A year or two I tried figuring this for Icelake and failed, but AFAIR 
(maybe my experiments can be found somewhere on trybot patchwork), I 
even tried both applying the affected ones via unicast (for each ss, or 
l3 where applicable) and also verifying a single register in all enabled 
ss. AFAIR there were still some issues there. Granted my memory could be 
leaky.. But I think this multiple write/verify could still be useful.


(Now that I think about it, I think that the problem area back when I 
experiementing with it was more suspend/resume.. hm..)


My main concern is that with current code we effectively have, after reset:

intel_gt_apply_workarounds:
   program 0xfdc
   program the rest of wa
verify_wa
   do reads using configured 0xfdc

So MCR should be correct. This series seems to be doing:

intel_gt_apply_workarounds:
   program 0xfdc
* store ss used for MCR configuration
   program the rest of wa
verify_wa
   Do reads but reconfigure 0xfdc before every register in range,
   but to the same value as in initial configuration.

Is this correct? Is the thinking then simply writing the same value to 
0xfdc multiple times fixes things?


Regards,

Tvrtko

P.S. Update, found the experiments, listing some of them:

https://patchwork.freedesktop.org/series/64183/
https://patchwork.freedesktop.org/series/64013/

It reminded me that there were some unexplained issues with regards of 
where I used ffs or fls for finding the valid common MCR setting between 
L3 and SSEU. I think we use a different one than Windows but ours works 
better for our verification, empirically at least. Usual disclaimer 
about my leaky memory applies here.



In addition to MCR and forcewake, there's supposed to be some more bspec
updates coming soon that deal with steering (i.e., different MCR ranges
should actually be using different registers to steer rather than just
the 0xFDC register we're familiar with); I don't have the full details
on that yet, so those updates will have to wait until we actually have
an updated spec.

References: https://gitlab.freedesktop.org/drm/intel/issues/1222

Matt Roper (4):
   drm/i915: Setup multicast register steering for all gen >= 10
   drm/i915: Steer multicast register readback in wa_verify
   drm/i915: Don't skip verification of MCR engine workarounds
   drm/i915: Add MCR ranges for gen11 and gen12

  drivers/gpu/drm/i915/gt/intel_engine.h|   3 +
  drivers/gpu/drm/i915/gt/intel_engine_cs.c |  17 +-
  drivers/gpu/drm/i915/gt/intel_workarounds.c   | 146 --
  .../gpu/drm/i915/gt/intel_workarounds_types.h |   2 +
  4 files changed, 110 insertions(+), 58 deletions(-)


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915/pmu: Keep a reference to module while active

2020-05-01 Thread Tvrtko Ursulin



On 30/04/2020 19:33, Chris Wilson wrote:

While a perf event is open, keep a reference to the module so we don't
remove the driver internals mid-sampling.

Testcase: igt/perf_pmu/module-unload
Signed-off-by: Chris Wilson 
Cc: Tvrtko Ursulin 
Cc: sta...@vger.kernel.org
---
  drivers/gpu/drm/i915/i915_pmu.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 83c6a8ccd2cb..e991a707bdb7 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -442,6 +442,7 @@ static u64 count_interrupts(struct drm_i915_private *i915)
  static void i915_pmu_event_destroy(struct perf_event *event)
  {
WARN_ON(event->parent);
+   module_put(THIS_MODULE);
  }
  
  static int

@@ -533,8 +534,10 @@ static int i915_pmu_event_init(struct perf_event *event)
if (ret)
return ret;
  
-	if (!event->parent)

+   if (!event->parent) {
+   __module_get(THIS_MODULE);
event->destroy = i915_pmu_event_destroy;
+   }
  
  	return 0;

  }



Reviewed-by: Tvrtko Ursulin 

Regards,

Tvrtko
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] ✗ Fi.CI.IGT: failure for drm/i915/gem: Use chained reloc batches

2020-05-01 Thread Patchwork
== Series Details ==

Series: drm/i915/gem: Use chained reloc batches
URL   : https://patchwork.freedesktop.org/series/76793/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_8403_full -> Patchwork_17535_full


Summary
---

  **FAILURE**

  Serious unknown changes coming with Patchwork_17535_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_17535_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Possible new issues
---

  Here are the unknown changes that may have been introduced in 
Patchwork_17535_full:

### IGT changes ###

 Possible regressions 

  * igt@gem_exec_whisper@basic-contexts:
- shard-iclb: [PASS][1] -> [FAIL][2]
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8403/shard-iclb3/igt@gem_exec_whis...@basic-contexts.html
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17535/shard-iclb3/igt@gem_exec_whis...@basic-contexts.html
- shard-tglb: [PASS][3] -> [FAIL][4]
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8403/shard-tglb1/igt@gem_exec_whis...@basic-contexts.html
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17535/shard-tglb3/igt@gem_exec_whis...@basic-contexts.html

  
 Suppressed 

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * {igt@gem_exec_reloc@basic-parallel}:
- shard-kbl:  [PASS][5] -> [TIMEOUT][6]
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8403/shard-kbl4/igt@gem_exec_re...@basic-parallel.html
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17535/shard-kbl1/igt@gem_exec_re...@basic-parallel.html
- shard-snb:  [PASS][7] -> [FAIL][8]
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8403/shard-snb1/igt@gem_exec_re...@basic-parallel.html
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17535/shard-snb4/igt@gem_exec_re...@basic-parallel.html
- shard-tglb: [PASS][9] -> [TIMEOUT][10]
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8403/shard-tglb8/igt@gem_exec_re...@basic-parallel.html
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17535/shard-tglb3/igt@gem_exec_re...@basic-parallel.html
- shard-skl:  [PASS][11] -> [INCOMPLETE][12]
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8403/shard-skl6/igt@gem_exec_re...@basic-parallel.html
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17535/shard-skl2/igt@gem_exec_re...@basic-parallel.html
- shard-apl:  [PASS][13] -> [INCOMPLETE][14]
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8403/shard-apl1/igt@gem_exec_re...@basic-parallel.html
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17535/shard-apl3/igt@gem_exec_re...@basic-parallel.html
- shard-iclb: [PASS][15] -> [TIMEOUT][16]
   [15]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8403/shard-iclb2/igt@gem_exec_re...@basic-parallel.html
   [16]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17535/shard-iclb3/igt@gem_exec_re...@basic-parallel.html
- shard-glk:  [PASS][17] -> [TIMEOUT][18]
   [17]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8403/shard-glk2/igt@gem_exec_re...@basic-parallel.html
   [18]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17535/shard-glk8/igt@gem_exec_re...@basic-parallel.html

  
Known issues


  Here are the changes found in Patchwork_17535_full that come from known 
issues:

### IGT changes ###

 Issues hit 

  * igt@gem_eio@in-flight-suspend:
- shard-kbl:  [PASS][19] -> [DMESG-WARN][20] ([i915#180])
   [19]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8403/shard-kbl2/igt@gem_...@in-flight-suspend.html
   [20]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17535/shard-kbl7/igt@gem_...@in-flight-suspend.html

  * igt@gem_exec_whisper@basic-contexts-forked:
- shard-glk:  [PASS][21] -> [FAIL][22] ([i915#1479])
   [21]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8403/shard-glk9/igt@gem_exec_whis...@basic-contexts-forked.html
   [22]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17535/shard-glk9/igt@gem_exec_whis...@basic-contexts-forked.html

  * igt@gen9_exec_parse@allowed-all:
- shard-kbl:  [PASS][23] -> [DMESG-WARN][24] ([i915#1436] / 
[i915#716])
   [23]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8403/shard-kbl7/igt@gen9_exec_pa...@allowed-all.html
   [24]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17535/shard-kbl6/igt@gen9_exec_pa...@allowed-all.html

  * igt@gen9_exec_parse@allowed-single:
- shard-skl:  [PASS][25] -> [DMESG-WARN][26] ([i915#716])
   [25]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8403/shard-skl3/igt@gen9_e

[Intel-gfx] [PATCH 2/4] drm/i915/gem: Use a single chained reloc batches for a single execbuf

2020-05-01 Thread Chris Wilson
As we can now keep chaining together a relocation batch to process any
number of relocations, we can keep building that relocation batch for
all of the target vma. This avoiding emitting a new request into the
ring for each target, consuming precious ring space and a potential
stall.

Testcase: igt/gem_exec_reloc/basic-wide-active
Signed-off-by: Chris Wilson 
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 23 +++
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 47b1192a159e..9d68d66555b0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -268,6 +268,7 @@ struct i915_execbuffer {
bool has_fence : 1;
bool needs_unfenced : 1;
 
+   struct i915_vma *target;
struct i915_request *rq;
u32 *rq_cmd;
unsigned int rq_size;
@@ -1070,9 +1071,6 @@ static void reloc_cache_reset(struct reloc_cache *cache)
 {
void *vaddr;
 
-   if (cache->rq)
-   reloc_gpu_flush(cache);
-
if (!cache->vaddr)
return;
 
@@ -1265,7 +1263,6 @@ static int reloc_move_to_gpu(struct i915_request *rq, 
struct i915_vma *vma)
 }
 
 static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
-struct i915_vma *vma,
 unsigned int len)
 {
struct reloc_cache *cache = &eb->reloc_cache;
@@ -1288,7 +1285,7 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
goto out_pool;
}
 
-   batch = i915_vma_instance(pool->obj, vma->vm, NULL);
+   batch = i915_vma_instance(pool->obj, eb->context->vm, NULL);
if (IS_ERR(batch)) {
err = PTR_ERR(batch);
goto err_unmap;
@@ -1308,10 +1305,6 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
if (err)
goto err_request;
 
-   err = reloc_move_to_gpu(rq, vma);
-   if (err)
-   goto err_request;
-
err = eb->engine->emit_bb_start(rq,
batch->node.start, PAGE_SIZE,
cache->gen > 5 ? 0 : 
I915_DISPATCH_SECURE);
@@ -1361,9 +1354,17 @@ static u32 *reloc_gpu(struct i915_execbuffer *eb,
if (!intel_engine_can_store_dword(eb->engine))
return ERR_PTR(-ENODEV);
 
-   err = __reloc_gpu_alloc(eb, vma, len);
+   err = __reloc_gpu_alloc(eb, len);
+   if (unlikely(err))
+   return ERR_PTR(err);
+   }
+
+   if (vma != cache->target) {
+   err = reloc_move_to_gpu(cache->rq, vma);
if (unlikely(err))
return ERR_PTR(err);
+
+   cache->target = vma;
}
 
if (unlikely(cache->rq_size > PAGE_SIZE / sizeof(u32) - len - 4)) {
@@ -1680,6 +1681,8 @@ static int eb_relocate(struct i915_execbuffer *eb)
if (err)
return err;
}
+
+   reloc_gpu_flush(&eb->reloc_cache);
}
 
return 0;
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 1/4] drm/i915/gem: Use chained reloc batches

2020-05-01 Thread Chris Wilson
The ring is a precious resource: we anticipate to only use a few hundred
bytes for a request, and only try to reserve that before we start. If we
go beyond our guess in building the request, then instead of waiting at
the start of execbuf before we hold any locks or other resources, we
may trigger a wait inside a critical region. One example is in using gpu
relocations, where currently we emit a new MI_BB_START from the ring
every time we overflow a page of relocation entries. However, instead of
insert the command into the precious ring, we can chain the next page of
relocation entries as MI_BB_START from the end of the previous.

Testcase: igt/gem_exec_reloc/basic-many-active
Signed-off-by: Chris Wilson 
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 104 +++---
 1 file changed, 91 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 414859fa2673..47b1192a159e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -975,20 +975,95 @@ static inline struct i915_ggtt *cache_to_ggtt(struct 
reloc_cache *cache)
return &i915->ggtt;
 }
 
+static int reloc_gpu_chain(struct reloc_cache *cache)
+{
+   struct intel_gt_buffer_pool_node *pool;
+   struct i915_request *rq = cache->rq;
+   struct i915_vma *batch;
+   u32 *cmd;
+   int err;
+
+   pool = intel_gt_get_buffer_pool(rq->engine->gt, PAGE_SIZE);
+   if (IS_ERR(pool))
+   return PTR_ERR(pool);
+
+   batch = i915_vma_instance(pool->obj, rq->context->vm, NULL);
+   if (IS_ERR(batch)) {
+   err = PTR_ERR(batch);
+   goto out_pool;
+   }
+
+   err = i915_vma_pin(batch, 0, 0, PIN_USER | PIN_NONBLOCK);
+   if (err)
+   goto out_pool;
+
+   cmd = cache->rq_cmd + cache->rq_size;
+   *cmd++ = MI_ARB_CHECK;
+   if (cache->gen >= 8) {
+   *cmd++ = MI_BATCH_BUFFER_START_GEN8;
+   *cmd++ = lower_32_bits(batch->node.start);
+   *cmd++ = upper_32_bits(batch->node.start);
+   } else {
+   *cmd++ = MI_BATCH_BUFFER_START;
+   *cmd++ = lower_32_bits(batch->node.start);
+   }
+   i915_gem_object_flush_map(rq->batch->obj);
+   i915_gem_object_unpin_map(rq->batch->obj);
+   rq->batch = NULL;
+
+   err = intel_gt_buffer_pool_mark_active(pool, rq);
+   if (err == 0) {
+   i915_vma_lock(batch);
+   err = i915_request_await_object(rq, batch->obj, false);
+   if (err == 0)
+   err = i915_vma_move_to_active(batch, rq, 0);
+   i915_vma_unlock(batch);
+   }
+   i915_vma_unpin(batch);
+   if (err)
+   goto out_pool;
+
+   cmd = i915_gem_object_pin_map(pool->obj,
+ cache->has_llc ?
+ I915_MAP_FORCE_WB :
+ I915_MAP_FORCE_WC);
+   if (IS_ERR(cmd)) {
+   err = PTR_ERR(cmd);
+   goto out_pool;
+   }
+
+   cache->rq_cmd = cmd;
+   cache->rq_size = 0;
+
+   /* Return with batch mapping (cmd) still pinned */
+   rq->batch = batch;
+
+out_pool:
+   intel_gt_buffer_pool_put(pool);
+   return err;
+}
+
 static void reloc_gpu_flush(struct reloc_cache *cache)
 {
-   struct drm_i915_gem_object *obj = cache->rq->batch->obj;
+   struct i915_request *rq;
 
-   GEM_BUG_ON(cache->rq_size >= obj->base.size / sizeof(u32));
-   cache->rq_cmd[cache->rq_size] = MI_BATCH_BUFFER_END;
+   rq = fetch_and_zero(&cache->rq);
+   if (!rq)
+   return;
 
-   __i915_gem_object_flush_map(obj, 0, sizeof(u32) * (cache->rq_size + 1));
-   i915_gem_object_unpin_map(obj);
+   if (rq->batch) {
+   struct drm_i915_gem_object *obj = rq->batch->obj;
 
-   intel_gt_chipset_flush(cache->rq->engine->gt);
+   GEM_BUG_ON(cache->rq_size >= obj->base.size / sizeof(u32));
+   cache->rq_cmd[cache->rq_size++] = MI_BATCH_BUFFER_END;
 
-   i915_request_add(cache->rq);
-   cache->rq = NULL;
+   __i915_gem_object_flush_map(obj,
+   0, sizeof(u32) * cache->rq_size);
+   i915_gem_object_unpin_map(obj);
+   }
+
+   intel_gt_chipset_flush(rq->engine->gt);
+   i915_request_add(rq);
 }
 
 static void reloc_cache_reset(struct reloc_cache *cache)
@@ -1280,13 +1355,9 @@ static u32 *reloc_gpu(struct i915_execbuffer *eb,
 {
struct reloc_cache *cache = &eb->reloc_cache;
u32 *cmd;
-
-   if (cache->rq_size > PAGE_SIZE/sizeof(u32) - (len + 1))
-   reloc_gpu_flush(cache);
+   int err;
 
if (unlikely(!cache->rq)) {
-   int err;
-
if (!intel_engine_can_store_dword(eb->engine))
ret

[Intel-gfx] [PATCH 4/4] drm/i915/gt: Stop holding onto the pinned_default_state

2020-05-01 Thread Chris Wilson
As we only restore the default context state upon banning a context, we
only need enough of the state to run the ring and nothing more. That is
we only need our bare protocontext.

Signed-off-by: Chris Wilson 
Cc: Tvrtko Ursulin 
Cc: Mika Kuoppala 
Cc: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_engine_pm.c| 14 +-
 drivers/gpu/drm/i915/gt/intel_engine_types.h |  1 -
 drivers/gpu/drm/i915/gt/intel_lrc.c  | 14 ++
 drivers/gpu/drm/i915/gt/selftest_context.c   | 11 ++--
 drivers/gpu/drm/i915/gt/selftest_lrc.c   | 53 +++-
 5 files changed, 47 insertions(+), 46 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c 
b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
index 811debefebc0..d0a1078ef632 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
@@ -21,18 +21,11 @@ static int __engine_unpark(struct intel_wakeref *wf)
struct intel_engine_cs *engine =
container_of(wf, typeof(*engine), wakeref);
struct intel_context *ce;
-   void *map;
 
ENGINE_TRACE(engine, "\n");
 
intel_gt_pm_get(engine->gt);
 
-   /* Pin the default state for fast resets from atomic context. */
-   map = NULL;
-   if (engine->default_state)
-   map = shmem_pin_map(engine->default_state);
-   engine->pinned_default_state = map;
-
/* Discard stale context state from across idling */
ce = engine->kernel_context;
if (ce) {
@@ -42,6 +35,7 @@ static int __engine_unpark(struct intel_wakeref *wf)
if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM) && ce->state) {
struct drm_i915_gem_object *obj = ce->state->obj;
int type = i915_coherent_map_type(engine->i915);
+   void *map;
 
map = i915_gem_object_pin_map(obj, type);
if (!IS_ERR(map)) {
@@ -260,12 +254,6 @@ static int __engine_park(struct intel_wakeref *wf)
if (engine->park)
engine->park(engine);
 
-   if (engine->pinned_default_state) {
-   shmem_unpin_map(engine->default_state,
-   engine->pinned_default_state);
-   engine->pinned_default_state = NULL;
-   }
-
engine->execlists.no_priolist = false;
 
/* While gt calls i915_vma_parked(), we have to break the lock cycle */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 3c3225c0332f..489deb3b8358 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -339,7 +339,6 @@ struct intel_engine_cs {
unsigned long wakeref_serial;
struct intel_wakeref wakeref;
struct file *default_state;
-   void *pinned_default_state;
 
struct {
struct intel_ring *ring;
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 4311b12542fb..dc5517c85df1 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1271,14 +1271,11 @@ execlists_check_context(const struct intel_context *ce,
 static void restore_default_state(struct intel_context *ce,
  struct intel_engine_cs *engine)
 {
-   u32 *regs = ce->lrc_reg_state;
+   u32 *regs;
 
-   if (engine->pinned_default_state)
-   memcpy(regs, /* skip restoring the vanilla PPHWSP */
-  engine->pinned_default_state + LRC_STATE_OFFSET,
-  engine->context_size - PAGE_SIZE);
+   regs = memset(ce->lrc_reg_state, 0, engine->context_size - PAGE_SIZE);
+   execlists_init_reg_state(regs, ce, engine, ce->ring, true);
 
-   execlists_init_reg_state(regs, ce, engine, ce->ring, false);
ce->runtime.last = intel_context_get_runtime(ce);
 }
 
@@ -4166,8 +4163,6 @@ static void __execlists_reset(struct intel_engine_cs 
*engine, bool stalled)
 * image back to the expected values to skip over the guilty request.
 */
__i915_request_reset(rq, stalled);
-   if (!stalled)
-   goto out_replay;
 
/*
 * We want a simple context + ring to execute the breadcrumb update.
@@ -4177,9 +4172,6 @@ static void __execlists_reset(struct intel_engine_cs 
*engine, bool stalled)
 * future request will be after userspace has had the opportunity
 * to recreate its own state.
 */
-   GEM_BUG_ON(!intel_context_is_pinned(ce));
-   restore_default_state(ce, engine);
-
 out_replay:
ENGINE_TRACE(engine, "replay {head:%04x, tail:%04x}\n",
 head, ce->ring->tail);
diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c 
b/drivers/gpu/drm/i915/gt/selftest_context.c
index b8ed3cbe1277..a56dff3b157a 100644
--- a/drivers/gpu/drm/i915/gt/selftest_context.c
+++ b/drivers/gpu/drm/i915/gt/selftest_cont

[Intel-gfx] [PATCH 3/4] drm/i915: Implement vm_ops->access for gdb access into mmaps

2020-05-01 Thread Chris Wilson
gdb uses ptrace() to peek and poke bytes of the target's address space.
The driver must implement an vm_ops->access() handler or else gdb will
be unable to inspect the pointer and report it as out-of-bounds.
Worse than useless as it causes immediate suspicion of the valid GTT
pointer, distracting the poor programmer trying to find his bug.

Testcase: igt/gem_mmap_gtt/ptrace
Testcase: igt/gem_mmap_offset/ptrace
Suggested-by: Kristian H. Kristensen 
Signed-off-by: Chris Wilson 
Cc: Matthew Auld 
Cc: Joonas Lahtinen 
Cc: Maciej Patelczyk 
Cc: Kristian H. Kristensen 
---
 drivers/gpu/drm/i915/gem/i915_gem_mman.c  |  31 +
 .../drm/i915/gem/selftests/i915_gem_mman.c| 124 ++
 2 files changed, 155 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index b39c24dae64e..aef917b7f168 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -396,6 +396,35 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
return i915_error_to_vmf_fault(ret);
 }
 
+static int
+vm_access(struct vm_area_struct *area, unsigned long addr,
+ void *buf, int len, int write)
+{
+   struct i915_mmap_offset *mmo = area->vm_private_data;
+   struct drm_i915_gem_object *obj = mmo->obj;
+   void *vaddr;
+
+   addr -= area->vm_start;
+   if (addr >= obj->base.size)
+   return -EINVAL;
+
+   /* As this is primarily for debugging, let's focus on simplicity */
+   vaddr = i915_gem_object_pin_map(obj, I915_MAP_FORCE_WC);
+   if (IS_ERR(vaddr))
+   return PTR_ERR(vaddr);
+
+   if (write) {
+   memcpy(vaddr + addr, buf, len);
+   __i915_gem_object_flush_map(obj, addr, len);
+   } else {
+   memcpy(buf, vaddr + addr, len);
+   }
+
+   i915_gem_object_unpin_map(obj);
+
+   return len;
+}
+
 void __i915_gem_object_release_mmap_gtt(struct drm_i915_gem_object *obj)
 {
struct i915_vma *vma;
@@ -745,12 +774,14 @@ static void vm_close(struct vm_area_struct *vma)
 
 static const struct vm_operations_struct vm_ops_gtt = {
.fault = vm_fault_gtt,
+   .access = vm_access,
.open = vm_open,
.close = vm_close,
 };
 
 static const struct vm_operations_struct vm_ops_cpu = {
.fault = vm_fault_cpu,
+   .access = vm_access,
.open = vm_open,
.close = vm_close,
 };
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index ef7abcb3f4ee..9c7402ce5bf9 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -952,6 +952,129 @@ static int igt_mmap(void *arg)
return 0;
 }
 
+static const char *repr_mmap_type(enum i915_mmap_type type)
+{
+   switch (type) {
+   case I915_MMAP_TYPE_GTT: return "gtt";
+   case I915_MMAP_TYPE_WB: return "wb";
+   case I915_MMAP_TYPE_WC: return "wc";
+   case I915_MMAP_TYPE_UC: return "uc";
+   default: return "unknown";
+   }
+}
+
+static bool can_access(const struct drm_i915_gem_object *obj)
+{
+   unsigned int flags =
+   I915_GEM_OBJECT_HAS_STRUCT_PAGE | I915_GEM_OBJECT_HAS_IOMEM;
+
+   return i915_gem_object_type_has(obj, flags);
+}
+
+static int __igt_mmap_access(struct drm_i915_private *i915,
+struct drm_i915_gem_object *obj,
+enum i915_mmap_type type)
+{
+   struct i915_mmap_offset *mmo;
+   unsigned long __user *ptr;
+   unsigned long A, B;
+   unsigned long x, y;
+   unsigned long addr;
+   int err;
+
+   memset(&A, 0xAA, sizeof(A));
+   memset(&B, 0xBB, sizeof(B));
+
+   if (!can_mmap(obj, type) || !can_access(obj))
+   return 0;
+
+   mmo = mmap_offset_attach(obj, type, NULL);
+   if (IS_ERR(mmo))
+   return PTR_ERR(mmo);
+
+   addr = igt_mmap_node(i915, &mmo->vma_node, 0, PROT_WRITE, MAP_SHARED);
+   if (IS_ERR_VALUE(addr))
+   return addr;
+   ptr = (unsigned long __user *)addr;
+
+   err = __put_user(A, ptr);
+   if (err) {
+   pr_err("%s(%s): failed to write into user mmap\n",
+  obj->mm.region->name, repr_mmap_type(type));
+   goto out_unmap;
+   }
+
+   intel_gt_flush_ggtt_writes(&i915->gt);
+
+   err = access_process_vm(current, addr, &x, sizeof(x), 0);
+   if (err != sizeof(x)) {
+   pr_err("%s(%s): access_process_vm() read failed\n",
+  obj->mm.region->name, repr_mmap_type(type));
+   goto out_unmap;
+   }
+
+   err = access_process_vm(current, addr, &B, sizeof(B), FOLL_WRITE);
+   if (err != sizeof(B)) {
+   pr_err("%s(%s): access_process_vm() write failed\n",
+  obj->mm.region->name, repr_mmap_type(type));
+   

[Intel-gfx] ✗ Fi.CI.IGT: failure for Rebased Big Joiner patch series for 8K 2p1p (rev2)

2020-05-01 Thread Patchwork
== Series Details ==

Series: Rebased Big Joiner patch series for 8K 2p1p (rev2)
URL   : https://patchwork.freedesktop.org/series/76791/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_8404_full -> Patchwork_17536_full


Summary
---

  **FAILURE**

  Serious unknown changes coming with Patchwork_17536_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_17536_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Possible new issues
---

  Here are the unknown changes that may have been introduced in 
Patchwork_17536_full:

### IGT changes ###

 Possible regressions 

  * igt@kms_dp_dsc@basic-dsc-enable-edp:
- shard-tglb: [PASS][1] -> [INCOMPLETE][2]
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8404/shard-tglb8/igt@kms_dp_...@basic-dsc-enable-edp.html
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17536/shard-tglb1/igt@kms_dp_...@basic-dsc-enable-edp.html

  * igt@runner@aborted:
- shard-tglb: NOTRUN -> [FAIL][3]
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17536/shard-tglb1/igt@run...@aborted.html

  
 Suppressed 

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * {igt@gem_exec_reloc@basic-many-active@rcs0}:
- shard-glk:  NOTRUN -> [FAIL][4] +3 similar issues
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17536/shard-glk6/igt@gem_exec_reloc@basic-many-act...@rcs0.html

  * {igt@kms_atomic_transition@plane-all-modeset-transition-fencing@pipe-a}:
- shard-kbl:  [PASS][5] -> [FAIL][6]
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8404/shard-kbl4/igt@kms_atomic_transition@plane-all-modeset-transition-fenc...@pipe-a.html
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17536/shard-kbl7/igt@kms_atomic_transition@plane-all-modeset-transition-fenc...@pipe-a.html

  * {igt@kms_flip@dpms-vs-vblank-race-interruptible@c-hdmi-a1}:
- shard-glk:  [PASS][7] -> [FAIL][8]
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8404/shard-glk9/igt@kms_flip@dpms-vs-vblank-race-interrupti...@c-hdmi-a1.html
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17536/shard-glk1/igt@kms_flip@dpms-vs-vblank-race-interrupti...@c-hdmi-a1.html

  
Known issues


  Here are the changes found in Patchwork_17536_full that come from known 
issues:

### IGT changes ###

 Issues hit 

  * igt@gem_eio@in-flight-suspend:
- shard-skl:  [PASS][9] -> [INCOMPLETE][10] ([i915#69])
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8404/shard-skl10/igt@gem_...@in-flight-suspend.html
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17536/shard-skl8/igt@gem_...@in-flight-suspend.html

  * igt@i915_suspend@fence-restore-tiled2untiled:
- shard-iclb: [PASS][11] -> [INCOMPLETE][12] ([i915#1185])
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8404/shard-iclb2/igt@i915_susp...@fence-restore-tiled2untiled.html
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17536/shard-iclb3/igt@i915_susp...@fence-restore-tiled2untiled.html

  * igt@kms_cursor_legacy@pipe-b-torture-move:
- shard-snb:  [PASS][13] -> [DMESG-WARN][14] ([i915#128])
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8404/shard-snb5/igt@kms_cursor_leg...@pipe-b-torture-move.html
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17536/shard-snb1/igt@kms_cursor_leg...@pipe-b-torture-move.html

  * igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b:
- shard-kbl:  [PASS][15] -> [DMESG-WARN][16] ([i915#180] / 
[i915#93] / [i915#95])
   [15]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8404/shard-kbl2/igt@kms_pipe_crc_ba...@suspend-read-crc-pipe-b.html
   [16]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17536/shard-kbl7/igt@kms_pipe_crc_ba...@suspend-read-crc-pipe-b.html

  * igt@kms_plane@plane-panning-bottom-right-suspend-pipe-b-planes:
- shard-apl:  [PASS][17] -> [DMESG-WARN][18] ([i915#180])
   [17]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8404/shard-apl2/igt@kms_pl...@plane-panning-bottom-right-suspend-pipe-b-planes.html
   [18]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17536/shard-apl2/igt@kms_pl...@plane-panning-bottom-right-suspend-pipe-b-planes.html
- shard-kbl:  [PASS][19] -> [DMESG-WARN][20] ([i915#180]) +2 
similar issues
   [19]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8404/shard-kbl3/igt@kms_pl...@plane-panning-bottom-right-suspend-pipe-b-planes.html
   [20]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17536/shard-kbl3/igt@kms_pl...@plane-panning-bottom-right-suspend-pipe-b-planes.html

  * igt@kms_psr@psr2_curso

[Intel-gfx] [PATCH 2/3] drm/i915/gem: Use a single chained reloc batches for a single execbuf

2020-05-01 Thread Chris Wilson
As we can now keep chaining together a relocation batch to process any
number of relocations, we can keep building that relocation batch for
all of the target vma. This avoiding emitting a new request into the
ring for each target, consuming precious ring space and a potential
stall.

Testcase: igt/gem_exec_reloc/basic-wide-active
Signed-off-by: Chris Wilson 
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 23 +++
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 293bf06b65b2..b224a453e2a3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -268,6 +268,7 @@ struct i915_execbuffer {
bool has_fence : 1;
bool needs_unfenced : 1;
 
+   struct i915_vma *target;
struct i915_request *rq;
u32 *rq_cmd;
unsigned int rq_size;
@@ -1087,9 +1088,6 @@ static void reloc_cache_reset(struct reloc_cache *cache)
 {
void *vaddr;
 
-   if (cache->rq)
-   reloc_gpu_flush(cache);
-
if (!cache->vaddr)
return;
 
@@ -1282,7 +1280,6 @@ static int reloc_move_to_gpu(struct i915_request *rq, 
struct i915_vma *vma)
 }
 
 static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
-struct i915_vma *vma,
 unsigned int len)
 {
struct reloc_cache *cache = &eb->reloc_cache;
@@ -1305,7 +1302,7 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
goto out_pool;
}
 
-   batch = i915_vma_instance(pool->obj, vma->vm, NULL);
+   batch = i915_vma_instance(pool->obj, eb->context->vm, NULL);
if (IS_ERR(batch)) {
err = PTR_ERR(batch);
goto err_unmap;
@@ -1325,10 +1322,6 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
if (err)
goto err_request;
 
-   err = reloc_move_to_gpu(rq, vma);
-   if (err)
-   goto err_request;
-
i915_vma_lock(batch);
err = i915_request_await_object(rq, batch->obj, false);
if (err == 0)
@@ -1373,9 +1366,17 @@ static u32 *reloc_gpu(struct i915_execbuffer *eb,
if (!intel_engine_can_store_dword(eb->engine))
return ERR_PTR(-ENODEV);
 
-   err = __reloc_gpu_alloc(eb, vma, len);
+   err = __reloc_gpu_alloc(eb, len);
+   if (unlikely(err))
+   return ERR_PTR(err);
+   }
+
+   if (vma != cache->target) {
+   err = reloc_move_to_gpu(cache->rq, vma);
if (unlikely(err))
return ERR_PTR(err);
+
+   cache->target = vma;
}
 
if (unlikely(cache->rq_size + len > PAGE_SIZE / sizeof(u32) - 4)) {
@@ -1694,6 +1695,8 @@ static int eb_relocate(struct i915_execbuffer *eb)
if (err)
return err;
}
+
+   reloc_gpu_flush(&eb->reloc_cache);
}
 
return 0;
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 1/3] drm/i915/gem: Use chained reloc batches

2020-05-01 Thread Chris Wilson
The ring is a precious resource: we anticipate to only use a few hundred
bytes for a request, and only try to reserve that before we start. If we
go beyond our guess in building the request, then instead of waiting at
the start of execbuf before we hold any locks or other resources, we
may trigger a wait inside a critical region. One example is in using gpu
relocations, where currently we emit a new MI_BB_START from the ring
every time we overflow a page of relocation entries. However, instead of
insert the command into the precious ring, we can chain the next page of
relocation entries as MI_BB_START from the end of the previous.

v2: Delay the emit_bb_start until after all the chained vma
synchronisation is complete. Since the buffer pool batches are idle, this
_should_ be a no-op, but one day we may some fancy async GPU bindings
for new vma!

Testcase: igt/gem_exec_reloc/basic-many-active
Signed-off-by: Chris Wilson 
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 130 +++---
 1 file changed, 111 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 414859fa2673..293bf06b65b2 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -271,6 +271,7 @@ struct i915_execbuffer {
struct i915_request *rq;
u32 *rq_cmd;
unsigned int rq_size;
+   struct i915_vma *rq_vma;
} reloc_cache;
 
u64 invalid_flags; /** Set of execobj.flags that are invalid */
@@ -975,20 +976,111 @@ static inline struct i915_ggtt *cache_to_ggtt(struct 
reloc_cache *cache)
return &i915->ggtt;
 }
 
+static int reloc_gpu_chain(struct reloc_cache *cache)
+{
+   struct intel_gt_buffer_pool_node *pool;
+   struct i915_request *rq = cache->rq;
+   struct i915_vma *batch;
+   u32 *cmd;
+   int err;
+
+   pool = intel_gt_get_buffer_pool(rq->engine->gt, PAGE_SIZE);
+   if (IS_ERR(pool))
+   return PTR_ERR(pool);
+
+   batch = i915_vma_instance(pool->obj, rq->context->vm, NULL);
+   if (IS_ERR(batch)) {
+   err = PTR_ERR(batch);
+   goto out_pool;
+   }
+
+   err = i915_vma_pin(batch, 0, 0, PIN_USER | PIN_NONBLOCK);
+   if (err)
+   goto out_pool;
+
+   cmd = cache->rq_cmd + cache->rq_size;
+   *cmd++ = MI_ARB_CHECK;
+   if (cache->gen >= 8) {
+   *cmd++ = MI_BATCH_BUFFER_START_GEN8;
+   *cmd++ = lower_32_bits(batch->node.start);
+   *cmd++ = upper_32_bits(batch->node.start);
+   } else {
+   *cmd++ = MI_BATCH_BUFFER_START;
+   *cmd++ = lower_32_bits(batch->node.start);
+   }
+   i915_gem_object_flush_map(cache->rq_vma->obj);
+   i915_gem_object_unpin_map(cache->rq_vma->obj);
+   cache->rq_vma = NULL;
+
+   err = intel_gt_buffer_pool_mark_active(pool, rq);
+   if (err == 0) {
+   i915_vma_lock(batch);
+   err = i915_request_await_object(rq, batch->obj, false);
+   if (err == 0)
+   err = i915_vma_move_to_active(batch, rq, 0);
+   i915_vma_unlock(batch);
+   }
+   i915_vma_unpin(batch);
+   if (err)
+   goto out_pool;
+
+   cmd = i915_gem_object_pin_map(pool->obj,
+ cache->has_llc ?
+ I915_MAP_FORCE_WB :
+ I915_MAP_FORCE_WC);
+   if (IS_ERR(cmd)) {
+   err = PTR_ERR(cmd);
+   goto out_pool;
+   }
+
+   /* Return with batch mapping (cmd) still pinned */
+   cache->rq_cmd = cmd;
+   cache->rq_size = 0;
+   cache->rq_vma = batch;
+
+out_pool:
+   intel_gt_buffer_pool_put(pool);
+   return err;
+}
+
+static unsigned int reloc_bb_flags(const struct reloc_cache *cache)
+{
+   return cache->gen > 5 ? 0 : I915_DISPATCH_SECURE;
+}
+
 static void reloc_gpu_flush(struct reloc_cache *cache)
 {
-   struct drm_i915_gem_object *obj = cache->rq->batch->obj;
+   struct i915_request *rq;
+   int err;
 
-   GEM_BUG_ON(cache->rq_size >= obj->base.size / sizeof(u32));
-   cache->rq_cmd[cache->rq_size] = MI_BATCH_BUFFER_END;
+   rq = fetch_and_zero(&cache->rq);
+   if (!rq)
+   return;
 
-   __i915_gem_object_flush_map(obj, 0, sizeof(u32) * (cache->rq_size + 1));
-   i915_gem_object_unpin_map(obj);
+   if (cache->rq_vma) {
+   struct drm_i915_gem_object *obj = cache->rq_vma->obj;
 
-   intel_gt_chipset_flush(cache->rq->engine->gt);
+   GEM_BUG_ON(cache->rq_size >= obj->base.size / sizeof(u32));
+   cache->rq_cmd[cache->rq_size++] = MI_BATCH_BUFFER_END;
 
-   i915_request_add(cache->rq);
-   cache->rq = NULL;
+   __i915_gem_object_flush_map(obj,
+   

[Intel-gfx] [PATCH 3/3] drm/i915/gem: Try an alternate engine for relocations

2020-05-01 Thread Chris Wilson
If at first we don't succeed, try try again.

No all engines may support the MI ops we need to perform asynchronous
relocation patching, and so we end up failing back to a synchronous
operation that has a liability of blocking. However, Tvrtko pointed out
we don't need to use the same engine to perform the relocations as we
are planning to execute the execbuf on, and so if we switch over to a
working engine, we can perform the relocation asynchronously. The user
execbuf will be queued after the relocations by virtue of fencing.

This patch creates a new context per execbuf requiring asynchronous
relocations on an unusable engines. This is perhaps a bit excessive and
can be amoriated by a small context cache, but for the moment we only
need it for working around a little used engine on Sandybridge, and only
if relocations are actually required.

Now we just need to teach the relocation code to handle physical
addressing for gen2/3, and we should then have universal support!

Suggested-by: Tvrtko Ursulin 
Testcase: igt/gem_exec_reloc/basic-spin # snb
Signed-off-by: Chris Wilson 
Cc: Tvrtko Ursulin 
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 32 ---
 1 file changed, 27 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index b224a453e2a3..6d649de3a796 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1280,6 +1280,7 @@ static int reloc_move_to_gpu(struct i915_request *rq, 
struct i915_vma *vma)
 }
 
 static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
+struct intel_engine_cs *engine,
 unsigned int len)
 {
struct reloc_cache *cache = &eb->reloc_cache;
@@ -1289,7 +1290,7 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
u32 *cmd;
int err;
 
-   pool = intel_gt_get_buffer_pool(eb->engine->gt, PAGE_SIZE);
+   pool = intel_gt_get_buffer_pool(engine->gt, PAGE_SIZE);
if (IS_ERR(pool))
return PTR_ERR(pool);
 
@@ -1312,7 +1313,23 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
if (err)
goto err_unmap;
 
-   rq = i915_request_create(eb->context);
+   if (engine == eb->context->engine) {
+   rq = i915_request_create(eb->context);
+   } else {
+   struct intel_context *ce;
+
+   ce = intel_context_create(engine);
+   if (IS_ERR(ce)) {
+   err = PTR_ERR(rq);
+   goto err_unpin;
+   }
+
+   i915_vm_put(ce->vm);
+   ce->vm = i915_vm_get(eb->context->vm);
+
+   rq = intel_context_create_request(ce);
+   intel_context_put(ce);
+   }
if (IS_ERR(rq)) {
err = PTR_ERR(rq);
goto err_unpin;
@@ -1363,10 +1380,15 @@ static u32 *reloc_gpu(struct i915_execbuffer *eb,
int err;
 
if (unlikely(!cache->rq)) {
-   if (!intel_engine_can_store_dword(eb->engine))
-   return ERR_PTR(-ENODEV);
+   struct intel_engine_cs *engine = eb->engine;
+
+   if (!intel_engine_can_store_dword(engine)) {
+   engine = engine->gt->engine_class[COPY_ENGINE_CLASS][0];
+   if (!engine || !intel_engine_can_store_dword(engine))
+   return ERR_PTR(-ENODEV);
+   }
 
-   err = __reloc_gpu_alloc(eb, len);
+   err = __reloc_gpu_alloc(eb, engine, len);
if (unlikely(err))
return ERR_PTR(err);
}
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] ✗ Fi.CI.BAT: failure for series starting with [1/4] drm/i915/gem: Use chained reloc batches

2020-05-01 Thread Patchwork
== Series Details ==

Series: series starting with [1/4] drm/i915/gem: Use chained reloc batches
URL   : https://patchwork.freedesktop.org/series/76812/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_8405 -> Patchwork_17537


Summary
---

  **FAILURE**

  Serious unknown changes coming with Patchwork_17537 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_17537, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17537/index.html

Possible new issues
---

  Here are the unknown changes that may have been introduced in Patchwork_17537:

### IGT changes ###

 Possible regressions 

  * igt@i915_selftest@live@gt_pm:
- fi-bdw-5557u:   [PASS][1] -> [INCOMPLETE][2]
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/fi-bdw-5557u/igt@i915_selftest@live@gt_pm.html
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17537/fi-bdw-5557u/igt@i915_selftest@live@gt_pm.html

  
Known issues


  Here are the changes found in Patchwork_17537 that come from known issues:

### IGT changes ###

 Possible fixes 

  * igt@i915_selftest@live@hugepages:
- fi-bwr-2160:[INCOMPLETE][3] ([i915#489]) -> [PASS][4]
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/fi-bwr-2160/igt@i915_selftest@l...@hugepages.html
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17537/fi-bwr-2160/igt@i915_selftest@l...@hugepages.html

  
  [i915#489]: https://gitlab.freedesktop.org/drm/intel/issues/489


Participating hosts (50 -> 43)
--

  Missing(7): fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan 
fi-ctg-p8600 fi-byt-clapper fi-bdw-samus 


Build changes
-

  * CI: CI-20190529 -> None
  * Linux: CI_DRM_8405 -> Patchwork_17537

  CI-20190529: 20190529
  CI_DRM_8405: 83efffba539b475ce7e3fb96aeae7ee744309ff7 @ 
git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5623: 8838c73169ea249e6e86aaed35e5178f60f4ef7d @ 
git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_17537: 134d63bc8109428cb3b71887be45470a1a2c3d9c @ 
git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

134d63bc8109 drm/i915/gt: Stop holding onto the pinned_default_state
af0c7a99d7e4 drm/i915: Implement vm_ops->access for gdb access into mmaps
ec1e3cc0a091 drm/i915/gem: Use a single chained reloc batches for a single 
execbuf
fd28ecfab694 drm/i915/gem: Use chained reloc batches

== Logs ==

For more details see: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17537/index.html
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: check to see if SIMD registers are available before using SIMD

2020-05-01 Thread Sebastian Andrzej Siewior
On 2020-04-30 16:10:16 [-0600], Jason A. Donenfeld wrote:
> Sometimes it's not okay to use SIMD registers, the conditions for which
> have changed subtly from kernel release to kernel release. Usually the
> pattern is to check for may_use_simd() and then fallback to using
> something slower in the unlikely case SIMD registers aren't available.
> So, this patch fixes up i915's accelerated memcpy routines to fallback
> to boring memcpy if may_use_simd() is false.

That would indicate that these functions are used from IRQ/softirq which
break otherwise if the kernel is also using the registers. The crypto
code uses it for that purpose.

So
   Reviewed-by: Sebastian Andrzej Siewior 

May I ask how large the memcpy can be? I'm asking in case it is large
and an explicit rescheduling point might be needed.

> Cc: sta...@vger.kernel.org
> Signed-off-by: Jason A. Donenfeld 

Sebastian
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915/gem: Use chained reloc batches

2020-05-01 Thread Patchwork
== Series Details ==

Series: series starting with [1/3] drm/i915/gem: Use chained reloc batches
URL   : https://patchwork.freedesktop.org/series/76813/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_8405 -> Patchwork_17538


Summary
---

  **SUCCESS**

  No regressions found.

  External URL: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17538/index.html

Known issues


  Here are the changes found in Patchwork_17538 that come from known issues:

### IGT changes ###

 Issues hit 

  * igt@kms_chamelium@dp-crc-fast:
- fi-cml-u2:  [PASS][1] -> [FAIL][2] ([i915#262])
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/fi-cml-u2/igt@kms_chamel...@dp-crc-fast.html
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17538/fi-cml-u2/igt@kms_chamel...@dp-crc-fast.html

  
 Possible fixes 

  * igt@i915_selftest@live@hugepages:
- fi-bwr-2160:[INCOMPLETE][3] ([i915#489]) -> [PASS][4]
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/fi-bwr-2160/igt@i915_selftest@l...@hugepages.html
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17538/fi-bwr-2160/igt@i915_selftest@l...@hugepages.html

  
  [i915#262]: https://gitlab.freedesktop.org/drm/intel/issues/262
  [i915#489]: https://gitlab.freedesktop.org/drm/intel/issues/489


Participating hosts (50 -> 43)
--

  Missing(7): fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan 
fi-ctg-p8600 fi-byt-clapper fi-bdw-samus 


Build changes
-

  * CI: CI-20190529 -> None
  * Linux: CI_DRM_8405 -> Patchwork_17538

  CI-20190529: 20190529
  CI_DRM_8405: 83efffba539b475ce7e3fb96aeae7ee744309ff7 @ 
git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5623: 8838c73169ea249e6e86aaed35e5178f60f4ef7d @ 
git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_17538: 91008ee19b392a99ff3cc190aa6c12b2c1959c11 @ 
git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

91008ee19b39 drm/i915/gem: Try an alternate engine for relocations
7b4c8219a242 drm/i915/gem: Use a single chained reloc batches for a single 
execbuf
20503d783f12 drm/i915/gem: Use chained reloc batches

== Logs ==

For more details see: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17538/index.html
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: check to see if SIMD registers are available before using SIMD

2020-05-01 Thread David Laight
From: Sebastian Andrzej Siewior
> Sent: 01 May 2020 11:42
> On 2020-04-30 16:10:16 [-0600], Jason A. Donenfeld wrote:
> > Sometimes it's not okay to use SIMD registers, the conditions for which
> > have changed subtly from kernel release to kernel release. Usually the
> > pattern is to check for may_use_simd() and then fallback to using
> > something slower in the unlikely case SIMD registers aren't available.
> > So, this patch fixes up i915's accelerated memcpy routines to fallback
> > to boring memcpy if may_use_simd() is false.
> 
> That would indicate that these functions are used from IRQ/softirq which
> break otherwise if the kernel is also using the registers. The crypto
> code uses it for that purpose.
> 
> So
>Reviewed-by: Sebastian Andrzej Siewior 
> 
> May I ask how large the memcpy can be? I'm asking in case it is large
> and an explicit rescheduling point might be needed.

It is also quite likely that a 'rep movs' copy will be at least just as
fast on modern hardware.

Clearly if you are copying to/from PCIe memory you need the largest
resisters possible - but I think the graphics buffers are mapped cached?
(Otherwise I wouldn't see 3ms 'spins' while it invalidates the
entire screen buffer cache.)

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 11/24] drm/i915: Nuke arguments to eb_pin_engine

2020-05-01 Thread Maarten Lankhorst
Those arguments are already set as eb.file and eb.args, so kill off
the extra arguments. This will allow us to move eb_pin_engine() to
after we reserved all BO's.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 17 +++--
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 96b172f9b9f7..ffe6853119bb 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -2468,11 +2468,10 @@ static void eb_unpin_engine(struct i915_execbuffer *eb)
 }
 
 static unsigned int
-eb_select_legacy_ring(struct i915_execbuffer *eb,
- struct drm_file *file,
- struct drm_i915_gem_execbuffer2 *args)
+eb_select_legacy_ring(struct i915_execbuffer *eb)
 {
struct drm_i915_private *i915 = eb->i915;
+   struct drm_i915_gem_execbuffer2 *args = eb->args;
unsigned int user_ring_id = args->flags & I915_EXEC_RING_MASK;
 
if (user_ring_id != I915_EXEC_BSD &&
@@ -2487,7 +2486,7 @@ eb_select_legacy_ring(struct i915_execbuffer *eb,
unsigned int bsd_idx = args->flags & I915_EXEC_BSD_MASK;
 
if (bsd_idx == I915_EXEC_BSD_DEFAULT) {
-   bsd_idx = gen8_dispatch_bsd_engine(i915, file);
+   bsd_idx = gen8_dispatch_bsd_engine(i915, eb->file);
} else if (bsd_idx >= I915_EXEC_BSD_RING1 &&
   bsd_idx <= I915_EXEC_BSD_RING2) {
bsd_idx >>= I915_EXEC_BSD_SHIFT;
@@ -2512,18 +2511,16 @@ eb_select_legacy_ring(struct i915_execbuffer *eb,
 }
 
 static int
-eb_pin_engine(struct i915_execbuffer *eb,
- struct drm_file *file,
- struct drm_i915_gem_execbuffer2 *args)
+eb_pin_engine(struct i915_execbuffer *eb)
 {
struct intel_context *ce;
unsigned int idx;
int err;
 
if (i915_gem_context_user_engines(eb->gem_context))
-   idx = args->flags & I915_EXEC_RING_MASK;
+   idx = eb->args->flags & I915_EXEC_RING_MASK;
else
-   idx = eb_select_legacy_ring(eb, file, args);
+   idx = eb_select_legacy_ring(eb);
 
ce = i915_gem_context_get_engine(eb->gem_context, idx);
if (IS_ERR(ce))
@@ -2822,7 +2819,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
if (unlikely(err))
goto err_destroy;
 
-   err = eb_pin_engine(&eb, file, args);
+   err = eb_pin_engine(&eb);
if (unlikely(err))
goto err_context;
 
-- 
2.26.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 01/24] perf/core: Only copy-to-user after completely unlocking all locks, v3.

2020-05-01 Thread Maarten Lankhorst
We inadvertently create a dependency on mmap_sem with a whole chain.

This breaks any user who wants to take a lock and call rcu_barrier(),
while also taking that lock inside mmap_sem:

<4> [604.892532] ==
<4> [604.892534] WARNING: possible circular locking dependency detected
<4> [604.892536] 5.6.0-rc7-CI-Patchwork_17096+ #1 Tainted: G U
<4> [604.892537] --
<4> [604.892538] kms_frontbuffer/2595 is trying to acquire lock:
<4> [604.892540] 8264a558 (rcu_state.barrier_mutex){+.+.}, at: 
rcu_barrier+0x23/0x190
<4> [604.892547]
but task is already holding lock:
<4> [604.892547] 888484716050 (reservation_ww_class_mutex){+.+.}, at: 
i915_gem_object_pin_to_display_plane+0x89/0x270 [i915]
<4> [604.892592]
which lock already depends on the new lock.
<4> [604.892593]
the existing dependency chain (in reverse order) is:
<4> [604.892594]
-> #6 (reservation_ww_class_mutex){+.+.}:
<4> [604.892597]__ww_mutex_lock.constprop.15+0xc3/0x1090
<4> [604.892598]ww_mutex_lock+0x39/0x70
<4> [604.892600]dma_resv_lockdep+0x10e/0x1f5
<4> [604.892602]do_one_initcall+0x58/0x300
<4> [604.892604]kernel_init_freeable+0x17b/0x1dc
<4> [604.892605]kernel_init+0x5/0x100
<4> [604.892606]ret_from_fork+0x24/0x50
<4> [604.892607]
-> #5 (reservation_ww_class_acquire){+.+.}:
<4> [604.892609]dma_resv_lockdep+0xec/0x1f5
<4> [604.892610]do_one_initcall+0x58/0x300
<4> [604.892610]kernel_init_freeable+0x17b/0x1dc
<4> [604.892611]kernel_init+0x5/0x100
<4> [604.892612]ret_from_fork+0x24/0x50
<4> [604.892613]
-> #4 (&mm->mmap_sem#2){}:
<4> [604.892615]__might_fault+0x63/0x90
<4> [604.892617]_copy_to_user+0x1e/0x80
<4> [604.892619]perf_read+0x200/0x2b0
<4> [604.892621]vfs_read+0x96/0x160
<4> [604.892622]ksys_read+0x9f/0xe0
<4> [604.892623]do_syscall_64+0x4f/0x220
<4> [604.892624]entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [604.892625]
-> #3 (&cpuctx_mutex){+.+.}:
<4> [604.892626]__mutex_lock+0x9a/0x9c0
<4> [604.892627]perf_event_init_cpu+0xa4/0x140
<4> [604.892629]perf_event_init+0x19d/0x1cd
<4> [604.892630]start_kernel+0x362/0x4e4
<4> [604.892631]secondary_startup_64+0xa4/0xb0
<4> [604.892631]
-> #2 (pmus_lock){+.+.}:
<4> [604.892633]__mutex_lock+0x9a/0x9c0
<4> [604.892633]perf_event_init_cpu+0x6b/0x140
<4> [604.892635]cpuhp_invoke_callback+0x9b/0x9d0
<4> [604.892636]_cpu_up+0xa2/0x140
<4> [604.892637]do_cpu_up+0x61/0xa0
<4> [604.892639]smp_init+0x57/0x96
<4> [604.892639]kernel_init_freeable+0x87/0x1dc
<4> [604.892640]kernel_init+0x5/0x100
<4> [604.892642]ret_from_fork+0x24/0x50
<4> [604.892642]
-> #1 (cpu_hotplug_lock.rw_sem){}:
<4> [604.892643]cpus_read_lock+0x34/0xd0
<4> [604.892644]rcu_barrier+0xaa/0x190
<4> [604.892645]kernel_init+0x21/0x100
<4> [604.892647]ret_from_fork+0x24/0x50
<4> [604.892647]
-> #0 (rcu_state.barrier_mutex){+.+.}:
<4> [604.892649]__lock_acquire+0x1328/0x15d0
<4> [604.892650]lock_acquire+0xa7/0x1c0
<4> [604.892651]__mutex_lock+0x9a/0x9c0
<4> [604.892652]rcu_barrier+0x23/0x190
<4> [604.892680]i915_gem_object_unbind+0x29d/0x3f0 [i915]
<4> [604.892707]i915_gem_object_pin_to_display_plane+0x141/0x270 [i915]
<4> [604.892737]intel_pin_and_fence_fb_obj+0xec/0x1f0 [i915]
<4> [604.892767]intel_plane_pin_fb+0x3f/0xd0 [i915]
<4> [604.892797]intel_prepare_plane_fb+0x13b/0x5c0 [i915]
<4> [604.892798]drm_atomic_helper_prepare_planes+0x85/0x110
<4> [604.892827]intel_atomic_commit+0xda/0x390 [i915]
<4> [604.892828]drm_atomic_helper_set_config+0x57/0xa0
<4> [604.892830]drm_mode_setcrtc+0x1c4/0x720
<4> [604.892830]drm_ioctl_kernel+0xb0/0xf0
<4> [604.892831]drm_ioctl+0x2e1/0x390
<4> [604.892833]ksys_ioctl+0x7b/0x90
<4> [604.892835]__x64_sys_ioctl+0x11/0x20
<4> [604.892835]do_syscall_64+0x4f/0x220
<4> [604.892836]entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [604.892837]

Changes since v1:
- Use (*values)[n++] in perf_read_one().
Changes since v2:
- Centrally allocate values.

Signed-off-by: Maarten Lankhorst 

fixup perf patch

Signed-off-by: Maarten Lankhorst 
---
 kernel/events/core.c | 45 +---
 1 file changed, 21 insertions(+), 24 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 33054521ea5d..f06dd7e68e7f 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5102,20 +5102,16 @@ static int __perf_read_group_add(struct perf_event 
*leader,
 }
 
 static int perf_read_group(struct perf_event *event,
-  u64 read_format, char __user *buf)
+  u64 r

[Intel-gfx] [PATCH 24/24] drm/i915: Ensure we hold the pin mutex

2020-05-01 Thread Maarten Lankhorst
Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/i915/gt/intel_renderstate.c | 2 +-
 drivers/gpu/drm/i915/i915_vma.c | 9 -
 drivers/gpu/drm/i915/i915_vma.h | 1 +
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_renderstate.c 
b/drivers/gpu/drm/i915/gt/intel_renderstate.c
index e35e17810ac8..2f8bb8c44f90 100644
--- a/drivers/gpu/drm/i915/gt/intel_renderstate.c
+++ b/drivers/gpu/drm/i915/gt/intel_renderstate.c
@@ -207,7 +207,7 @@ int intel_renderstate_init(struct intel_renderstate *so,
if (err)
goto err_context;
 
-   err = i915_vma_pin(so->vma, 0, 0, PIN_GLOBAL | PIN_HIGH);
+   err = i915_vma_pin_ww(so->vma, &so->ww, 0, 0, PIN_GLOBAL | PIN_HIGH);
if (err)
goto err_context;
 
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 164e23e0fc11..837706d28cc5 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -869,6 +869,8 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct 
i915_gem_ww_ctx *ww,
 #ifdef CONFIG_PROVE_LOCKING
if (debug_locks && lockdep_is_held(&vma->vm->i915->drm.struct_mutex))
WARN_ON(!ww);
+   if (debug_locks && ww && vma->resv)
+   assert_vma_held(vma);
 #endif
 
BUILD_BUG_ON(PIN_GLOBAL != I915_VMA_GLOBAL_BIND);
@@ -1009,8 +1011,13 @@ int i915_ggtt_pin(struct i915_vma *vma, struct 
i915_gem_ww_ctx *ww,
 
GEM_BUG_ON(!i915_vma_is_ggtt(vma));
 
+   WARN_ON(!ww && vma->resv && dma_resv_held(vma->resv));
+
do {
-   err = i915_vma_pin_ww(vma, ww, 0, align, flags | PIN_GLOBAL);
+   if (ww)
+   err = i915_vma_pin_ww(vma, ww, 0, align, flags | 
PIN_GLOBAL);
+   else
+   err = i915_vma_pin(vma, 0, align, flags | PIN_GLOBAL);
if (err != -ENOSPC) {
if (!err) {
err = i915_vma_wait_for_bind(vma);
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 2e3779a8a437..d937ce950481 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -242,6 +242,7 @@ i915_vma_pin_ww(struct i915_vma *vma, struct 
i915_gem_ww_ctx *ww,
 static inline int __must_check
 i915_vma_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 {
+   WARN_ON_ONCE(vma->resv && dma_resv_held(vma->resv));
return i915_vma_pin_ww(vma, NULL, size, alignment, flags);
 }
 
-- 
2.26.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 19/24] drm/i915/selftests: Fix locking inversion in lrc selftest.

2020-05-01 Thread Maarten Lankhorst
This function does not use intel_context_create_request, so it has
to use the same locking order as normal code. This is required to
shut up lockdep in selftests.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/i915/gt/selftest_lrc.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c 
b/drivers/gpu/drm/i915/gt/selftest_lrc.c
index dabfd9348670..613557075ec9 100644
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@@ -4661,6 +4661,7 @@ static int __live_lrc_state(struct intel_engine_cs 
*engine,
 {
struct intel_context *ce;
struct i915_request *rq;
+   struct i915_gem_ww_ctx ww;
enum {
RING_START_IDX = 0,
RING_TAIL_IDX,
@@ -4675,7 +4676,11 @@ static int __live_lrc_state(struct intel_engine_cs 
*engine,
if (IS_ERR(ce))
return PTR_ERR(ce);
 
-   err = intel_context_pin(ce);
+   i915_gem_ww_ctx_init(&ww, false);
+retry:
+   err = i915_gem_object_lock(scratch->obj, &ww);
+   if (!err)
+   err = intel_context_pin_ww(ce, &ww);
if (err)
goto err_put;
 
@@ -4704,11 +4709,9 @@ static int __live_lrc_state(struct intel_engine_cs 
*engine,
*cs++ = i915_ggtt_offset(scratch) + RING_TAIL_IDX * sizeof(u32);
*cs++ = 0;
 
-   i915_vma_lock(scratch);
err = i915_request_await_object(rq, scratch->obj, true);
if (!err)
err = i915_vma_move_to_active(scratch, rq, EXEC_OBJECT_WRITE);
-   i915_vma_unlock(scratch);
 
i915_request_get(rq);
i915_request_add(rq);
@@ -4745,6 +4748,12 @@ static int __live_lrc_state(struct intel_engine_cs 
*engine,
 err_unpin:
intel_context_unpin(ce);
 err_put:
+   if (err == -EDEADLK) {
+   err = i915_gem_ww_ctx_backoff(&ww);
+   if (!err)
+   goto retry;
+   }
+   i915_gem_ww_ctx_fini(&ww);
intel_context_put(ce);
return err;
 }
-- 
2.26.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 04/24] drm/i915: Remove locking from i915_gem_object_prepare_read/write

2020-05-01 Thread Maarten Lankhorst
Execbuffer submission will perform its own WW locking, and we
cannot rely on the implicit lock there.

This also makes it clear that the GVT code will get a lockdep splat when
multiple batchbuffer shadows need to be performed in the same instance,
fix that up.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/i915/gem/i915_gem_domain.c| 20 ++-
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 13 ++--
 drivers/gpu/drm/i915/gem/i915_gem_object.h|  1 -
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  5 -
 .../i915/gem/selftests/i915_gem_coherency.c   | 14 +
 .../drm/i915/gem/selftests/i915_gem_context.c | 12 ---
 drivers/gpu/drm/i915/gt/intel_renderstate.c   |  5 -
 drivers/gpu/drm/i915/gvt/cmd_parser.c |  9 -
 drivers/gpu/drm/i915/i915_gem.c   | 20 +--
 9 files changed, 70 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c 
b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
index c0acfc97fae3..8ebceebd11b0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
@@ -576,19 +576,17 @@ int i915_gem_object_prepare_read(struct 
drm_i915_gem_object *obj,
if (!i915_gem_object_has_struct_page(obj))
return -ENODEV;
 
-   ret = i915_gem_object_lock_interruptible(obj, NULL);
-   if (ret)
-   return ret;
+   assert_object_held(obj);
 
ret = i915_gem_object_wait(obj,
   I915_WAIT_INTERRUPTIBLE,
   MAX_SCHEDULE_TIMEOUT);
if (ret)
-   goto err_unlock;
+   return ret;
 
ret = i915_gem_object_pin_pages(obj);
if (ret)
-   goto err_unlock;
+   return ret;
 
if (obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_READ ||
!static_cpu_has(X86_FEATURE_CLFLUSH)) {
@@ -616,8 +614,6 @@ int i915_gem_object_prepare_read(struct drm_i915_gem_object 
*obj,
 
 err_unpin:
i915_gem_object_unpin_pages(obj);
-err_unlock:
-   i915_gem_object_unlock(obj);
return ret;
 }
 
@@ -630,20 +626,18 @@ int i915_gem_object_prepare_write(struct 
drm_i915_gem_object *obj,
if (!i915_gem_object_has_struct_page(obj))
return -ENODEV;
 
-   ret = i915_gem_object_lock_interruptible(obj, NULL);
-   if (ret)
-   return ret;
+   assert_object_held(obj);
 
ret = i915_gem_object_wait(obj,
   I915_WAIT_INTERRUPTIBLE |
   I915_WAIT_ALL,
   MAX_SCHEDULE_TIMEOUT);
if (ret)
-   goto err_unlock;
+   return ret;
 
ret = i915_gem_object_pin_pages(obj);
if (ret)
-   goto err_unlock;
+   return ret;
 
if (obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE ||
!static_cpu_has(X86_FEATURE_CLFLUSH)) {
@@ -680,7 +674,5 @@ int i915_gem_object_prepare_write(struct 
drm_i915_gem_object *obj,
 
 err_unpin:
i915_gem_object_unpin_pages(obj);
-err_unlock:
-   i915_gem_object_unlock(obj);
return ret;
 }
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 4ced8865d8eb..0d1d64bcd964 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1003,11 +1003,14 @@ static void reloc_cache_reset(struct reloc_cache *cache)
 
vaddr = unmask_page(cache->vaddr);
if (cache->vaddr & KMAP) {
+   struct drm_i915_gem_object *obj =
+   (struct drm_i915_gem_object *)cache->node.mm;
if (cache->vaddr & CLFLUSH_AFTER)
mb();
 
kunmap_atomic(vaddr);
-   i915_gem_object_finish_access((struct drm_i915_gem_object 
*)cache->node.mm);
+   i915_gem_object_finish_access(obj);
+   i915_gem_object_unlock(obj);
} else {
struct i915_ggtt *ggtt = cache_to_ggtt(cache);
 
@@ -1042,10 +1045,16 @@ static void *reloc_kmap(struct drm_i915_gem_object *obj,
unsigned int flushes;
int err;
 
-   err = i915_gem_object_prepare_write(obj, &flushes);
+   err = i915_gem_object_lock_interruptible(obj, NULL);
if (err)
return ERR_PTR(err);
 
+   err = i915_gem_object_prepare_write(obj, &flushes);
+   if (err) {
+   i915_gem_object_unlock(obj);
+   return ERR_PTR(err);
+   }
+
BUILD_BUG_ON(KMAP & CLFLUSH_FLAGS);
BUILD_BUG_ON((KMAP | CLFLUSH_FLAGS) & PAGE_MASK);
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index 5103067269b0..11b8e27

[Intel-gfx] [PATCH 12/24] drm/i915: Pin engine before pinning all objects, v4.

2020-05-01 Thread Maarten Lankhorst
We want to lock all gem objects, including the engine context objects,
rework the throttling to ensure that we can do this. Now we only throttle
once, but can take eb_pin_engine while acquiring objects. This means we
will have to drop the lock to wait. If we don't have to throttle we can
still take the fastpath, if not we will take the slowpath and wait for
the throttle request while unlocked.

The engine has to be pinned as first step, otherwise gpu relocations
won't work.

Changes since v1:
- Only need to get a throttled request in the fastpath, no need for
  a global flag any more.
- Always free the waited request correctly.
Changes since v2:
- Use intel_engine_pm_get()/put() to keeep engine pool alive during
  EDEADLK handling.
Changes since v3:
- Fix small rq leak.

Signed-off-by: Maarten Lankhorst 
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 185 --
 1 file changed, 129 insertions(+), 56 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index ffe6853119bb..2127b9d249c9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -55,7 +55,8 @@ enum {
 #define __EXEC_OBJECT_RESERVED (__EXEC_OBJECT_HAS_PIN | 
__EXEC_OBJECT_HAS_FENCE)
 
 #define __EXEC_HAS_RELOC   BIT(31)
-#define __EXEC_INTERNAL_FLAGS  (~0u << 31)
+#define __EXEC_ENGINE_PINNED   BIT(30)
+#define __EXEC_INTERNAL_FLAGS  (~0u << 30)
 #define UPDATE PIN_OFFSET_FIXED
 
 #define BATCH_OFFSET_BIAS (256*1024)
@@ -292,6 +293,9 @@ struct i915_execbuffer {
 };
 
 static int eb_parse(struct i915_execbuffer *eb);
+static struct i915_request *eb_pin_engine(struct i915_execbuffer *eb,
+ bool throttle);
+static void eb_unpin_engine(struct i915_execbuffer *eb);
 
 static inline bool eb_use_cmdparser(const struct i915_execbuffer *eb)
 {
@@ -924,7 +928,7 @@ eb_get_vma(const struct i915_execbuffer *eb, unsigned long 
handle)
}
 }
 
-static void eb_release_vmas(const struct i915_execbuffer *eb, bool final)
+static void eb_release_vmas(struct i915_execbuffer *eb, bool final)
 {
const unsigned int count = eb->buffer_count;
unsigned int i;
@@ -941,6 +945,8 @@ static void eb_release_vmas(const struct i915_execbuffer 
*eb, bool final)
if (final)
i915_vma_put(vma);
}
+
+   eb_unpin_engine(eb);
 }
 
 static void eb_destroy(const struct i915_execbuffer *eb)
@@ -1768,7 +1774,8 @@ static int eb_prefault_relocations(const struct 
i915_execbuffer *eb)
return 0;
 }
 
-static noinline int eb_relocate_parse_slow(struct i915_execbuffer *eb)
+static noinline int eb_relocate_parse_slow(struct i915_execbuffer *eb,
+  struct i915_request *rq)
 {
bool have_copy = false;
struct eb_vma *ev;
@@ -1784,6 +1791,21 @@ static noinline int eb_relocate_parse_slow(struct 
i915_execbuffer *eb)
eb_release_vmas(eb, false);
i915_gem_ww_ctx_fini(&eb->ww);
 
+   if (rq) {
+   /* nonblocking is always false */
+   if (i915_request_wait(rq, I915_WAIT_INTERRUPTIBLE,
+ MAX_SCHEDULE_TIMEOUT) < 0) {
+   i915_request_put(rq);
+   rq = NULL;
+
+   err = -EINTR;
+   goto err_relock;
+   }
+
+   i915_request_put(rq);
+   rq = NULL;
+   }
+
/*
 * We take 3 passes through the slowpatch.
 *
@@ -1807,14 +1829,25 @@ static noinline int eb_relocate_parse_slow(struct 
i915_execbuffer *eb)
err = 0;
}
 
-   flush_workqueue(eb->i915->mm.userptr_wq);
+   if (!err)
+   flush_workqueue(eb->i915->mm.userptr_wq);
 
+err_relock:
i915_gem_ww_ctx_init(&eb->ww, true);
if (err)
goto out;
 
/* reacquire the objects */
 repeat_validate:
+   rq = eb_pin_engine(eb, false);
+   if (IS_ERR(rq)) {
+   err = PTR_ERR(rq);
+   goto err;
+   }
+
+   /* We didn't throttle, should be NULL */
+   GEM_WARN_ON(rq);
+
err = eb_validate_vmas(eb);
if (err)
goto err;
@@ -1878,14 +1911,49 @@ static noinline int eb_relocate_parse_slow(struct 
i915_execbuffer *eb)
}
}
 
+   if (rq)
+   i915_request_put(rq);
+
return err;
 }
 
 static int eb_relocate_parse(struct i915_execbuffer *eb)
 {
int err;
+   struct i915_request *rq = NULL;
+   bool throttle = true;
 
 retry:
+   rq = eb_pin_engine(eb, throttle);
+   if (IS_ERR(rq)) {
+   err = PTR_ERR(rq);
+   rq = NULL;
+   if (err != -EDEADLK)
+   return err;
+
+   goto err;
+   }
+
+   if (rq) {
+   bool nonblock = eb->fil

[Intel-gfx] [PATCH 03/24] drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2.

2020-05-01 Thread Maarten Lankhorst
i915_gem_ww_ctx is used to lock all gem bo's for pinning and memory
eviction. We don't use it yet, but lets start adding the definition
first.

To use it, we have to pass a non-NULL ww to gem_object_lock, and don't
unlock directly. It is done in i915_gem_ww_ctx_fini.

Changes since v1:
- Change ww_ctx and obj order in locking functions (Jonas Lahtinen)

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/i915/display/intel_display.c  |  4 +-
 .../gpu/drm/i915/gem/i915_gem_client_blt.c|  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c|  4 +-
 drivers/gpu/drm/i915/gem/i915_gem_domain.c| 10 ++--
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c|  4 +-
 drivers/gpu/drm/i915/gem/i915_gem_object.c|  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_object.h| 38 +++---
 .../gpu/drm/i915/gem/i915_gem_object_blt.c|  2 +-
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  9 
 drivers/gpu/drm/i915/gem/i915_gem_pm.c|  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_tiling.c|  2 +-
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  2 +-
 .../i915/gem/selftests/i915_gem_client_blt.c  |  2 +-
 .../i915/gem/selftests/i915_gem_coherency.c   | 10 ++--
 .../drm/i915/gem/selftests/i915_gem_context.c |  4 +-
 .../drm/i915/gem/selftests/i915_gem_mman.c|  4 +-
 .../drm/i915/gem/selftests/i915_gem_phys.c|  2 +-
 .../gpu/drm/i915/gt/selftest_workarounds.c|  2 +-
 drivers/gpu/drm/i915/gvt/cmd_parser.c |  2 +-
 drivers/gpu/drm/i915/i915_gem.c   | 52 +--
 drivers/gpu/drm/i915/i915_gem.h   | 11 
 drivers/gpu/drm/i915/selftests/i915_gem.c | 41 +++
 drivers/gpu/drm/i915/selftests/i915_vma.c |  2 +-
 .../drm/i915/selftests/intel_memory_region.c  |  2 +-
 25 files changed, 174 insertions(+), 43 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_display.c 
b/drivers/gpu/drm/i915/display/intel_display.c
index 2a17cf38d3dc..dd98732e9c5d 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -2309,7 +2309,7 @@ intel_pin_and_fence_fb_obj(struct drm_framebuffer *fb,
 
 void intel_unpin_fb_vma(struct i915_vma *vma, unsigned long flags)
 {
-   i915_gem_object_lock(vma->obj);
+   i915_gem_object_lock(vma->obj, NULL);
if (flags & PLANE_HAS_FENCE)
i915_vma_unpin_fence(vma);
i915_gem_object_unpin_from_display_plane(vma);
@@ -16971,7 +16971,7 @@ static int intel_framebuffer_init(struct 
intel_framebuffer *intel_fb,
if (!intel_fb->frontbuffer)
return -ENOMEM;
 
-   i915_gem_object_lock(obj);
+   i915_gem_object_lock(obj, NULL);
tiling = i915_gem_object_get_tiling(obj);
stride = i915_gem_object_get_stride(obj);
i915_gem_object_unlock(obj);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c 
b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
index 3a146aa2593b..2f1d8150256b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
@@ -286,7 +286,7 @@ int i915_gem_schedule_fill_pages_blt(struct 
drm_i915_gem_object *obj,
dma_fence_init(&work->dma, &clear_pages_work_ops, &fence_lock, 0, 0);
i915_sw_fence_init(&work->wait, clear_pages_work_notify);
 
-   i915_gem_object_lock(obj);
+   i915_gem_object_lock(obj, NULL);
err = i915_sw_fence_await_reservation(&work->wait,
  obj->base.resv, NULL,
  true, I915_FENCE_TIMEOUT,
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 900ea8b7fc8f..7abb2deb1327 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -113,7 +113,7 @@ static void lut_close(struct i915_gem_context *ctx)
continue;
 
rcu_read_unlock();
-   i915_gem_object_lock(obj);
+   i915_gem_object_lock(obj, NULL);
list_for_each_entry(lut, &obj->lut_list, obj_link) {
if (lut->ctx != ctx)
continue;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
index 7db5a793739d..cfadccfc2990 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
@@ -128,7 +128,7 @@ static int i915_gem_begin_cpu_access(struct dma_buf 
*dma_buf, enum dma_data_dire
if (err)
return err;
 
-   err = i915_gem_object_lock_interruptible(obj);
+   err = i915_gem_object_lock_interruptible(obj, NULL);
if (err)
goto out;
 
@@ -149,7 +149,7 @@ static int i915_gem_end_cpu_access(struct dma_buf *dma_buf, 
enum dma_data_direct
if (err)
return err;
 
-   err = i915_gem_obj

[Intel-gfx] [PATCH 14/24] drm/i915: Make sure execbuffer always passes ww state to i915_vma_pin.

2020-05-01 Thread Maarten Lankhorst
As a preparation step for full object locking and wait/wound handling
during pin and object mapping, ensure that we always pass the ww context
in i915_gem_execbuffer.c to i915_vma_pin, use lockdep to ensure this
happens.

This also requires changing the order of eb_parse slightly, to ensure
we pass ww at a point where we could still handle -EDEADLK safely.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/i915/display/intel_display.c  |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |   4 +-
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 137 ++
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c  |   4 +-
 drivers/gpu/drm/i915/gt/gen6_ppgtt.h  |   4 +-
 drivers/gpu/drm/i915/gt/intel_context.c   |  65 ++---
 drivers/gpu/drm/i915/gt/intel_context.h   |  13 ++
 drivers/gpu/drm/i915/gt/intel_context_types.h |   3 +-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c |   2 +-
 drivers/gpu/drm/i915/gt/intel_gt.c|   2 +-
 drivers/gpu/drm/i915/gt/intel_lrc.c   |   5 +-
 drivers/gpu/drm/i915/gt/intel_renderstate.c   |   2 +-
 drivers/gpu/drm/i915/gt/intel_ring.c  |  10 +-
 drivers/gpu/drm/i915/gt/intel_ring.h  |   3 +-
 .../gpu/drm/i915/gt/intel_ring_submission.c   |  15 +-
 drivers/gpu/drm/i915/gt/intel_timeline.c  |  12 +-
 drivers/gpu/drm/i915/gt/intel_timeline.h  |   3 +-
 drivers/gpu/drm/i915/gt/mock_engine.c |   3 +-
 drivers/gpu/drm/i915/gt/selftest_lrc.c|   2 +-
 drivers/gpu/drm/i915/gt/selftest_timeline.c   |   4 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc.c|   2 +-
 drivers/gpu/drm/i915/i915_drv.h   |  13 +-
 drivers/gpu/drm/i915/i915_gem.c   |  11 +-
 drivers/gpu/drm/i915/i915_vma.c   |  13 +-
 drivers/gpu/drm/i915/i915_vma.h   |  13 +-
 25 files changed, 214 insertions(+), 133 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_display.c 
b/drivers/gpu/drm/i915/display/intel_display.c
index dd98732e9c5d..fd88f9d76d0f 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -3449,7 +3449,7 @@ initial_plane_vma(struct drm_i915_private *i915,
if (IS_ERR(vma))
goto err_obj;
 
-   if (i915_ggtt_pin(vma, 0, PIN_MAPPABLE | PIN_OFFSET_FIXED | base))
+   if (i915_ggtt_pin(vma, NULL, 0, PIN_MAPPABLE | PIN_OFFSET_FIXED | base))
goto err_obj;
 
if (i915_gem_object_is_tiled(obj) &&
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index c640f70f29f1..aaea0e51fd91 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1142,7 +1142,7 @@ static int context_barrier_task(struct i915_gem_context 
*ctx,
 
i915_gem_ww_ctx_init(&ww, true);
 retry:
-   err = intel_context_pin(ce);
+   err = intel_context_pin_ww(ce, &ww);
if (err)
goto err;
 
@@ -1235,7 +1235,7 @@ static int pin_ppgtt_update(struct intel_context *ce, 
struct i915_gem_ww_ctx *ww
 
if (!HAS_LOGICAL_RING_CONTEXTS(vm->i915))
/* ppGTT is not part of the legacy context image */
-   return gen6_ppgtt_pin(i915_vm_to_ppgtt(vm));
+   return gen6_ppgtt_pin(i915_vm_to_ppgtt(vm), ww);
 
return 0;
 }
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 2127b9d249c9..f0f81191992c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -424,16 +424,17 @@ eb_pin_vma(struct i915_execbuffer *eb,
pin_flags |= PIN_GLOBAL;
 
/* Attempt to reuse the current location if available */
-   if (unlikely(i915_vma_pin(vma, 0, 0, pin_flags))) {
+   /* TODO: Add -EDEADLK handling here */
+   if (unlikely(i915_vma_pin_ww(vma, &eb->ww, 0, 0, pin_flags))) {
if (entry->flags & EXEC_OBJECT_PINNED)
return false;
 
/* Failing that pick any _free_ space if suitable */
-   if (unlikely(i915_vma_pin(vma,
- entry->pad_to_size,
- entry->alignment,
- eb_pin_flags(entry, ev->flags) |
- PIN_USER | PIN_NOEVICT)))
+   if (unlikely(i915_vma_pin_ww(vma, &eb->ww,
+entry->pad_to_size,
+entry->alignment,
+eb_pin_flags(entry, ev->flags) |
+PIN_USER | PIN_NOEVICT)))
return false;
}
 
@@ -575,7 +576,7 @@ static inline int use_cpu_reloc(const struct reloc_cache 
*cache,
obj->cache_level != I915_

[Intel-gfx] [PATCH 17/24] drm/i915: Convert i915_perf to ww locking as well

2020-05-01 Thread Maarten Lankhorst
We have the ordering of timeline->mutex vs resv_lock wrong,
convert the i915_pin_vma and intel_context_pin as well to
future-proof this.

We may need to do future changes to do this more transaction-like,
and only get down to a single i915_gem_ww_ctx, but for now this
should work.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/i915/i915_perf.c | 57 +++-
 1 file changed, 42 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index c533f569dd42..c675e6cd5967 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1195,24 +1195,39 @@ static struct intel_context *oa_pin_context(struct 
i915_perf_stream *stream)
struct i915_gem_engines_iter it;
struct i915_gem_context *ctx = stream->ctx;
struct intel_context *ce;
-   int err;
+   struct i915_gem_ww_ctx ww;
+   int err = -ENODEV;
 
for_each_gem_engine(ce, i915_gem_context_lock_engines(ctx), it) {
if (ce->engine != stream->engine) /* first match! */
continue;
 
-   /*
-* As the ID is the gtt offset of the context's vma we
-* pin the vma to ensure the ID remains fixed.
-*/
-   err = intel_context_pin(ce);
-   if (err == 0) {
-   stream->pinned_ctx = ce;
-   break;
-   }
+   err = 0;
+   break;
}
i915_gem_context_unlock_engines(ctx);
 
+   if (err)
+   return ERR_PTR(err);
+
+   i915_gem_ww_ctx_init(&ww, true);
+retry:
+   /*
+* As the ID is the gtt offset of the context's vma we
+* pin the vma to ensure the ID remains fixed.
+*/
+   err = intel_context_pin_ww(ce, &ww);
+   if (err == -EDEADLK) {
+   err = i915_gem_ww_ctx_backoff(&ww);
+   if (!err)
+   goto retry;
+   }
+   i915_gem_ww_ctx_fini(&ww);
+
+   if (err)
+   return ERR_PTR(err);
+
+   stream->pinned_ctx = ce;
return stream->pinned_ctx;
 }
 
@@ -1925,15 +1940,22 @@ emit_oa_config(struct i915_perf_stream *stream,
 {
struct i915_request *rq;
struct i915_vma *vma;
+   struct i915_gem_ww_ctx ww;
int err;
 
vma = get_oa_vma(stream, oa_config);
if (IS_ERR(vma))
return PTR_ERR(vma);
 
-   err = i915_vma_pin(vma, 0, 0, PIN_GLOBAL | PIN_HIGH);
+   i915_gem_ww_ctx_init(&ww, true);
+retry:
+   err = i915_gem_object_lock(vma->obj, &ww);
+   if (err)
+   goto err;
+
+   err = i915_vma_pin_ww(vma, &ww, 0, 0, PIN_GLOBAL | PIN_HIGH);
if (err)
-   goto err_vma_put;
+   goto err;
 
intel_engine_pm_get(ce->engine);
rq = i915_request_create(ce);
@@ -1955,11 +1977,9 @@ emit_oa_config(struct i915_perf_stream *stream,
goto err_add_request;
}
 
-   i915_vma_lock(vma);
err = i915_request_await_object(rq, vma->obj, 0);
if (!err)
err = i915_vma_move_to_active(vma, rq, 0);
-   i915_vma_unlock(vma);
if (err)
goto err_add_request;
 
@@ -1973,7 +1993,14 @@ emit_oa_config(struct i915_perf_stream *stream,
i915_request_add(rq);
 err_vma_unpin:
i915_vma_unpin(vma);
-err_vma_put:
+err:
+   if (err == -EDEADLK) {
+   err = i915_gem_ww_ctx_backoff(&ww);
+   if (!err)
+   goto retry;
+   }
+
+   i915_gem_ww_ctx_fini(&ww);
i915_vma_put(vma);
return err;
 }
-- 
2.26.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 23/24] drm/i915: Add ww locking to pin_to_display_plane

2020-05-01 Thread Maarten Lankhorst
Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/i915/gem/i915_gem_domain.c | 65 --
 drivers/gpu/drm/i915/gem/i915_gem_object.h |  1 +
 2 files changed, 49 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c 
b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
index 8ebceebd11b0..c0d153284984 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
@@ -37,6 +37,12 @@ void i915_gem_object_flush_if_display(struct 
drm_i915_gem_object *obj)
i915_gem_object_unlock(obj);
 }
 
+void i915_gem_object_flush_if_display_locked(struct drm_i915_gem_object *obj)
+{
+   if (i915_gem_object_is_framebuffer(obj))
+   __i915_gem_object_flush_for_display(obj);
+}
+
 /**
  * Moves a single object to the WC read, and possibly write domain.
  * @obj: object to act on
@@ -197,18 +203,12 @@ int i915_gem_object_set_cache_level(struct 
drm_i915_gem_object *obj,
if (ret)
return ret;
 
-   ret = i915_gem_object_lock_interruptible(obj, NULL);
-   if (ret)
-   return ret;
-
/* Always invalidate stale cachelines */
if (obj->cache_level != cache_level) {
i915_gem_object_set_cache_coherency(obj, cache_level);
obj->cache_dirty = true;
}
 
-   i915_gem_object_unlock(obj);
-
/* The cache-level will be applied when each vma is rebound. */
return i915_gem_object_unbind(obj,
  I915_GEM_OBJECT_UNBIND_ACTIVE |
@@ -255,6 +255,7 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void 
*data,
struct drm_i915_gem_caching *args = data;
struct drm_i915_gem_object *obj;
enum i915_cache_level level;
+   struct i915_gem_ww_ctx ww;
int ret = 0;
 
switch (args->caching) {
@@ -293,7 +294,18 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, 
void *data,
goto out;
}
 
-   ret = i915_gem_object_set_cache_level(obj, level);
+   i915_gem_ww_ctx_init(&ww, true);
+retry:
+   ret = i915_gem_object_lock(obj, &ww);
+   if (!ret)
+   ret = i915_gem_object_set_cache_level(obj, level);
+
+   if (ret == -EDEADLK) {
+   ret = i915_gem_ww_ctx_backoff(&ww);
+   if (!ret)
+   goto retry;
+   }
+   i915_gem_ww_ctx_fini(&ww);
 
 out:
i915_gem_object_put(obj);
@@ -313,6 +325,7 @@ i915_gem_object_pin_to_display_plane(struct 
drm_i915_gem_object *obj,
 unsigned int flags)
 {
struct drm_i915_private *i915 = to_i915(obj->base.dev);
+   struct i915_gem_ww_ctx ww;
struct i915_vma *vma;
int ret;
 
@@ -320,6 +333,11 @@ i915_gem_object_pin_to_display_plane(struct 
drm_i915_gem_object *obj,
if (HAS_LMEM(i915) && !i915_gem_object_is_lmem(obj))
return ERR_PTR(-EINVAL);
 
+   i915_gem_ww_ctx_init(&ww, true);
+retry:
+   ret = i915_gem_object_lock(obj, &ww);
+   if (ret)
+   goto err;
/*
 * The display engine is not coherent with the LLC cache on gen6.  As
 * a result, we make sure that the pinning that is about to occur is
@@ -334,7 +352,7 @@ i915_gem_object_pin_to_display_plane(struct 
drm_i915_gem_object *obj,
  HAS_WT(i915) ?
  I915_CACHE_WT : I915_CACHE_NONE);
if (ret)
-   return ERR_PTR(ret);
+   goto err;
 
/*
 * As the user may map the buffer once pinned in the display plane
@@ -347,18 +365,31 @@ i915_gem_object_pin_to_display_plane(struct 
drm_i915_gem_object *obj,
vma = ERR_PTR(-ENOSPC);
if ((flags & PIN_MAPPABLE) == 0 &&
(!view || view->type == I915_GGTT_VIEW_NORMAL))
-   vma = i915_gem_object_ggtt_pin(obj, view, 0, alignment,
-  flags |
-  PIN_MAPPABLE |
-  PIN_NONBLOCK);
-   if (IS_ERR(vma))
-   vma = i915_gem_object_ggtt_pin(obj, view, 0, alignment, flags);
-   if (IS_ERR(vma))
-   return vma;
+   vma = i915_gem_object_ggtt_pin_ww(obj, &ww, view, 0, alignment,
+ flags | PIN_MAPPABLE |
+ PIN_NONBLOCK);
+   if (IS_ERR(vma) && vma != ERR_PTR(-EDEADLK))
+   vma = i915_gem_object_ggtt_pin_ww(obj, &ww, view, 0,
+ alignment, flags);
+   if (IS_ERR(vma)) {
+   ret = PTR_ERR(vma);
+   goto err;
+   }
 
vma->display_alignment = max_t(u64, vma->display_alignment, alignment);
 
-   i915_gem_object_flush_if_display(obj);
+   i915_gem_object_flush_if_display_loc

[Intel-gfx] [PATCH 22/24] drm/i915: Add ww locking to vm_fault_gtt

2020-05-01 Thread Maarten Lankhorst
Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/i915/gem/i915_gem_mman.c | 51 +++-
 1 file changed, 33 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index b39c24dae64e..e35e8d0b6938 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -283,37 +283,46 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
struct intel_runtime_pm *rpm = &i915->runtime_pm;
struct i915_ggtt *ggtt = &i915->ggtt;
bool write = area->vm_flags & VM_WRITE;
+   struct i915_gem_ww_ctx ww;
intel_wakeref_t wakeref;
struct i915_vma *vma;
pgoff_t page_offset;
int srcu;
int ret;
 
-   /* Sanity check that we allow writing into this object */
-   if (i915_gem_object_is_readonly(obj) && write)
-   return VM_FAULT_SIGBUS;
-
/* We don't use vmf->pgoff since that has the fake offset */
page_offset = (vmf->address - area->vm_start) >> PAGE_SHIFT;
 
trace_i915_gem_object_fault(obj, page_offset, true, write);
 
-   ret = i915_gem_object_pin_pages(obj);
+   wakeref = intel_runtime_pm_get(rpm);
+
+   i915_gem_ww_ctx_init(&ww, true);
+retry:
+   ret = i915_gem_object_lock(obj, &ww);
if (ret)
-   goto err;
+   goto err_rpm;
 
-   wakeref = intel_runtime_pm_get(rpm);
+   /* Sanity check that we allow writing into this object */
+   if (i915_gem_object_is_readonly(obj) && write) {
+   ret = -EFAULT;
+   goto err_rpm;
+   }
 
-   ret = intel_gt_reset_trylock(ggtt->vm.gt, &srcu);
+   ret = i915_gem_object_pin_pages(obj);
if (ret)
goto err_rpm;
 
+   ret = intel_gt_reset_trylock(ggtt->vm.gt, &srcu);
+   if (ret)
+   goto err_pages;
+
/* Now pin it into the GTT as needed */
-   vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
-  PIN_MAPPABLE |
-  PIN_NONBLOCK /* NOWARN */ |
-  PIN_NOEVICT);
-   if (IS_ERR(vma)) {
+   vma = i915_gem_object_ggtt_pin_ww(obj, &ww, NULL, 0, 0,
+ PIN_MAPPABLE |
+ PIN_NONBLOCK /* NOWARN */ |
+ PIN_NOEVICT);
+   if (IS_ERR(vma) && vma != ERR_PTR(-EDEADLK)) {
/* Use a partial view if it is bigger than available space */
struct i915_ggtt_view view =
compute_partial_view(obj, page_offset, MIN_CHUNK_PAGES);
@@ -328,11 +337,11 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
 * all hope that the hardware is able to track future writes.
 */
 
-   vma = i915_gem_object_ggtt_pin(obj, &view, 0, 0, flags);
-   if (IS_ERR(vma)) {
+   vma = i915_gem_object_ggtt_pin_ww(obj, &ww, &view, 0, 0, flags);
+   if (IS_ERR(vma) && vma != ERR_PTR(-EDEADLK)) {
flags = PIN_MAPPABLE;
view.type = I915_GGTT_VIEW_PARTIAL;
-   vma = i915_gem_object_ggtt_pin(obj, &view, 0, 0, flags);
+   vma = i915_gem_object_ggtt_pin_ww(obj, &ww, &view, 0, 
0, flags);
}
 
/* The entire mappable GGTT is pinned? Unexpected! */
@@ -389,10 +398,16 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
__i915_vma_unpin(vma);
 err_reset:
intel_gt_reset_unlock(ggtt->vm.gt, srcu);
+err_pages:
+   i915_gem_object_unpin_pages(obj);
 err_rpm:
+   if (ret == -EDEADLK) {
+   ret = i915_gem_ww_ctx_backoff(&ww);
+   if (!ret)
+   goto retry;
+   }
+   i915_gem_ww_ctx_fini(&ww);
intel_runtime_pm_put(rpm, wakeref);
-   i915_gem_object_unpin_pages(obj);
-err:
return i915_error_to_vmf_fault(ret);
 }
 
-- 
2.26.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 09/24] drm/i915: Use ww locking in intel_renderstate.

2020-05-01 Thread Maarten Lankhorst
We want to start using ww locking in intel_context_pin, for this
we need to lock multiple objects, and the single i915_gem_object_lock
is not enough.

Convert to using ww-waiting, and make sure we always pin intel_context_state,
even if we don't have a renderstate object.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/i915/gt/intel_gt.c  | 21 +++---
 drivers/gpu/drm/i915/gt/intel_renderstate.c | 78 ++---
 drivers/gpu/drm/i915/gt/intel_renderstate.h |  9 ++-
 3 files changed, 72 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c 
b/drivers/gpu/drm/i915/gt/intel_gt.c
index f069551e412f..3c674aa76dae 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -398,21 +398,20 @@ static int __engines_record_defaults(struct intel_gt *gt)
/* We must be able to switch to something! */
GEM_BUG_ON(!engine->kernel_context);
 
-   err = intel_renderstate_init(&so, engine);
-   if (err)
-   goto out;
-
ce = intel_context_create(engine);
if (IS_ERR(ce)) {
err = PTR_ERR(ce);
goto out;
}
 
-   rq = intel_context_create_request(ce);
+   err = intel_renderstate_init(&so, ce);
+   if (err)
+   goto err;
+
+   rq = i915_request_create(ce);
if (IS_ERR(rq)) {
err = PTR_ERR(rq);
-   intel_context_put(ce);
-   goto out;
+   goto err_fini;
}
 
err = intel_engine_emit_ctx_wa(rq);
@@ -426,9 +425,13 @@ static int __engines_record_defaults(struct intel_gt *gt)
 err_rq:
requests[id] = i915_request_get(rq);
i915_request_add(rq);
-   intel_renderstate_fini(&so);
-   if (err)
+err_fini:
+   intel_renderstate_fini(&so, ce);
+err:
+   if (err) {
+   intel_context_put(ce);
goto out;
+   }
}
 
/* Flush the default context image to memory, and enable powersaving. */
diff --git a/drivers/gpu/drm/i915/gt/intel_renderstate.c 
b/drivers/gpu/drm/i915/gt/intel_renderstate.c
index ee5ca48d6953..d2cfa521fe89 100644
--- a/drivers/gpu/drm/i915/gt/intel_renderstate.c
+++ b/drivers/gpu/drm/i915/gt/intel_renderstate.c
@@ -27,6 +27,7 @@
 
 #include "i915_drv.h"
 #include "intel_renderstate.h"
+#include "gt/intel_context.h"
 #include "intel_ring.h"
 
 static const struct intel_renderstate_rodata *
@@ -74,10 +75,9 @@ static int render_state_setup(struct intel_renderstate *so,
u32 *d;
int ret;
 
-   i915_gem_object_lock(so->vma->obj, NULL);
ret = i915_gem_object_prepare_write(so->vma->obj, &needs_clflush);
if (ret)
-   goto out_unlock;
+   return ret;
 
d = kmap_atomic(i915_gem_object_get_dirty_page(so->vma->obj, 0));
 
@@ -158,8 +158,6 @@ static int render_state_setup(struct intel_renderstate *so,
ret = 0;
 out:
i915_gem_object_finish_access(so->vma->obj);
-out_unlock:
-   i915_gem_object_unlock(so->vma->obj);
return ret;
 
 err:
@@ -171,33 +169,47 @@ static int render_state_setup(struct intel_renderstate 
*so,
 #undef OUT_BATCH
 
 int intel_renderstate_init(struct intel_renderstate *so,
-  struct intel_engine_cs *engine)
+  struct intel_context *ce)
 {
-   struct drm_i915_gem_object *obj;
+   struct intel_engine_cs *engine = ce->engine;
+   struct drm_i915_gem_object *obj = NULL;
int err;
 
memset(so, 0, sizeof(*so));
 
so->rodata = render_state_get_rodata(engine);
-   if (!so->rodata)
-   return 0;
+   if (so->rodata) {
+   if (so->rodata->batch_items * 4 > PAGE_SIZE)
+   return -EINVAL;
+
+   obj = i915_gem_object_create_internal(engine->i915, PAGE_SIZE);
+   if (IS_ERR(obj))
+   return PTR_ERR(obj);
+
+   so->vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
+   if (IS_ERR(so->vma)) {
+   err = PTR_ERR(so->vma);
+   goto err_obj;
+   }
+   }
 
-   if (so->rodata->batch_items * 4 > PAGE_SIZE)
-   return -EINVAL;
+   i915_gem_ww_ctx_init(&so->ww, true);
+retry:
+   err = intel_context_pin(ce);
+   if (err)
+   goto err_fini;
 
-   obj = i915_gem_object_create_internal(engine->i915, PAGE_SIZE);
-   if (IS_ERR(obj))
-   return PTR_ERR(obj);
+   /* return early if there's nothing to setup */
+   if (!err && !so->rodata)
+   return 0;
 
-   so->vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
-   if (

[Intel-gfx] [PATCH 06/24] Revert "drm/i915/gem: Split eb_vma into its own allocation"

2020-05-01 Thread Maarten Lankhorst
This reverts commit 0f1dd02295f35dcdcbaafcbcbbec0753884ab974.
This conflicts with the ww mutex handling, which needs to drop
the references after gpu submission anyway, because otherwise we
may risk unlocking a BO after first freeing it.

Signed-off-by: Maarten Lankhorst 
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 131 --
 1 file changed, 58 insertions(+), 73 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 057e0ba14b47..9922dc68311f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -40,11 +40,6 @@ struct eb_vma {
u32 handle;
 };
 
-struct eb_vma_array {
-   struct kref kref;
-   struct eb_vma vma[];
-};
-
 enum {
FORCE_CPU_RELOC = 1,
FORCE_GTT_RELOC,
@@ -57,6 +52,7 @@ enum {
 #define __EXEC_OBJECT_NEEDS_MAPBIT(29)
 #define __EXEC_OBJECT_NEEDS_BIAS   BIT(28)
 #define __EXEC_OBJECT_INTERNAL_FLAGS   (~0u << 28) /* all of the above */
+#define __EXEC_OBJECT_RESERVED (__EXEC_OBJECT_HAS_PIN | 
__EXEC_OBJECT_HAS_FENCE)
 
 #define __EXEC_HAS_RELOC   BIT(31)
 #define __EXEC_INTERNAL_FLAGS  (~0u << 31)
@@ -287,7 +283,6 @@ struct i915_execbuffer {
 */
int lut_size;
struct hlist_head *buckets; /** ht for relocation handles */
-   struct eb_vma_array *array;
 };
 
 static int eb_parse(struct i915_execbuffer *eb);
@@ -299,62 +294,8 @@ static inline bool eb_use_cmdparser(const struct 
i915_execbuffer *eb)
 eb->args->batch_len);
 }
 
-static struct eb_vma_array *eb_vma_array_create(unsigned int count)
-{
-   struct eb_vma_array *arr;
-
-   arr = kvmalloc(struct_size(arr, vma, count), GFP_KERNEL | __GFP_NOWARN);
-   if (!arr)
-   return NULL;
-
-   kref_init(&arr->kref);
-   arr->vma[0].vma = NULL;
-
-   return arr;
-}
-
-static inline void eb_unreserve_vma(struct eb_vma *ev)
-{
-   struct i915_vma *vma = ev->vma;
-
-   if (unlikely(ev->flags & __EXEC_OBJECT_HAS_FENCE))
-   __i915_vma_unpin_fence(vma);
-
-   if (ev->flags & __EXEC_OBJECT_HAS_PIN)
-   __i915_vma_unpin(vma);
-
-   ev->flags &= ~(__EXEC_OBJECT_HAS_PIN |
-  __EXEC_OBJECT_HAS_FENCE);
-}
-
-static void eb_vma_array_destroy(struct kref *kref)
-{
-   struct eb_vma_array *arr = container_of(kref, typeof(*arr), kref);
-   struct eb_vma *ev = arr->vma;
-
-   while (ev->vma) {
-   eb_unreserve_vma(ev);
-   i915_vma_put(ev->vma);
-   ev++;
-   }
-
-   kvfree(arr);
-}
-
-static void eb_vma_array_put(struct eb_vma_array *arr)
-{
-   kref_put(&arr->kref, eb_vma_array_destroy);
-}
-
 static int eb_create(struct i915_execbuffer *eb)
 {
-   /* Allocate an extra slot for use by the command parser + sentinel */
-   eb->array = eb_vma_array_create(eb->buffer_count + 2);
-   if (!eb->array)
-   return -ENOMEM;
-
-   eb->vma = eb->array->vma;
-
if (!(eb->args->flags & I915_EXEC_HANDLE_LUT)) {
unsigned int size = 1 + ilog2(eb->buffer_count);
 
@@ -388,10 +329,8 @@ static int eb_create(struct i915_execbuffer *eb)
break;
} while (--size);
 
-   if (unlikely(!size)) {
-   eb_vma_array_put(eb->array);
+   if (unlikely(!size))
return -ENOMEM;
-   }
 
eb->lut_size = size;
} else {
@@ -502,6 +441,26 @@ eb_pin_vma(struct i915_execbuffer *eb,
return !eb_vma_misplaced(entry, vma, ev->flags);
 }
 
+static inline void __eb_unreserve_vma(struct i915_vma *vma, unsigned int flags)
+{
+   GEM_BUG_ON(!(flags & __EXEC_OBJECT_HAS_PIN));
+
+   if (unlikely(flags & __EXEC_OBJECT_HAS_FENCE))
+   __i915_vma_unpin_fence(vma);
+
+   __i915_vma_unpin(vma);
+}
+
+static inline void
+eb_unreserve_vma(struct eb_vma *ev)
+{
+   if (!(ev->flags & __EXEC_OBJECT_HAS_PIN))
+   return;
+
+   __eb_unreserve_vma(ev->vma, ev->flags);
+   ev->flags &= ~__EXEC_OBJECT_RESERVED;
+}
+
 static int
 eb_validate_vma(struct i915_execbuffer *eb,
struct drm_i915_gem_exec_object2 *entry,
@@ -944,13 +903,31 @@ eb_get_vma(const struct i915_execbuffer *eb, unsigned 
long handle)
}
 }
 
+static void eb_release_vmas(const struct i915_execbuffer *eb)
+{
+   const unsigned int count = eb->buffer_count;
+   unsigned int i;
+
+   for (i = 0; i < count; i++) {
+   struct eb_vma *ev = &eb->vma[i];
+   struct i915_vma *vma = ev->vma;
+
+   if (!vma)
+   break;
+
+   eb->vma[i].vma = NULL;
+
+   if (ev->flags & __EXEC_OBJECT_HAS_PIN)
+   __eb_unreserve_vma(vma, ev->flags);
+
+   i915_vma_put(vma);
+   }
+}
+
 static v

[Intel-gfx] [PATCH 08/24] drm/i915: Use per object locking in execbuf, v9.

2020-05-01 Thread Maarten Lankhorst
Now that we changed execbuf submission slightly to allow us to do all
pinning in one place, we can now simply add ww versions on top of
struct_mutex. All we have to do is a separate path for -EDEADLK
handling, which needs to unpin all gem bo's before dropping the lock,
then starting over.

This finally allows us to do parallel submission, but because not
all of the pinning code uses the ww ctx yet, we cannot completely
drop struct_mutex yet.

Changes since v1:
- Keep struct_mutex for now. :(
Changes since v2:
- Make sure we always lock the ww context in slowpath.
Changes since v3:
- Don't call __eb_unreserve_vma in eb_move_to_gpu now; this can be
  done on normal unlock path.
- Unconditionally release vmas and context.
Changes since v4:
- Rebased on top of struct_mutex reduction.
Changes since v5:
- Remove training wheels.
Changes since v6:
- Fix accidentally broken -ENOSPC handling.
Changes since v7:
- Handle gt buffer pool better.
Changes since v8:
- Properly clear variables, to make -EDEADLK handling not BUG.

Signed-off-by: Maarten Lankhorst 
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 350 ++
 drivers/gpu/drm/i915/i915_gem.c   |   6 +
 drivers/gpu/drm/i915/i915_gem.h   |   1 +
 3 files changed, 207 insertions(+), 150 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 5b4f6fb1428c..96b172f9b9f7 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -249,6 +249,8 @@ struct i915_execbuffer {
/** list of vma that have execobj.relocation_count */
struct list_head relocs;
 
+   struct i915_gem_ww_ctx ww;
+
/**
 * Track the most recently used object for relocations, as we
 * frequently have to perform multiple relocations within the same
@@ -267,14 +269,18 @@ struct i915_execbuffer {
struct i915_request *rq;
u32 *rq_cmd;
unsigned int rq_size;
+   struct intel_gt_buffer_pool_node *pool;
} reloc_cache;
 
+   struct intel_gt_buffer_pool_node *reloc_pool; /** relocation pool for 
-EDEADLK handling */
+
u64 invalid_flags; /** Set of execobj.flags that are invalid */
u32 context_flags; /** Set of execobj.flags to insert from the ctx */
 
u32 batch_start_offset; /** Location within object of batch */
u32 batch_len; /** Length of batch within object */
u32 batch_flags; /** Flags composed for emit_bb_start() */
+   struct intel_gt_buffer_pool_node *batch_pool; /** pool node for batch 
buffer */
 
/**
 * Indicate either the size of the hastable used to resolve
@@ -441,24 +447,18 @@ eb_pin_vma(struct i915_execbuffer *eb,
return !eb_vma_misplaced(entry, vma, ev->flags);
 }
 
-static inline void __eb_unreserve_vma(struct i915_vma *vma, unsigned int flags)
-{
-   GEM_BUG_ON(!(flags & __EXEC_OBJECT_HAS_PIN));
-
-   if (unlikely(flags & __EXEC_OBJECT_HAS_FENCE))
-   __i915_vma_unpin_fence(vma);
-
-   __i915_vma_unpin(vma);
-}
-
 static inline void
 eb_unreserve_vma(struct eb_vma *ev)
 {
if (!(ev->flags & __EXEC_OBJECT_HAS_PIN))
return;
 
-   __eb_unreserve_vma(ev->vma, ev->flags);
ev->flags &= ~__EXEC_OBJECT_RESERVED;
+
+   if (unlikely(ev->flags & __EXEC_OBJECT_HAS_FENCE))
+   __i915_vma_unpin_fence(ev->vma);
+
+   __i915_vma_unpin(ev->vma);
 }
 
 static int
@@ -552,16 +552,6 @@ eb_add_vma(struct i915_execbuffer *eb,
 
eb->batch = ev;
}
-
-   if (eb_pin_vma(eb, entry, ev)) {
-   if (entry->offset != vma->node.start) {
-   entry->offset = vma->node.start | UPDATE;
-   eb->args->flags |= __EXEC_HAS_RELOC;
-   }
-   } else {
-   eb_unreserve_vma(ev);
-   list_add_tail(&ev->bind_link, &eb->unbound);
-   }
 }
 
 static inline int use_cpu_reloc(const struct reloc_cache *cache,
@@ -646,10 +636,6 @@ static int eb_reserve(struct i915_execbuffer *eb)
 * This avoid unnecessary unbinding of later objects in order to make
 * room for the earlier objects *unless* we need to defragment.
 */
-
-   if (mutex_lock_interruptible(&eb->i915->drm.struct_mutex))
-   return -EINTR;
-
pass = 0;
do {
list_for_each_entry(ev, &eb->unbound, bind_link) {
@@ -657,8 +643,8 @@ static int eb_reserve(struct i915_execbuffer *eb)
if (err)
break;
}
-   if (!(err == -ENOSPC || err == -EAGAIN))
-   break;
+   if (err != -ENOSPC)
+   return err;
 
/* Resort *all* the objects into priority order */
INIT_LIST_HEAD(&eb->unbound);
@@ -688,13 +674,6 @@ static int eb_r

[Intel-gfx] [PATCH 05/24] drm/i915: Parse command buffer earlier in eb_relocate(slow)

2020-05-01 Thread Maarten Lankhorst
We want to introduce backoff logic, but we need to lock the
pool object as well for command parsing. Because of this, we
will need backoff logic for the engine pool obj, move the batch
validation up slightly to eb_lookup_vmas, and the actual command
parsing in a separate function which can get called from execbuf
relocation fast and slowpath.

Signed-off-by: Maarten Lankhorst 
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 68 ++-
 1 file changed, 37 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 0d1d64bcd964..057e0ba14b47 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -290,6 +290,8 @@ struct i915_execbuffer {
struct eb_vma_array *array;
 };
 
+static int eb_parse(struct i915_execbuffer *eb);
+
 static inline bool eb_use_cmdparser(const struct i915_execbuffer *eb)
 {
return intel_engine_requires_cmd_parser(eb->engine) ||
@@ -873,6 +875,7 @@ static struct i915_vma *eb_lookup_vma(struct 
i915_execbuffer *eb, u32 handle)
 
 static int eb_lookup_vmas(struct i915_execbuffer *eb)
 {
+   struct drm_i915_private *i915 = eb->i915;
unsigned int batch = eb_batch_index(eb);
unsigned int i;
int err = 0;
@@ -886,18 +889,37 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb)
vma = eb_lookup_vma(eb, eb->exec[i].handle);
if (IS_ERR(vma)) {
err = PTR_ERR(vma);
-   break;
+   goto err;
}
 
err = eb_validate_vma(eb, &eb->exec[i], vma);
if (unlikely(err)) {
i915_vma_put(vma);
-   break;
+   goto err;
}
 
eb_add_vma(eb, i, batch, vma);
}
 
+   if (unlikely(eb->batch->flags & EXEC_OBJECT_WRITE)) {
+   drm_dbg(&i915->drm,
+   "Attempting to use self-modifying batch buffer\n");
+   return -EINVAL;
+   }
+
+   if (range_overflows_t(u64,
+ eb->batch_start_offset, eb->batch_len,
+ eb->batch->vma->size)) {
+   drm_dbg(&i915->drm, "Attempting to use out-of-bounds batch\n");
+   return -EINVAL;
+   }
+
+   if (eb->batch_len == 0)
+   eb->batch_len = eb->batch->vma->size - eb->batch_start_offset;
+
+   return 0;
+
+err:
eb->vma[i].vma = NULL;
return err;
 }
@@ -1737,7 +1759,7 @@ static int eb_prefault_relocations(const struct 
i915_execbuffer *eb)
return 0;
 }
 
-static noinline int eb_relocate_slow(struct i915_execbuffer *eb)
+static noinline int eb_relocate_parse_slow(struct i915_execbuffer *eb)
 {
bool have_copy = false;
struct eb_vma *ev;
@@ -1788,6 +1810,11 @@ static noinline int eb_relocate_slow(struct 
i915_execbuffer *eb)
}
}
 
+   /* as last step, parse the command buffer */
+   err = eb_parse(eb);
+   if (err)
+   goto err;
+
/*
 * Leave the user relocations as are, this is the painfully slow path,
 * and we want to avoid the complication of dropping the lock whilst
@@ -1820,7 +1847,7 @@ static noinline int eb_relocate_slow(struct 
i915_execbuffer *eb)
return err;
 }
 
-static int eb_relocate(struct i915_execbuffer *eb)
+static int eb_relocate_parse(struct i915_execbuffer *eb)
 {
int err;
 
@@ -1840,11 +1867,11 @@ static int eb_relocate(struct i915_execbuffer *eb)
 
list_for_each_entry(ev, &eb->relocs, reloc_link) {
if (eb_relocate_vma(eb, ev))
-   return eb_relocate_slow(eb);
+   return eb_relocate_parse_slow(eb);
}
}
 
-   return 0;
+   return eb_parse(eb);
 }
 
 static int eb_move_to_gpu(struct i915_execbuffer *eb)
@@ -2775,7 +2802,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
if (unlikely(err))
goto err_context;
 
-   err = eb_relocate(&eb);
+   err = eb_relocate_parse(&eb);
if (err) {
/*
 * If the user expects the execobject.offset and
@@ -2788,33 +2815,10 @@ i915_gem_do_execbuffer(struct drm_device *dev,
goto err_vma;
}
 
-   if (unlikely(eb.batch->flags & EXEC_OBJECT_WRITE)) {
-   drm_dbg(&i915->drm,
-   "Attempting to use self-modifying batch buffer\n");
-   err = -EINVAL;
-   goto err_vma;
-   }
-
-   if (range_overflows_t(u64,
- eb.batch_start_offset, eb.batch_len,
- eb.batch->vma->size)) {
-   drm_dbg(&i915->drm, "Attempting to use out-of-bounds batch\n");
-   err = -EINVAL;
- 

[Intel-gfx] [PATCH 15/24] drm/i915: Convert i915_gem_object/client_blt.c to use ww locking as well, v2.

2020-05-01 Thread Maarten Lankhorst
This is the last part outside of selftests that still don't use the
correct lock ordering of timeline->mutex vs resv_lock.

With gem fixed, there are a few places that still get locking wrong:
- gvt/scheduler.c
- i915_perf.c
- Most if not all selftests.

Changes since v1:
- Add intel_engine_pm_get/put() calls to fix use-after-free when using
  intel_engine_get_pool().

Signed-off-by: Maarten Lankhorst 
---
 .../gpu/drm/i915/gem/i915_gem_client_blt.c|  80 +++--
 .../gpu/drm/i915/gem/i915_gem_object_blt.c| 156 +++---
 .../gpu/drm/i915/gem/i915_gem_object_blt.h|   3 +
 3 files changed, 165 insertions(+), 74 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c 
b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
index 2f1d8150256b..6d2f6ac500dc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
@@ -156,6 +156,7 @@ static void clear_pages_worker(struct work_struct *work)
struct clear_pages_work *w = container_of(work, typeof(*w), work);
struct drm_i915_gem_object *obj = w->sleeve->vma->obj;
struct i915_vma *vma = w->sleeve->vma;
+   struct i915_gem_ww_ctx ww;
struct i915_request *rq;
struct i915_vma *batch;
int err = w->dma.error;
@@ -171,17 +172,20 @@ static void clear_pages_worker(struct work_struct *work)
obj->read_domains = I915_GEM_GPU_DOMAINS;
obj->write_domain = 0;
 
-   err = i915_vma_pin(vma, 0, 0, PIN_USER);
-   if (unlikely(err))
+   i915_gem_ww_ctx_init(&ww, false);
+   intel_engine_pm_get(w->ce->engine);
+retry:
+   err = intel_context_pin_ww(w->ce, &ww);
+   if (err)
goto out_signal;
 
-   batch = intel_emit_vma_fill_blt(w->ce, vma, w->value);
+   batch = intel_emit_vma_fill_blt(w->ce, vma, &ww, w->value);
if (IS_ERR(batch)) {
err = PTR_ERR(batch);
-   goto out_unpin;
+   goto out_ctx;
}
 
-   rq = intel_context_create_request(w->ce);
+   rq = i915_request_create(w->ce);
if (IS_ERR(rq)) {
err = PTR_ERR(rq);
goto out_batch;
@@ -223,9 +227,19 @@ static void clear_pages_worker(struct work_struct *work)
i915_request_add(rq);
 out_batch:
intel_emit_vma_release(w->ce, batch);
-out_unpin:
-   i915_vma_unpin(vma);
+out_ctx:
+   intel_context_unpin(w->ce);
 out_signal:
+   if (err == -EDEADLK) {
+   err = i915_gem_ww_ctx_backoff(&ww);
+   if (!err)
+   goto retry;
+   }
+   i915_gem_ww_ctx_fini(&ww);
+
+   i915_vma_unpin(w->sleeve->vma);
+   intel_engine_pm_put(w->ce->engine);
+
if (unlikely(err)) {
dma_fence_set_error(&w->dma, err);
dma_fence_signal(&w->dma);
@@ -233,6 +247,45 @@ static void clear_pages_worker(struct work_struct *work)
}
 }
 
+static int pin_wait_clear_pages_work(struct clear_pages_work *w,
+struct intel_context *ce)
+{
+   struct i915_vma *vma = w->sleeve->vma;
+   struct i915_gem_ww_ctx ww;
+   int err;
+
+   i915_gem_ww_ctx_init(&ww, false);
+retry:
+   err = i915_gem_object_lock(vma->obj, &ww);
+   if (err)
+   goto out;
+
+   err = i915_vma_pin_ww(vma, &ww, 0, 0, PIN_USER);
+   if (unlikely(err))
+   goto out;
+
+   err = i915_sw_fence_await_reservation(&w->wait,
+ vma->obj->base.resv, NULL,
+ true, I915_FENCE_TIMEOUT,
+ I915_FENCE_GFP);
+   if (err)
+   goto err_unpin_vma;
+
+   dma_resv_add_excl_fence(vma->obj->base.resv, &w->dma);
+
+err_unpin_vma:
+   if (err)
+   i915_vma_unpin(vma);
+out:
+   if (err == -EDEADLK) {
+   err = i915_gem_ww_ctx_backoff(&ww);
+   if (!err)
+   goto retry;
+   }
+   i915_gem_ww_ctx_fini(&ww);
+   return err;
+}
+
 static int __i915_sw_fence_call
 clear_pages_work_notify(struct i915_sw_fence *fence,
enum i915_sw_fence_notify state)
@@ -286,18 +339,9 @@ int i915_gem_schedule_fill_pages_blt(struct 
drm_i915_gem_object *obj,
dma_fence_init(&work->dma, &clear_pages_work_ops, &fence_lock, 0, 0);
i915_sw_fence_init(&work->wait, clear_pages_work_notify);
 
-   i915_gem_object_lock(obj, NULL);
-   err = i915_sw_fence_await_reservation(&work->wait,
- obj->base.resv, NULL,
- true, I915_FENCE_TIMEOUT,
- I915_FENCE_GFP);
-   if (err < 0) {
+   err = pin_wait_clear_pages_work(work, ce);
+   if (err < 0)
dma_fence_set_error(&work->dma, err);
-   } else {
-   dma_resv_add_e

[Intel-gfx] [PATCH 16/24] drm/i915: Kill last user of intel_context_create_request outside of selftests

2020-05-01 Thread Maarten Lankhorst
Instead of using intel_context_create_request(), use intel_context_pin()
and i915_create_request directly.

Now all those calls are gone outside of selftests. :)

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/i915/gt/intel_workarounds.c | 43 ++---
 1 file changed, 29 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c 
b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index adddc5c93b48..51a0e114c367 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -1746,6 +1746,7 @@ static int engine_wa_list_verify(struct intel_context *ce,
const struct i915_wa *wa;
struct i915_request *rq;
struct i915_vma *vma;
+   struct i915_gem_ww_ctx ww;
unsigned int i;
u32 *results;
int err;
@@ -1758,29 +1759,34 @@ static int engine_wa_list_verify(struct intel_context 
*ce,
return PTR_ERR(vma);
 
intel_engine_pm_get(ce->engine);
-   rq = intel_context_create_request(ce);
-   intel_engine_pm_put(ce->engine);
+   i915_gem_ww_ctx_init(&ww, false);
+retry:
+   err = i915_gem_object_lock(vma->obj, &ww);
+   if (err == 0)
+   err = intel_context_pin_ww(ce, &ww);
+   if (err)
+   goto err_pm;
+
+   rq = i915_request_create(ce);
if (IS_ERR(rq)) {
err = PTR_ERR(rq);
-   goto err_vma;
+   goto err_unpin;
}
 
-   i915_vma_lock(vma);
err = i915_request_await_object(rq, vma->obj, true);
if (err == 0)
err = i915_vma_move_to_active(vma, rq, EXEC_OBJECT_WRITE);
-   i915_vma_unlock(vma);
-   if (err) {
-   i915_request_add(rq);
-   goto err_vma;
-   }
-
-   err = wa_list_srm(rq, wal, vma);
-   if (err)
-   goto err_vma;
+   if (err == 0)
+   err = wa_list_srm(rq, wal, vma);
 
i915_request_get(rq);
+   if (err)
+   i915_request_set_error_once(rq, err);
i915_request_add(rq);
+
+   if (err)
+   goto err_rq;
+
if (i915_request_wait(rq, 0, HZ / 5) < 0) {
err = -ETIME;
goto err_rq;
@@ -1805,7 +1811,16 @@ static int engine_wa_list_verify(struct intel_context 
*ce,
 
 err_rq:
i915_request_put(rq);
-err_vma:
+err_unpin:
+   intel_context_unpin(ce);
+err_pm:
+   if (err == -EDEADLK) {
+   err = i915_gem_ww_ctx_backoff(&ww);
+   if (!err)
+   goto retry;
+   }
+   i915_gem_ww_ctx_fini(&ww);
+   intel_engine_pm_put(ce->engine);
i915_vma_unpin(vma);
i915_vma_put(vma);
return err;
-- 
2.26.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 13/24] drm/i915: Rework intel_context pinning to do everything outside of pin_mutex

2020-05-01 Thread Maarten Lankhorst
Instead of doing everything inside of pin_mutex, we move all pinning
outside. Because i915_active has its own reference counting and
pinning is also having the same issues vs mutexes, we make sure
everything is pinned first, so the pinning in i915_active only needs
to bump refcounts. This allows us to take pin refcounts correctly
all the time.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/i915/gt/intel_context.c   | 232 +++---
 drivers/gpu/drm/i915/gt/intel_context_types.h |   4 +-
 drivers/gpu/drm/i915/gt/intel_lrc.c   |  34 ++-
 .../gpu/drm/i915/gt/intel_ring_submission.c   |  13 +-
 drivers/gpu/drm/i915/gt/mock_engine.c |  13 +-
 5 files changed, 190 insertions(+), 106 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index e4aece20bc80..c039e87a46c4 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -93,79 +93,6 @@ static void intel_context_active_release(struct 
intel_context *ce)
i915_active_release(&ce->active);
 }
 
-int __intel_context_do_pin(struct intel_context *ce)
-{
-   int err;
-
-   if (unlikely(!test_bit(CONTEXT_ALLOC_BIT, &ce->flags))) {
-   err = intel_context_alloc_state(ce);
-   if (err)
-   return err;
-   }
-
-   err = i915_active_acquire(&ce->active);
-   if (err)
-   return err;
-
-   if (mutex_lock_interruptible(&ce->pin_mutex)) {
-   err = -EINTR;
-   goto out_release;
-   }
-
-   if (unlikely(intel_context_is_closed(ce))) {
-   err = -ENOENT;
-   goto out_unlock;
-   }
-
-   if (likely(!atomic_add_unless(&ce->pin_count, 1, 0))) {
-   err = intel_context_active_acquire(ce);
-   if (unlikely(err))
-   goto out_unlock;
-
-   err = ce->ops->pin(ce);
-   if (unlikely(err))
-   goto err_active;
-
-   CE_TRACE(ce, "pin ring:{start:%08x, head:%04x, tail:%04x}\n",
-i915_ggtt_offset(ce->ring->vma),
-ce->ring->head, ce->ring->tail);
-
-   smp_mb__before_atomic(); /* flush pin before it is visible */
-   atomic_inc(&ce->pin_count);
-   }
-
-   GEM_BUG_ON(!intel_context_is_pinned(ce)); /* no overflow! */
-   GEM_BUG_ON(i915_active_is_idle(&ce->active));
-   goto out_unlock;
-
-err_active:
-   intel_context_active_release(ce);
-out_unlock:
-   mutex_unlock(&ce->pin_mutex);
-out_release:
-   i915_active_release(&ce->active);
-   return err;
-}
-
-void intel_context_unpin(struct intel_context *ce)
-{
-   if (!atomic_dec_and_test(&ce->pin_count))
-   return;
-
-   CE_TRACE(ce, "unpin\n");
-   ce->ops->unpin(ce);
-
-   /*
-* Once released, we may asynchronously drop the active reference.
-* As that may be the only reference keeping the context alive,
-* take an extra now so that it is not freed before we finish
-* dereferencing it.
-*/
-   intel_context_get(ce);
-   intel_context_active_release(ce);
-   intel_context_put(ce);
-}
-
 static int __context_pin_state(struct i915_vma *vma)
 {
unsigned int bias = i915_ggtt_pin_bias(vma) | PIN_OFFSET_BIAS;
@@ -225,6 +152,138 @@ static void __ring_retire(struct intel_ring *ring)
i915_active_release(&ring->vma->active);
 }
 
+static int intel_context_pre_pin(struct intel_context *ce)
+{
+   int err;
+
+   CE_TRACE(ce, "active\n");
+
+   err = __ring_active(ce->ring);
+   if (err)
+   return err;
+
+   err = intel_timeline_pin(ce->timeline);
+   if (err)
+   goto err_ring;
+
+   if (!ce->state)
+   return 0;
+
+   err = __context_pin_state(ce->state);
+   if (err)
+   goto err_timeline;
+
+
+   return 0;
+
+err_timeline:
+   intel_timeline_unpin(ce->timeline);
+err_ring:
+   __ring_retire(ce->ring);
+   return err;
+}
+
+static void intel_context_post_unpin(struct intel_context *ce)
+{
+   if (ce->state)
+   __context_unpin_state(ce->state);
+
+   intel_timeline_unpin(ce->timeline);
+   __ring_retire(ce->ring);
+}
+
+int __intel_context_do_pin(struct intel_context *ce)
+{
+   bool handoff = false;
+   void *vaddr;
+   int err = 0;
+
+   if (unlikely(!test_bit(CONTEXT_ALLOC_BIT, &ce->flags))) {
+   err = intel_context_alloc_state(ce);
+   if (err)
+   return err;
+   }
+
+   /*
+* We always pin the context/ring/timeline here, to ensure a pin
+* refcount for __intel_context_active(), which prevent a lock
+* inversion of ce->pin_mutex vs dma_resv_lock().
+*/
+   err = intel_context_pre_pin(ce);
+   if (err)
+   return err;
+
+ 

[Intel-gfx] [PATCH 10/24] drm/i915: Add ww context handling to context_barrier_task

2020-05-01 Thread Maarten Lankhorst
This is required if we want to pass a ww context in intel_context_pin
and gen6_ppgtt_pin().

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   | 55 ++-
 .../drm/i915/gem/selftests/i915_gem_context.c | 22 +++-
 2 files changed, 48 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 7abb2deb1327..c640f70f29f1 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1094,6 +1094,7 @@ I915_SELFTEST_DECLARE(static intel_engine_mask_t 
context_barrier_inject_fault);
 static int context_barrier_task(struct i915_gem_context *ctx,
intel_engine_mask_t engines,
bool (*skip)(struct intel_context *ce, void 
*data),
+   int (*pin)(struct intel_context *ce, struct 
i915_gem_ww_ctx *ww, void *data),
int (*emit)(struct i915_request *rq, void 
*data),
void (*task)(void *data),
void *data)
@@ -1101,6 +1102,7 @@ static int context_barrier_task(struct i915_gem_context 
*ctx,
struct context_barrier_task *cb;
struct i915_gem_engines_iter it;
struct i915_gem_engines *e;
+   struct i915_gem_ww_ctx ww;
struct intel_context *ce;
int err = 0;
 
@@ -1138,10 +1140,21 @@ static int context_barrier_task(struct i915_gem_context 
*ctx,
if (skip && skip(ce, data))
continue;
 
-   rq = intel_context_create_request(ce);
+   i915_gem_ww_ctx_init(&ww, true);
+retry:
+   err = intel_context_pin(ce);
+   if (err)
+   goto err;
+
+   if (pin)
+   err = pin(ce, &ww, data);
+   if (err)
+   goto err_unpin;
+
+   rq = i915_request_create(ce);
if (IS_ERR(rq)) {
err = PTR_ERR(rq);
-   break;
+   goto err_unpin;
}
 
err = 0;
@@ -1151,6 +1164,16 @@ static int context_barrier_task(struct i915_gem_context 
*ctx,
err = i915_active_add_request(&cb->base, rq);
 
i915_request_add(rq);
+err_unpin:
+   intel_context_unpin(ce);
+err:
+   if (err == -EDEADLK) {
+   err = i915_gem_ww_ctx_backoff(&ww);
+   if (!err)
+   goto retry;
+   }
+   i915_gem_ww_ctx_fini(&ww);
+
if (err)
break;
}
@@ -1206,6 +1229,17 @@ static void set_ppgtt_barrier(void *data)
i915_vm_close(old);
 }
 
+static int pin_ppgtt_update(struct intel_context *ce, struct i915_gem_ww_ctx 
*ww, void *data)
+{
+   struct i915_address_space *vm = ce->vm;
+
+   if (!HAS_LOGICAL_RING_CONTEXTS(vm->i915))
+   /* ppGTT is not part of the legacy context image */
+   return gen6_ppgtt_pin(i915_vm_to_ppgtt(vm));
+
+   return 0;
+}
+
 static int emit_ppgtt_update(struct i915_request *rq, void *data)
 {
struct i915_address_space *vm = rq->context->vm;
@@ -1262,20 +1296,10 @@ static int emit_ppgtt_update(struct i915_request *rq, 
void *data)
 
 static bool skip_ppgtt_update(struct intel_context *ce, void *data)
 {
-   if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags))
-   return true;
-
if (HAS_LOGICAL_RING_CONTEXTS(ce->engine->i915))
-   return false;
-
-   if (!atomic_read(&ce->pin_count))
-   return true;
-
-   /* ppGTT is not part of the legacy context image */
-   if (gen6_ppgtt_pin(i915_vm_to_ppgtt(ce->vm)))
-   return true;
-
-   return false;
+   return !ce->state;
+   else
+   return !atomic_read(&ce->pin_count);
 }
 
 static int set_ppgtt(struct drm_i915_file_private *file_priv,
@@ -1326,6 +1350,7 @@ static int set_ppgtt(struct drm_i915_file_private 
*file_priv,
 */
err = context_barrier_task(ctx, ALL_ENGINES,
   skip_ppgtt_update,
+  pin_ppgtt_update,
   emit_ppgtt_update,
   set_ppgtt_barrier,
   old);
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
index bcfe0f230cef..be9d4b45b289 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
@@ -1902,8 +1902,8 @@ static int mock_context_barrier(void *arg)
return -ENOMEM;
 
counter = 0;
-   err = context_barrier_task(ctx, 0,
-

[Intel-gfx] [PATCH 18/24] drm/i915: Dirty hack to fix selftests locking inversion

2020-05-01 Thread Maarten Lankhorst
Some i915 selftests still use i915_vma_lock() as inner lock, and
intel_context_create_request() intel_timeline->mutex as outer lock.
Fortunately for selftests this is not an issue, they should be fixed
but we can move ahead and cleanify lockdep now.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/i915/gt/intel_context.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index 64948386630f..fe9fff5a63b1 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -459,6 +459,18 @@ struct i915_request *intel_context_create_request(struct 
intel_context *ce)
rq = i915_request_create(ce);
intel_context_unpin(ce);
 
+   if (IS_ERR(rq))
+   return rq;
+
+   /*
+* timeline->mutex should be the inner lock, but is used as outer lock.
+* Hack around this to shut up lockdep in selftests..
+*/
+   lockdep_unpin_lock(&ce->timeline->mutex, rq->cookie);
+   mutex_release(&ce->timeline->mutex.dep_map, _RET_IP_);
+   mutex_acquire(&ce->timeline->mutex.dep_map, SINGLE_DEPTH_NESTING, 0, 
_RET_IP_);
+   rq->cookie = lockdep_pin_lock(&ce->timeline->mutex);
+
return rq;
 }
 
-- 
2.26.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 02/24] Revert "drm/i915/gem: Drop relocation slowpath"

2020-05-01 Thread Maarten Lankhorst
This reverts commit 7dc8f1143778 ("drm/i915/gem: Drop relocation
slowpath"). We need the slowpath relocation for taking ww-mutex
inside the page fault handler, and we will take this mutex when
pinning all objects.

Cc: Chris Wilson 
Cc: Matthew Auld 
Signed-off-by: Maarten Lankhorst 
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 239 +-
 1 file changed, 235 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 414859fa2673..73dfcbf07886 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1531,7 +1531,9 @@ static int eb_relocate_vma(struct i915_execbuffer *eb, 
struct eb_vma *ev)
 * we would try to acquire the struct mutex again. Obviously
 * this is bad and so lockdep complains vehemently.
 */
-   copied = __copy_from_user(r, urelocs, count * sizeof(r[0]));
+   pagefault_disable();
+   copied = __copy_from_user_inatomic(r, urelocs, count * 
sizeof(r[0]));
+   pagefault_enable();
if (unlikely(copied)) {
remain = -EFAULT;
goto out;
@@ -1579,6 +1581,236 @@ static int eb_relocate_vma(struct i915_execbuffer *eb, 
struct eb_vma *ev)
return remain;
 }
 
+static int
+eb_relocate_vma_slow(struct i915_execbuffer *eb, struct eb_vma *ev)
+{
+   const struct drm_i915_gem_exec_object2 *entry = ev->exec;
+   struct drm_i915_gem_relocation_entry *relocs =
+   u64_to_ptr(typeof(*relocs), entry->relocs_ptr);
+   unsigned int i;
+   int err;
+
+   for (i = 0; i < entry->relocation_count; i++) {
+   u64 offset = eb_relocate_entry(eb, ev, &relocs[i]);
+
+   if ((s64)offset < 0) {
+   err = (int)offset;
+   goto err;
+   }
+   }
+   err = 0;
+err:
+   reloc_cache_reset(&eb->reloc_cache);
+   return err;
+}
+
+static int check_relocations(const struct drm_i915_gem_exec_object2 *entry)
+{
+   const char __user *addr, *end;
+   unsigned long size;
+   char __maybe_unused c;
+
+   size = entry->relocation_count;
+   if (size == 0)
+   return 0;
+
+   if (size > N_RELOC(ULONG_MAX))
+   return -EINVAL;
+
+   addr = u64_to_user_ptr(entry->relocs_ptr);
+   size *= sizeof(struct drm_i915_gem_relocation_entry);
+   if (!access_ok(addr, size))
+   return -EFAULT;
+
+   end = addr + size;
+   for (; addr < end; addr += PAGE_SIZE) {
+   int err = __get_user(c, addr);
+   if (err)
+   return err;
+   }
+   return __get_user(c, end - 1);
+}
+
+static int eb_copy_relocations(const struct i915_execbuffer *eb)
+{
+   struct drm_i915_gem_relocation_entry *relocs;
+   const unsigned int count = eb->buffer_count;
+   unsigned int i;
+   int err;
+
+   for (i = 0; i < count; i++) {
+   const unsigned int nreloc = eb->exec[i].relocation_count;
+   struct drm_i915_gem_relocation_entry __user *urelocs;
+   unsigned long size;
+   unsigned long copied;
+
+   if (nreloc == 0)
+   continue;
+
+   err = check_relocations(&eb->exec[i]);
+   if (err)
+   goto err;
+
+   urelocs = u64_to_user_ptr(eb->exec[i].relocs_ptr);
+   size = nreloc * sizeof(*relocs);
+
+   relocs = kvmalloc_array(size, 1, GFP_KERNEL);
+   if (!relocs) {
+   err = -ENOMEM;
+   goto err;
+   }
+
+   /* copy_from_user is limited to < 4GiB */
+   copied = 0;
+   do {
+   unsigned int len =
+   min_t(u64, BIT_ULL(31), size - copied);
+
+   if (__copy_from_user((char *)relocs + copied,
+(char __user *)urelocs + copied,
+len))
+   goto end;
+
+   copied += len;
+   } while (copied < size);
+
+   /*
+* As we do not update the known relocation offsets after
+* relocating (due to the complexities in lock handling),
+* we need to mark them as invalid now so that we force the
+* relocation processing next time. Just in case the target
+* object is evicted and then rebound into its old
+* presumed_offset before the next execbuffer - if that
+* happened we would make the mistake of assuming that the
+* relocations were valid.
+*/
+   if (!user_access_begin

[Intel-gfx] [PATCH 07/24] drm/i915/gem: Make eb_add_lut interruptible wait on object lock.

2020-05-01 Thread Maarten Lankhorst
The lock here should be interruptible, so we can backoff if needed.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 9922dc68311f..5b4f6fb1428c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -774,7 +774,12 @@ static int __eb_add_lut(struct i915_execbuffer *eb,
if (err == 0) { /* And nor has this handle */
struct drm_i915_gem_object *obj = vma->obj;
 
-   i915_gem_object_lock(obj, NULL);
+   err = i915_gem_object_lock_interruptible(obj, NULL);
+   if (err) {
+   radix_tree_delete(&ctx->handles_vma, handle);
+   goto unlock;
+   }
+
if (idr_find(&eb->file->object_idr, handle) == obj) {
list_add(&lut->obj_link, &obj->lut_list);
} else {
@@ -783,6 +788,7 @@ static int __eb_add_lut(struct i915_execbuffer *eb,
}
i915_gem_object_unlock(obj);
}
+unlock:
mutex_unlock(&ctx->mutex);
}
if (unlikely(err))
-- 
2.26.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 21/24] drm/i915: Move i915_vma_lock in the selftests to avoid lock inversion, v2.

2020-05-01 Thread Maarten Lankhorst
Make sure vma_lock is not used as inner lock when kernel context is used,
and add ww handling where appropriate.

Signed-off-by: Maarten Lankhorst 
---
 .../i915/gem/selftests/i915_gem_coherency.c   | 26 ++--
 .../drm/i915/gem/selftests/i915_gem_mman.c| 41 ++-
 drivers/gpu/drm/i915/gt/selftest_rps.c| 30 --
 drivers/gpu/drm/i915/selftests/i915_request.c | 18 +---
 4 files changed, 75 insertions(+), 40 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
index 99f8466a108a..d93b7d9ad174 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
@@ -199,25 +199,25 @@ static int gpu_set(struct context *ctx, unsigned long 
offset, u32 v)
 
i915_gem_object_lock(ctx->obj, NULL);
err = i915_gem_object_set_to_gtt_domain(ctx->obj, true);
-   i915_gem_object_unlock(ctx->obj);
if (err)
-   return err;
+   goto out_unlock;
 
vma = i915_gem_object_ggtt_pin(ctx->obj, NULL, 0, 0, 0);
-   if (IS_ERR(vma))
-   return PTR_ERR(vma);
+   if (IS_ERR(vma)) {
+   err = PTR_ERR(vma);
+   goto out_unlock;
+   }
 
rq = intel_engine_create_kernel_request(ctx->engine);
if (IS_ERR(rq)) {
-   i915_vma_unpin(vma);
-   return PTR_ERR(rq);
+   err = PTR_ERR(rq);
+   goto out_unpin;
}
 
cs = intel_ring_begin(rq, 4);
if (IS_ERR(cs)) {
-   i915_request_add(rq);
-   i915_vma_unpin(vma);
-   return PTR_ERR(cs);
+   err = PTR_ERR(cs);
+   goto out_rq;
}
 
if (INTEL_GEN(ctx->engine->i915) >= 8) {
@@ -238,14 +238,16 @@ static int gpu_set(struct context *ctx, unsigned long 
offset, u32 v)
}
intel_ring_advance(rq, cs);
 
-   i915_vma_lock(vma);
err = i915_request_await_object(rq, vma->obj, true);
if (err == 0)
err = i915_vma_move_to_active(vma, rq, EXEC_OBJECT_WRITE);
-   i915_vma_unlock(vma);
-   i915_vma_unpin(vma);
 
+out_rq:
i915_request_add(rq);
+out_unpin:
+   i915_vma_unpin(vma);
+out_unlock:
+   i915_gem_object_unlock(ctx->obj);
 
return err;
 }
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index eec58da734bd..c8b9343cc88c 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -528,31 +528,42 @@ static int make_obj_busy(struct drm_i915_gem_object *obj)
for_each_uabi_engine(engine, i915) {
struct i915_request *rq;
struct i915_vma *vma;
+   struct i915_gem_ww_ctx ww;
int err;
 
vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
if (IS_ERR(vma))
return PTR_ERR(vma);
 
-   err = i915_vma_pin(vma, 0, 0, PIN_USER);
+   i915_gem_ww_ctx_init(&ww, false);
+retry:
+   err = i915_gem_object_lock(obj, &ww);
+   if (!err)
+   err = i915_vma_pin_ww(vma, &ww, 0, 0, PIN_USER);
if (err)
-   return err;
+   goto err;
 
rq = intel_engine_create_kernel_request(engine);
if (IS_ERR(rq)) {
-   i915_vma_unpin(vma);
-   return PTR_ERR(rq);
+   err = PTR_ERR(rq);
+   goto err_unpin;
}
 
-   i915_vma_lock(vma);
err = i915_request_await_object(rq, vma->obj, true);
if (err == 0)
err = i915_vma_move_to_active(vma, rq,
  EXEC_OBJECT_WRITE);
-   i915_vma_unlock(vma);
 
i915_request_add(rq);
+err_unpin:
i915_vma_unpin(vma);
+err:
+   if (err == -EDEADLK) {
+   err = i915_gem_ww_ctx_backoff(&ww);
+   if (!err)
+   goto retry;
+   }
+   i915_gem_ww_ctx_fini(&ww);
if (err)
return err;
}
@@ -1000,6 +1011,7 @@ static int __igt_mmap_gpu(struct drm_i915_private *i915,
for_each_uabi_engine(engine, i915) {
struct i915_request *rq;
struct i915_vma *vma;
+   struct i915_gem_ww_ctx ww;
 
vma = i915_vma_instance(obj, engine->kernel_context->vm, NULL);
if (IS_ERR(vma)) {
@@ -1007,9 +1019,13 @@ static int __igt_mmap_gpu(struct drm_i915_private *i915,
goto out_unmap;
}
 
- 

[Intel-gfx] [PATCH 20/24] drm/i915: Use ww pinning for intel_context_create_request()

2020-05-01 Thread Maarten Lankhorst
We want to get rid of intel_context_pin(), convert
intel_context_create_request() first. :)

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/i915/gt/intel_context.c | 20 +++-
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index fe9fff5a63b1..e148e2d69ae1 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -449,15 +449,25 @@ int intel_context_prepare_remote_request(struct 
intel_context *ce,
 
 struct i915_request *intel_context_create_request(struct intel_context *ce)
 {
+   struct i915_gem_ww_ctx ww;
struct i915_request *rq;
int err;
 
-   err = intel_context_pin(ce);
-   if (unlikely(err))
-   return ERR_PTR(err);
+   i915_gem_ww_ctx_init(&ww, true);
+retry:
+   err = intel_context_pin_ww(ce, &ww);
+   if (!err) {
+   rq = i915_request_create(ce);
+   intel_context_unpin(ce);
+   } else if (err == -EDEADLK) {
+   err = i915_gem_ww_ctx_backoff(&ww);
+   if (!err)
+   goto retry;
+   } else {
+   rq = ERR_PTR(err);
+   }
 
-   rq = i915_request_create(ce);
-   intel_context_unpin(ce);
+   i915_gem_ww_ctx_fini(&ww);
 
if (IS_ERR(rq))
return rq;
-- 
2.26.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] drm/i915/gt: Make timeslicing an explicit engine property

2020-05-01 Thread Chris Wilson
In order to allow userspace to rely on timeslicing to reorder their
batches, we must support preemption of those user batches. Declare
timeslicing as an explicit property that is a combination of having the
kernel support and HW support.

Suggested-by: Tvrtko Ursulin 
Fixes: 8ee36e048c98 ("drm/i915/execlists: Minimalistic timeslicing")
Signed-off-by: Chris Wilson 
Cc: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/gt/intel_engine.h   |  9 -
 drivers/gpu/drm/i915/gt/intel_engine_types.h | 18 ++
 drivers/gpu/drm/i915/gt/intel_lrc.c  |  5 -
 3 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h 
b/drivers/gpu/drm/i915/gt/intel_engine.h
index d10e52ff059f..19d0b8830905 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -332,13 +332,4 @@ intel_engine_has_preempt_reset(const struct 
intel_engine_cs *engine)
return intel_engine_has_preemption(engine);
 }
 
-static inline bool
-intel_engine_has_timeslices(const struct intel_engine_cs *engine)
-{
-   if (!IS_ACTIVE(CONFIG_DRM_I915_TIMESLICE_DURATION))
-   return false;
-
-   return intel_engine_has_semaphores(engine);
-}
-
 #endif /* _INTEL_RINGBUFFER_H_ */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 3c3225c0332f..6c676774dcd9 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -492,10 +492,11 @@ struct intel_engine_cs {
 #define I915_ENGINE_SUPPORTS_STATS   BIT(1)
 #define I915_ENGINE_HAS_PREEMPTION   BIT(2)
 #define I915_ENGINE_HAS_SEMAPHORES   BIT(3)
-#define I915_ENGINE_NEEDS_BREADCRUMB_TASKLET BIT(4)
-#define I915_ENGINE_IS_VIRTUAL   BIT(5)
-#define I915_ENGINE_HAS_RELATIVE_MMIO BIT(6)
-#define I915_ENGINE_REQUIRES_CMD_PARSER BIT(7)
+#define I915_ENGINE_HAS_TIMESLICES   BIT(4)
+#define I915_ENGINE_NEEDS_BREADCRUMB_TASKLET BIT(5)
+#define I915_ENGINE_IS_VIRTUAL   BIT(6)
+#define I915_ENGINE_HAS_RELATIVE_MMIO BIT(7)
+#define I915_ENGINE_REQUIRES_CMD_PARSER BIT(8)
unsigned int flags;
 
/*
@@ -593,6 +594,15 @@ intel_engine_has_semaphores(const struct intel_engine_cs 
*engine)
return engine->flags & I915_ENGINE_HAS_SEMAPHORES;
 }
 
+static inline bool
+intel_engine_has_timeslices(const struct intel_engine_cs *engine)
+{
+   if (!IS_ACTIVE(CONFIG_DRM_I915_TIMESLICE_DURATION))
+   return false;
+
+   return engine->flags & I915_ENGINE_HAS_TIMESLICES;
+}
+
 static inline bool
 intel_engine_needs_breadcrumb_tasklet(const struct intel_engine_cs *engine)
 {
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 4311b12542fb..d4ef344657b0 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -4801,8 +4801,11 @@ void intel_execlists_set_default_submission(struct 
intel_engine_cs *engine)
engine->flags |= I915_ENGINE_SUPPORTS_STATS;
if (!intel_vgpu_active(engine->i915)) {
engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
-   if (HAS_LOGICAL_RING_PREEMPTION(engine->i915))
+   if (HAS_LOGICAL_RING_PREEMPTION(engine->i915)) {
engine->flags |= I915_ENGINE_HAS_PREEMPTION;
+   if (IS_ACTIVE(CONFIG_DRM_I915_TIMESLICE_DURATION))
+   engine->flags |= I915_ENGINE_HAS_TIMESLICES;
+   }
}
 
if (INTEL_GEN(engine->i915) >= 12)
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/24] perf/core: Only copy-to-user after completely unlocking all locks, v3.

2020-05-01 Thread Patchwork
== Series Details ==

Series: series starting with [01/24] perf/core: Only copy-to-user after 
completely unlocking all locks, v3.
URL   : https://patchwork.freedesktop.org/series/76816/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
c7e83b1d5e8c perf/core: Only copy-to-user after completely unlocking all locks, 
v3.
-:17: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description 
(prefer a maximum 75 chars per line)
#17: 
<4> [604.892540] 8264a558 (rcu_state.barrier_mutex){+.+.}, at: 
rcu_barrier+0x23/0x190

-:106: WARNING:BAD_SIGN_OFF: Duplicate signature
#106: 
Signed-off-by: Maarten Lankhorst 

-:180: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#180: FILE: kernel/events/core.c:5174:
+__perf_read(struct perf_event *event, char __user *buf,
+   size_t count, u64 *values)

total: 0 errors, 2 warnings, 1 checks, 106 lines checked
767e1d98b897 Revert "drm/i915/gem: Drop relocation slowpath"
-:78: WARNING:LINE_SPACING: Missing a blank line after declarations
#78: FILE: drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:1628:
+   int err = __get_user(c, addr);
+   if (err)

total: 0 errors, 1 warnings, 0 checks, 257 lines checked
7fd405b049ee drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2.
-:493: WARNING:LONG_LINE: line over 100 characters
#493: FILE: drivers/gpu/drm/i915/i915_gem.c:1341:
+   while ((obj = list_first_entry_or_null(&ww->obj_list, struct 
drm_i915_gem_object, obj_link))) {

total: 0 errors, 1 warnings, 0 checks, 473 lines checked
47239de18d02 drm/i915: Remove locking from i915_gem_object_prepare_read/write
acd8e0b1b5dc drm/i915: Parse command buffer earlier in eb_relocate(slow)
9cd5eb751161 Revert "drm/i915/gem: Split eb_vma into its own allocation"
ce37b6d2a7dc drm/i915/gem: Make eb_add_lut interruptible wait on object lock.
cbaecd19dd78 drm/i915: Use per object locking in execbuf, v9.
-:703: CHECK:MULTIPLE_ASSIGNMENTS: multiple assignments should be avoided
#703: FILE: drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:2761:
+   eb.reloc_pool = eb.batch_pool = NULL;

total: 0 errors, 0 warnings, 1 checks, 708 lines checked
81ee09773f68 drm/i915: Use ww locking in intel_renderstate.
-:10: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description 
(prefer a maximum 75 chars per line)
#10: 
Convert to using ww-waiting, and make sure we always pin intel_context_state,

total: 0 errors, 1 warnings, 0 checks, 207 lines checked
853dabe4de66 drm/i915: Add ww context handling to context_barrier_task
-:19: WARNING:LONG_LINE: line over 100 characters
#19: FILE: drivers/gpu/drm/i915/gem/i915_gem_context.c:1097:
+   int (*pin)(struct intel_context *ce, struct 
i915_gem_ww_ctx *ww, void *data),

total: 0 errors, 1 warnings, 0 checks, 146 lines checked
4edaac13184b drm/i915: Nuke arguments to eb_pin_engine
030b8005d968 drm/i915: Pin engine before pinning all objects, v4.
09c8db4fbd68 drm/i915: Rework intel_context pinning to do everything outside of 
pin_mutex
-:125: CHECK:LINE_SPACING: Please don't use multiple blank lines
#125: FILE: drivers/gpu/drm/i915/gt/intel_context.c:176:
+
+

-:338: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#338: FILE: drivers/gpu/drm/i915/gt/intel_lrc.c:3443:
+   *vaddr = i915_gem_object_pin_map(ce->state->obj,
+   
i915_coherent_map_type(ce->engine->i915) |

total: 0 errors, 0 warnings, 2 checks, 435 lines checked
7b3ccc070938 drm/i915: Make sure execbuffer always passes ww state to 
i915_vma_pin.
-:95: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#95: FILE: drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:595:
+   err = i915_vma_pin_ww(vma, &eb->ww,
   entry->pad_to_size, entry->alignment,

-:203: WARNING:BLOCK_COMMENT_STYLE: Block comments use a trailing */ on a 
separate line
#203: FILE: drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:2252:
+* hsw should have this fixed, but bdw mucks it up again. */

total: 0 errors, 1 warnings, 1 checks, 841 lines checked
29441f87c632 drm/i915: Convert i915_gem_object/client_blt.c to use ww locking 
as well, v2.
62e8d8511668 drm/i915: Kill last user of intel_context_create_request outside 
of selftests
f046ee760a15 drm/i915: Convert i915_perf to ww locking as well
5b95a6acfd4a drm/i915: Dirty hack to fix selftests locking inversion
3bd190df8a2e drm/i915/selftests: Fix locking inversion in lrc selftest.
4156a9384b8c drm/i915: Use ww pinning for intel_context_create_request()
6e5953cd8828 drm/i915: Move i915_vma_lock in the selftests to avoid lock 
inversion, v2.
56909ff76ec7 drm/i915: Add ww locking to vm_fault_gtt
-:7: WARNING:COMMIT_MESSAGE: Missing commit description - Add an appropriate one

total: 0 errors, 1 warnings, 0 checks, 91 lines checked
e7190066c37f drm/i915: Add ww locking to pin_to_display_plane
-:7: WARNING:COMMIT_MESSAGE: Missing commit description - 

Re: [Intel-gfx] [PATCH 1/3] drm/i915/gem: Use chained reloc batches

2020-05-01 Thread Tvrtko Ursulin



On 01/05/2020 11:18, Chris Wilson wrote:

The ring is a precious resource: we anticipate to only use a few hundred
bytes for a request, and only try to reserve that before we start. If we
go beyond our guess in building the request, then instead of waiting at
the start of execbuf before we hold any locks or other resources, we
may trigger a wait inside a critical region. One example is in using gpu
relocations, where currently we emit a new MI_BB_START from the ring
every time we overflow a page of relocation entries. However, instead of
insert the command into the precious ring, we can chain the next page of
relocation entries as MI_BB_START from the end of the previous.

v2: Delay the emit_bb_start until after all the chained vma
synchronisation is complete. Since the buffer pool batches are idle, this
_should_ be a no-op, but one day we may some fancy async GPU bindings
for new vma!

Testcase: igt/gem_exec_reloc/basic-many-active
Signed-off-by: Chris Wilson 
---
  .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 130 +++---
  1 file changed, 111 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 414859fa2673..293bf06b65b2 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -271,6 +271,7 @@ struct i915_execbuffer {
struct i915_request *rq;
u32 *rq_cmd;
unsigned int rq_size;
+   struct i915_vma *rq_vma;
} reloc_cache;
  
  	u64 invalid_flags; /** Set of execobj.flags that are invalid */

@@ -975,20 +976,111 @@ static inline struct i915_ggtt *cache_to_ggtt(struct 
reloc_cache *cache)
return &i915->ggtt;
  }
  
+static int reloc_gpu_chain(struct reloc_cache *cache)

+{
+   struct intel_gt_buffer_pool_node *pool;
+   struct i915_request *rq = cache->rq;
+   struct i915_vma *batch;
+   u32 *cmd;
+   int err;
+
+   pool = intel_gt_get_buffer_pool(rq->engine->gt, PAGE_SIZE);
+   if (IS_ERR(pool))
+   return PTR_ERR(pool);
+
+   batch = i915_vma_instance(pool->obj, rq->context->vm, NULL);
+   if (IS_ERR(batch)) {
+   err = PTR_ERR(batch);
+   goto out_pool;
+   }
+
+   err = i915_vma_pin(batch, 0, 0, PIN_USER | PIN_NONBLOCK);
+   if (err)
+   goto out_pool;
+
+   cmd = cache->rq_cmd + cache->rq_size;
+   *cmd++ = MI_ARB_CHECK;
+   if (cache->gen >= 8) {
+   *cmd++ = MI_BATCH_BUFFER_START_GEN8;
+   *cmd++ = lower_32_bits(batch->node.start);
+   *cmd++ = upper_32_bits(batch->node.start);
+   } else {
+   *cmd++ = MI_BATCH_BUFFER_START;
+   *cmd++ = lower_32_bits(batch->node.start);
+   }
+   i915_gem_object_flush_map(cache->rq_vma->obj);
+   i915_gem_object_unpin_map(cache->rq_vma->obj);
+   cache->rq_vma = NULL;
+
+   err = intel_gt_buffer_pool_mark_active(pool, rq);
+   if (err == 0) {
+   i915_vma_lock(batch);
+   err = i915_request_await_object(rq, batch->obj, false);
+   if (err == 0)
+   err = i915_vma_move_to_active(batch, rq, 0);
+   i915_vma_unlock(batch);
+   }
+   i915_vma_unpin(batch);
+   if (err)
+   goto out_pool;
+
+   cmd = i915_gem_object_pin_map(pool->obj,


batch->obj maybe to be consistent in this block? Few lines above you get 
to it via batch.



+ cache->has_llc ?
+ I915_MAP_FORCE_WB :
+ I915_MAP_FORCE_WC);
+   if (IS_ERR(cmd)) {
+   err = PTR_ERR(cmd);
+   goto out_pool;
+   }
+
+   /* Return with batch mapping (cmd) still pinned */
+   cache->rq_cmd = cmd;
+   cache->rq_size = 0;
+   cache->rq_vma = batch;
+
+out_pool:
+   intel_gt_buffer_pool_put(pool);
+   return err;
+}
+
+static unsigned int reloc_bb_flags(const struct reloc_cache *cache)
+{
+   return cache->gen > 5 ? 0 : I915_DISPATCH_SECURE;
+}
+
  static void reloc_gpu_flush(struct reloc_cache *cache)
  {
-   struct drm_i915_gem_object *obj = cache->rq->batch->obj;
+   struct i915_request *rq;
+   int err;
  
-	GEM_BUG_ON(cache->rq_size >= obj->base.size / sizeof(u32));

-   cache->rq_cmd[cache->rq_size] = MI_BATCH_BUFFER_END;
+   rq = fetch_and_zero(&cache->rq);
+   if (!rq)
+   return;
  
-	__i915_gem_object_flush_map(obj, 0, sizeof(u32) * (cache->rq_size + 1));

-   i915_gem_object_unpin_map(obj);
+   if (cache->rq_vma) {
+   struct drm_i915_gem_object *obj = cache->rq_vma->obj;
  
-	intel_gt_chipset_flush(cache->rq->engine->gt);

+   GEM_BUG_ON(cache->rq_size >= obj->base.size / sizeof(u32));
+   cache->rq_cmd[cache->rq_size++] = MI_BATCH_BUFFER_END;
 

Re: [Intel-gfx] [PATCH 1/3] drm/i915/gem: Use chained reloc batches

2020-05-01 Thread Chris Wilson
Quoting Tvrtko Ursulin (2020-05-01 13:33:14)
> 
> On 01/05/2020 11:18, Chris Wilson wrote:
> > +
> > + err = 0;
> > + if (rq->engine->emit_init_breadcrumb)
> > + err = rq->engine->emit_init_breadcrumb(rq);
> > + if (!err)
> > + err = rq->engine->emit_bb_start(rq,
> > + rq->batch->node.start,
> > + PAGE_SIZE,
> > + reloc_bb_flags(cache));
> > + if (err)
> > + i915_request_set_error_once(rq, err);
> 
> Will this error propagate and fail the execbuf?

It fails the execution, but not the execbuf... I was thinking it was too
late for the execbuf, but return err at the end propagates nicely!

> 
> > +
> > + intel_gt_chipset_flush(rq->engine->gt);
> > + i915_request_add(rq);
> >   }

Thanks,
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/3] drm/i915/gem: Use chained reloc batches

2020-05-01 Thread Chris Wilson
Quoting Chris Wilson (2020-05-01 13:38:03)
> Quoting Tvrtko Ursulin (2020-05-01 13:33:14)
> > 
> > On 01/05/2020 11:18, Chris Wilson wrote:
> > > +
> > > + err = 0;
> > > + if (rq->engine->emit_init_breadcrumb)
> > > + err = rq->engine->emit_init_breadcrumb(rq);
> > > + if (!err)
> > > + err = rq->engine->emit_bb_start(rq,
> > > + rq->batch->node.start,
> > > + PAGE_SIZE,
> > > + reloc_bb_flags(cache));
> > > + if (err)
> > > + i915_request_set_error_once(rq, err);
> > 
> > Will this error propagate and fail the execbuf?
> 
> It fails the execution, but not the execbuf... I was thinking it was too
> late for the execbuf, but return err at the end propagates nicely!

This is much easier in #2, so I'll let the bug slide for a patch.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/3] drm/i915/gem: Use a single chained reloc batches for a single execbuf

2020-05-01 Thread Tvrtko Ursulin



On 01/05/2020 11:18, Chris Wilson wrote:

As we can now keep chaining together a relocation batch to process any
number of relocations, we can keep building that relocation batch for
all of the target vma. This avoiding emitting a new request into the
ring for each target, consuming precious ring space and a potential
stall.

Testcase: igt/gem_exec_reloc/basic-wide-active
Signed-off-by: Chris Wilson 
---
  .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 23 +++
  1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 293bf06b65b2..b224a453e2a3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -268,6 +268,7 @@ struct i915_execbuffer {
bool has_fence : 1;
bool needs_unfenced : 1;
  
+		struct i915_vma *target;

struct i915_request *rq;
u32 *rq_cmd;
unsigned int rq_size;
@@ -1087,9 +1088,6 @@ static void reloc_cache_reset(struct reloc_cache *cache)
  {
void *vaddr;
  
-	if (cache->rq)

-   reloc_gpu_flush(cache);
-
if (!cache->vaddr)
return;
  
@@ -1282,7 +1280,6 @@ static int reloc_move_to_gpu(struct i915_request *rq, struct i915_vma *vma)

  }
  
  static int __reloc_gpu_alloc(struct i915_execbuffer *eb,

-struct i915_vma *vma,
 unsigned int len)
  {
struct reloc_cache *cache = &eb->reloc_cache;
@@ -1305,7 +1302,7 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
goto out_pool;
}
  
-	batch = i915_vma_instance(pool->obj, vma->vm, NULL);

+   batch = i915_vma_instance(pool->obj, eb->context->vm, NULL);
if (IS_ERR(batch)) {
err = PTR_ERR(batch);
goto err_unmap;
@@ -1325,10 +1322,6 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
if (err)
goto err_request;
  
-	err = reloc_move_to_gpu(rq, vma);

-   if (err)
-   goto err_request;
-
i915_vma_lock(batch);
err = i915_request_await_object(rq, batch->obj, false);
if (err == 0)
@@ -1373,9 +1366,17 @@ static u32 *reloc_gpu(struct i915_execbuffer *eb,
if (!intel_engine_can_store_dword(eb->engine))
return ERR_PTR(-ENODEV);
  
-		err = __reloc_gpu_alloc(eb, vma, len);

+   err = __reloc_gpu_alloc(eb, len);
+   if (unlikely(err))
+   return ERR_PTR(err);
+   }
+
+   if (vma != cache->target) {
+   err = reloc_move_to_gpu(cache->rq, vma);
if (unlikely(err))
return ERR_PTR(err);
+
+   cache->target = vma;
}
  
  	if (unlikely(cache->rq_size + len > PAGE_SIZE / sizeof(u32) - 4)) {

@@ -1694,6 +1695,8 @@ static int eb_relocate(struct i915_execbuffer *eb)
if (err)
return err;
}
+
+   reloc_gpu_flush(&eb->reloc_cache);
}
  
  	return 0;




Reviewed-by: Tvrtko Ursulin 

Regards,

Tvrtko
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 3/3] drm/i915/gem: Try an alternate engine for relocations

2020-05-01 Thread Tvrtko Ursulin



On 01/05/2020 11:19, Chris Wilson wrote:

If at first we don't succeed, try try again.

No all engines may support the MI ops we need to perform asynchronous
relocation patching, and so we end up failing back to a synchronous
operation that has a liability of blocking. However, Tvrtko pointed out
we don't need to use the same engine to perform the relocations as we
are planning to execute the execbuf on, and so if we switch over to a
working engine, we can perform the relocation asynchronously. The user
execbuf will be queued after the relocations by virtue of fencing.

This patch creates a new context per execbuf requiring asynchronous
relocations on an unusable engines. This is perhaps a bit excessive and
can be amoriated by a small context cache, but for the moment we only
need it for working around a little used engine on Sandybridge, and only
if relocations are actually required.

Now we just need to teach the relocation code to handle physical
addressing for gen2/3, and we should then have universal support!

Suggested-by: Tvrtko Ursulin 
Testcase: igt/gem_exec_reloc/basic-spin # snb
Signed-off-by: Chris Wilson 
Cc: Tvrtko Ursulin 
---
  .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 32 ---
  1 file changed, 27 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index b224a453e2a3..6d649de3a796 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1280,6 +1280,7 @@ static int reloc_move_to_gpu(struct i915_request *rq, 
struct i915_vma *vma)
  }
  
  static int __reloc_gpu_alloc(struct i915_execbuffer *eb,

+struct intel_engine_cs *engine,
 unsigned int len)
  {
struct reloc_cache *cache = &eb->reloc_cache;
@@ -1289,7 +1290,7 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
u32 *cmd;
int err;
  
-	pool = intel_gt_get_buffer_pool(eb->engine->gt, PAGE_SIZE);

+   pool = intel_gt_get_buffer_pool(engine->gt, PAGE_SIZE);
if (IS_ERR(pool))
return PTR_ERR(pool);
  
@@ -1312,7 +1313,23 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,

if (err)
goto err_unmap;
  
-	rq = i915_request_create(eb->context);

+   if (engine == eb->context->engine) {
+   rq = i915_request_create(eb->context);
+   } else {
+   struct intel_context *ce;
+
+   ce = intel_context_create(engine);
+   if (IS_ERR(ce)) {
+   err = PTR_ERR(rq);
+   goto err_unpin;
+   }
+
+   i915_vm_put(ce->vm);
+   ce->vm = i915_vm_get(eb->context->vm);
+
+   rq = intel_context_create_request(ce);
+   intel_context_put(ce);
+   }
if (IS_ERR(rq)) {
err = PTR_ERR(rq);
goto err_unpin;
@@ -1363,10 +1380,15 @@ static u32 *reloc_gpu(struct i915_execbuffer *eb,
int err;
  
  	if (unlikely(!cache->rq)) {

-   if (!intel_engine_can_store_dword(eb->engine))
-   return ERR_PTR(-ENODEV);
+   struct intel_engine_cs *engine = eb->engine;
+
+   if (!intel_engine_can_store_dword(engine)) {
+   engine = engine->gt->engine_class[COPY_ENGINE_CLASS][0];
+   if (!engine || !intel_engine_can_store_dword(engine))
+   return ERR_PTR(-ENODEV);
+   }
  
-		err = __reloc_gpu_alloc(eb, len);

+   err = __reloc_gpu_alloc(eb, engine, len);
if (unlikely(err))
return ERR_PTR(err);
}



If you are not worried about the context create dance on SNB, and it is 
limited to VCS, then neither am I.


Reviewed-by: Tvrtko Ursulin 

Regards,

Tvrtko
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 3/3] drm/i915/gem: Try an alternate engine for relocations

2020-05-01 Thread Chris Wilson
Quoting Tvrtko Ursulin (2020-05-01 13:47:36)
> 
> On 01/05/2020 11:19, Chris Wilson wrote:
> If you are not worried about the context create dance on SNB, and it is 
> limited to VCS, then neither am I.

In the short term, since it's limited to vcs on SNB so that means it is
just a plain kmalloc (as there is no logical state), I'm not worrying.

Longer term, I do intend on having a pool of logical states cached on
the engine.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] ✗ Fi.CI.BAT: failure for series starting with [01/24] perf/core: Only copy-to-user after completely unlocking all locks, v3.

2020-05-01 Thread Patchwork
== Series Details ==

Series: series starting with [01/24] perf/core: Only copy-to-user after 
completely unlocking all locks, v3.
URL   : https://patchwork.freedesktop.org/series/76816/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_8405 -> Patchwork_17539


Summary
---

  **FAILURE**

  Serious unknown changes coming with Patchwork_17539 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_17539, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17539/index.html

Possible new issues
---

  Here are the unknown changes that may have been introduced in Patchwork_17539:

### IGT changes ###

 Possible regressions 

  * igt@gem_render_tiled_blits@basic:
- fi-pnv-d510:[PASS][1] -> [DMESG-WARN][2]
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/fi-pnv-d510/igt@gem_render_tiled_bl...@basic.html
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17539/fi-pnv-d510/igt@gem_render_tiled_bl...@basic.html
- fi-gdg-551: [PASS][3] -> [DMESG-WARN][4]
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/fi-gdg-551/igt@gem_render_tiled_bl...@basic.html
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17539/fi-gdg-551/igt@gem_render_tiled_bl...@basic.html
- fi-blb-e6850:   [PASS][5] -> [DMESG-WARN][6]
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/fi-blb-e6850/igt@gem_render_tiled_bl...@basic.html
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17539/fi-blb-e6850/igt@gem_render_tiled_bl...@basic.html

  * igt@i915_selftest@live@gem_contexts:
- fi-cfl-8109u:   [PASS][7] -> [DMESG-WARN][8]
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/fi-cfl-8109u/igt@i915_selftest@live@gem_contexts.html
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17539/fi-cfl-8109u/igt@i915_selftest@live@gem_contexts.html
- fi-skl-lmem:[PASS][9] -> [DMESG-WARN][10]
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/fi-skl-lmem/igt@i915_selftest@live@gem_contexts.html
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17539/fi-skl-lmem/igt@i915_selftest@live@gem_contexts.html

  * igt@i915_selftest@live@gt_pm:
- fi-kbl-x1275:   [PASS][11] -> [INCOMPLETE][12]
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/fi-kbl-x1275/igt@i915_selftest@live@gt_pm.html
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17539/fi-kbl-x1275/igt@i915_selftest@live@gt_pm.html

  
Known issues


  Here are the changes found in Patchwork_17539 that come from known issues:

### IGT changes ###

 Issues hit 

  * igt@i915_selftest@live@evict:
- fi-bwr-2160:[PASS][13] -> [INCOMPLETE][14] ([i915#489])
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/fi-bwr-2160/igt@i915_selftest@l...@evict.html
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17539/fi-bwr-2160/igt@i915_selftest@l...@evict.html

  
  [i915#489]: https://gitlab.freedesktop.org/drm/intel/issues/489


Participating hosts (50 -> 43)
--

  Missing(7): fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan 
fi-ctg-p8600 fi-byt-clapper fi-bdw-samus 


Build changes
-

  * CI: CI-20190529 -> None
  * Linux: CI_DRM_8405 -> Patchwork_17539

  CI-20190529: 20190529
  CI_DRM_8405: 83efffba539b475ce7e3fb96aeae7ee744309ff7 @ 
git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5623: 8838c73169ea249e6e86aaed35e5178f60f4ef7d @ 
git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_17539: 2d420146a045549c0759cf0f7ebc984bc09d0dd6 @ 
git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

2d420146a045 drm/i915: Ensure we hold the pin mutex
e7190066c37f drm/i915: Add ww locking to pin_to_display_plane
56909ff76ec7 drm/i915: Add ww locking to vm_fault_gtt
6e5953cd8828 drm/i915: Move i915_vma_lock in the selftests to avoid lock 
inversion, v2.
4156a9384b8c drm/i915: Use ww pinning for intel_context_create_request()
3bd190df8a2e drm/i915/selftests: Fix locking inversion in lrc selftest.
5b95a6acfd4a drm/i915: Dirty hack to fix selftests locking inversion
f046ee760a15 drm/i915: Convert i915_perf to ww locking as well
62e8d8511668 drm/i915: Kill last user of intel_context_create_request outside 
of selftests
29441f87c632 drm/i915: Convert i915_gem_object/client_blt.c to use ww locking 
as well, v2.
7b3ccc070938 drm/i915: Make sure execbuffer always passes ww state to 
i915_vma_pin.
09c8db4fbd68 drm/i915: Rework intel_context pinning to do everything outside of 
pin_mutex
030b8005d968 drm/i915: Pin engine before pinning all objects, v4.
4edaac13184b drm/i915: Nuke arguments to eb_pin_engine
853dabe4de66 

[Intel-gfx] [PATCH 2/3] drm/i915/gem: Use a single chained reloc batches for a single execbuf

2020-05-01 Thread Chris Wilson
As we can now keep chaining together a relocation batch to process any
number of relocations, we can keep building that relocation batch for
all of the target vma. This avoiding emitting a new request into the
ring for each target, consuming precious ring space and a potential
stall.

v2: Propagate the failure from submitting the relocation batch.

Testcase: igt/gem_exec_reloc/basic-wide-active
Signed-off-by: Chris Wilson 
Cc: Tvrtko Ursulin 
Reviewed-by: Tvrtko Ursulin  #v1
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 31 ---
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 0874976b1cf7..4c4b9e0e75bc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -268,6 +268,7 @@ struct i915_execbuffer {
bool has_fence : 1;
bool needs_unfenced : 1;
 
+   struct i915_vma *target;
struct i915_request *rq;
u32 *rq_cmd;
unsigned int rq_size;
@@ -1051,14 +1052,14 @@ static unsigned int reloc_bb_flags(const struct 
reloc_cache *cache)
return cache->gen > 5 ? 0 : I915_DISPATCH_SECURE;
 }
 
-static void reloc_gpu_flush(struct reloc_cache *cache)
+static int reloc_gpu_flush(struct reloc_cache *cache)
 {
struct i915_request *rq;
int err;
 
rq = fetch_and_zero(&cache->rq);
if (!rq)
-   return;
+   return 0;
 
if (cache->rq_vma) {
struct drm_i915_gem_object *obj = cache->rq_vma->obj;
@@ -1084,15 +1085,14 @@ static void reloc_gpu_flush(struct reloc_cache *cache)
 
intel_gt_chipset_flush(rq->engine->gt);
i915_request_add(rq);
+
+   return err;
 }
 
 static void reloc_cache_reset(struct reloc_cache *cache)
 {
void *vaddr;
 
-   if (cache->rq)
-   reloc_gpu_flush(cache);
-
if (!cache->vaddr)
return;
 
@@ -1285,7 +1285,6 @@ static int reloc_move_to_gpu(struct i915_request *rq, 
struct i915_vma *vma)
 }
 
 static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
-struct i915_vma *vma,
 unsigned int len)
 {
struct reloc_cache *cache = &eb->reloc_cache;
@@ -1308,7 +1307,7 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
goto out_pool;
}
 
-   batch = i915_vma_instance(pool->obj, vma->vm, NULL);
+   batch = i915_vma_instance(pool->obj, eb->context->vm, NULL);
if (IS_ERR(batch)) {
err = PTR_ERR(batch);
goto err_unmap;
@@ -1328,10 +1327,6 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
if (err)
goto err_request;
 
-   err = reloc_move_to_gpu(rq, vma);
-   if (err)
-   goto err_request;
-
i915_vma_lock(batch);
err = i915_request_await_object(rq, batch->obj, false);
if (err == 0)
@@ -1376,11 +1371,19 @@ static u32 *reloc_gpu(struct i915_execbuffer *eb,
if (!intel_engine_can_store_dword(eb->engine))
return ERR_PTR(-ENODEV);
 
-   err = __reloc_gpu_alloc(eb, vma, len);
+   err = __reloc_gpu_alloc(eb, len);
if (unlikely(err))
return ERR_PTR(err);
}
 
+   if (vma != cache->target) {
+   err = reloc_move_to_gpu(cache->rq, vma);
+   if (unlikely(err))
+   return ERR_PTR(err);
+
+   cache->target = vma;
+   }
+
if (unlikely(cache->rq_size + len >
 PAGE_SIZE / sizeof(u32) - RELOC_TAIL)) {
err = reloc_gpu_chain(cache);
@@ -1698,6 +1701,10 @@ static int eb_relocate(struct i915_execbuffer *eb)
if (err)
return err;
}
+
+   err = reloc_gpu_flush(&eb->reloc_cache);
+   if (err)
+   return err;
}
 
return 0;
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 1/3] drm/i915/gem: Use chained reloc batches

2020-05-01 Thread Chris Wilson
The ring is a precious resource: we anticipate to only use a few hundred
bytes for a request, and only try to reserve that before we start. If we
go beyond our guess in building the request, then instead of waiting at
the start of execbuf before we hold any locks or other resources, we
may trigger a wait inside a critical region. One example is in using gpu
relocations, where currently we emit a new MI_BB_START from the ring
every time we overflow a page of relocation entries. However, instead of
insert the command into the precious ring, we can chain the next page of
relocation entries as MI_BB_START from the end of the previous.

v2: Delay the emit_bb_start until after all the chained vma
synchronisation is complete. Since the buffer pool batches are idle, this
_should_ be a no-op, but one day we may some fancy async GPU bindings
for new vma!

v3: Use pool/batch consitently, once we start thinking in terms of the
batch vma, use batch->obj.
v4: Explain the magic number 4.

Tvrtko spotted that we lose propagation of the error for failing to
submit the relocation request; that's easier to fix up in the next
patch.

Testcase: igt/gem_exec_reloc/basic-many-active
Signed-off-by: Chris Wilson 
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 134 +++---
 1 file changed, 115 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 414859fa2673..0874976b1cf7 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -271,6 +271,7 @@ struct i915_execbuffer {
struct i915_request *rq;
u32 *rq_cmd;
unsigned int rq_size;
+   struct i915_vma *rq_vma;
} reloc_cache;
 
u64 invalid_flags; /** Set of execobj.flags that are invalid */
@@ -975,20 +976,114 @@ static inline struct i915_ggtt *cache_to_ggtt(struct 
reloc_cache *cache)
return &i915->ggtt;
 }
 
+#define RELOC_TAIL 4
+
+static int reloc_gpu_chain(struct reloc_cache *cache)
+{
+   struct intel_gt_buffer_pool_node *pool;
+   struct i915_request *rq = cache->rq;
+   struct i915_vma *batch;
+   u32 *cmd;
+   int err;
+
+   pool = intel_gt_get_buffer_pool(rq->engine->gt, PAGE_SIZE);
+   if (IS_ERR(pool))
+   return PTR_ERR(pool);
+
+   batch = i915_vma_instance(pool->obj, rq->context->vm, NULL);
+   if (IS_ERR(batch)) {
+   err = PTR_ERR(batch);
+   goto out_pool;
+   }
+
+   err = i915_vma_pin(batch, 0, 0, PIN_USER | PIN_NONBLOCK);
+   if (err)
+   goto out_pool;
+
+   GEM_BUG_ON(cache->rq_size + RELOC_TAIL > PAGE_SIZE  / sizeof(u32));
+   cmd = cache->rq_cmd + cache->rq_size;
+   *cmd++ = MI_ARB_CHECK;
+   if (cache->gen >= 8) {
+   *cmd++ = MI_BATCH_BUFFER_START_GEN8;
+   *cmd++ = lower_32_bits(batch->node.start);
+   *cmd++ = upper_32_bits(batch->node.start);
+   } else {
+   *cmd++ = MI_BATCH_BUFFER_START;
+   *cmd++ = lower_32_bits(batch->node.start);
+   }
+   i915_gem_object_flush_map(cache->rq_vma->obj);
+   i915_gem_object_unpin_map(cache->rq_vma->obj);
+   cache->rq_vma = NULL;
+
+   err = intel_gt_buffer_pool_mark_active(pool, rq);
+   if (err == 0) {
+   i915_vma_lock(batch);
+   err = i915_request_await_object(rq, batch->obj, false);
+   if (err == 0)
+   err = i915_vma_move_to_active(batch, rq, 0);
+   i915_vma_unlock(batch);
+   }
+   i915_vma_unpin(batch);
+   if (err)
+   goto out_pool;
+
+   cmd = i915_gem_object_pin_map(batch->obj,
+ cache->has_llc ?
+ I915_MAP_FORCE_WB :
+ I915_MAP_FORCE_WC);
+   if (IS_ERR(cmd)) {
+   err = PTR_ERR(cmd);
+   goto out_pool;
+   }
+
+   /* Return with batch mapping (cmd) still pinned */
+   cache->rq_cmd = cmd;
+   cache->rq_size = 0;
+   cache->rq_vma = batch;
+
+out_pool:
+   intel_gt_buffer_pool_put(pool);
+   return err;
+}
+
+static unsigned int reloc_bb_flags(const struct reloc_cache *cache)
+{
+   return cache->gen > 5 ? 0 : I915_DISPATCH_SECURE;
+}
+
 static void reloc_gpu_flush(struct reloc_cache *cache)
 {
-   struct drm_i915_gem_object *obj = cache->rq->batch->obj;
+   struct i915_request *rq;
+   int err;
 
-   GEM_BUG_ON(cache->rq_size >= obj->base.size / sizeof(u32));
-   cache->rq_cmd[cache->rq_size] = MI_BATCH_BUFFER_END;
+   rq = fetch_and_zero(&cache->rq);
+   if (!rq)
+   return;
 
-   __i915_gem_object_flush_map(obj, 0, sizeof(u32) * (cache->rq_size + 1));
-   i915_gem_object_unpin_map(obj);
+   if (cache->rq_vma) {
+   struct drm_i915_g

[Intel-gfx] [PATCH 3/3] drm/i915/gem: Try an alternate engine for relocations

2020-05-01 Thread Chris Wilson
If at first we don't succeed, try try again.

Not all engines may support the MI ops we need to perform asynchronous
relocation patching, and so we end up falling back to a synchronous
operation that has a liability of blocking. However, Tvrtko pointed out
we don't need to use the same engine to perform the relocations as we
are planning to execute the execbuf on, and so if we switch over to a
working engine, we can perform the relocation asynchronously. The user
execbuf will be queued after the relocations by virtue of fencing.

This patch creates a new context per execbuf requiring asynchronous
relocations on an unusable engines. This is perhaps a bit excessive and
can be ameliorated by a small context cache, but for the moment we only
need it for working around a little used engine on Sandybridge, and only
if relocations are actually required to an active batch buffer.

Now we just need to teach the relocation code to handle physical
addressing for gen2/3, and we should then have universal support!

Suggested-by: Tvrtko Ursulin 
Testcase: igt/gem_exec_reloc/basic-spin # snb
Signed-off-by: Chris Wilson 
Cc: Tvrtko Ursulin 
Reviewed-by: Tvrtko Ursulin 
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 32 ---
 1 file changed, 27 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 4c4b9e0e75bc..3e02922bf68e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1285,6 +1285,7 @@ static int reloc_move_to_gpu(struct i915_request *rq, 
struct i915_vma *vma)
 }
 
 static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
+struct intel_engine_cs *engine,
 unsigned int len)
 {
struct reloc_cache *cache = &eb->reloc_cache;
@@ -1294,7 +1295,7 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
u32 *cmd;
int err;
 
-   pool = intel_gt_get_buffer_pool(eb->engine->gt, PAGE_SIZE);
+   pool = intel_gt_get_buffer_pool(engine->gt, PAGE_SIZE);
if (IS_ERR(pool))
return PTR_ERR(pool);
 
@@ -1317,7 +1318,23 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
if (err)
goto err_unmap;
 
-   rq = i915_request_create(eb->context);
+   if (engine == eb->context->engine) {
+   rq = i915_request_create(eb->context);
+   } else {
+   struct intel_context *ce;
+
+   ce = intel_context_create(engine);
+   if (IS_ERR(ce)) {
+   err = PTR_ERR(rq);
+   goto err_unpin;
+   }
+
+   i915_vm_put(ce->vm);
+   ce->vm = i915_vm_get(eb->context->vm);
+
+   rq = intel_context_create_request(ce);
+   intel_context_put(ce);
+   }
if (IS_ERR(rq)) {
err = PTR_ERR(rq);
goto err_unpin;
@@ -1368,10 +1385,15 @@ static u32 *reloc_gpu(struct i915_execbuffer *eb,
int err;
 
if (unlikely(!cache->rq)) {
-   if (!intel_engine_can_store_dword(eb->engine))
-   return ERR_PTR(-ENODEV);
+   struct intel_engine_cs *engine = eb->engine;
+
+   if (!intel_engine_can_store_dword(engine)) {
+   engine = engine->gt->engine_class[COPY_ENGINE_CLASS][0];
+   if (!engine || !intel_engine_can_store_dword(engine))
+   return ERR_PTR(-ENODEV);
+   }
 
-   err = __reloc_gpu_alloc(eb, len);
+   err = __reloc_gpu_alloc(eb, engine, len);
if (unlikely(err))
return ERR_PTR(err);
}
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915/gt: Make timeslicing an explicit engine property

2020-05-01 Thread Patchwork
== Series Details ==

Series: drm/i915/gt: Make timeslicing an explicit engine property
URL   : https://patchwork.freedesktop.org/series/76817/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_8405 -> Patchwork_17540


Summary
---

  **SUCCESS**

  No regressions found.

  External URL: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17540/index.html

Known issues


  Here are the changes found in Patchwork_17540 that come from known issues:

### IGT changes ###

 Issues hit 

  * igt@i915_selftest@live@active:
- fi-kbl-r:   [PASS][1] -> [DMESG-FAIL][2] ([i915#666])
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/fi-kbl-r/igt@i915_selftest@l...@active.html
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17540/fi-kbl-r/igt@i915_selftest@l...@active.html

  
 Possible fixes 

  * igt@i915_selftest@live@hugepages:
- fi-bwr-2160:[INCOMPLETE][3] ([i915#489]) -> [PASS][4]
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/fi-bwr-2160/igt@i915_selftest@l...@hugepages.html
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17540/fi-bwr-2160/igt@i915_selftest@l...@hugepages.html

  
  [i915#489]: https://gitlab.freedesktop.org/drm/intel/issues/489
  [i915#666]: https://gitlab.freedesktop.org/drm/intel/issues/666


Participating hosts (50 -> 43)
--

  Missing(7): fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan 
fi-ctg-p8600 fi-byt-clapper fi-bdw-samus 


Build changes
-

  * CI: CI-20190529 -> None
  * Linux: CI_DRM_8405 -> Patchwork_17540

  CI-20190529: 20190529
  CI_DRM_8405: 83efffba539b475ce7e3fb96aeae7ee744309ff7 @ 
git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5623: 8838c73169ea249e6e86aaed35e5178f60f4ef7d @ 
git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_17540: 65124f85c5e60d2be8cf16168be081e836bb7619 @ 
git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

65124f85c5e6 drm/i915/gt: Make timeslicing an explicit engine property

== Logs ==

For more details see: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17540/index.html
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/3] drm/i915/gem: Use chained reloc batches

2020-05-01 Thread Tvrtko Ursulin



On 01/05/2020 14:02, Chris Wilson wrote:

The ring is a precious resource: we anticipate to only use a few hundred
bytes for a request, and only try to reserve that before we start. If we
go beyond our guess in building the request, then instead of waiting at
the start of execbuf before we hold any locks or other resources, we
may trigger a wait inside a critical region. One example is in using gpu
relocations, where currently we emit a new MI_BB_START from the ring
every time we overflow a page of relocation entries. However, instead of
insert the command into the precious ring, we can chain the next page of
relocation entries as MI_BB_START from the end of the previous.

v2: Delay the emit_bb_start until after all the chained vma
synchronisation is complete. Since the buffer pool batches are idle, this
_should_ be a no-op, but one day we may some fancy async GPU bindings
for new vma!

v3: Use pool/batch consitently, once we start thinking in terms of the
batch vma, use batch->obj.
v4: Explain the magic number 4.

Tvrtko spotted that we lose propagation of the error for failing to
submit the relocation request; that's easier to fix up in the next
patch.

Testcase: igt/gem_exec_reloc/basic-many-active
Signed-off-by: Chris Wilson 
---
  .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 134 +++---
  1 file changed, 115 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 414859fa2673..0874976b1cf7 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -271,6 +271,7 @@ struct i915_execbuffer {
struct i915_request *rq;
u32 *rq_cmd;
unsigned int rq_size;
+   struct i915_vma *rq_vma;
} reloc_cache;
  
  	u64 invalid_flags; /** Set of execobj.flags that are invalid */

@@ -975,20 +976,114 @@ static inline struct i915_ggtt *cache_to_ggtt(struct 
reloc_cache *cache)
return &i915->ggtt;
  }
  
+#define RELOC_TAIL 4

+
+static int reloc_gpu_chain(struct reloc_cache *cache)
+{
+   struct intel_gt_buffer_pool_node *pool;
+   struct i915_request *rq = cache->rq;
+   struct i915_vma *batch;
+   u32 *cmd;
+   int err;
+
+   pool = intel_gt_get_buffer_pool(rq->engine->gt, PAGE_SIZE);
+   if (IS_ERR(pool))
+   return PTR_ERR(pool);
+
+   batch = i915_vma_instance(pool->obj, rq->context->vm, NULL);
+   if (IS_ERR(batch)) {
+   err = PTR_ERR(batch);
+   goto out_pool;
+   }
+
+   err = i915_vma_pin(batch, 0, 0, PIN_USER | PIN_NONBLOCK);
+   if (err)
+   goto out_pool;
+
+   GEM_BUG_ON(cache->rq_size + RELOC_TAIL > PAGE_SIZE  / sizeof(u32));
+   cmd = cache->rq_cmd + cache->rq_size;
+   *cmd++ = MI_ARB_CHECK;
+   if (cache->gen >= 8) {
+   *cmd++ = MI_BATCH_BUFFER_START_GEN8;
+   *cmd++ = lower_32_bits(batch->node.start);
+   *cmd++ = upper_32_bits(batch->node.start);
+   } else {
+   *cmd++ = MI_BATCH_BUFFER_START;
+   *cmd++ = lower_32_bits(batch->node.start);
+   }
+   i915_gem_object_flush_map(cache->rq_vma->obj);
+   i915_gem_object_unpin_map(cache->rq_vma->obj);
+   cache->rq_vma = NULL;
+
+   err = intel_gt_buffer_pool_mark_active(pool, rq);
+   if (err == 0) {
+   i915_vma_lock(batch);
+   err = i915_request_await_object(rq, batch->obj, false);
+   if (err == 0)
+   err = i915_vma_move_to_active(batch, rq, 0);
+   i915_vma_unlock(batch);
+   }
+   i915_vma_unpin(batch);
+   if (err)
+   goto out_pool;
+
+   cmd = i915_gem_object_pin_map(batch->obj,
+ cache->has_llc ?
+ I915_MAP_FORCE_WB :
+ I915_MAP_FORCE_WC);
+   if (IS_ERR(cmd)) {
+   err = PTR_ERR(cmd);
+   goto out_pool;
+   }
+
+   /* Return with batch mapping (cmd) still pinned */
+   cache->rq_cmd = cmd;
+   cache->rq_size = 0;
+   cache->rq_vma = batch;
+
+out_pool:
+   intel_gt_buffer_pool_put(pool);
+   return err;
+}
+
+static unsigned int reloc_bb_flags(const struct reloc_cache *cache)
+{
+   return cache->gen > 5 ? 0 : I915_DISPATCH_SECURE;
+}
+
  static void reloc_gpu_flush(struct reloc_cache *cache)
  {
-   struct drm_i915_gem_object *obj = cache->rq->batch->obj;
+   struct i915_request *rq;
+   int err;
  
-	GEM_BUG_ON(cache->rq_size >= obj->base.size / sizeof(u32));

-   cache->rq_cmd[cache->rq_size] = MI_BATCH_BUFFER_END;
+   rq = fetch_and_zero(&cache->rq);
+   if (!rq)
+   return;
  
-	__i915_gem_object_flush_map(obj, 0, sizeof(u32) * (cache->rq_size + 1));

-   i915_gem_object_unpin_map(obj);
+   if (cache->rq

Re: [Intel-gfx] [PATCH 2/3] drm/i915/gem: Use a single chained reloc batches for a single execbuf

2020-05-01 Thread Tvrtko Ursulin



On 01/05/2020 14:02, Chris Wilson wrote:

As we can now keep chaining together a relocation batch to process any
number of relocations, we can keep building that relocation batch for
all of the target vma. This avoiding emitting a new request into the
ring for each target, consuming precious ring space and a potential
stall.

v2: Propagate the failure from submitting the relocation batch.

Testcase: igt/gem_exec_reloc/basic-wide-active
Signed-off-by: Chris Wilson 
Cc: Tvrtko Ursulin 
Reviewed-by: Tvrtko Ursulin  #v1
---
  .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 31 ---
  1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 0874976b1cf7..4c4b9e0e75bc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -268,6 +268,7 @@ struct i915_execbuffer {
bool has_fence : 1;
bool needs_unfenced : 1;
  
+		struct i915_vma *target;

struct i915_request *rq;
u32 *rq_cmd;
unsigned int rq_size;
@@ -1051,14 +1052,14 @@ static unsigned int reloc_bb_flags(const struct 
reloc_cache *cache)
return cache->gen > 5 ? 0 : I915_DISPATCH_SECURE;
  }
  
-static void reloc_gpu_flush(struct reloc_cache *cache)

+static int reloc_gpu_flush(struct reloc_cache *cache)
  {
struct i915_request *rq;
int err;
  
  	rq = fetch_and_zero(&cache->rq);

if (!rq)
-   return;
+   return 0;
  
  	if (cache->rq_vma) {

struct drm_i915_gem_object *obj = cache->rq_vma->obj;
@@ -1084,15 +1085,14 @@ static void reloc_gpu_flush(struct reloc_cache *cache)
  
  	intel_gt_chipset_flush(rq->engine->gt);

i915_request_add(rq);
+
+   return err;
  }
  
  static void reloc_cache_reset(struct reloc_cache *cache)

  {
void *vaddr;
  
-	if (cache->rq)

-   reloc_gpu_flush(cache);
-
if (!cache->vaddr)
return;
  
@@ -1285,7 +1285,6 @@ static int reloc_move_to_gpu(struct i915_request *rq, struct i915_vma *vma)

  }
  
  static int __reloc_gpu_alloc(struct i915_execbuffer *eb,

-struct i915_vma *vma,
 unsigned int len)
  {
struct reloc_cache *cache = &eb->reloc_cache;
@@ -1308,7 +1307,7 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
goto out_pool;
}
  
-	batch = i915_vma_instance(pool->obj, vma->vm, NULL);

+   batch = i915_vma_instance(pool->obj, eb->context->vm, NULL);
if (IS_ERR(batch)) {
err = PTR_ERR(batch);
goto err_unmap;
@@ -1328,10 +1327,6 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
if (err)
goto err_request;
  
-	err = reloc_move_to_gpu(rq, vma);

-   if (err)
-   goto err_request;
-
i915_vma_lock(batch);
err = i915_request_await_object(rq, batch->obj, false);
if (err == 0)
@@ -1376,11 +1371,19 @@ static u32 *reloc_gpu(struct i915_execbuffer *eb,
if (!intel_engine_can_store_dword(eb->engine))
return ERR_PTR(-ENODEV);
  
-		err = __reloc_gpu_alloc(eb, vma, len);

+   err = __reloc_gpu_alloc(eb, len);
if (unlikely(err))
return ERR_PTR(err);
}
  
+	if (vma != cache->target) {

+   err = reloc_move_to_gpu(cache->rq, vma);
+   if (unlikely(err))
+   return ERR_PTR(err);
+
+   cache->target = vma;
+   }
+
if (unlikely(cache->rq_size + len >
 PAGE_SIZE / sizeof(u32) - RELOC_TAIL)) {
err = reloc_gpu_chain(cache);
@@ -1698,6 +1701,10 @@ static int eb_relocate(struct i915_execbuffer *eb)
if (err)
return err;
}
+
+   err = reloc_gpu_flush(&eb->reloc_cache);
+   if (err)
+   return err;
}
  
  	return 0;




Reviewed-by: Tvrtko Ursulin 

Regards,

Tvrtko
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915/gt: Make timeslicing an explicit engine property

2020-05-01 Thread Tvrtko Ursulin



On 01/05/2020 13:22, Chris Wilson wrote:

In order to allow userspace to rely on timeslicing to reorder their
batches, we must support preemption of those user batches. Declare
timeslicing as an explicit property that is a combination of having the
kernel support and HW support.

Suggested-by: Tvrtko Ursulin 
Fixes: 8ee36e048c98 ("drm/i915/execlists: Minimalistic timeslicing")
Signed-off-by: Chris Wilson 
Cc: Tvrtko Ursulin 
---
  drivers/gpu/drm/i915/gt/intel_engine.h   |  9 -
  drivers/gpu/drm/i915/gt/intel_engine_types.h | 18 ++
  drivers/gpu/drm/i915/gt/intel_lrc.c  |  5 -
  3 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h 
b/drivers/gpu/drm/i915/gt/intel_engine.h
index d10e52ff059f..19d0b8830905 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -332,13 +332,4 @@ intel_engine_has_preempt_reset(const struct 
intel_engine_cs *engine)
return intel_engine_has_preemption(engine);
  }
  
-static inline bool

-intel_engine_has_timeslices(const struct intel_engine_cs *engine)
-{
-   if (!IS_ACTIVE(CONFIG_DRM_I915_TIMESLICE_DURATION))
-   return false;
-
-   return intel_engine_has_semaphores(engine);
-}
-
  #endif /* _INTEL_RINGBUFFER_H_ */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 3c3225c0332f..6c676774dcd9 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -492,10 +492,11 @@ struct intel_engine_cs {
  #define I915_ENGINE_SUPPORTS_STATS   BIT(1)
  #define I915_ENGINE_HAS_PREEMPTION   BIT(2)
  #define I915_ENGINE_HAS_SEMAPHORES   BIT(3)
-#define I915_ENGINE_NEEDS_BREADCRUMB_TASKLET BIT(4)
-#define I915_ENGINE_IS_VIRTUAL   BIT(5)
-#define I915_ENGINE_HAS_RELATIVE_MMIO BIT(6)
-#define I915_ENGINE_REQUIRES_CMD_PARSER BIT(7)
+#define I915_ENGINE_HAS_TIMESLICES   BIT(4)
+#define I915_ENGINE_NEEDS_BREADCRUMB_TASKLET BIT(5)
+#define I915_ENGINE_IS_VIRTUAL   BIT(6)
+#define I915_ENGINE_HAS_RELATIVE_MMIO BIT(7)
+#define I915_ENGINE_REQUIRES_CMD_PARSER BIT(8)
unsigned int flags;
  
  	/*

@@ -593,6 +594,15 @@ intel_engine_has_semaphores(const struct intel_engine_cs 
*engine)
return engine->flags & I915_ENGINE_HAS_SEMAPHORES;
  }
  
+static inline bool

+intel_engine_has_timeslices(const struct intel_engine_cs *engine)
+{
+   if (!IS_ACTIVE(CONFIG_DRM_I915_TIMESLICE_DURATION))
+   return false;
+
+   return engine->flags & I915_ENGINE_HAS_TIMESLICES;
+}
+
  static inline bool
  intel_engine_needs_breadcrumb_tasklet(const struct intel_engine_cs *engine)
  {
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 4311b12542fb..d4ef344657b0 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -4801,8 +4801,11 @@ void intel_execlists_set_default_submission(struct 
intel_engine_cs *engine)
engine->flags |= I915_ENGINE_SUPPORTS_STATS;
if (!intel_vgpu_active(engine->i915)) {
engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
-   if (HAS_LOGICAL_RING_PREEMPTION(engine->i915))
+   if (HAS_LOGICAL_RING_PREEMPTION(engine->i915)) {
engine->flags |= I915_ENGINE_HAS_PREEMPTION;
+   if (IS_ACTIVE(CONFIG_DRM_I915_TIMESLICE_DURATION))
+   engine->flags |= I915_ENGINE_HAS_TIMESLICES;
+   }
}
  
  	if (INTEL_GEN(engine->i915) >= 12)




Reviewed-by: Tvrtko Ursulin 

Regards,

Tvrtko
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] ✓ Fi.CI.IGT: success for series starting with [1/3] drm/i915/gem: Use chained reloc batches

2020-05-01 Thread Patchwork
== Series Details ==

Series: series starting with [1/3] drm/i915/gem: Use chained reloc batches
URL   : https://patchwork.freedesktop.org/series/76813/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_8405_full -> Patchwork_17538_full


Summary
---

  **SUCCESS**

  No regressions found.

  

Possible new issues
---

  Here are the unknown changes that may have been introduced in 
Patchwork_17538_full:

### IGT changes ###

 Suppressed 

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * {igt@gem_exec_reloc@basic-parallel}:
- shard-hsw:  [PASS][1] -> [FAIL][2]
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/shard-hsw7/igt@gem_exec_re...@basic-parallel.html
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17538/shard-hsw6/igt@gem_exec_re...@basic-parallel.html
- shard-kbl:  [PASS][3] -> [TIMEOUT][4]
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/shard-kbl6/igt@gem_exec_re...@basic-parallel.html
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17538/shard-kbl6/igt@gem_exec_re...@basic-parallel.html
- shard-snb:  [PASS][5] -> [FAIL][6]
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/shard-snb1/igt@gem_exec_re...@basic-parallel.html
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17538/shard-snb1/igt@gem_exec_re...@basic-parallel.html
- shard-tglb: [PASS][7] -> [TIMEOUT][8]
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/shard-tglb2/igt@gem_exec_re...@basic-parallel.html
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17538/shard-tglb5/igt@gem_exec_re...@basic-parallel.html
- shard-skl:  NOTRUN -> [INCOMPLETE][9]
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17538/shard-skl9/igt@gem_exec_re...@basic-parallel.html
- shard-apl:  [PASS][10] -> [TIMEOUT][11]
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/shard-apl7/igt@gem_exec_re...@basic-parallel.html
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17538/shard-apl3/igt@gem_exec_re...@basic-parallel.html
- shard-iclb: [PASS][12] -> [TIMEOUT][13]
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/shard-iclb5/igt@gem_exec_re...@basic-parallel.html
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17538/shard-iclb6/igt@gem_exec_re...@basic-parallel.html
- shard-glk:  [PASS][14] -> [TIMEOUT][15]
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/shard-glk5/igt@gem_exec_re...@basic-parallel.html
   [15]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17538/shard-glk2/igt@gem_exec_re...@basic-parallel.html

  * {igt@gem_mmap_offset@ptrace@gtt}:
- shard-snb:  NOTRUN -> [FAIL][16] +3 similar issues
   [16]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17538/shard-snb1/igt@gem_mmap_offset@ptr...@gtt.html

  
Known issues


  Here are the changes found in Patchwork_17538_full that come from known 
issues:

### IGT changes ###

 Issues hit 

  * igt@kms_cursor_crc@pipe-a-cursor-256x85-sliding:
- shard-skl:  [PASS][17] -> [FAIL][18] ([i915#54])
   [17]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/shard-skl7/igt@kms_cursor_...@pipe-a-cursor-256x85-sliding.html
   [18]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17538/shard-skl4/igt@kms_cursor_...@pipe-a-cursor-256x85-sliding.html

  * igt@kms_cursor_crc@pipe-c-cursor-suspend:
- shard-apl:  [PASS][19] -> [DMESG-WARN][20] ([i915#180]) +2 
similar issues
   [19]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/shard-apl6/igt@kms_cursor_...@pipe-c-cursor-suspend.html
   [20]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17538/shard-apl2/igt@kms_cursor_...@pipe-c-cursor-suspend.html

  * igt@kms_cursor_legacy@flip-vs-cursor-busy-crc-atomic:
- shard-glk:  [PASS][21] -> [FAIL][22] ([IGT#5])
   [21]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/shard-glk6/igt@kms_cursor_leg...@flip-vs-cursor-busy-crc-atomic.html
   [22]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17538/shard-glk4/igt@kms_cursor_leg...@flip-vs-cursor-busy-crc-atomic.html

  * igt@kms_hdr@bpc-switch:
- shard-skl:  [PASS][23] -> [FAIL][24] ([i915#1188])
   [23]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/shard-skl9/igt@kms_...@bpc-switch.html
   [24]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17538/shard-skl7/igt@kms_...@bpc-switch.html

  * igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b:
- shard-kbl:  [PASS][25] -> [DMESG-WARN][26] ([i915#180] / 
[i915#93] / [i915#95])
   [25]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/shard-kbl4/igt@kms_pipe_crc_ba...@suspend-read-crc-pipe-b.html
   [26]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17538/shar

[Intel-gfx] ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915/gem: Use chained reloc batches

2020-05-01 Thread Patchwork
== Series Details ==

Series: series starting with [1/3] drm/i915/gem: Use chained reloc batches
URL   : https://patchwork.freedesktop.org/series/76818/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_8405 -> Patchwork_17541


Summary
---

  **SUCCESS**

  No regressions found.

  External URL: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17541/index.html

Known issues


  Here are the changes found in Patchwork_17541 that come from known issues:

### IGT changes ###

 Possible fixes 

  * igt@i915_selftest@live@hugepages:
- fi-bwr-2160:[INCOMPLETE][1] ([i915#489]) -> [PASS][2]
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/fi-bwr-2160/igt@i915_selftest@l...@hugepages.html
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17541/fi-bwr-2160/igt@i915_selftest@l...@hugepages.html

  
  [i915#489]: https://gitlab.freedesktop.org/drm/intel/issues/489


Participating hosts (50 -> 43)
--

  Missing(7): fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan 
fi-ctg-p8600 fi-byt-clapper fi-bdw-samus 


Build changes
-

  * CI: CI-20190529 -> None
  * Linux: CI_DRM_8405 -> Patchwork_17541

  CI-20190529: 20190529
  CI_DRM_8405: 83efffba539b475ce7e3fb96aeae7ee744309ff7 @ 
git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5623: 8838c73169ea249e6e86aaed35e5178f60f4ef7d @ 
git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_17541: 33804e5f45d5966bd57174bcd53fe7cc0cf1ec01 @ 
git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

33804e5f45d5 drm/i915/gem: Try an alternate engine for relocations
6490e47bafca drm/i915/gem: Use a single chained reloc batches for a single 
execbuf
25152ee6dda9 drm/i915/gem: Use chained reloc batches

== Logs ==

For more details see: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17541/index.html
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [CI 3/3] drm/i915/gem: Try an alternate engine for relocations

2020-05-01 Thread Chris Wilson
If at first we don't succeed, try try again.

Not all engines may support the MI ops we need to perform asynchronous
relocation patching, and so we end up falling back to a synchronous
operation that has a liability of blocking. However, Tvrtko pointed out
we don't need to use the same engine to perform the relocations as we
are planning to execute the execbuf on, and so if we switch over to a
working engine, we can perform the relocation asynchronously. The user
execbuf will be queued after the relocations by virtue of fencing.

This patch creates a new context per execbuf requiring asynchronous
relocations on an unusable engines. This is perhaps a bit excessive and
can be ameliorated by a small context cache, but for the moment we only
need it for working around a little used engine on Sandybridge, and only
if relocations are actually required to an active batch buffer.

Now we just need to teach the relocation code to handle physical
addressing for gen2/3, and we should then have universal support!

Suggested-by: Tvrtko Ursulin 
Testcase: igt/gem_exec_reloc/basic-spin # snb
Signed-off-by: Chris Wilson 
Cc: Tvrtko Ursulin 
Reviewed-by: Tvrtko Ursulin 
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 32 ---
 1 file changed, 27 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 22992ec80172..2e26cd4610a6 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1285,6 +1285,7 @@ static int reloc_move_to_gpu(struct i915_request *rq, 
struct i915_vma *vma)
 }
 
 static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
+struct intel_engine_cs *engine,
 unsigned int len)
 {
struct reloc_cache *cache = &eb->reloc_cache;
@@ -1294,7 +1295,7 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
u32 *cmd;
int err;
 
-   pool = intel_gt_get_buffer_pool(eb->engine->gt, PAGE_SIZE);
+   pool = intel_gt_get_buffer_pool(engine->gt, PAGE_SIZE);
if (IS_ERR(pool))
return PTR_ERR(pool);
 
@@ -1317,7 +1318,23 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
if (err)
goto err_unmap;
 
-   rq = i915_request_create(eb->context);
+   if (engine == eb->context->engine) {
+   rq = i915_request_create(eb->context);
+   } else {
+   struct intel_context *ce;
+
+   ce = intel_context_create(engine);
+   if (IS_ERR(ce)) {
+   err = PTR_ERR(rq);
+   goto err_unpin;
+   }
+
+   i915_vm_put(ce->vm);
+   ce->vm = i915_vm_get(eb->context->vm);
+
+   rq = intel_context_create_request(ce);
+   intel_context_put(ce);
+   }
if (IS_ERR(rq)) {
err = PTR_ERR(rq);
goto err_unpin;
@@ -1368,10 +1385,15 @@ static u32 *reloc_gpu(struct i915_execbuffer *eb,
int err;
 
if (unlikely(!cache->rq)) {
-   if (!intel_engine_can_store_dword(eb->engine))
-   return ERR_PTR(-ENODEV);
+   struct intel_engine_cs *engine = eb->engine;
+
+   if (!intel_engine_can_store_dword(engine)) {
+   engine = engine->gt->engine_class[COPY_ENGINE_CLASS][0];
+   if (!engine || !intel_engine_can_store_dword(engine))
+   return ERR_PTR(-ENODEV);
+   }
 
-   err = __reloc_gpu_alloc(eb, len);
+   err = __reloc_gpu_alloc(eb, engine, len);
if (unlikely(err))
return ERR_PTR(err);
}
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [CI 2/3] drm/i915/gem: Use a single chained reloc batches for a single execbuf

2020-05-01 Thread Chris Wilson
As we can now keep chaining together a relocation batch to process any
number of relocations, we can keep building that relocation batch for
all of the target vma. This avoiding emitting a new request into the
ring for each target, consuming precious ring space and a potential
stall.

v2: Propagate the failure from submitting the relocation batch.

Testcase: igt/gem_exec_reloc/basic-wide-active
Signed-off-by: Chris Wilson 
Cc: Tvrtko Ursulin 
Reviewed-by: Tvrtko Ursulin 
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 31 ---
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 0411618d66a9..22992ec80172 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -268,6 +268,7 @@ struct i915_execbuffer {
bool has_fence : 1;
bool needs_unfenced : 1;
 
+   struct i915_vma *target;
struct i915_request *rq;
struct i915_vma *rq_vma;
u32 *rq_cmd;
@@ -1051,14 +1052,14 @@ static unsigned int reloc_bb_flags(const struct 
reloc_cache *cache)
return cache->gen > 5 ? 0 : I915_DISPATCH_SECURE;
 }
 
-static void reloc_gpu_flush(struct reloc_cache *cache)
+static int reloc_gpu_flush(struct reloc_cache *cache)
 {
struct i915_request *rq;
int err;
 
rq = fetch_and_zero(&cache->rq);
if (!rq)
-   return;
+   return 0;
 
if (cache->rq_vma) {
struct drm_i915_gem_object *obj = cache->rq_vma->obj;
@@ -1084,15 +1085,14 @@ static void reloc_gpu_flush(struct reloc_cache *cache)
 
intel_gt_chipset_flush(rq->engine->gt);
i915_request_add(rq);
+
+   return err;
 }
 
 static void reloc_cache_reset(struct reloc_cache *cache)
 {
void *vaddr;
 
-   if (cache->rq)
-   reloc_gpu_flush(cache);
-
if (!cache->vaddr)
return;
 
@@ -1285,7 +1285,6 @@ static int reloc_move_to_gpu(struct i915_request *rq, 
struct i915_vma *vma)
 }
 
 static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
-struct i915_vma *vma,
 unsigned int len)
 {
struct reloc_cache *cache = &eb->reloc_cache;
@@ -1308,7 +1307,7 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
goto out_pool;
}
 
-   batch = i915_vma_instance(pool->obj, vma->vm, NULL);
+   batch = i915_vma_instance(pool->obj, eb->context->vm, NULL);
if (IS_ERR(batch)) {
err = PTR_ERR(batch);
goto err_unmap;
@@ -1328,10 +1327,6 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
if (err)
goto err_request;
 
-   err = reloc_move_to_gpu(rq, vma);
-   if (err)
-   goto err_request;
-
i915_vma_lock(batch);
err = i915_request_await_object(rq, batch->obj, false);
if (err == 0)
@@ -1376,11 +1371,19 @@ static u32 *reloc_gpu(struct i915_execbuffer *eb,
if (!intel_engine_can_store_dword(eb->engine))
return ERR_PTR(-ENODEV);
 
-   err = __reloc_gpu_alloc(eb, vma, len);
+   err = __reloc_gpu_alloc(eb, len);
if (unlikely(err))
return ERR_PTR(err);
}
 
+   if (vma != cache->target) {
+   err = reloc_move_to_gpu(cache->rq, vma);
+   if (unlikely(err))
+   return ERR_PTR(err);
+
+   cache->target = vma;
+   }
+
if (unlikely(cache->rq_size + len >
 PAGE_SIZE / sizeof(u32) - RELOC_TAIL)) {
err = reloc_gpu_chain(cache);
@@ -1698,6 +1701,10 @@ static int eb_relocate(struct i915_execbuffer *eb)
if (err)
return err;
}
+
+   err = reloc_gpu_flush(&eb->reloc_cache);
+   if (err)
+   return err;
}
 
return 0;
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [CI 1/3] drm/i915/gem: Use chained reloc batches

2020-05-01 Thread Chris Wilson
The ring is a precious resource: we anticipate to only use a few hundred
bytes for a request, and only try to reserve that before we start. If we
go beyond our guess in building the request, then instead of waiting at
the start of execbuf before we hold any locks or other resources, we
may trigger a wait inside a critical region. One example is in using gpu
relocations, where currently we emit a new MI_BB_START from the ring
every time we overflow a page of relocation entries. However, instead of
insert the command into the precious ring, we can chain the next page of
relocation entries as MI_BB_START from the end of the previous.

v2: Delay the emit_bb_start until after all the chained vma
synchronisation is complete. Since the buffer pool batches are idle, this
_should_ be a no-op, but one day we may some fancy async GPU bindings
for new vma!

v3: Use pool/batch consitently, once we start thinking in terms of the
batch vma, use batch->obj.
v4: Explain the magic number 4.

Tvrtko spotted that we lose propagation of the error for failing to
submit the relocation request; that's easier to fix up in the next
patch.

Testcase: igt/gem_exec_reloc/basic-many-active
Signed-off-by: Chris Wilson 
Reviewed-by: Tvrtko Ursulin 
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 134 +++---
 1 file changed, 115 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 414859fa2673..0411618d66a9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -269,6 +269,7 @@ struct i915_execbuffer {
bool needs_unfenced : 1;
 
struct i915_request *rq;
+   struct i915_vma *rq_vma;
u32 *rq_cmd;
unsigned int rq_size;
} reloc_cache;
@@ -975,20 +976,114 @@ static inline struct i915_ggtt *cache_to_ggtt(struct 
reloc_cache *cache)
return &i915->ggtt;
 }
 
+#define RELOC_TAIL 4
+
+static int reloc_gpu_chain(struct reloc_cache *cache)
+{
+   struct intel_gt_buffer_pool_node *pool;
+   struct i915_request *rq = cache->rq;
+   struct i915_vma *batch;
+   u32 *cmd;
+   int err;
+
+   pool = intel_gt_get_buffer_pool(rq->engine->gt, PAGE_SIZE);
+   if (IS_ERR(pool))
+   return PTR_ERR(pool);
+
+   batch = i915_vma_instance(pool->obj, rq->context->vm, NULL);
+   if (IS_ERR(batch)) {
+   err = PTR_ERR(batch);
+   goto out_pool;
+   }
+
+   err = i915_vma_pin(batch, 0, 0, PIN_USER | PIN_NONBLOCK);
+   if (err)
+   goto out_pool;
+
+   GEM_BUG_ON(cache->rq_size + RELOC_TAIL > PAGE_SIZE  / sizeof(u32));
+   cmd = cache->rq_cmd + cache->rq_size;
+   *cmd++ = MI_ARB_CHECK;
+   if (cache->gen >= 8) {
+   *cmd++ = MI_BATCH_BUFFER_START_GEN8;
+   *cmd++ = lower_32_bits(batch->node.start);
+   *cmd++ = upper_32_bits(batch->node.start);
+   } else {
+   *cmd++ = MI_BATCH_BUFFER_START;
+   *cmd++ = lower_32_bits(batch->node.start);
+   }
+   i915_gem_object_flush_map(cache->rq_vma->obj);
+   i915_gem_object_unpin_map(cache->rq_vma->obj);
+   cache->rq_vma = NULL;
+
+   err = intel_gt_buffer_pool_mark_active(pool, rq);
+   if (err == 0) {
+   i915_vma_lock(batch);
+   err = i915_request_await_object(rq, batch->obj, false);
+   if (err == 0)
+   err = i915_vma_move_to_active(batch, rq, 0);
+   i915_vma_unlock(batch);
+   }
+   i915_vma_unpin(batch);
+   if (err)
+   goto out_pool;
+
+   cmd = i915_gem_object_pin_map(batch->obj,
+ cache->has_llc ?
+ I915_MAP_FORCE_WB :
+ I915_MAP_FORCE_WC);
+   if (IS_ERR(cmd)) {
+   err = PTR_ERR(cmd);
+   goto out_pool;
+   }
+
+   /* Return with batch mapping (cmd) still pinned */
+   cache->rq_cmd = cmd;
+   cache->rq_size = 0;
+   cache->rq_vma = batch;
+
+out_pool:
+   intel_gt_buffer_pool_put(pool);
+   return err;
+}
+
+static unsigned int reloc_bb_flags(const struct reloc_cache *cache)
+{
+   return cache->gen > 5 ? 0 : I915_DISPATCH_SECURE;
+}
+
 static void reloc_gpu_flush(struct reloc_cache *cache)
 {
-   struct drm_i915_gem_object *obj = cache->rq->batch->obj;
+   struct i915_request *rq;
+   int err;
 
-   GEM_BUG_ON(cache->rq_size >= obj->base.size / sizeof(u32));
-   cache->rq_cmd[cache->rq_size] = MI_BATCH_BUFFER_END;
+   rq = fetch_and_zero(&cache->rq);
+   if (!rq)
+   return;
 
-   __i915_gem_object_flush_map(obj, 0, sizeof(u32) * (cache->rq_size + 1));
-   i915_gem_object_unpin_map(obj);
+   if (cache->rq_vma) {
+   struct drm_i915_gem

Re: [Intel-gfx] [PATCH 3/4] drm/i915: Implement vm_ops->access for gdb access into mmaps

2020-05-01 Thread Matthew Auld

On 01/05/2020 09:42, Chris Wilson wrote:

gdb uses ptrace() to peek and poke bytes of the target's address space.
The driver must implement an vm_ops->access() handler or else gdb will
be unable to inspect the pointer and report it as out-of-bounds.
Worse than useless as it causes immediate suspicion of the valid GTT
pointer, distracting the poor programmer trying to find his bug.

Testcase: igt/gem_mmap_gtt/ptrace
Testcase: igt/gem_mmap_offset/ptrace
Suggested-by: Kristian H. Kristensen 
Signed-off-by: Chris Wilson 
Cc: Matthew Auld 
Cc: Joonas Lahtinen 
Cc: Maciej Patelczyk 
Cc: Kristian H. Kristensen 
---
  drivers/gpu/drm/i915/gem/i915_gem_mman.c  |  31 +
  .../drm/i915/gem/selftests/i915_gem_mman.c| 124 ++
  2 files changed, 155 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index b39c24dae64e..aef917b7f168 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -396,6 +396,35 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
return i915_error_to_vmf_fault(ret);
  }
  
+static int

+vm_access(struct vm_area_struct *area, unsigned long addr,
+ void *buf, int len, int write)
+{
+   struct i915_mmap_offset *mmo = area->vm_private_data;
+   struct drm_i915_gem_object *obj = mmo->obj;
+   void *vaddr;
+


What's the story with object_is_readonly and write=true here? Shouldn't 
we reject, or what?


Reviewed-by: Matthew Auld 
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [CI] drm/i915: Implement vm_ops->access for gdb access into mmaps

2020-05-01 Thread Chris Wilson
gdb uses ptrace() to peek and poke bytes of the target's address space.
The driver must implement an vm_ops->access() handler or else gdb will
be unable to inspect the pointer and report it as out-of-bounds.
Worse than useless as it causes immediate suspicion of the valid GTT
pointer, distracting the poor programmer trying to find his bug.

v2: Write-protect readonly objects (Matthew).

Testcase: igt/gem_mmap_gtt/ptrace
Testcase: igt/gem_mmap_offset/ptrace
Suggested-by: Kristian H. Kristensen 
Signed-off-by: Chris Wilson 
Cc: Matthew Auld 
Cc: Joonas Lahtinen 
Cc: Maciej Patelczyk 
Cc: Kristian H. Kristensen 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/gem/i915_gem_mman.c  |  34 +
 .../drm/i915/gem/selftests/i915_gem_mman.c| 124 ++
 2 files changed, 158 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index b39c24dae64e..70f5f82da288 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -396,6 +396,38 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
return i915_error_to_vmf_fault(ret);
 }
 
+static int
+vm_access(struct vm_area_struct *area, unsigned long addr,
+ void *buf, int len, int write)
+{
+   struct i915_mmap_offset *mmo = area->vm_private_data;
+   struct drm_i915_gem_object *obj = mmo->obj;
+   void *vaddr;
+
+   if (i915_gem_object_is_readonly(obj) && write)
+   return -EACCES;
+
+   addr -= area->vm_start;
+   if (addr >= obj->base.size)
+   return -EINVAL;
+
+   /* As this is primarily for debugging, let's focus on simplicity */
+   vaddr = i915_gem_object_pin_map(obj, I915_MAP_FORCE_WC);
+   if (IS_ERR(vaddr))
+   return PTR_ERR(vaddr);
+
+   if (write) {
+   memcpy(vaddr + addr, buf, len);
+   __i915_gem_object_flush_map(obj, addr, len);
+   } else {
+   memcpy(buf, vaddr + addr, len);
+   }
+
+   i915_gem_object_unpin_map(obj);
+
+   return len;
+}
+
 void __i915_gem_object_release_mmap_gtt(struct drm_i915_gem_object *obj)
 {
struct i915_vma *vma;
@@ -745,12 +777,14 @@ static void vm_close(struct vm_area_struct *vma)
 
 static const struct vm_operations_struct vm_ops_gtt = {
.fault = vm_fault_gtt,
+   .access = vm_access,
.open = vm_open,
.close = vm_close,
 };
 
 static const struct vm_operations_struct vm_ops_cpu = {
.fault = vm_fault_cpu,
+   .access = vm_access,
.open = vm_open,
.close = vm_close,
 };
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index ef7abcb3f4ee..9c7402ce5bf9 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -952,6 +952,129 @@ static int igt_mmap(void *arg)
return 0;
 }
 
+static const char *repr_mmap_type(enum i915_mmap_type type)
+{
+   switch (type) {
+   case I915_MMAP_TYPE_GTT: return "gtt";
+   case I915_MMAP_TYPE_WB: return "wb";
+   case I915_MMAP_TYPE_WC: return "wc";
+   case I915_MMAP_TYPE_UC: return "uc";
+   default: return "unknown";
+   }
+}
+
+static bool can_access(const struct drm_i915_gem_object *obj)
+{
+   unsigned int flags =
+   I915_GEM_OBJECT_HAS_STRUCT_PAGE | I915_GEM_OBJECT_HAS_IOMEM;
+
+   return i915_gem_object_type_has(obj, flags);
+}
+
+static int __igt_mmap_access(struct drm_i915_private *i915,
+struct drm_i915_gem_object *obj,
+enum i915_mmap_type type)
+{
+   struct i915_mmap_offset *mmo;
+   unsigned long __user *ptr;
+   unsigned long A, B;
+   unsigned long x, y;
+   unsigned long addr;
+   int err;
+
+   memset(&A, 0xAA, sizeof(A));
+   memset(&B, 0xBB, sizeof(B));
+
+   if (!can_mmap(obj, type) || !can_access(obj))
+   return 0;
+
+   mmo = mmap_offset_attach(obj, type, NULL);
+   if (IS_ERR(mmo))
+   return PTR_ERR(mmo);
+
+   addr = igt_mmap_node(i915, &mmo->vma_node, 0, PROT_WRITE, MAP_SHARED);
+   if (IS_ERR_VALUE(addr))
+   return addr;
+   ptr = (unsigned long __user *)addr;
+
+   err = __put_user(A, ptr);
+   if (err) {
+   pr_err("%s(%s): failed to write into user mmap\n",
+  obj->mm.region->name, repr_mmap_type(type));
+   goto out_unmap;
+   }
+
+   intel_gt_flush_ggtt_writes(&i915->gt);
+
+   err = access_process_vm(current, addr, &x, sizeof(x), 0);
+   if (err != sizeof(x)) {
+   pr_err("%s(%s): access_process_vm() read failed\n",
+  obj->mm.region->name, repr_mmap_type(type));
+   goto out_unmap;
+   }
+
+   err = access_process_vm(current, addr, &B, sizeof(B), FOLL_WRITE);
+   if (err != siz

Re: [Intel-gfx] [PATCH i-g-t] i915/gem_mmap_gtt: Simulate gdb inspecting a GTT mmap using ptrace()

2020-05-01 Thread Matthew Auld
On Thu, 30 Apr 2020 at 20:42, Chris Wilson  wrote:
>
> gdb uses ptrace() to peek and poke bytes of the target's address space.
> The kernel must implement an vm_ops->access() handler or else gdb will
> be unable to inspect the pointer and report it as out-of-bounds. Worse
> than useless as it causes immediate suspicion of the valid GTT pointer.
>
> Signed-off-by: Chris Wilson 
> ---
>  tests/i915/gem_mmap_gtt.c | 79 ++-
>  1 file changed, 78 insertions(+), 1 deletion(-)
>
> diff --git a/tests/i915/gem_mmap_gtt.c b/tests/i915/gem_mmap_gtt.c
> index 1f4655af4..38b4d02d7 100644
> --- a/tests/i915/gem_mmap_gtt.c
> +++ b/tests/i915/gem_mmap_gtt.c
> @@ -34,8 +34,11 @@
>  #include 
>  #include 
>  #include 
> -#include 
> +#include 
>  #include 
> +#include 
> +#include 
> +#include 
>  #include "drm.h"
>
>  #include "igt.h"
> @@ -501,6 +504,78 @@ test_write_gtt(int fd)
> munmap(src, OBJECT_SIZE);
>  }
>
> +static void *memchr_inv(const void *s, int c, size_t n)
> +{
> +   const uint8_t *us = s;
> +   const uint8_t uc = c;
> +
> +#pragma GCC diagnostic push
> +#pragma GCC diagnostic ignored "-Wcast-qual"
> +   while (n--) {
> +   if (*us != uc)
> +   return (void *) us;
> +   us++;
> +   }
> +#pragma GCC diagnostic pop
> +
> +   return NULL;
> +}
> +
> +static void
> +test_ptrace(int fd)
> +{
> +   unsigned long AA, CC;
> +   unsigned long *gtt, *cpy;
> +   uint32_t bo;
> +   pid_t pid;
> +
> +   memset(&AA, 0xaa, sizeof(AA));
> +   memset(&CC, 0x55, sizeof(CC));
> +
> +   cpy = malloc(OBJECT_SIZE);
> +   memset(cpy, AA, OBJECT_SIZE);
> +
> +   bo = gem_create(fd, OBJECT_SIZE);
> +   gtt = mmap_bo(fd, bo, OBJECT_SIZE);
> +   memset(gtt, CC, OBJECT_SIZE);
> +   gem_close(fd, bo);
> +
> +   igt_assert(!memchr_inv(gtt, CC, OBJECT_SIZE));
> +   igt_assert(!memchr_inv(cpy, AA, OBJECT_SIZE));
> +
> +   igt_fork(child, 1) {
> +   ptrace(PTRACE_TRACEME, 0, NULL, NULL);
> +   raise(SIGSTOP);
> +   }
> +
> +   /* Wait for the child to ready themselves [SIGSTOP] */
> +   pid = wait(NULL);
> +
> +   ptrace(PTRACE_ATTACH, pid, NULL, NULL);
> +   for (int i = 0; i < OBJECT_SIZE / sizeof(long); i++) {
> +   long ret;
> +
> +   ret = ptrace(PTRACE_PEEKDATA, pid, gtt + i);
> +   igt_assert_eq_u64(ret, CC);
> +   cpy[i] = ret;
> +
> +   ret = ptrace(PTRACE_POKEDATA, pid, gtt + i, AA);
> +   igt_assert_eq(ret, 0l);

igt_assert_eq_u64() ?

Reviewed-by: Matthew Auld 
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t] i915/gem_mmap_gtt: Simulate gdb inspecting a GTT mmap using ptrace()

2020-05-01 Thread Chris Wilson
Quoting Matthew Auld (2020-05-01 15:58:29)
> On Thu, 30 Apr 2020 at 20:42, Chris Wilson  wrote:
> > +   ptrace(PTRACE_ATTACH, pid, NULL, NULL);
> > +   for (int i = 0; i < OBJECT_SIZE / sizeof(long); i++) {
> > +   long ret;
> > +
> > +   ret = ptrace(PTRACE_PEEKDATA, pid, gtt + i);
> > +   igt_assert_eq_u64(ret, CC);
> > +   cpy[i] = ret;
> > +
> > +   ret = ptrace(PTRACE_POKEDATA, pid, gtt + i, AA);
> > +   igt_assert_eq(ret, 0l);
> 
> igt_assert_eq_u64() ?

In this case it will either be 0 or -1 + errno.

So "%d" vs "%llx" should not affect debugging.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [igt-dev] [PATCH i-g-t] igt/gem_mmap_offset: Simulate gdb inspecting any mmap using ptrace()

2020-05-01 Thread Matthew Auld
On Thu, 30 Apr 2020 at 20:51, Chris Wilson  wrote:
>
> gdb uses ptrace() to peek and poke bytes of the target's address space.
> The kernel must implement an vm_ops->access() handler or else gdb will
> be unable to inspect the pointer and report it as out-of-bounds. Worse
> than useless as it causes immediate suspicion of the valid GPU pointer.
>
> Signed-off-by: Chris Wilson 
> ---
>  tests/i915/gem_mmap_offset.c | 91 +++-
>  1 file changed, 90 insertions(+), 1 deletion(-)
>
> diff --git a/tests/i915/gem_mmap_offset.c b/tests/i915/gem_mmap_offset.c
> index 1ec963b25..c10cf606f 100644
> --- a/tests/i915/gem_mmap_offset.c
> +++ b/tests/i915/gem_mmap_offset.c
> @@ -23,9 +23,12 @@
>
>  #include 
>  #include 
> +#include 
>  #include 
> -#include 
>  #include 
> +#include 
> +#include 
> +#include 
>  #include "drm.h"
>
>  #include "igt.h"
> @@ -265,6 +268,89 @@ static void pf_nonblock(int i915)
> igt_spin_free(i915, spin);
>  }
>
> +static void *memchr_inv(const void *s, int c, size_t n)
> +{
> +   const uint8_t *us = s;
> +   const uint8_t uc = c;
> +
> +#pragma GCC diagnostic push
> +#pragma GCC diagnostic ignored "-Wcast-qual"
> +   while (n--) {
> +   if (*us != uc)
> +   return (void *) us;
> +   us++;
> +   }
> +#pragma GCC diagnostic pop
> +
> +   return NULL;
> +}
> +
> +static void test_ptrace(int i915)
> +{
> +   const unsigned int SZ = 3 * 4096;
> +   unsigned long *ptr, *cpy;
> +   unsigned long AA, CC;
> +   uint32_t bo;
> +
> +   memset(&AA, 0xaa, sizeof(AA));
> +   memset(&CC, 0x55, sizeof(CC));
> +
> +   cpy = malloc(SZ);
> +   bo = gem_create(i915, SZ);
> +
> +   for_each_mmap_offset_type(i915, t) {
> +   igt_dynamic_f("%s", t->name) {
> +   pid_t pid;
> +
> +   ptr = __mmap_offset(i915, bo, 0, SZ,
> +   PROT_READ | PROT_WRITE,
> +   t->type);
> +   if (!ptr)
> +   continue;
> +
> +   memset(cpy, AA, SZ);
> +   memset(ptr, CC, SZ);
> +
> +   igt_assert(!memchr_inv(ptr, CC, SZ));
> +   igt_assert(!memchr_inv(cpy, AA, SZ));
> +
> +   igt_fork(child, 1) {
> +   ptrace(PTRACE_TRACEME, 0, NULL, NULL);
> +   raise(SIGSTOP);
> +   }
> +
> +   /* Wait for the child to ready themselves [SIGSTOP] */
> +   pid = wait(NULL);
> +
> +   ptrace(PTRACE_ATTACH, pid, NULL, NULL);
> +   for (int i = 0; i < SZ / sizeof(long); i++) {
> +   long ret;
> +
> +   ret = ptrace(PTRACE_PEEKDATA, pid, ptr + i);
> +   igt_assert_eq_u64(ret, CC);
> +   cpy[i] = ret;
> +
> +   ret = ptrace(PTRACE_POKEDATA, pid, ptr + i, 
> AA);
> +   igt_assert_eq(ret, 0l);

igt_assert_eq_u64() ?

Reviewed-by: Matthew Auld 
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] i915 HDCP 2.2 TX encryption on Teledyne test instrument

2020-05-01 Thread Voldman, Mikhail
Thanks

Mikhail Voldman
System Architect

Teledyne LeCroy, Protocol Solutions Group
2111 Big Timber Road
Elgin, IL  60123
email address:  mikhail.vold...@teledyne.com
847-888-0450 x136

Send me a file securely


-Original Message-
From: Ramalingam C  
Sent: Thursday, April 30, 2020 12:02 AM
To: Voldman, Mikhail 
Cc: Kurmi Suresh Kumar ; intel-gfx 

Subject: Re: i915 HDCP 2.2 TX encryption on Teledyne test instrument

---External Email---

On 2020-04-29 at 18:12:32 +, Voldman, Mikhail wrote:
> Hi Ram,
> 
> Thank you for your help, in the past. 
> We can control HDCP on our products as needed.
> 
> One issue on the new motherboard used in our product.
> In this case, i915 advertises itself as DP-1 sink repeater and authenticated 
> as HDCP 1.4 capable device, but downstream HDMI device is HDCP 2.3 capable 
> and correctly authenticated as HDCP 2.3. 
I assume this is due to the lspcon used in your motherboard/SoC itself.
Which is converting DP->HDMI. For i915 LSPCon is external device authenticate 
with it based on the LSPCon's hdcp capability and requested HDCP type. And 
authentication between LSPCon and Display sink connected to it is LSPCon's 
resposibility. I915 will assume LSPCon has followed the HDCP spec for repeater.

-Ram
> Is it any way I can determine what HDCP level downstream device is 
> authenticated?  
> Lock of i915 documentation makes this not very obvious. 
> Can you just point us in the right direction?
> 
> Tahk You,
> 
> Mikhail Voldman
> System Architect
> 
> Teledyne LeCroy, Protocol Solutions Group
> 2111 Big Timber Road
> Elgin, IL  60123
> email address:  mikhail.vold...@teledyne.com
> 847-888-0450 x136
> 
> Send me a file securely
> 
> 
> -Original Message-
> From: Ramalingam C 
> Sent: Tuesday, November 5, 2019 11:12 PM
> To: Voldman, Mikhail 
> Cc: Kurmi Suresh Kumar ; intel-gfx 
> 
> Subject: Re: i915 HDCP 2.2 TX encryption on Teledyne test instrument
> 
> ---External Email---
> 
> Moving to #intel-gfx
> 
> Hi,
> 
> Glad that I could help you!
> 
> On 2019-11-05 at 21:56:28 +, Voldman, Mikhail wrote:
> > Hello Ramalingam,
> > 
> > Thank you for quick response. 
> > Your information is very helpful. 
> > But can you elaborate:
> > 
> > In your product, If you want to enable the HDCP always based on the 
> > sink capability, set the "Content protection" to DESIRED state along 
> > with desired content type.  [MV] should I set DESIRED protection level as 
> > DRM master?
> This needs additional kernel patch for your product to set the desired state 
> as default state of the property at the creation.
> > 
> > As these are properties, non DRM Masters can only read them. can set 
> > them. [MV] do you mean: " non DRM Masters can only read them, but  can't 
> > set them."
> Yes.
> > Can I use MEI interface to control HDCP?
> Not needed if you set the default state as desired.
> -Ram
> > 
> > Mikhail Voldman
> > System Architect
> > 
> > Teledyne LeCroy, Protocol Solutions Group
> > 2111 Big Timber Road
> > Elgin, IL  60123
> > Note new email address:  mikhail.vold...@teledyne.com
> > 847-888-0450 x136
> > 
> > Send me a file securely
> > 
> > 
> > -Original Message-
> > From: Ramalingam C 
> > Sent: Monday, November 4, 2019 10:44 PM
> > To: Voldman, Mikhail 
> > Cc: Kurmi Suresh Kumar 
> > Subject: Re: i915 HDCP 2.2 TX encryption on Teledyne test instrument
> > 
> > ---External Email---
> > 
> > On 2019-11-04 at 20:42:49 +, Voldman, Mikhail wrote:
> > > Hello Ramalingam,
> > > 
> > > We exchanged number of e-mails few months ago regarding Linux i915 HDCP 
> > > 2.2 encryption  support in the new Teledyne video test instrument.
> > > Thanks for your help we were able to control HDCP 2.2 encryption as DRM 
> > > masters.
> > > 
> > > Unfortunately our product requirement specified than we need to  enable 
> > > HDCP 2.2 always if attached monitor capabilities shows HDCP 2.2 support.
> > > Also i915 based TX required to execute HDCP 2.2 re-authentication if Sink 
> > > HPD is detected.
> > > 
> > > Is current Intel i915 kernel driver implementation can support desired 
> > > functionality? Do you have plans to support this?
> > 
> > "HDCP always" will never be an upstream solution. always userspace 
> > driven.
> > 
> > In your product, If you want to enable the HDCP always based on the 
> > sink capability, set the "Content protection" to DESIRED state along 
> > with desired content type.
> > 
> > As these are properties, non DRM Masters can only read them. can set 
> > them.
> > 
> > Hope you are unblocked. All the best!
> > 
> > -Ram
> > > 
> > > Are current i915 allow control HDCP encryption by NOT DRM master 
> > > application?
> > > 
> > > Any suggestion or advice by Intel HDCP 2.2 experts will be really 
> > > appreciated.
> > > 
> > > Best Regards,
> > > 
> > > Mikhail Voldman
> > > System Architect
> > > [cid:image001.gif@01D2D0A7.919320A0]
> > > Teledyne LeCroy, Protocol Solutions Group
> > > 2111 Big Timber Road
> > > Elgin, IL  60123

[Intel-gfx] ✓ Fi.CI.IGT: success for drm/i915/gt: Make timeslicing an explicit engine property

2020-05-01 Thread Patchwork
== Series Details ==

Series: drm/i915/gt: Make timeslicing an explicit engine property
URL   : https://patchwork.freedesktop.org/series/76817/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_8405_full -> Patchwork_17540_full


Summary
---

  **SUCCESS**

  No regressions found.

  

Possible new issues
---

  Here are the unknown changes that may have been introduced in 
Patchwork_17540_full:

### IGT changes ###

 Suppressed 

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * {igt@gem_exec_reloc@basic-many-active@vcs1}:
- shard-iclb: NOTRUN -> [FAIL][1]
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17540/shard-iclb4/igt@gem_exec_reloc@basic-many-act...@vcs1.html

  * {igt@gem_mmap_offset@ptrace@gtt}:
- shard-snb:  NOTRUN -> [FAIL][2] +3 similar issues
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17540/shard-snb2/igt@gem_mmap_offset@ptr...@gtt.html

  
Known issues


  Here are the changes found in Patchwork_17540_full that come from known 
issues:

### IGT changes ###

 Issues hit 

  * igt@gem_workarounds@suspend-resume:
- shard-kbl:  [PASS][3] -> [DMESG-WARN][4] ([i915#180] / [i915#93] 
/ [i915#95]) +1 similar issue
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/shard-kbl2/igt@gem_workarou...@suspend-resume.html
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17540/shard-kbl6/igt@gem_workarou...@suspend-resume.html

  * igt@kms_cursor_crc@pipe-a-cursor-256x85-sliding:
- shard-skl:  [PASS][5] -> [FAIL][6] ([i915#54])
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/shard-skl7/igt@kms_cursor_...@pipe-a-cursor-256x85-sliding.html
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17540/shard-skl1/igt@kms_cursor_...@pipe-a-cursor-256x85-sliding.html

  * igt@kms_cursor_crc@pipe-b-cursor-suspend:
- shard-apl:  [PASS][7] -> [DMESG-WARN][8] ([i915#180]) +2 similar 
issues
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/shard-apl2/igt@kms_cursor_...@pipe-b-cursor-suspend.html
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17540/shard-apl2/igt@kms_cursor_...@pipe-b-cursor-suspend.html

  * igt@kms_cursor_legacy@pipe-b-torture-move:
- shard-iclb: [PASS][9] -> [DMESG-WARN][10] ([i915#128])
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/shard-iclb4/igt@kms_cursor_leg...@pipe-b-torture-move.html
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17540/shard-iclb5/igt@kms_cursor_leg...@pipe-b-torture-move.html

  * igt@kms_plane_alpha_blend@pipe-a-constant-alpha-min:
- shard-skl:  [PASS][11] -> [FAIL][12] ([fdo#108145] / [i915#265])
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/shard-skl7/igt@kms_plane_alpha_bl...@pipe-a-constant-alpha-min.html
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17540/shard-skl1/igt@kms_plane_alpha_bl...@pipe-a-constant-alpha-min.html

  * igt@kms_psr@psr2_cursor_mmap_cpu:
- shard-iclb: [PASS][13] -> [SKIP][14] ([fdo#109441]) +2 similar 
issues
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/shard-iclb2/igt@kms_psr@psr2_cursor_mmap_cpu.html
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17540/shard-iclb8/igt@kms_psr@psr2_cursor_mmap_cpu.html

  * igt@kms_vblank@pipe-c-ts-continuation-suspend:
- shard-kbl:  [PASS][15] -> [DMESG-WARN][16] ([i915#180]) +1 
similar issue
   [15]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/shard-kbl7/igt@kms_vbl...@pipe-c-ts-continuation-suspend.html
   [16]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17540/shard-kbl7/igt@kms_vbl...@pipe-c-ts-continuation-suspend.html

  
 Possible fixes 

  * igt@kms_cursor_crc@pipe-a-cursor-suspend:
- shard-kbl:  [INCOMPLETE][17] ([i915#155]) -> [PASS][18]
   [17]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/shard-kbl3/igt@kms_cursor_...@pipe-a-cursor-suspend.html
   [18]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17540/shard-kbl7/igt@kms_cursor_...@pipe-a-cursor-suspend.html

  * {igt@kms_flip@flip-vs-expired-vblank@c-hdmi-a2}:
- shard-glk:  [FAIL][19] ([i915#79]) -> [PASS][20]
   [19]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/shard-glk4/igt@kms_flip@flip-vs-expired-vbl...@c-hdmi-a2.html
   [20]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17540/shard-glk9/igt@kms_flip@flip-vs-expired-vbl...@c-hdmi-a2.html

  * {igt@kms_flip@flip-vs-suspend-interruptible@b-edp1}:
- shard-skl:  [INCOMPLETE][21] ([i915#198]) -> [PASS][22]
   [21]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/shard-skl3/igt@kms_flip@flip-vs-suspend-interrupti...@b-edp1.html
   [22]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17540/shard-skl4/igt@kms_flip@flip-vs-

[Intel-gfx] ✓ Fi.CI.IGT: success for series starting with [1/3] drm/i915/gem: Use chained reloc batches

2020-05-01 Thread Patchwork
== Series Details ==

Series: series starting with [1/3] drm/i915/gem: Use chained reloc batches
URL   : https://patchwork.freedesktop.org/series/76818/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_8405_full -> Patchwork_17541_full


Summary
---

  **SUCCESS**

  No regressions found.

  

Possible new issues
---

  Here are the unknown changes that may have been introduced in 
Patchwork_17541_full:

### IGT changes ###

 Suppressed 

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * {igt@gem_exec_reloc@basic-parallel}:
- shard-kbl:  [PASS][1] -> [INCOMPLETE][2]
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/shard-kbl6/igt@gem_exec_re...@basic-parallel.html
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17541/shard-kbl7/igt@gem_exec_re...@basic-parallel.html
- shard-snb:  [PASS][3] -> [FAIL][4]
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/shard-snb1/igt@gem_exec_re...@basic-parallel.html
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17541/shard-snb4/igt@gem_exec_re...@basic-parallel.html
- shard-tglb: [PASS][5] -> [TIMEOUT][6]
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/shard-tglb2/igt@gem_exec_re...@basic-parallel.html
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17541/shard-tglb6/igt@gem_exec_re...@basic-parallel.html
- shard-apl:  [PASS][7] -> [TIMEOUT][8]
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/shard-apl7/igt@gem_exec_re...@basic-parallel.html
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17541/shard-apl7/igt@gem_exec_re...@basic-parallel.html
- shard-iclb: [PASS][9] -> [TIMEOUT][10]
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/shard-iclb5/igt@gem_exec_re...@basic-parallel.html
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17541/shard-iclb3/igt@gem_exec_re...@basic-parallel.html

  * {igt@gem_mmap_offset@ptrace@gtt}:
- shard-snb:  NOTRUN -> [FAIL][11] +3 similar issues
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17541/shard-snb4/igt@gem_mmap_offset@ptr...@gtt.html

  
Known issues


  Here are the changes found in Patchwork_17541_full that come from known 
issues:

### IGT changes ###

 Issues hit 

  * igt@i915_pm_rpm@system-suspend-execbuf:
- shard-skl:  [PASS][12] -> [INCOMPLETE][13] ([i915#151] / 
[i915#69])
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/shard-skl6/igt@i915_pm_...@system-suspend-execbuf.html
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17541/shard-skl9/igt@i915_pm_...@system-suspend-execbuf.html

  * igt@i915_suspend@forcewake:
- shard-kbl:  [PASS][14] -> [DMESG-WARN][15] ([i915#180]) +3 
similar issues
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/shard-kbl4/igt@i915_susp...@forcewake.html
   [15]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17541/shard-kbl4/igt@i915_susp...@forcewake.html

  * igt@kms_cursor_crc@pipe-a-cursor-256x85-sliding:
- shard-skl:  [PASS][16] -> [FAIL][17] ([i915#54])
   [16]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/shard-skl7/igt@kms_cursor_...@pipe-a-cursor-256x85-sliding.html
   [17]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17541/shard-skl10/igt@kms_cursor_...@pipe-a-cursor-256x85-sliding.html

  * igt@kms_cursor_legacy@all-pipes-torture-bo:
- shard-tglb: [PASS][18] -> [DMESG-WARN][19] ([i915#128])
   [18]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/shard-tglb1/igt@kms_cursor_leg...@all-pipes-torture-bo.html
   [19]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17541/shard-tglb8/igt@kms_cursor_leg...@all-pipes-torture-bo.html

  * igt@kms_draw_crc@draw-method-rgb565-mmap-wc-xtiled:
- shard-glk:  [PASS][20] -> [FAIL][21] ([i915#52] / [i915#54])
   [20]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/shard-glk2/igt@kms_draw_...@draw-method-rgb565-mmap-wc-xtiled.html
   [21]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17541/shard-glk7/igt@kms_draw_...@draw-method-rgb565-mmap-wc-xtiled.html

  * igt@kms_frontbuffer_tracking@fbc-suspend:
- shard-kbl:  [PASS][22] -> [DMESG-WARN][23] ([i915#180] / 
[i915#93] / [i915#95])
   [22]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/shard-kbl4/igt@kms_frontbuffer_track...@fbc-suspend.html
   [23]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17541/shard-kbl4/igt@kms_frontbuffer_track...@fbc-suspend.html

  * igt@kms_hdr@bpc-switch:
- shard-skl:  [PASS][24] -> [FAIL][25] ([i915#1188])
   [24]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8405/shard-skl9/igt@kms_...@bpc-switch.html
   [25]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17541/shard-skl5/igt@kms_...@bpc

Re: [Intel-gfx] [PATCH] drm: Replace drm_modeset_lock/unlock_all with DRM_MODESET_LOCK_ALL_* helpers

2020-05-01 Thread Michał Orzeł



On 30.04.2020 20:30, Daniel Vetter wrote:
> On Thu, Apr 30, 2020 at 5:38 PM Sean Paul  wrote:
>>
>> On Wed, Apr 29, 2020 at 4:57 AM Jani Nikula  
>> wrote:
>>>
>>> On Tue, 28 Apr 2020, Michal Orzel  wrote:
 As suggested by the TODO list for the kernel DRM subsystem, replace
 the deprecated functions that take/drop modeset locks with new helpers.

 Signed-off-by: Michal Orzel 
 ---
  drivers/gpu/drm/drm_mode_object.c | 10 ++
  1 file changed, 6 insertions(+), 4 deletions(-)

 diff --git a/drivers/gpu/drm/drm_mode_object.c 
 b/drivers/gpu/drm/drm_mode_object.c
 index 35c2719..901b078 100644
 --- a/drivers/gpu/drm/drm_mode_object.c
 +++ b/drivers/gpu/drm/drm_mode_object.c
 @@ -402,12 +402,13 @@ int drm_mode_obj_get_properties_ioctl(struct 
 drm_device *dev, void *data,
  {
   struct drm_mode_obj_get_properties *arg = data;
   struct drm_mode_object *obj;
 + struct drm_modeset_acquire_ctx ctx;
   int ret = 0;

   if (!drm_core_check_feature(dev, DRIVER_MODESET))
   return -EOPNOTSUPP;

 - drm_modeset_lock_all(dev);
 + DRM_MODESET_LOCK_ALL_BEGIN(dev, ctx, 0, ret);
>>>
>>> I cry a little every time I look at the DRM_MODESET_LOCK_ALL_BEGIN and
>>> DRM_MODESET_LOCK_ALL_END macros. :(
>>>
>>> Currently only six users... but there are ~60 calls to
>>> drm_modeset_lock_all{,_ctx} that I presume are to be replaced. I wonder
>>> if this will come back and haunt us.
>>>
>>
>> What's the alternative? Seems like the options without the macros is
>> to use incorrect scope or have a bunch of retry/backoff cargo-cult
>> everywhere (and hope the copy source is done correctly).
> 
> Yeah Sean & me had a bunch of bikesheds and this is the least worst
> option we could come up with. You can't make it a function because of
> the control flow. You don't want to open code this because it's tricky
> to get right, if all you want is to just grab all locks. But it is
> magic hidden behind a macro, which occasionally ends up hurting.
> -Daniel
So what are we doing with this problem? Should we replace at once approx. 60 
calls?

Michal
> 
>> Sean
>>
>>> BR,
>>> Jani.
>>>
>>>

   obj = drm_mode_object_find(dev, file_priv, arg->obj_id, 
 arg->obj_type);
   if (!obj) {
 @@ -427,7 +428,7 @@ int drm_mode_obj_get_properties_ioctl(struct 
 drm_device *dev, void *data,
  out_unref:
   drm_mode_object_put(obj);
  out:
 - drm_modeset_unlock_all(dev);
 + DRM_MODESET_LOCK_ALL_END(ctx, ret);
   return ret;
  }

 @@ -449,12 +450,13 @@ static int set_property_legacy(struct 
 drm_mode_object *obj,
  {
   struct drm_device *dev = prop->dev;
   struct drm_mode_object *ref;
 + struct drm_modeset_acquire_ctx ctx;
   int ret = -EINVAL;

   if (!drm_property_change_valid_get(prop, prop_value, &ref))
   return -EINVAL;

 - drm_modeset_lock_all(dev);
 + DRM_MODESET_LOCK_ALL_BEGIN(dev, ctx, 0, ret);
   switch (obj->type) {
   case DRM_MODE_OBJECT_CONNECTOR:
   ret = drm_connector_set_obj_prop(obj, prop, prop_value);
 @@ -468,7 +470,7 @@ static int set_property_legacy(struct drm_mode_object 
 *obj,
   break;
   }
   drm_property_change_valid_put(prop, ref);
 - drm_modeset_unlock_all(dev);
 + DRM_MODESET_LOCK_ALL_END(ctx, ret);

   return ret;
  }
>>>
>>> --
>>> Jani Nikula, Intel Open Source Graphics Center
> 
> 
> 
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] ✓ Fi.CI.BAT: success for series starting with [CI,1/3] drm/i915/gem: Use chained reloc batches

2020-05-01 Thread Patchwork
== Series Details ==

Series: series starting with [CI,1/3] drm/i915/gem: Use chained reloc batches
URL   : https://patchwork.freedesktop.org/series/76821/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_8406 -> Patchwork_17542


Summary
---

  **SUCCESS**

  No regressions found.

  External URL: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17542/index.html

Known issues


  Here are the changes found in Patchwork_17542 that come from known issues:

### IGT changes ###

 Warnings 

  * igt@i915_pm_rpm@module-reload:
- fi-kbl-x1275:   [FAIL][1] ([i915#62] / [i915#95]) -> [SKIP][2] 
([fdo#109271])
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8406/fi-kbl-x1275/igt@i915_pm_...@module-reload.html
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17542/fi-kbl-x1275/igt@i915_pm_...@module-reload.html

  
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [i915#62]: https://gitlab.freedesktop.org/drm/intel/issues/62
  [i915#95]: https://gitlab.freedesktop.org/drm/intel/issues/95


Participating hosts (51 -> 43)
--

  Missing(8): fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan 
fi-ctg-p8600 fi-kbl-7560u fi-byt-clapper fi-bdw-samus 


Build changes
-

  * CI: CI-20190529 -> None
  * Linux: CI_DRM_8406 -> Patchwork_17542

  CI-20190529: 20190529
  CI_DRM_8406: 591ff846d6332ae3d11fa34f14107ce23da02790 @ 
git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5626: f27fdfff026276ac75c69e487c929a843f66f6ca @ 
git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_17542: aca39b6c9b2e3860e03574bee1a412ee833b49ff @ 
git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

aca39b6c9b2e drm/i915/gem: Try an alternate engine for relocations
11820479052b drm/i915/gem: Use a single chained reloc batches for a single 
execbuf
c556477bf85f drm/i915/gem: Use chained reloc batches

== Logs ==

For more details see: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17542/index.html
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 0/4] Steer multicast register workaround verification

2020-05-01 Thread Matt Roper
On Fri, May 01, 2020 at 09:01:42AM +0100, Tvrtko Ursulin wrote:
> 
> Hi,
> 
> On 01/05/2020 00:15, Matt Roper wrote:
> > We're seeing some CI errors indicating that a workaround did not apply
> > properly on EHL/JSL.  The workaround in question is updating a multicast
> > register, the failures are only seen on specific CI machines, and the
> > failures only seem to happen on resets and such rather than on initial
> > driver load.  It seems likely that the culprit here is failure to steer
> > the multicast register readback on a SKU that has slice0 / subslice0
> > fused off.
> > 
> > This series makes a couple changes:
> >   * Workaround verification will explicitly steer MCR registers by
> > calling read_subslice_reg rather than a regular read.
> >   * New multicast ranges are added for gen11 and gen12.  Sadly this
> > information is still missing from the bspec (just like the updated
> > forcewake tables).  The hardware guys have given us a spreadsheet
> > with both the forcewake and the multicast information while they work
> > on getting the spec properly updated, so that's where the new ranges
> > come from.
> 
> I think there are multiple things here. To begin with, newly discovered
> ranges are of course a savior.
> 
> But I am not sure about the approach of using intel_read_subslice_reg in
> wa_verify. It is one suspicion that 0xfdc is lost on reset, but we do
> reprogram it afterwards don't we? And since it is the first register in the
> list it is supposed to be in place before the rest of verification runs, no?
> 
> A year or two I tried figuring this for Icelake and failed, but AFAIR (maybe
> my experiments can be found somewhere on trybot patchwork), I even tried
> both applying the affected ones via unicast (for each ss, or l3 where
> applicable) and also verifying a single register in all enabled ss. AFAIR
> there were still some issues there. Granted my memory could be leaky.. But I
> think this multiple write/verify could still be useful.
> 
> (Now that I think about it, I think that the problem area back when I
> experiementing with it was more suspend/resume.. hm..)
> 
> My main concern is that with current code we effectively have, after reset:
> 
> intel_gt_apply_workarounds:
>program 0xfdc
>program the rest of wa
> verify_wa
>do reads using configured 0xfdc
> 
> So MCR should be correct. This series seems to be doing:
> 
> intel_gt_apply_workarounds:
>program 0xfdc
>   * store ss used for MCR configuration
>program the rest of wa
> verify_wa
>Do reads but reconfigure 0xfdc before every register in range,
>but to the same value as in initial configuration.
> 
> Is this correct? Is the thinking then simply writing the same value to 0xfdc
> multiple times fixes things?

We stick the MCR steering at the beginning of the GT workaround list,
but the workaround that CI is complaining about is one of the RCS engine
workarounds, which is on a separate WA list.  At startup I think the RCS
workarounds are applied shortly after the GT workarounds, so the
steering is still in place, but on things like engine resets and such
that might not be the case.

A simpler approach might be to just stick the steering settings at the
front of the RCS workaround list, just as we do for the GT list.  That
was actually what I intended to try initially, but I started thinking
that we might want to handle registers invidually once we get the gen12
details about different steering registers for different ranges.  But I
guess we should probably hold off on that until we actually have all the
details documented and just keep things simple for now.

I'll send a v2 of this series later today that just sticks the steering
settings at the front of the RCS engine's WA list instead of trying to
individually steer each register.  That's probably a simpler and cleaner
approach for now.


Matt


> 
> Regards,
> 
> Tvrtko
> 
> P.S. Update, found the experiments, listing some of them:
> 
> https://patchwork.freedesktop.org/series/64183/
> https://patchwork.freedesktop.org/series/64013/
> 
> It reminded me that there were some unexplained issues with regards of where
> I used ffs or fls for finding the valid common MCR setting between L3 and
> SSEU. I think we use a different one than Windows but ours works better for
> our verification, empirically at least. Usual disclaimer about my leaky
> memory applies here.
> 
> > In addition to MCR and forcewake, there's supposed to be some more bspec
> > updates coming soon that deal with steering (i.e., different MCR ranges
> > should actually be using different registers to steer rather than just
> > the 0xFDC register we're familiar with); I don't have the full details
> > on that yet, so those updates will have to wait until we actually have
> > an updated spec.
> > 
> > References: https://gitlab.freedesktop.org/drm/intel/issues/1222
> > 
> > Matt Roper (4):
> >drm/i915: Setup multicast register steering for all gen >= 10
>

[Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915: Implement vm_ops->access for gdb access into mmaps (rev4)

2020-05-01 Thread Patchwork
== Series Details ==

Series: drm/i915: Implement vm_ops->access for gdb access into mmaps (rev4)
URL   : https://patchwork.freedesktop.org/series/76783/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_8406 -> Patchwork_17543


Summary
---

  **FAILURE**

  Serious unknown changes coming with Patchwork_17543 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_17543, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17543/index.html

Possible new issues
---

  Here are the unknown changes that may have been introduced in Patchwork_17543:

### IGT changes ###

 Possible regressions 

  * igt@i915_selftest@live@gtt:
- fi-icl-guc: [PASS][1] -> [INCOMPLETE][2]
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8406/fi-icl-guc/igt@i915_selftest@l...@gtt.html
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17543/fi-icl-guc/igt@i915_selftest@l...@gtt.html

  
Known issues


  Here are the changes found in Patchwork_17543 that come from known issues:

### IGT changes ###

 Issues hit 

  * igt@i915_selftest@live@workarounds:
- fi-bwr-2160:[PASS][3] -> [INCOMPLETE][4] ([i915#489])
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8406/fi-bwr-2160/igt@i915_selftest@l...@workarounds.html
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17543/fi-bwr-2160/igt@i915_selftest@l...@workarounds.html

  
 Warnings 

  * igt@i915_pm_rpm@module-reload:
- fi-kbl-x1275:   [FAIL][5] ([i915#62] / [i915#95]) -> [SKIP][6] 
([fdo#109271])
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8406/fi-kbl-x1275/igt@i915_pm_...@module-reload.html
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17543/fi-kbl-x1275/igt@i915_pm_...@module-reload.html

  
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [i915#489]: https://gitlab.freedesktop.org/drm/intel/issues/489
  [i915#62]: https://gitlab.freedesktop.org/drm/intel/issues/62
  [i915#95]: https://gitlab.freedesktop.org/drm/intel/issues/95


Participating hosts (51 -> 43)
--

  Missing(8): fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan 
fi-ctg-p8600 fi-kbl-7560u fi-byt-clapper fi-bdw-samus 


Build changes
-

  * CI: CI-20190529 -> None
  * Linux: CI_DRM_8406 -> Patchwork_17543

  CI-20190529: 20190529
  CI_DRM_8406: 591ff846d6332ae3d11fa34f14107ce23da02790 @ 
git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5626: f27fdfff026276ac75c69e487c929a843f66f6ca @ 
git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_17543: ae8ec16d0ac6ce7d65dc64b8d3dc2f3b38c55dd0 @ 
git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

ae8ec16d0ac6 drm/i915: Implement vm_ops->access for gdb access into mmaps

== Logs ==

For more details see: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17543/index.html
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] drm/i915: check to see if SIMD registers are available before using SIMD

2020-05-01 Thread Jason A. Donenfeld
Sometimes it's not okay to use SIMD registers, the conditions for which
have changed subtly from kernel release to kernel release. Usually the
pattern is to check for may_use_simd() and then fallback to using
something slower in the unlikely case SIMD registers aren't available.
So, this patch fixes up i915's accelerated memcpy routines to fallback
to boring memcpy if may_use_simd() is false.

Cc: sta...@vger.kernel.org
Signed-off-by: Jason A. Donenfeld 
---
 drivers/gpu/drm/i915/i915_memcpy.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_memcpy.c 
b/drivers/gpu/drm/i915/i915_memcpy.c
index fdd550405fd3..7c0e022586bc 100644
--- a/drivers/gpu/drm/i915/i915_memcpy.c
+++ b/drivers/gpu/drm/i915/i915_memcpy.c
@@ -24,6 +24,7 @@
 
 #include 
 #include 
+#include 
 
 #include "i915_memcpy.h"
 
@@ -38,6 +39,12 @@ static DEFINE_STATIC_KEY_FALSE(has_movntdqa);
 #ifdef CONFIG_AS_MOVNTDQA
 static void __memcpy_ntdqa(void *dst, const void *src, unsigned long len)
 {
+   if (unlikely(!may_use_simd())) {
+   memcpy(dst, src, len);
+   return;
+   }
+
+
kernel_fpu_begin();
 
while (len >= 4) {
@@ -67,6 +74,11 @@ static void __memcpy_ntdqa(void *dst, const void *src, 
unsigned long len)
 
 static void __memcpy_ntdqu(void *dst, const void *src, unsigned long len)
 {
+   if (unlikely(!may_use_simd())) {
+   memcpy(dst, src, len);
+   return;
+   }
+
kernel_fpu_begin();
 
while (len >= 4) {
-- 
2.26.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 02/23] x86/gpu: add RKL stolen memory support

2020-05-01 Thread Matt Roper
RKL re-uses the same stolen memory registers as TGL and ICL.

Bspec: 52055
Bspec: 49589
Bspec: 49636
Cc: Lucas De Marchi 
Signed-off-by: Matt Roper 
---
 arch/x86/kernel/early-quirks.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c
index 2f9ec14be3b1..a4b5af03dcc1 100644
--- a/arch/x86/kernel/early-quirks.c
+++ b/arch/x86/kernel/early-quirks.c
@@ -550,6 +550,7 @@ static const struct pci_device_id intel_early_ids[] 
__initconst = {
INTEL_ICL_11_IDS(&gen11_early_ops),
INTEL_EHL_IDS(&gen11_early_ops),
INTEL_TGL_12_IDS(&gen11_early_ops),
+   INTEL_RKL_IDS(&gen11_early_ops),
 };
 
 struct resource intel_graphics_stolen_res __ro_after_init = DEFINE_RES_MEM(0, 
0);
-- 
2.24.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 00/23] Introduce Rocket Lake

2020-05-01 Thread Matt Roper
Rocket Lake (RKL) is another gen12 platform, so the driver support is
mostly a straightforward evolution of our existing Tiger Lake support.

One area of this patch series that's a bit non-intuitive and warrants
some extra explanation is the output handling.  All four of RKL's output
ports use combo PHYs, but the hardware guys have recycled the naming
scheme from Tiger Lake.  The DDI's are still named "A, B, TC1, and TC2"
even though none of them are actually connected to Type-C PHYs on this
platform.  From a register offset perspective, these four DDIs are
effectively A, B, D, and E (skipping over the register range that would
usually be used for a "C" instance of the DDI).  However the PHYs
attached to those DDIs do *not* skip the "C" instance, so we wind up
with the following output relationships:

DDI-A   (port A) <-> PHY-A
DDI-B   (port B) <-> PHY-B
DDI-TC1 (port D) <-> PHY-C
DDI-TC2 (port E) <-> PHY-D

Given that most of our past platforms have straight DDI==PHY mappings,
extra care is needed to ensure we use the proper namespace (port or phy)
when programming various output-related registers.


Aditya Swarup (1):
  drm/i915/rkl: Don't try to read out DSI transcoders

José Roberto de Souza (1):
  drm/i915/rkl: Disable PSR2

Lucas De Marchi (1):
  drm/i915/rkl: provide port/phy mapping for vbt

Matt Roper (20):
  drm/i915/rkl: Add RKL platform info and PCI ids
  x86/gpu: add RKL stolen memory support
  drm/i915/rkl: Re-use TGL GuC/HuC firmware
  drm/i915/rkl: Load DMC firmware for Rocket Lake
  drm/i915/rkl: Add PCH support
  drm/i915/rkl: Update memory bandwidth parameters
  drm/i915/rkl: Limit number of universal planes to 5
  drm/i915/rkl: Add power well support
  drm/i915/rkl: Program BW_BUDDY0 registers instead of BW_BUDDY1/2
  drm/i915/rkl: RKL only uses PHY_MISC for PHY's A and B
  drm/i915/rkl: Add cdclk support
  drm/i915/rkl: Handle new DPCLKA_CFGCR0 layout
  drm/i915/rkl: Check proper SDEISR bits for TC1 and TC2 outputs
  drm/i915/rkl: Setup ports/phys
  drm/i915/rkl: Add DDC pin mapping
  drm/i915/rkl: Don't try to access transcoder D
  drm/i915/rkl: Handle comp master/slave relationships for PHYs
  drm/i915/rkl: Add DPLL4 support
  drm/i915/rkl: Handle HTI
  drm/i915/rkl: Add initial workarounds

 arch/x86/kernel/early-quirks.c|   1 +
 drivers/gpu/drm/i915/display/intel_bios.c |  72 --
 drivers/gpu/drm/i915/display/intel_bw.c   |  10 +-
 drivers/gpu/drm/i915/display/intel_cdclk.c|  54 -
 .../gpu/drm/i915/display/intel_combo_phy.c|  55 +++--
 drivers/gpu/drm/i915/display/intel_csr.c  |  10 +-
 drivers/gpu/drm/i915/display/intel_ddi.c  |  18 +-
 drivers/gpu/drm/i915/display/intel_display.c  |  82 +--
 .../drm/i915/display/intel_display_power.c| 229 --
 drivers/gpu/drm/i915/display/intel_dp.c   |   8 +-
 drivers/gpu/drm/i915/display/intel_dpll_mgr.c |  50 +++-
 drivers/gpu/drm/i915/display/intel_dpll_mgr.h |   1 +
 drivers/gpu/drm/i915/display/intel_hdmi.c |  22 +-
 drivers/gpu/drm/i915/display/intel_psr.c  |  15 ++
 drivers/gpu/drm/i915/display/intel_sprite.c   |  22 +-
 drivers/gpu/drm/i915/display/intel_sprite.h   |  11 +-
 drivers/gpu/drm/i915/display/intel_vdsc.c |   4 +-
 drivers/gpu/drm/i915/gt/intel_workarounds.c   |  88 ---
 drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c  |   3 +
 drivers/gpu/drm/i915/i915_drv.h   |  13 +
 drivers/gpu/drm/i915/i915_irq.c   |  10 +-
 drivers/gpu/drm/i915/i915_pci.c   |  13 +
 drivers/gpu/drm/i915/i915_reg.h   |  33 ++-
 drivers/gpu/drm/i915/intel_device_info.c  |   6 +-
 drivers/gpu/drm/i915/intel_device_info.h  |   2 +
 drivers/gpu/drm/i915/intel_pch.c  |   8 +-
 include/drm/i915_pciids.h |   9 +
 27 files changed, 702 insertions(+), 147 deletions(-)

-- 
2.24.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 04/23] drm/i915/rkl: Load DMC firmware for Rocket Lake

2020-05-01 Thread Matt Roper
Cc: Anusha Srivatsa 
Signed-off-by: Matt Roper 
---
 drivers/gpu/drm/i915/display/intel_csr.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/display/intel_csr.c 
b/drivers/gpu/drm/i915/display/intel_csr.c
index 3112572cfb7d..319932b03e88 100644
--- a/drivers/gpu/drm/i915/display/intel_csr.c
+++ b/drivers/gpu/drm/i915/display/intel_csr.c
@@ -40,6 +40,10 @@
 
 #define GEN12_CSR_MAX_FW_SIZE  ICL_CSR_MAX_FW_SIZE
 
+#define RKL_CSR_PATH   "i915/rkl_dmc_ver2_01.bin"
+#define RKL_CSR_VERSION_REQUIRED   CSR_VERSION(2, 1)
+MODULE_FIRMWARE(RKL_CSR_PATH);
+
 #define TGL_CSR_PATH   "i915/tgl_dmc_ver2_06.bin"
 #define TGL_CSR_VERSION_REQUIRED   CSR_VERSION(2, 6)
 #define TGL_CSR_MAX_FW_SIZE0x6000
@@ -682,7 +686,11 @@ void intel_csr_ucode_init(struct drm_i915_private 
*dev_priv)
 */
intel_csr_runtime_pm_get(dev_priv);
 
-   if (INTEL_GEN(dev_priv) >= 12) {
+   if (IS_ROCKETLAKE(dev_priv)) {
+   csr->fw_path = RKL_CSR_PATH;
+   csr->required_version = RKL_CSR_VERSION_REQUIRED;
+   csr->max_fw_size = GEN12_CSR_MAX_FW_SIZE;
+   } else if (INTEL_GEN(dev_priv) >= 12) {
csr->fw_path = TGL_CSR_PATH;
csr->required_version = TGL_CSR_VERSION_REQUIRED;
/* Allow to load fw via parameter using the last known size */
-- 
2.24.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 06/23] drm/i915/rkl: Update memory bandwidth parameters

2020-05-01 Thread Matt Roper
The RKL platform has different memory characteristics from past
platforms.  Update the values used by our memory bandwidth calculations
accordingly.

Bspec: 53998
Cc: James Ausmus 
Signed-off-by: Matt Roper 
---
 drivers/gpu/drm/i915/display/intel_bw.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/display/intel_bw.c 
b/drivers/gpu/drm/i915/display/intel_bw.c
index 4aa54fcb0629..c23401785f4a 100644
--- a/drivers/gpu/drm/i915/display/intel_bw.c
+++ b/drivers/gpu/drm/i915/display/intel_bw.c
@@ -176,6 +176,12 @@ static const struct intel_sa_info tgl_sa_info = {
.displayrtids = 256,
 };
 
+static const struct intel_sa_info rkl_sa_info = {
+   .deburst = 16,
+   .deprogbwlimit = 20, /* GB/s */
+   .displayrtids = 128,
+};
+
 static int icl_get_bw_info(struct drm_i915_private *dev_priv, const struct 
intel_sa_info *sa)
 {
struct intel_qgv_info qi = {};
@@ -271,7 +277,9 @@ void intel_bw_init_hw(struct drm_i915_private *dev_priv)
if (!HAS_DISPLAY(dev_priv))
return;
 
-   if (IS_GEN(dev_priv, 12))
+   if (IS_ROCKETLAKE(dev_priv))
+   icl_get_bw_info(dev_priv, &rkl_sa_info);
+   else if (IS_GEN(dev_priv, 12))
icl_get_bw_info(dev_priv, &tgl_sa_info);
else if (IS_GEN(dev_priv, 11))
icl_get_bw_info(dev_priv, &icl_sa_info);
-- 
2.24.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 11/23] drm/i915/rkl: Add cdclk support

2020-05-01 Thread Matt Roper
Note that the 192000 clock frequencies can be achieved with different
pairs of ratio+divider, which is something we haven't encountered
before.  If any of those ratios were common with other legal cdclk
values, then it would mean we could avoid triggering full modesets if we
just needed to change the divider.  However at the moment there don't
appear to be any valid cdclks that share the same ratio so we can't take
advantage of this and it doesn't really matter which approach we use to
achieve the 192000 cdclk.  For now our driver functions that operate on
the table will just always pick the first entry (lower ratio + lower
divider).

Bspec: 49202
Cc: Ville Syrjälä 
Signed-off-by: Matt Roper 
---
 drivers/gpu/drm/i915/display/intel_cdclk.c | 54 +++---
 1 file changed, 48 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_cdclk.c 
b/drivers/gpu/drm/i915/display/intel_cdclk.c
index 979a0241fdcb..4ca87260e8ba 100644
--- a/drivers/gpu/drm/i915/display/intel_cdclk.c
+++ b/drivers/gpu/drm/i915/display/intel_cdclk.c
@@ -1230,6 +1230,40 @@ static const struct intel_cdclk_vals icl_cdclk_table[] = 
{
{}
 };
 
+/*
+ * RKL has multiple divider+ratio pairs that can hit cdclk=192000.  Our
+ * functions to read these tables will just always pick the first one which
+ * should be fine since there's no other valid cdclk value that can be achieved
+ * via the same ratio with a different divider (i.e., no opportunity to avoid a
+ * full modeset).
+ */
+static const struct intel_cdclk_vals rkl_cdclk_table[] = {
+   { .refclk = 19200, .cdclk = 172800, .divider = 3, .ratio = 27 },
+   { .refclk = 19200, .cdclk = 192000, .divider = 2, .ratio = 20 },
+   { .refclk = 19200, .cdclk = 192000, .divider = 3, .ratio = 15 },
+   { .refclk = 19200, .cdclk = 307200, .divider = 2, .ratio = 32 },
+   { .refclk = 19200, .cdclk = 326400, .divider = 4, .ratio = 68 },
+   { .refclk = 19200, .cdclk = 556800, .divider = 2, .ratio = 58 },
+   { .refclk = 19200, .cdclk = 652800, .divider = 2, .ratio = 68 },
+
+   { .refclk = 24000, .cdclk = 176000, .divider = 3, .ratio = 22 },
+   { .refclk = 24000, .cdclk = 192000, .divider = 2, .ratio = 16 },
+   { .refclk = 24000, .cdclk = 192000, .divider = 3, .ratio = 24 },
+   { .refclk = 24000, .cdclk = 312000, .divider = 2, .ratio = 26 },
+   { .refclk = 24000, .cdclk = 324000, .divider = 4, .ratio = 54 },
+   { .refclk = 24000, .cdclk = 552000, .divider = 2, .ratio = 45 },
+   { .refclk = 24000, .cdclk = 648000, .divider = 2, .ratio = 54 },
+
+   { .refclk = 38400, .cdclk = 179200, .divider = 3, .ratio = 14 },
+   { .refclk = 38400, .cdclk = 192000, .divider = 2, .ratio = 10 },
+   { .refclk = 38400, .cdclk = 192000, .divider = 3, .ratio = 15 },
+   { .refclk = 38400, .cdclk = 307200, .divider = 2, .ratio = 16 },
+   { .refclk = 38400, .cdclk = 326400, .divider = 4, .ratio = 34 },
+   { .refclk = 38400, .cdclk = 556800, .divider = 2, .ratio = 29 },
+   { .refclk = 38400, .cdclk = 652800, .divider = 2, .ratio = 34 },
+   {}
+};
+
 static int bxt_calc_cdclk(struct drm_i915_private *dev_priv, int min_cdclk)
 {
const struct intel_cdclk_vals *table = dev_priv->cdclk.table;
@@ -1405,8 +1439,8 @@ static void bxt_get_cdclk(struct drm_i915_private 
*dev_priv,
div = 2;
break;
case BXT_CDCLK_CD2X_DIV_SEL_1_5:
-   drm_WARN(&dev_priv->drm,
-IS_GEMINILAKE(dev_priv) || INTEL_GEN(dev_priv) >= 10,
+   drm_WARN(&dev_priv->drm, IS_GEMINILAKE(dev_priv) ||
+(INTEL_GEN(dev_priv) >= 10 && 
!IS_ROCKETLAKE(dev_priv)),
 "Unsupported divider\n");
div = 3;
break;
@@ -1414,7 +1448,8 @@ static void bxt_get_cdclk(struct drm_i915_private 
*dev_priv,
div = 4;
break;
case BXT_CDCLK_CD2X_DIV_SEL_4:
-   drm_WARN(&dev_priv->drm, INTEL_GEN(dev_priv) >= 10,
+   drm_WARN(&dev_priv->drm,
+INTEL_GEN(dev_priv) >= 10 && !IS_ROCKETLAKE(dev_priv),
 "Unsupported divider\n");
div = 8;
break;
@@ -1564,7 +1599,8 @@ static void bxt_set_cdclk(struct drm_i915_private 
*dev_priv,
break;
case 3:
drm_WARN(&dev_priv->drm,
-IS_GEMINILAKE(dev_priv) || INTEL_GEN(dev_priv) >= 10,
+IS_GEMINILAKE(dev_priv) ||
+(INTEL_GEN(dev_priv) >= 10 && 
!IS_ROCKETLAKE(dev_priv)),
 "Unsupported divider\n");
divider = BXT_CDCLK_CD2X_DIV_SEL_1_5;
break;
@@ -1572,7 +1608,8 @@ static void bxt_set_cdclk(struct drm_i915_private 
*dev_priv,
divider = BXT_CDCLK_CD2X_DIV_SEL_2;
break;
case 8:
-   drm_WARN(&dev_priv->drm, IN

[Intel-gfx] [PATCH 10/23] drm/i915/rkl: RKL only uses PHY_MISC for PHY's A and B

2020-05-01 Thread Matt Roper
Since the number of platforms with this restriction are growing, let's
separate out the platform logic into a has_phy_misc() function.

Bspec: 50107
Signed-off-by: Matt Roper 
---
 .../gpu/drm/i915/display/intel_combo_phy.c| 30 +++
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_combo_phy.c 
b/drivers/gpu/drm/i915/display/intel_combo_phy.c
index 9ff05ec12115..43d8784f6fa0 100644
--- a/drivers/gpu/drm/i915/display/intel_combo_phy.c
+++ b/drivers/gpu/drm/i915/display/intel_combo_phy.c
@@ -181,11 +181,25 @@ static void cnl_combo_phys_uninit(struct drm_i915_private 
*dev_priv)
intel_de_write(dev_priv, CHICKEN_MISC_2, val);
 }
 
+static bool has_phy_misc(struct drm_i915_private *i915, enum phy phy)
+{
+   /*
+* Some platforms only expect PHY_MISC to be programmed for PHY-A and
+* PHY-B and may not even have instances of the register for the
+* other combo PHY's.
+*/
+   if (IS_ELKHARTLAKE(i915) ||
+   IS_ROCKETLAKE(i915))
+   return phy < PHY_C;
+
+   return true;
+}
+
 static bool icl_combo_phy_enabled(struct drm_i915_private *dev_priv,
  enum phy phy)
 {
/* The PHY C added by EHL has no PHY_MISC register */
-   if (IS_ELKHARTLAKE(dev_priv) && phy == PHY_C)
+   if (!has_phy_misc(dev_priv, phy))
return intel_de_read(dev_priv, ICL_PORT_COMP_DW0(phy)) & 
COMP_INIT;
else
return !(intel_de_read(dev_priv, ICL_PHY_MISC(phy)) &
@@ -317,12 +331,7 @@ static void icl_combo_phys_init(struct drm_i915_private 
*dev_priv)
continue;
}
 
-   /*
-* Although EHL adds a combo PHY C, there's no PHY_MISC
-* register for it and no need to program the
-* DE_IO_COMP_PWR_DOWN setting on PHY C.
-*/
-   if (IS_ELKHARTLAKE(dev_priv) && phy == PHY_C)
+   if (!has_phy_misc(dev_priv, phy))
goto skip_phy_misc;
 
/*
@@ -376,12 +385,7 @@ static void icl_combo_phys_uninit(struct drm_i915_private 
*dev_priv)
 "Combo PHY %c HW state changed unexpectedly\n",
 phy_name(phy));
 
-   /*
-* Although EHL adds a combo PHY C, there's no PHY_MISC
-* register for it and no need to program the
-* DE_IO_COMP_PWR_DOWN setting on PHY C.
-*/
-   if (IS_ELKHARTLAKE(dev_priv) && phy == PHY_C)
+   if (!has_phy_misc(dev_priv, phy))
goto skip_phy_misc;
 
val = intel_de_read(dev_priv, ICL_PHY_MISC(phy));
-- 
2.24.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 13/23] drm/i915/rkl: Check proper SDEISR bits for TC1 and TC2 outputs

2020-05-01 Thread Matt Roper
When Rocket Lake is paired with a TGP PCH, the last two outputs utilize
the TC1 and TC2 hpd pins, even though these are combo outputs.

Bspec: 49181
Cc: Lucas De Marchi 
Signed-off-by: Matt Roper 
---
 drivers/gpu/drm/i915/display/intel_dp.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_dp.c 
b/drivers/gpu/drm/i915/display/intel_dp.c
index 6952b0295096..d32bbcd99b8a 100644
--- a/drivers/gpu/drm/i915/display/intel_dp.c
+++ b/drivers/gpu/drm/i915/display/intel_dp.c
@@ -6172,8 +6172,12 @@ static bool bxt_digital_port_connected(struct 
intel_encoder *encoder)
 static bool intel_combo_phy_connected(struct drm_i915_private *dev_priv,
  enum phy phy)
 {
-   if (HAS_PCH_MCC(dev_priv) && phy == PHY_C)
-   return intel_de_read(dev_priv, SDEISR) & 
SDE_TC_HOTPLUG_ICP(PORT_TC1);
+   if (IS_ROCKETLAKE(dev_priv) && phy >= PHY_C)
+   return intel_de_read(dev_priv, SDEISR) &
+   SDE_TC_HOTPLUG_ICP(phy - PHY_C);
+   else if (HAS_PCH_MCC(dev_priv) && phy == PHY_C)
+   return intel_de_read(dev_priv, SDEISR) &
+   SDE_TC_HOTPLUG_ICP(PORT_TC1);
 
return intel_de_read(dev_priv, SDEISR) & SDE_DDI_HOTPLUG_ICP(phy);
 }
-- 
2.24.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 07/23] drm/i915/rkl: Limit number of universal planes to 5

2020-05-01 Thread Matt Roper
RKL only has five universal planes, plus a cursor.  Since the
bottom-most universal plane is considered the primary plane, set the
number of sprites available on this platform to 4.

In general, the plane capabilities of the remaining planes stay the same
as TGL.  However the NV12 Y-plane support moves down to the new top two
planes and now only the bottom three planes can be used for NV12 UV.

Bspec: 49181
Bspec: 49251
Cc: Ville Syrjälä 
Signed-off-by: Matt Roper 
---
 drivers/gpu/drm/i915/display/intel_display.c |  6 +-
 drivers/gpu/drm/i915/display/intel_sprite.c  | 17 -
 drivers/gpu/drm/i915/display/intel_sprite.h  | 11 ++-
 drivers/gpu/drm/i915/i915_irq.c  |  4 +++-
 drivers/gpu/drm/i915/i915_reg.h  |  5 +
 drivers/gpu/drm/i915/intel_device_info.c |  5 -
 6 files changed, 35 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_display.c 
b/drivers/gpu/drm/i915/display/intel_display.c
index 2a17cf38d3dc..ee6d6beac241 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -12500,7 +12500,7 @@ static int icl_check_nv12_planes(struct 
intel_crtc_state *crtc_state)
continue;
 
for_each_intel_plane_on_crtc(&dev_priv->drm, crtc, linked) {
-   if (!icl_is_nv12_y_plane(linked->id))
+   if (!icl_is_nv12_y_plane(dev_priv, linked->id))
continue;
 
if (crtc_state->active_planes & BIT(linked->id))
@@ -12546,6 +12546,10 @@ static int icl_check_nv12_planes(struct 
intel_crtc_state *crtc_state)
plane_state->cus_ctl |= PLANE_CUS_PLANE_7;
else if (linked->id == PLANE_SPRITE4)
plane_state->cus_ctl |= PLANE_CUS_PLANE_6;
+   else if (linked->id == PLANE_SPRITE3)
+   plane_state->cus_ctl |= PLANE_CUS_PLANE_5_RKL;
+   else if (linked->id == PLANE_SPRITE2)
+   plane_state->cus_ctl |= PLANE_CUS_PLANE_4_RKL;
else
MISSING_CASE(linked->id);
}
diff --git a/drivers/gpu/drm/i915/display/intel_sprite.c 
b/drivers/gpu/drm/i915/display/intel_sprite.c
index ec7055f7..571c36f929bd 100644
--- a/drivers/gpu/drm/i915/display/intel_sprite.c
+++ b/drivers/gpu/drm/i915/display/intel_sprite.c
@@ -333,6 +333,21 @@ int intel_plane_check_src_coordinates(struct 
intel_plane_state *plane_state)
return 0;
 }
 
+static u8 icl_nv12_y_plane_mask(struct drm_i915_private *i915)
+{
+   if (IS_ROCKETLAKE(i915))
+   return BIT(PLANE_SPRITE2) | BIT(PLANE_SPRITE3);
+   else
+   return BIT(PLANE_SPRITE4) | BIT(PLANE_SPRITE5);
+}
+
+bool icl_is_nv12_y_plane(struct drm_i915_private *dev_priv,
+enum plane_id plane_id)
+{
+   return INTEL_GEN(dev_priv) >= 11 &&
+   icl_nv12_y_plane_mask(dev_priv) & BIT(plane_id);
+}
+
 bool icl_is_hdr_plane(struct drm_i915_private *dev_priv, enum plane_id 
plane_id)
 {
return INTEL_GEN(dev_priv) >= 11 &&
@@ -3003,7 +3018,7 @@ static const u32 *icl_get_plane_formats(struct 
drm_i915_private *dev_priv,
if (icl_is_hdr_plane(dev_priv, plane_id)) {
*num_formats = ARRAY_SIZE(icl_hdr_plane_formats);
return icl_hdr_plane_formats;
-   } else if (icl_is_nv12_y_plane(plane_id)) {
+   } else if (icl_is_nv12_y_plane(dev_priv, plane_id)) {
*num_formats = ARRAY_SIZE(icl_sdr_y_plane_formats);
return icl_sdr_y_plane_formats;
} else {
diff --git a/drivers/gpu/drm/i915/display/intel_sprite.h 
b/drivers/gpu/drm/i915/display/intel_sprite.h
index 5eeaa92420d1..cd2104ba1ca1 100644
--- a/drivers/gpu/drm/i915/display/intel_sprite.h
+++ b/drivers/gpu/drm/i915/display/intel_sprite.h
@@ -32,21 +32,14 @@ struct intel_plane *
 skl_universal_plane_create(struct drm_i915_private *dev_priv,
   enum pipe pipe, enum plane_id plane_id);
 
-static inline bool icl_is_nv12_y_plane(enum plane_id id)
-{
-   /* Don't need to do a gen check, these planes are only available on 
gen11 */
-   if (id == PLANE_SPRITE4 || id == PLANE_SPRITE5)
-   return true;
-
-   return false;
-}
-
 static inline u8 icl_hdr_plane_mask(void)
 {
return BIT(PLANE_PRIMARY) |
BIT(PLANE_SPRITE0) | BIT(PLANE_SPRITE1);
 }
 
+bool icl_is_nv12_y_plane(struct drm_i915_private *dev_priv,
+enum plane_id plane_id);
 bool icl_is_hdr_plane(struct drm_i915_private *dev_priv, enum plane_id 
plane_id);
 
 int ivb_plane_min_cdclk(const struct intel_crtc_state *crtc_state,
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index bd722d0650c8..622986759ec6 100644
--- a/drivers/gpu/drm/i91

[Intel-gfx] [PATCH 12/23] drm/i915/rkl: Handle new DPCLKA_CFGCR0 layout

2020-05-01 Thread Matt Roper
RKL uses a slightly different bit layout for the DPCLKA_CFGCR0 register.

Bspec: 50287
Cc: Aditya Swarup 
Signed-off-by: Matt Roper 
---
 drivers/gpu/drm/i915/display/intel_ddi.c | 18 +++---
 drivers/gpu/drm/i915/display/intel_display.c | 15 ---
 drivers/gpu/drm/i915/i915_reg.h  |  4 
 3 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_ddi.c 
b/drivers/gpu/drm/i915/display/intel_ddi.c
index 5601673c3f30..3c1f3cf42a60 100644
--- a/drivers/gpu/drm/i915/display/intel_ddi.c
+++ b/drivers/gpu/drm/i915/display/intel_ddi.c
@@ -2732,7 +2732,9 @@ hsw_set_signal_levels(struct intel_dp *intel_dp)
 static u32 icl_dpclka_cfgcr0_clk_off(struct drm_i915_private *dev_priv,
 enum phy phy)
 {
-   if (intel_phy_is_combo(dev_priv, phy)) {
+   if (IS_ROCKETLAKE(dev_priv)) {
+   return RKL_DPCLKA_CFGCR0_DDI_CLK_OFF(phy);
+   } else if (intel_phy_is_combo(dev_priv, phy)) {
return ICL_DPCLKA_CFGCR0_DDI_CLK_OFF(phy);
} else if (intel_phy_is_tc(dev_priv, phy)) {
enum tc_port tc_port = intel_port_to_tc(dev_priv,
@@ -2759,6 +2761,16 @@ static void icl_map_plls_to_ports(struct intel_encoder 
*encoder,
(val & icl_dpclka_cfgcr0_clk_off(dev_priv, phy)) == 0);
 
if (intel_phy_is_combo(dev_priv, phy)) {
+   u32 mask, sel;
+
+   if (IS_ROCKETLAKE(dev_priv)) {
+   mask = RKL_DPCLKA_CFGCR0_DDI_CLK_SEL_MASK(phy);
+   sel = RKL_DPCLKA_CFGCR0_DDI_CLK_SEL(pll->info->id, phy);
+   } else {
+   mask = ICL_DPCLKA_CFGCR0_DDI_CLK_SEL_MASK(phy);
+   sel = ICL_DPCLKA_CFGCR0_DDI_CLK_SEL(pll->info->id, phy);
+   }
+
/*
 * Even though this register references DDIs, note that we
 * want to pass the PHY rather than the port (DDI).  For
@@ -2769,8 +2781,8 @@ static void icl_map_plls_to_ports(struct intel_encoder 
*encoder,
 *   Clock Select chooses the PLL for both DDIA and DDID and
 *   drives port A in all cases."
 */
-   val &= ~ICL_DPCLKA_CFGCR0_DDI_CLK_SEL_MASK(phy);
-   val |= ICL_DPCLKA_CFGCR0_DDI_CLK_SEL(pll->info->id, phy);
+   val &= mask;
+   val |= sel;
intel_de_write(dev_priv, ICL_DPCLKA_CFGCR0, val);
intel_de_posting_read(dev_priv, ICL_DPCLKA_CFGCR0);
}
diff --git a/drivers/gpu/drm/i915/display/intel_display.c 
b/drivers/gpu/drm/i915/display/intel_display.c
index ee6d6beac241..ebbec5e5bf53 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -10773,9 +10773,18 @@ static void icl_get_ddi_pll(struct drm_i915_private 
*dev_priv, enum port port,
u32 temp;
 
if (intel_phy_is_combo(dev_priv, phy)) {
-   temp = intel_de_read(dev_priv, ICL_DPCLKA_CFGCR0) &
-   ICL_DPCLKA_CFGCR0_DDI_CLK_SEL_MASK(phy);
-   id = temp >> ICL_DPCLKA_CFGCR0_DDI_CLK_SEL_SHIFT(phy);
+   u32 mask, shift;
+
+   if (IS_ROCKETLAKE(dev_priv)) {
+   mask = RKL_DPCLKA_CFGCR0_DDI_CLK_SEL_MASK(phy);
+   shift = RKL_DPCLKA_CFGCR0_DDI_CLK_SEL_SHIFT(phy);
+   } else {
+   mask = ICL_DPCLKA_CFGCR0_DDI_CLK_SEL_MASK(phy);
+   shift = ICL_DPCLKA_CFGCR0_DDI_CLK_SEL_SHIFT(phy);
+   }
+
+   temp = intel_de_read(dev_priv, ICL_DPCLKA_CFGCR0) & mask;
+   id = temp >> shift;
port_dpll_id = ICL_PORT_DPLL_DEFAULT;
} else if (intel_phy_is_tc(dev_priv, phy)) {
u32 clk_sel = intel_de_read(dev_priv, DDI_CLK_SEL(port)) & 
DDI_CLK_SEL_MASK;
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 2266f9fc2d79..f392ad61f1db 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -10163,12 +10163,16 @@ enum skl_power_gate {
 
 #define ICL_DPCLKA_CFGCR0  _MMIO(0x164280)
 #define  ICL_DPCLKA_CFGCR0_DDI_CLK_OFF(phy)(1 << _PICK(phy, 10, 11, 24))
+#define  RKL_DPCLKA_CFGCR0_DDI_CLK_OFF(phy)REG_BIT(phy + 10)
 #define  ICL_DPCLKA_CFGCR0_TC_CLK_OFF(tc_port) (1 << ((tc_port) < PORT_TC4 ? \
   (tc_port) + 12 : \
   (tc_port) - PORT_TC4 + 
21))
 #define  ICL_DPCLKA_CFGCR0_DDI_CLK_SEL_SHIFT(phy)  ((phy) * 2)
 #define  ICL_DPCLKA_CFGCR0_DDI_CLK_SEL_MASK(phy)   (3 << 
ICL_DPCLKA_CFGCR0_DDI_CLK_SEL_SHIFT(phy))
 #define  ICL_DPCLKA_CFGCR0_DDI_CLK_SEL(pll, phy)   ((pll) << 
ICL_DPCLKA_CFGCR0_DDI_CLK_SEL_SHIFT(phy))
+#define  RKL_DPCLKA_CFGCR0_DDI_CLK_SEL_SHIFT(phy)  _PICK(phy, 0, 2, 4, 27)
+#define

[Intel-gfx] [PATCH 05/23] drm/i915/rkl: Add PCH support

2020-05-01 Thread Matt Roper
Rocket Lake can pair with either TGP or CMP.

Cc: Lucas De Marchi 
Signed-off-by: Matt Roper 
---
 drivers/gpu/drm/i915/intel_pch.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_pch.c b/drivers/gpu/drm/i915/intel_pch.c
index 20ab9a5023b5..102b03d24f90 100644
--- a/drivers/gpu/drm/i915/intel_pch.c
+++ b/drivers/gpu/drm/i915/intel_pch.c
@@ -88,7 +88,8 @@ intel_pch_type(const struct drm_i915_private *dev_priv, 
unsigned short id)
case INTEL_PCH_CMP_DEVICE_ID_TYPE:
case INTEL_PCH_CMP2_DEVICE_ID_TYPE:
drm_dbg_kms(&dev_priv->drm, "Found Comet Lake PCH (CMP)\n");
-   drm_WARN_ON(&dev_priv->drm, !IS_COFFEELAKE(dev_priv));
+   drm_WARN_ON(&dev_priv->drm, !IS_COFFEELAKE(dev_priv) &&
+   !IS_ROCKETLAKE(dev_priv));
/* CometPoint is CNP Compatible */
return PCH_CNP;
case INTEL_PCH_CMP_V_DEVICE_ID_TYPE:
@@ -107,7 +108,8 @@ intel_pch_type(const struct drm_i915_private *dev_priv, 
unsigned short id)
case INTEL_PCH_TGP_DEVICE_ID_TYPE:
case INTEL_PCH_TGP2_DEVICE_ID_TYPE:
drm_dbg_kms(&dev_priv->drm, "Found Tiger Lake LP PCH\n");
-   drm_WARN_ON(&dev_priv->drm, !IS_TIGERLAKE(dev_priv));
+   drm_WARN_ON(&dev_priv->drm, !IS_TIGERLAKE(dev_priv) &&
+   !IS_ROCKETLAKE(dev_priv));
return PCH_TGP;
case INTEL_PCH_JSP_DEVICE_ID_TYPE:
case INTEL_PCH_JSP2_DEVICE_ID_TYPE:
@@ -141,7 +143,7 @@ intel_virt_detect_pch(const struct drm_i915_private 
*dev_priv)
 * make an educated guess as to which PCH is really there.
 */
 
-   if (IS_TIGERLAKE(dev_priv))
+   if (IS_TIGERLAKE(dev_priv) || IS_ROCKETLAKE(dev_priv))
id = INTEL_PCH_TGP_DEVICE_ID_TYPE;
else if (IS_ELKHARTLAKE(dev_priv))
id = INTEL_PCH_MCC_DEVICE_ID_TYPE;
-- 
2.24.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 19/23] drm/i915/rkl: Handle comp master/slave relationships for PHYs

2020-05-01 Thread Matt Roper
Certain combo PHYs act as a compensation master to other PHYs and need
to be initialized with a special irefgen bit in the PORT_COMP_DW8
register.  Previously PHY A was the only compensation master (for PHYs
B & C), but RKL adds a fourth PHY which is slaved to PHY C instead.

Bspec: 49291
Cc: Lucas De Marchi 
Cc: José Roberto de Souza 
Cc: Aditya Swarup 
Signed-off-by: Matt Roper 
---
 .../gpu/drm/i915/display/intel_combo_phy.c| 25 +--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_combo_phy.c 
b/drivers/gpu/drm/i915/display/intel_combo_phy.c
index 43d8784f6fa0..77b04bb3ec62 100644
--- a/drivers/gpu/drm/i915/display/intel_combo_phy.c
+++ b/drivers/gpu/drm/i915/display/intel_combo_phy.c
@@ -234,6 +234,27 @@ static bool ehl_vbt_ddi_d_present(struct drm_i915_private 
*i915)
return false;
 }
 
+static bool phy_is_master(struct drm_i915_private *dev_priv, enum phy phy)
+{
+   /*
+* Certain PHYs are connected to compensation resistors and act
+* as masters to other PHYs.
+*
+* ICL,TGL:
+*   A(master) -> B(slave), C(slave)
+* RKL:
+*   A(master) -> B(slave)
+*   C(master) -> D(slave)
+*
+* We must set the IREFGEN bit for any PHY acting as a master
+* to another PHY.
+*/
+   if (IS_ROCKETLAKE(dev_priv) && phy == PHY_C)
+   return true;
+
+   return phy == PHY_A;
+}
+
 static bool icl_combo_phy_verify_state(struct drm_i915_private *dev_priv,
   enum phy phy)
 {
@@ -245,7 +266,7 @@ static bool icl_combo_phy_verify_state(struct 
drm_i915_private *dev_priv,
 
ret = cnl_verify_procmon_ref_values(dev_priv, phy);
 
-   if (phy == PHY_A) {
+   if (phy_is_master(dev_priv, phy)) {
ret &= check_phy_reg(dev_priv, phy, ICL_PORT_COMP_DW8(phy),
 IREFGEN, IREFGEN);
 
@@ -356,7 +377,7 @@ static void icl_combo_phys_init(struct drm_i915_private 
*dev_priv)
 skip_phy_misc:
cnl_set_procmon_ref_values(dev_priv, phy);
 
-   if (phy == PHY_A) {
+   if (phy_is_master(dev_priv, phy)) {
val = intel_de_read(dev_priv, ICL_PORT_COMP_DW8(phy));
val |= IREFGEN;
intel_de_write(dev_priv, ICL_PORT_COMP_DW8(phy), val);
-- 
2.24.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 21/23] drm/i915/rkl: Handle HTI

2020-05-01 Thread Matt Roper
If HTI (also sometimes called HDPORT) is enabled at startup, it may be
using some of the PHYs and DPLLs making them unavailable for general
usage.  Let's read out the HDPORT_STATE register and avoid making use of
resources that HTI is already using.

Bspec: 49189
Bspec: 53707
Cc: Lucas De Marchi 
Signed-off-by: Matt Roper 
---
 drivers/gpu/drm/i915/display/intel_display.c  | 30 ---
 drivers/gpu/drm/i915/display/intel_dpll_mgr.c | 22 ++
 drivers/gpu/drm/i915/display/intel_dpll_mgr.h |  1 +
 drivers/gpu/drm/i915/i915_drv.h   |  3 ++
 drivers/gpu/drm/i915/i915_reg.h   |  6 
 5 files changed, 58 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_display.c 
b/drivers/gpu/drm/i915/display/intel_display.c
index 87defd30af1f..5d2919adb5f3 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -46,6 +46,7 @@
 #include "display/intel_ddi.h"
 #include "display/intel_dp.h"
 #include "display/intel_dp_mst.h"
+#include "display/intel_dpll_mgr.h"
 #include "display/intel_dsi.h"
 #include "display/intel_dvo.h"
 #include "display/intel_gmbus.h"
@@ -16688,6 +16689,13 @@ static void intel_pps_init(struct drm_i915_private 
*dev_priv)
intel_pps_unlock_regs_wa(dev_priv);
 }
 
+static bool hti_uses_phy(u32 hdport_state, enum phy phy)
+{
+   return hdport_state & HDPORT_ENABLED &&
+   (hdport_state & HDPORT_PHY_USED_DP(phy) ||
+hdport_state & HDPORT_PHY_USED_HDMI(phy));
+}
+
 static void intel_setup_outputs(struct drm_i915_private *dev_priv)
 {
struct intel_encoder *encoder;
@@ -16699,10 +16707,22 @@ static void intel_setup_outputs(struct 
drm_i915_private *dev_priv)
return;
 
if (IS_ROCKETLAKE(dev_priv)) {
-   intel_ddi_init(dev_priv, PORT_A);
-   intel_ddi_init(dev_priv, PORT_B);
-   intel_ddi_init(dev_priv, PORT_D);   /* DDI TC1 */
-   intel_ddi_init(dev_priv, PORT_E);   /* DDI TC2 */
+   /*
+* If HTI (aka HDPORT) is enabled at boot, it may have taken
+* over some of the PHYs and made them unavailable to the
+* driver.  In that case we should skip initializing the
+* corresponding outputs.
+*/
+   u32 hdport_state = intel_de_read(dev_priv, HDPORT_STATE);
+
+   if (!hti_uses_phy(hdport_state, PHY_A))
+   intel_ddi_init(dev_priv, PORT_A);
+   if (!hti_uses_phy(hdport_state, PHY_B))
+   intel_ddi_init(dev_priv, PORT_B);
+   if (!hti_uses_phy(hdport_state, PHY_C))
+   intel_ddi_init(dev_priv, PORT_D);   /* DDI TC1 */
+   if (!hti_uses_phy(hdport_state, PHY_D))
+   intel_ddi_init(dev_priv, PORT_E);   /* DDI TC2 */
} else if (INTEL_GEN(dev_priv) >= 12) {
intel_ddi_init(dev_priv, PORT_A);
intel_ddi_init(dev_priv, PORT_B);
@@ -18221,6 +18241,8 @@ static void intel_modeset_readout_hw_state(struct 
drm_device *dev)
 
intel_dpll_readout_hw_state(dev_priv);
 
+   dev_priv->hti_pll_mask = intel_get_hti_plls(dev_priv);
+
for_each_intel_encoder(dev, encoder) {
pipe = 0;
 
diff --git a/drivers/gpu/drm/i915/display/intel_dpll_mgr.c 
b/drivers/gpu/drm/i915/display/intel_dpll_mgr.c
index 196d9eb3a77b..f8078a288379 100644
--- a/drivers/gpu/drm/i915/display/intel_dpll_mgr.c
+++ b/drivers/gpu/drm/i915/display/intel_dpll_mgr.c
@@ -265,6 +265,25 @@ void intel_disable_shared_dpll(const struct 
intel_crtc_state *crtc_state)
mutex_unlock(&dev_priv->dpll.lock);
 }
 
+/*
+ * HTI (aka HDPORT) may be using some of the platform's PLL's, making them
+ * unavailable for use.
+ */
+u32 intel_get_hti_plls(struct drm_i915_private *dev_priv)
+{
+
+   u32 hdport_state;
+
+   if (!IS_ROCKETLAKE(dev_priv))
+   return 0;
+
+   hdport_state = intel_de_read(dev_priv, HDPORT_STATE);
+   if (!(hdport_state & HDPORT_ENABLED))
+   return 0;
+
+   return REG_FIELD_GET(HDPORT_DPLL_USED_MASK, hdport_state);
+}
+
 static struct intel_shared_dpll *
 intel_find_shared_dpll(struct intel_atomic_state *state,
   const struct intel_crtc *crtc,
@@ -280,6 +299,9 @@ intel_find_shared_dpll(struct intel_atomic_state *state,
 
drm_WARN_ON(&dev_priv->drm, dpll_mask & ~(BIT(I915_NUM_PLLS) - 1));
 
+   /* Eliminate DPLLs from consideration if reserved by HTI */
+   dpll_mask &= ~dev_priv->hti_pll_mask;
+
for_each_set_bit(i, &dpll_mask, I915_NUM_PLLS) {
pll = &dev_priv->dpll.shared_dplls[i];
 
diff --git a/drivers/gpu/drm/i915/display/intel_dpll_mgr.h 
b/drivers/gpu/drm/i915/display/intel_dpll_mgr.h
index 5d9a2bc371e7..ac2238646fe7 100644
--- a/drivers/gpu/drm/i915/display/intel_dpll_mgr.h
+++ b/drivers/gpu/dr

[Intel-gfx] [PATCH 15/23] drm/i915/rkl: provide port/phy mapping for vbt

2020-05-01 Thread Matt Roper
From: Lucas De Marchi 

RKL uses the DDI A, DDI B, DDI USBC1, DDI USBC2 from the DE point of
view, so all DDI/pipe/transcoder register use these indexes to refer to
them. Combo phy and IO functions follow another namespace that we keep
as "enum phy". The VBT in theory would use the DE point of view, but
that does not happen in practice.

Provide a table to convert the child devices to the "correct" port
numbering we use. Now this is the output we get while reading the VBT:

DDIA:
[drm:intel_bios_port_aux_ch [i915]] using AUX A for port A (VBT)
[drm:intel_dp_init_connector [i915]] Adding DP connector on [ENCODER:275:DDI A]
[drm:intel_hdmi_init_connector [i915]] Adding HDMI connector on 
[ENCODER:275:DDI A]
[drm:intel_hdmi_init_connector [i915]] Using DDC pin 0x1 for port A (VBT)

DDIB:
[drm:intel_bios_port_aux_ch [i915]] using AUX B for port B (platform default)
[drm:intel_hdmi_init_connector [i915]] Adding HDMI connector on 
[ENCODER:291:DDI B]
[drm:intel_hdmi_init_connector [i915]] Using DDC pin 0x2 for port B (VBT)

DDI USBC1:
[drm:intel_bios_port_aux_ch [i915]] using AUX D for port D (VBT)
[drm:intel_dp_init_connector [i915]] Adding DP connector on [ENCODER:295:DDI D]
[drm:intel_hdmi_init_connector [i915]] Adding HDMI connector on 
[ENCODER:295:DDI D]
[drm:intel_hdmi_init_connector [i915]] Using DDC pin 0x3 for port D (VBT)

DDI USBC2:
[drm:intel_bios_port_aux_ch [i915]] using AUX E for port E (VBT)
[drm:intel_dp_init_connector [i915]] Adding DP connector on [ENCODER:306:DDI E]
[drm:intel_hdmi_init_connector [i915]] Adding HDMI connector on 
[ENCODER:306:DDI E]
[drm:intel_hdmi_init_connector [i915]] Using DDC pin 0x9 for port E (VBT)

Cc: Clinton Taylor 
Cc: Aditya Swarup 
Signed-off-by: Lucas De Marchi 
Signed-off-by: Matt Roper 
---
 drivers/gpu/drm/i915/display/intel_bios.c | 72 ---
 1 file changed, 51 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_bios.c 
b/drivers/gpu/drm/i915/display/intel_bios.c
index 839124647202..4f1a72a90b8f 100644
--- a/drivers/gpu/drm/i915/display/intel_bios.c
+++ b/drivers/gpu/drm/i915/display/intel_bios.c
@@ -1619,30 +1619,18 @@ static u8 map_ddc_pin(struct drm_i915_private 
*dev_priv, u8 vbt_pin)
return 0;
 }
 
-static enum port dvo_port_to_port(u8 dvo_port)
+static enum port __dvo_port_to_port(int n_ports, int n_dvo,
+   const int port_mapping[][3], u8 dvo_port)
 {
-   /*
-* Each DDI port can have more than one value on the "DVO Port" field,
-* so look for all the possible values for each port.
-*/
-   static const int dvo_ports[][3] = {
-   [PORT_A] = { DVO_PORT_HDMIA, DVO_PORT_DPA, -1},
-   [PORT_B] = { DVO_PORT_HDMIB, DVO_PORT_DPB, -1},
-   [PORT_C] = { DVO_PORT_HDMIC, DVO_PORT_DPC, -1},
-   [PORT_D] = { DVO_PORT_HDMID, DVO_PORT_DPD, -1},
-   [PORT_E] = { DVO_PORT_CRT, DVO_PORT_HDMIE, DVO_PORT_DPE},
-   [PORT_F] = { DVO_PORT_HDMIF, DVO_PORT_DPF, -1},
-   [PORT_G] = { DVO_PORT_HDMIG, DVO_PORT_DPG, -1},
-   };
enum port port;
int i;
 
-   for (port = PORT_A; port < ARRAY_SIZE(dvo_ports); port++) {
-   for (i = 0; i < ARRAY_SIZE(dvo_ports[port]); i++) {
-   if (dvo_ports[port][i] == -1)
+   for (port = PORT_A; port < n_ports; port++) {
+   for (i = 0; i < n_dvo; i++) {
+   if (port_mapping[port][i] == -1)
break;
 
-   if (dvo_port == dvo_ports[port][i])
+   if (dvo_port == port_mapping[port][i])
return port;
}
}
@@ -1650,6 +1638,48 @@ static enum port dvo_port_to_port(u8 dvo_port)
return PORT_NONE;
 }
 
+static enum port dvo_port_to_port(struct drm_i915_private *dev_priv,
+ u8 dvo_port)
+{
+   /*
+* Each DDI port can have more than one value on the "DVO Port" field,
+* so look for all the possible values for each port.
+*/
+   static const int port_mapping[][3] = {
+   [PORT_A] = { DVO_PORT_HDMIA, DVO_PORT_DPA, -1 },
+   [PORT_B] = { DVO_PORT_HDMIB, DVO_PORT_DPB, -1 },
+   [PORT_C] = { DVO_PORT_HDMIC, DVO_PORT_DPC, -1 },
+   [PORT_D] = { DVO_PORT_HDMID, DVO_PORT_DPD, -1 },
+   [PORT_E] = { DVO_PORT_CRT, DVO_PORT_HDMIE, -1 },
+   [PORT_F] = { DVO_PORT_HDMIF, DVO_PORT_DPF, -1 },
+   [PORT_G] = { DVO_PORT_HDMIG, DVO_PORT_DPG, -1 },
+   };
+   /*
+* Bspec lists the ports as A, B, C, D - however internally in our
+* driver we keep them as PORT_A, PORT_B, PORT_D and PORT_E so the
+* registers in Display Engine match the right offsets. Apply the
+* mapping here to translate from VBT to internal convention.
+*/
+   static const int rkl_port_mapping[][3] 

[Intel-gfx] [PATCH 01/23] drm/i915/rkl: Add RKL platform info and PCI ids

2020-05-01 Thread Matt Roper
Introduce the basic platform definition, macros, and PCI IDs.

Bspec: 44501
Cc: Lucas De Marchi 
Cc: Caz Yokoyama 
Cc: Aditya Swarup 
Signed-off-by: Matt Roper 
---
 drivers/gpu/drm/i915/i915_drv.h  |  8 
 drivers/gpu/drm/i915/i915_pci.c  | 10 ++
 drivers/gpu/drm/i915/intel_device_info.c |  1 +
 drivers/gpu/drm/i915/intel_device_info.h |  1 +
 include/drm/i915_pciids.h|  9 +
 5 files changed, 29 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 6af69555733e..1ba77283123d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1406,6 +1406,7 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
 #define IS_ICELAKE(dev_priv)   IS_PLATFORM(dev_priv, INTEL_ICELAKE)
 #define IS_ELKHARTLAKE(dev_priv)   IS_PLATFORM(dev_priv, INTEL_ELKHARTLAKE)
 #define IS_TIGERLAKE(dev_priv) IS_PLATFORM(dev_priv, INTEL_TIGERLAKE)
+#define IS_ROCKETLAKE(dev_priv)IS_PLATFORM(dev_priv, INTEL_ROCKETLAKE)
 #define IS_HSW_EARLY_SDV(dev_priv) (IS_HASWELL(dev_priv) && \
(INTEL_DEVID(dev_priv) & 0xFF00) == 0x0C00)
 #define IS_BDW_ULT(dev_priv) \
@@ -1514,6 +1515,13 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
 #define IS_TGL_REVID(p, since, until) \
(IS_TIGERLAKE(p) && IS_REVID(p, since, until))
 
+#define RKL_REVID_A0   0x0
+#define RKL_REVID_B0   0x1
+#define RKL_REVID_C0   0x4
+
+#define IS_RKL_REVID(p, since, until) \
+   (IS_ROCKETLAKE(p) && IS_REVID(p, since, until))
+
 #define IS_LP(dev_priv)(INTEL_INFO(dev_priv)->is_lp)
 #define IS_GEN9_LP(dev_priv)   (IS_GEN(dev_priv, 9) && IS_LP(dev_priv))
 #define IS_GEN9_BC(dev_priv)   (IS_GEN(dev_priv, 9) && !IS_LP(dev_priv))
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 1faf9d6ec0a4..5a470bab2214 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -863,6 +863,15 @@ static const struct intel_device_info tgl_info = {
BIT(RCS0) | BIT(BCS0) | BIT(VECS0) | BIT(VCS0) | BIT(VCS2),
 };
 
+static const struct intel_device_info rkl_info = {
+   GEN12_FEATURES,
+   PLATFORM(INTEL_ROCKETLAKE),
+   .pipe_mask = BIT(PIPE_A) | BIT(PIPE_B) | BIT(PIPE_C),
+   .require_force_probe = 1,
+   .engine_mask =
+   BIT(RCS0) | BIT(BCS0) | BIT(VECS0) | BIT(VCS0),
+};
+
 #define GEN12_DGFX_FEATURES \
GEN12_FEATURES, \
.is_dgfx = 1
@@ -941,6 +950,7 @@ static const struct pci_device_id pciidlist[] = {
INTEL_ICL_11_IDS(&icl_info),
INTEL_EHL_IDS(&ehl_info),
INTEL_TGL_12_IDS(&tgl_info),
+   INTEL_RKL_IDS(&rkl_info),
{0, 0, 0}
 };
 MODULE_DEVICE_TABLE(pci, pciidlist);
diff --git a/drivers/gpu/drm/i915/intel_device_info.c 
b/drivers/gpu/drm/i915/intel_device_info.c
index 91bb7891c70c..9862c1185059 100644
--- a/drivers/gpu/drm/i915/intel_device_info.c
+++ b/drivers/gpu/drm/i915/intel_device_info.c
@@ -61,6 +61,7 @@ static const char * const platform_names[] = {
PLATFORM_NAME(ICELAKE),
PLATFORM_NAME(ELKHARTLAKE),
PLATFORM_NAME(TIGERLAKE),
+   PLATFORM_NAME(ROCKETLAKE),
 };
 #undef PLATFORM_NAME
 
diff --git a/drivers/gpu/drm/i915/intel_device_info.h 
b/drivers/gpu/drm/i915/intel_device_info.h
index 69c9257c6c6a..a126984cef7f 100644
--- a/drivers/gpu/drm/i915/intel_device_info.h
+++ b/drivers/gpu/drm/i915/intel_device_info.h
@@ -80,6 +80,7 @@ enum intel_platform {
INTEL_ELKHARTLAKE,
/* gen12 */
INTEL_TIGERLAKE,
+   INTEL_ROCKETLAKE,
INTEL_MAX_PLATFORMS
 };
 
diff --git a/include/drm/i915_pciids.h b/include/drm/i915_pciids.h
index 662d8351c87a..bc989de2aac2 100644
--- a/include/drm/i915_pciids.h
+++ b/include/drm/i915_pciids.h
@@ -605,4 +605,13 @@
INTEL_VGA_DEVICE(0x9AD9, info), \
INTEL_VGA_DEVICE(0x9AF8, info)
 
+/* RKL */
+#define INTEL_RKL_IDS(info) \
+   INTEL_VGA_DEVICE(0x4C80, info), \
+   INTEL_VGA_DEVICE(0x4C8A, info), \
+   INTEL_VGA_DEVICE(0x4C8B, info), \
+   INTEL_VGA_DEVICE(0x4C8C, info), \
+   INTEL_VGA_DEVICE(0x4C90, info), \
+   INTEL_VGA_DEVICE(0x4C9A, info)
+
 #endif /* _I915_PCIIDS_H */
-- 
2.24.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 14/23] drm/i915/rkl: Setup ports/phys

2020-05-01 Thread Matt Roper
RKL uses DDI's A, B, TC1, and TC2 which need to map to combo PHY's A-D.

Bspec: 49181
Cc: Imre Deak 
Cc: Aditya Swarup 
Cc: Lucas De Marchi 
Signed-off-by: Matt Roper 
---
 drivers/gpu/drm/i915/display/intel_display.c | 34 
 drivers/gpu/drm/i915/i915_reg.h  |  4 ++-
 2 files changed, 24 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_display.c 
b/drivers/gpu/drm/i915/display/intel_display.c
index ebbec5e5bf53..1ed6bb03379b 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -7212,30 +7212,33 @@ bool intel_phy_is_combo(struct drm_i915_private 
*dev_priv, enum phy phy)
 {
if (phy == PHY_NONE)
return false;
-
-   if (IS_ELKHARTLAKE(dev_priv))
+   else if (IS_ROCKETLAKE(dev_priv))
+   return phy <= PHY_D;
+   else if (IS_ELKHARTLAKE(dev_priv))
return phy <= PHY_C;
-
-   if (INTEL_GEN(dev_priv) >= 11)
+   else if (INTEL_GEN(dev_priv) >= 11)
return phy <= PHY_B;
-
-   return false;
+   else
+   return false;
 }
 
 bool intel_phy_is_tc(struct drm_i915_private *dev_priv, enum phy phy)
 {
-   if (INTEL_GEN(dev_priv) >= 12)
+   if (IS_ROCKETLAKE(dev_priv))
+   return false;
+   else if (INTEL_GEN(dev_priv) >= 12)
return phy >= PHY_D && phy <= PHY_I;
-
-   if (INTEL_GEN(dev_priv) >= 11 && !IS_ELKHARTLAKE(dev_priv))
+   else if (INTEL_GEN(dev_priv) >= 11 && !IS_ELKHARTLAKE(dev_priv))
return phy >= PHY_C && phy <= PHY_F;
-
-   return false;
+   else
+   return false;
 }
 
 enum phy intel_port_to_phy(struct drm_i915_private *i915, enum port port)
 {
-   if (IS_ELKHARTLAKE(i915) && port == PORT_D)
+   if (IS_ROCKETLAKE(i915) && port >= PORT_D)
+   return (enum phy)port - 1;
+   else if (IS_ELKHARTLAKE(i915) && port == PORT_D)
return PHY_A;
 
return (enum phy)port;
@@ -16692,7 +16695,12 @@ static void intel_setup_outputs(struct 
drm_i915_private *dev_priv)
if (!HAS_DISPLAY(dev_priv) || !INTEL_DISPLAY_ENABLED(dev_priv))
return;
 
-   if (INTEL_GEN(dev_priv) >= 12) {
+   if (IS_ROCKETLAKE(dev_priv)) {
+   intel_ddi_init(dev_priv, PORT_A);
+   intel_ddi_init(dev_priv, PORT_B);
+   intel_ddi_init(dev_priv, PORT_D);   /* DDI TC1 */
+   intel_ddi_init(dev_priv, PORT_E);   /* DDI TC2 */
+   } else if (INTEL_GEN(dev_priv) >= 12) {
intel_ddi_init(dev_priv, PORT_A);
intel_ddi_init(dev_priv, PORT_B);
intel_ddi_init(dev_priv, PORT_D);
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index f392ad61f1db..42941b01710d 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -1869,9 +1869,11 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
 #define _ICL_COMBOPHY_A0x162000
 #define _ICL_COMBOPHY_B0x6C000
 #define _EHL_COMBOPHY_C0x16
+#define _RKL_COMBOPHY_D0x161000
 #define _ICL_COMBOPHY(phy) _PICK(phy, _ICL_COMBOPHY_A, \
  _ICL_COMBOPHY_B, \
- _EHL_COMBOPHY_C)
+ _EHL_COMBOPHY_C, \
+ _RKL_COMBOPHY_D)
 
 /* CNL/ICL Port CL_DW registers */
 #define _ICL_PORT_CL_DW(dw, phy)   (_ICL_COMBOPHY(phy) + \
-- 
2.24.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 22/23] drm/i915/rkl: Disable PSR2

2020-05-01 Thread Matt Roper
From: José Roberto de Souza 

RKL doesn't have PSR2 HW tracking, it was replaced by software/manual
tracking.  The driver is required to track the areas that needs update
and program hardware to send selective updates.

So until the software tracking is implemented, PSR2 needs to be disabled
for platforms without PSR2 HW tracking.

BSpec: 50422
BSpec: 50424

Cc: Dhinakaran Pandiyan 
Cc: Rodrigo Vivi 
Signed-off-by: José Roberto de Souza 
Signed-off-by: Matt Roper 
---
 drivers/gpu/drm/i915/display/intel_psr.c | 15 +++
 drivers/gpu/drm/i915/i915_drv.h  |  2 ++
 drivers/gpu/drm/i915/i915_pci.c  |  3 +++
 drivers/gpu/drm/i915/intel_device_info.h |  1 +
 4 files changed, 21 insertions(+)

diff --git a/drivers/gpu/drm/i915/display/intel_psr.c 
b/drivers/gpu/drm/i915/display/intel_psr.c
index a0569fdfeb16..31a04570d262 100644
--- a/drivers/gpu/drm/i915/display/intel_psr.c
+++ b/drivers/gpu/drm/i915/display/intel_psr.c
@@ -678,6 +678,21 @@ static bool intel_psr2_config_valid(struct intel_dp 
*intel_dp,
return false;
}
 
+   /*
+* Some platforms lack PSR2 HW tracking and instead require manual
+* tracking by software.  In this case, the driver is required to track
+* the areas that need updates and program hardware to send selective
+* updates.
+*
+* So until the software tracking is implemented, PSR2 needs to be
+* disabled for platforms without PSR2 HW tracking.
+*/
+   if (!HAS_PSR_HW_TRACKING(dev_priv)) {
+   drm_dbg_kms(&dev_priv->drm,
+   "No PSR2 HW tracking in the platform\n");
+   return false;
+   }
+
/*
 * DSC and PSR2 cannot be enabled simultaneously. If a requested
 * resolution requires DSC to be enabled, priority is given to DSC
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 06802f2f1cd5..88b524399c8f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1618,6 +1618,8 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
 #define HAS_DDI(dev_priv)   (INTEL_INFO(dev_priv)->display.has_ddi)
 #define HAS_FPGA_DBG_UNCLAIMED(dev_priv) (INTEL_INFO(dev_priv)->has_fpga_dbg)
 #define HAS_PSR(dev_priv)   (INTEL_INFO(dev_priv)->display.has_psr)
+#define HAS_PSR_HW_TRACKING(dev_priv) \
+   (INTEL_INFO(dev_priv)->display.has_psr_hw_tracking)
 #define HAS_TRANSCODER(dev_priv, trans) 
((INTEL_INFO(dev_priv)->cpu_transcoder_mask & BIT(trans)) != 0)
 
 #define HAS_RC6(dev_priv)   (INTEL_INFO(dev_priv)->has_rc6)
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 5a470bab2214..2c3b0a7d577d 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -536,6 +536,7 @@ static const struct intel_device_info vlv_info = {
.display.has_ddi = 1, \
.has_fpga_dbg = 1, \
.display.has_psr = 1, \
+   .display.has_psr_hw_tracking = 1, \
.display.has_dp_mst = 1, \
.has_rc6p = 0 /* RC6p removed-by HSW */, \
HSW_PIPE_OFFSETS, \
@@ -690,6 +691,7 @@ static const struct intel_device_info skl_gt4_info = {
.display.has_fbc = 1, \
.display.has_hdcp = 1, \
.display.has_psr = 1, \
+   .display.has_psr_hw_tracking = 1, \
.has_runtime_pm = 1, \
.display.has_csr = 1, \
.has_rc6 = 1, \
@@ -868,6 +870,7 @@ static const struct intel_device_info rkl_info = {
PLATFORM(INTEL_ROCKETLAKE),
.pipe_mask = BIT(PIPE_A) | BIT(PIPE_B) | BIT(PIPE_C),
.require_force_probe = 1,
+   .display.has_psr_hw_tracking = 0,
.engine_mask =
BIT(RCS0) | BIT(BCS0) | BIT(VECS0) | BIT(VCS0),
 };
diff --git a/drivers/gpu/drm/i915/intel_device_info.h 
b/drivers/gpu/drm/i915/intel_device_info.h
index a126984cef7f..b336b50d3e0b 100644
--- a/drivers/gpu/drm/i915/intel_device_info.h
+++ b/drivers/gpu/drm/i915/intel_device_info.h
@@ -147,6 +147,7 @@ enum intel_ppgtt_type {
func(has_modular_fia); \
func(has_overlay); \
func(has_psr); \
+   func(has_psr_hw_tracking); \
func(overlay_needs_physical); \
func(supports_tv);
 
-- 
2.24.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 09/23] drm/i915/rkl: Program BW_BUDDY0 registers instead of BW_BUDDY1/2

2020-05-01 Thread Matt Roper
RKL uses the same BW_BUDDY programming table as TGL, but programs the
values into a single set BUDDY0 set of registers rather than the
BUDDY1/BUDDY2 sets used by TGL.

Bspec: 49218
Cc: Aditya Swarup 
Signed-off-by: Matt Roper 
---
 .../drm/i915/display/intel_display_power.c| 44 +++
 drivers/gpu/drm/i915/i915_reg.h   | 14 --
 2 files changed, 35 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_display_power.c 
b/drivers/gpu/drm/i915/display/intel_display_power.c
index 71691919d101..a83e1bc0e3a7 100644
--- a/drivers/gpu/drm/i915/display/intel_display_power.c
+++ b/drivers/gpu/drm/i915/display/intel_display_power.c
@@ -5249,7 +5249,7 @@ static void tgl_bw_buddy_init(struct drm_i915_private 
*dev_priv)
enum intel_dram_type type = dev_priv->dram_info.type;
u8 num_channels = dev_priv->dram_info.num_channels;
const struct buddy_page_mask *table;
-   int i;
+   int config, min_buddy, max_buddy, i;
 
if (IS_TGL_REVID(dev_priv, TGL_REVID_A0, TGL_REVID_B0))
/* Wa_1409767108: tgl */
@@ -5257,29 +5257,35 @@ static void tgl_bw_buddy_init(struct drm_i915_private 
*dev_priv)
else
table = tgl_buddy_page_masks;
 
-   for (i = 0; table[i].page_mask != 0; i++)
-   if (table[i].num_channels == num_channels &&
-   table[i].type == type)
+   if (IS_ROCKETLAKE(dev_priv)) {
+   min_buddy = max_buddy = 0;
+   } else {
+   min_buddy = 1;
+   max_buddy = 2;
+   }
+
+   for (config = 0; table[config].page_mask != 0; config++)
+   if (table[config].num_channels == num_channels &&
+   table[config].type == type)
break;
 
-   if (table[i].page_mask == 0) {
+   if (table[config].page_mask == 0) {
drm_dbg(&dev_priv->drm,
"Unknown memory configuration; disabling address buddy 
logic.\n");
-   intel_de_write(dev_priv, BW_BUDDY1_CTL, BW_BUDDY_DISABLE);
-   intel_de_write(dev_priv, BW_BUDDY2_CTL, BW_BUDDY_DISABLE);
+   for (i = min_buddy; i <= max_buddy; i++)
+   intel_de_write(dev_priv, BW_BUDDY_CTL(i),
+  BW_BUDDY_DISABLE);
} else {
-   intel_de_write(dev_priv, BW_BUDDY1_PAGE_MASK,
-  table[i].page_mask);
-   intel_de_write(dev_priv, BW_BUDDY2_PAGE_MASK,
-  table[i].page_mask);
-
-   /* Wa_22010178259:tgl */
-   intel_de_rmw(dev_priv, BW_BUDDY1_CTL,
-BW_BUDDY_TLB_REQ_TIMER_MASK,
-REG_FIELD_PREP(BW_BUDDY_TLB_REQ_TIMER_MASK, 0x8));
-   intel_de_rmw(dev_priv, BW_BUDDY2_CTL,
-BW_BUDDY_TLB_REQ_TIMER_MASK,
-REG_FIELD_PREP(BW_BUDDY_TLB_REQ_TIMER_MASK, 0x8));
+   for (i = min_buddy; i <= max_buddy; i++) {
+   intel_de_write(dev_priv, BW_BUDDY_PAGE_MASK(i),
+  table[config].page_mask);
+
+   /* Wa_22010178259:tgl,rkl */
+   intel_de_rmw(dev_priv, BW_BUDDY_CTL(i),
+BW_BUDDY_TLB_REQ_TIMER_MASK,
+REG_FIELD_PREP(BW_BUDDY_TLB_REQ_TIMER_MASK,
+   0x8));
+   }
}
 }
 
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 59c1d527cf13..2266f9fc2d79 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -7832,13 +7832,19 @@ enum {
 #define  WAIT_FOR_PCH_RESET_ACK(1 << 1)
 #define  WAIT_FOR_PCH_FLR_ACK  (1 << 0)
 
-#define BW_BUDDY1_CTL  _MMIO(0x45140)
-#define BW_BUDDY2_CTL  _MMIO(0x45150)
+#define _BW_BUDDY0_CTL 0x45130
+#define _BW_BUDDY1_CTL 0x45140
+#define BW_BUDDY_CTL(x)_MMIO(_PICK_EVEN(x, \
+_BW_BUDDY0_CTL, \
+_BW_BUDDY1_CTL))
 #define   BW_BUDDY_DISABLE REG_BIT(31)
 #define   BW_BUDDY_TLB_REQ_TIMER_MASK  REG_GENMASK(21, 16)
 
-#define BW_BUDDY1_PAGE_MASK_MMIO(0x45144)
-#define BW_BUDDY2_PAGE_MASK_MMIO(0x45154)
+#define _BW_BUDDY0_PAGE_MASK   0x45134
+#define _BW_BUDDY1_PAGE_MASK   0x45144
+#define BW_BUDDY_PAGE_MASK(x)  _MMIO(_PICK_EVEN(x, \
+_BW_BUDDY0_PAGE_MASK, \
+_BW_BUDDY1_PAGE_MASK))
 
 #define HSW_NDE_RSTWRN_OPT _MMIO(0x46408)
 #define  RESET_PCH_HANDSHAKE_ENABLE(1 << 4)
-- 
2.24.1


[Intel-gfx] [PATCH 23/23] drm/i915/rkl: Add initial workarounds

2020-05-01 Thread Matt Roper
RKL and TGL share some general gen12 workarounds, but each platform also
has its own platform-specific workarounds.

Cc: Matt Atwood 
Signed-off-by: Matt Roper 
---
 drivers/gpu/drm/i915/display/intel_sprite.c |  5 +-
 drivers/gpu/drm/i915/gt/intel_workarounds.c | 88 +
 2 files changed, 59 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_sprite.c 
b/drivers/gpu/drm/i915/display/intel_sprite.c
index 571c36f929bd..20eea81118da 100644
--- a/drivers/gpu/drm/i915/display/intel_sprite.c
+++ b/drivers/gpu/drm/i915/display/intel_sprite.c
@@ -2842,8 +2842,9 @@ static bool skl_plane_format_mod_supported(struct 
drm_plane *_plane,
 static bool gen12_plane_supports_mc_ccs(struct drm_i915_private *dev_priv,
enum plane_id plane_id)
 {
-   /* Wa_14010477008:tgl[a0..c0] */
-   if (IS_TGL_REVID(dev_priv, TGL_REVID_A0, TGL_REVID_C0))
+   /* Wa_14010477008:tgl[a0..c0],rkl[all] */
+   if (IS_ROCKETLAKE(dev_priv) ||
+   IS_TGL_REVID(dev_priv, TGL_REVID_A0, TGL_REVID_C0))
return false;
 
return plane_id < PLANE_SPRITE4;
diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c 
b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index adddc5c93b48..d309af394b53 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -586,8 +586,8 @@ static void icl_ctx_workarounds_init(struct intel_engine_cs 
*engine,
wa_masked_en(wal, GEN9_ROW_CHICKEN4, GEN11_DIS_PICK_2ND_EU);
 }
 
-static void tgl_ctx_workarounds_init(struct intel_engine_cs *engine,
-struct i915_wa_list *wal)
+static void gen12_ctx_workarounds_init(struct intel_engine_cs *engine,
+  struct i915_wa_list *wal)
 {
/*
 * Wa_1409142259:tgl
@@ -597,12 +597,28 @@ static void tgl_ctx_workarounds_init(struct 
intel_engine_cs *engine,
 * Wa_1409207793:tgl
 * Wa_1409178076:tgl
 * Wa_1408979724:tgl
+* Wa_14010443199:rkl
+* Wa_14010698770:rkl
 */
WA_SET_BIT_MASKED(GEN11_COMMON_SLICE_CHICKEN3,
  GEN12_DISABLE_CPS_AWARE_COLOR_PIPE);
 
+   /* WaDisableGPGPUMidThreadPreemption:gen12 */
+   WA_SET_FIELD_MASKED(GEN8_CS_CHICKEN1,
+   GEN9_PREEMPT_GPGPU_LEVEL_MASK,
+   GEN9_PREEMPT_GPGPU_THREAD_GROUP_LEVEL);
+}
+
+static void tgl_ctx_workarounds_init(struct intel_engine_cs *engine,
+struct i915_wa_list *wal)
+{
+   gen12_ctx_workarounds_init(engine, wal);
+
/*
-* Wa_1604555607:gen12 and Wa_1608008084:gen12
+* Wa_1604555607:tgl
+*
+* Note that the implementation of this workaround is further modified
+* according to the FF_MODE2 guidance given by Wa_1608008084:gen12.
 * FF_MODE2 register will return the wrong value when read. The default
 * value for this register is zero for all fields and there are no bit
 * masks. So instead of doing a RMW we should just write the TDS timer
@@ -610,11 +626,6 @@ static void tgl_ctx_workarounds_init(struct 
intel_engine_cs *engine,
 */
wa_add(wal, FF_MODE2, FF_MODE2_TDS_TIMER_MASK,
   FF_MODE2_TDS_TIMER_128, 0);
-
-   /* WaDisableGPGPUMidThreadPreemption:tgl */
-   WA_SET_FIELD_MASKED(GEN8_CS_CHICKEN1,
-   GEN9_PREEMPT_GPGPU_LEVEL_MASK,
-   GEN9_PREEMPT_GPGPU_THREAD_GROUP_LEVEL);
 }
 
 static void
@@ -629,8 +640,10 @@ __intel_engine_init_ctx_wa(struct intel_engine_cs *engine,
 
wa_init_start(wal, name, engine->name);
 
-   if (IS_GEN(i915, 12))
+   if (IS_TIGERLAKE(i915))
tgl_ctx_workarounds_init(engine, wal);
+   else if (IS_GEN(i915, 12))
+   gen12_ctx_workarounds_init(engine, wal);
else if (IS_GEN(i915, 11))
icl_ctx_workarounds_init(engine, wal);
else if (IS_CANNONLAKE(i915))
@@ -941,9 +954,16 @@ icl_gt_workarounds_init(struct drm_i915_private *i915, 
struct i915_wa_list *wal)
 }
 
 static void
-tgl_gt_workarounds_init(struct drm_i915_private *i915, struct i915_wa_list 
*wal)
+gen12_gt_workarounds_init(struct drm_i915_private *i915,
+ struct i915_wa_list *wal)
 {
wa_init_mcr(i915, wal);
+}
+
+static void
+tgl_gt_workarounds_init(struct drm_i915_private *i915, struct i915_wa_list 
*wal)
+{
+   gen12_gt_workarounds_init(i915, wal);
 
/* Wa_1409420604:tgl */
if (IS_TGL_REVID(i915, TGL_REVID_A0, TGL_REVID_A0))
@@ -961,8 +981,10 @@ tgl_gt_workarounds_init(struct drm_i915_private *i915, 
struct i915_wa_list *wal)
 static void
 gt_init_workarounds(struct drm_i915_private *i915, struct i915_wa_list *wal)
 {
-   if (IS_GEN(i915, 12))
+   if (IS_TIGERLAKE(i915))
tgl_gt_workarounds_init(i915, wal);
+   else if (

[Intel-gfx] [PATCH 08/23] drm/i915/rkl: Add power well support

2020-05-01 Thread Matt Roper
RKL power wells are similar to TGL power wells, but have some important
differences:

 * PG1 now has pipe A's VDSC (rather than sticking it in PG2)
 * PG2 no longer exists
 * DDI-C (aka TC-1) moves from PG1 -> PG3
 * PG5 no longer exists due to the lack of a fourth pipe

Also note that what we refer to as 'DDI-C' and 'DDI-D' need to actually
be programmed as TC-1 and TC-2 even though this platform doesn't have TC
outputs.

Bspec: 49234
Cc: Imre Deak 
Cc: Lucas De Marchi 
Cc: Anshuman Gupta 
Signed-off-by: Matt Roper 
---
 .../drm/i915/display/intel_display_power.c| 185 +-
 drivers/gpu/drm/i915/display/intel_vdsc.c |   4 +-
 2 files changed, 186 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_display_power.c 
b/drivers/gpu/drm/i915/display/intel_display_power.c
index 49998906cc61..71691919d101 100644
--- a/drivers/gpu/drm/i915/display/intel_display_power.c
+++ b/drivers/gpu/drm/i915/display/intel_display_power.c
@@ -2913,6 +2913,53 @@ void intel_display_power_put(struct drm_i915_private 
*dev_priv,
BIT_ULL(POWER_DOMAIN_AUX_I_TBT) |   \
BIT_ULL(POWER_DOMAIN_TC_COLD_OFF))
 
+#define RKL_PW_4_POWER_DOMAINS (   \
+   BIT_ULL(POWER_DOMAIN_PIPE_C) |  \
+   BIT_ULL(POWER_DOMAIN_PIPE_C_PANEL_FITTER) | \
+   BIT_ULL(POWER_DOMAIN_TRANSCODER_C) |\
+   BIT_ULL(POWER_DOMAIN_INIT))
+
+#define RKL_PW_3_POWER_DOMAINS (   \
+   RKL_PW_4_POWER_DOMAINS |\
+   BIT_ULL(POWER_DOMAIN_PIPE_B) |  \
+   BIT_ULL(POWER_DOMAIN_PIPE_B_PANEL_FITTER) | \
+   BIT_ULL(POWER_DOMAIN_AUDIO) |   \
+   BIT_ULL(POWER_DOMAIN_VGA) | \
+   BIT_ULL(POWER_DOMAIN_TRANSCODER_B) |\
+   BIT_ULL(POWER_DOMAIN_PORT_DDI_D_LANES) |\
+   BIT_ULL(POWER_DOMAIN_PORT_DDI_E_LANES) |\
+   BIT_ULL(POWER_DOMAIN_AUX_D) |   \
+   BIT_ULL(POWER_DOMAIN_AUX_E) |   \
+   BIT_ULL(POWER_DOMAIN_INIT))
+
+/*
+ * There is no PW_2/PG_2 on RKL.
+ *
+ * RKL PW_1/PG_1 domains (under HW/DMC control):
+ * - DBUF function (note: registers are in PW0)
+ * - PIPE_A and its planes and VDSC/joining, except VGA
+ * - transcoder A
+ * - DDI_A and DDI_B
+ * - FBC
+ *
+ * RKL PW_0/PG_0 domains (under HW/DMC control):
+ * - PCI
+ * - clocks except port PLL
+ * - shared functions:
+ * * interrupts except pipe interrupts
+ * * MBus except PIPE_MBUS_DBOX_CTL
+ * * DBUF registers
+ * - central power except FBC
+ * - top-level GTC (DDI-level GTC is in the well associated with the DDI)
+ */
+
+#define RKL_DISPLAY_DC_OFF_POWER_DOMAINS ( \
+   RKL_PW_3_POWER_DOMAINS |\
+   BIT_ULL(POWER_DOMAIN_MODESET) | \
+   BIT_ULL(POWER_DOMAIN_AUX_A) |   \
+   BIT_ULL(POWER_DOMAIN_AUX_B) |   \
+   BIT_ULL(POWER_DOMAIN_INIT))
+
 static const struct i915_power_well_ops i9xx_always_on_power_well_ops = {
.sync_hw = i9xx_power_well_sync_hw_noop,
.enable = i9xx_always_on_power_well_noop,
@@ -4283,6 +4330,140 @@ static const struct i915_power_well_desc 
tgl_power_wells[] = {
},
 };
 
+static const struct i915_power_well_desc rkl_power_wells[] = {
+   {
+   .name = "always-on",
+   .always_on = true,
+   .domains = POWER_DOMAIN_MASK,
+   .ops = &i9xx_always_on_power_well_ops,
+   .id = DISP_PW_ID_NONE,
+   },
+   {
+   .name = "power well 1",
+   /* Handled by the DMC firmware */
+   .always_on = true,
+   .domains = 0,
+   .ops = &hsw_power_well_ops,
+   .id = SKL_DISP_PW_1,
+   {
+   .hsw.regs = &hsw_power_well_regs,
+   .hsw.idx = ICL_PW_CTL_IDX_PW_1,
+   .hsw.has_fuses = true,
+   },
+   },
+   {
+   .name = "DC off",
+   .domains = RKL_DISPLAY_DC_OFF_POWER_DOMAINS,
+   .ops = &gen9_dc_off_power_well_ops,
+   .id = SKL_DISP_DC_OFF,
+   },
+   {
+   .name = "power well 3",
+   .domains = RKL_PW_3_POWER_DOMAINS,
+   .ops = &hsw_power_well_ops,
+   .id = ICL_DISP_PW_3,
+   {
+   .hsw.regs = &hsw_power_well_regs,
+   .hsw.idx = ICL_PW_CTL_IDX_PW_3,
+   .hsw.irq_pipe_mask = BIT(PIPE_B),
+   .hsw.has_vga = true,
+   .hsw.has_fuses = true,
+   },
+   },
+   {
+   .name = "power well 4",
+   .domains = RKL_PW_4_POWER_DOMAINS,
+   .ops = &hsw_power_well_ops,
+   .id = DISP_PW_ID_NONE,
+   {
+   .hsw.regs = &hs

[Intel-gfx] [PATCH 20/23] drm/i915/rkl: Add DPLL4 support

2020-05-01 Thread Matt Roper
Rocket Lake has a third DPLL (called 'DPLL4') that must be used to
enable a third display.  Unlike EHL's variant of DPLL4, the RKL variant
behaves the same as DPLL0/1.  And despite its name, the DPLL4 registers
are offset as if it were DPLL2, so no extra offset handling is needed
either.

Bspec: 49202
Bspec: 49443
Bspec: 50288
Bspec: 50289
Cc: Lucas De Marchi 
Signed-off-by: Matt Roper 
---
 drivers/gpu/drm/i915/display/intel_dpll_mgr.c | 28 +--
 1 file changed, 25 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_dpll_mgr.c 
b/drivers/gpu/drm/i915/display/intel_dpll_mgr.c
index b45185b80bec..196d9eb3a77b 100644
--- a/drivers/gpu/drm/i915/display/intel_dpll_mgr.c
+++ b/drivers/gpu/drm/i915/display/intel_dpll_mgr.c
@@ -3506,13 +3506,19 @@ static bool icl_get_combo_phy_dpll(struct 
intel_atomic_state *state,
return false;
}
 
-   if (IS_ELKHARTLAKE(dev_priv) && port != PORT_A)
+   if (IS_ROCKETLAKE(dev_priv)) {
dpll_mask =
BIT(DPLL_ID_EHL_DPLL4) |
BIT(DPLL_ID_ICL_DPLL1) |
BIT(DPLL_ID_ICL_DPLL0);
-   else
+   } else if (IS_ELKHARTLAKE(dev_priv) && port != PORT_A) {
+   dpll_mask =
+   BIT(DPLL_ID_EHL_DPLL4) |
+   BIT(DPLL_ID_ICL_DPLL1) |
+   BIT(DPLL_ID_ICL_DPLL0);
+   } else {
dpll_mask = BIT(DPLL_ID_ICL_DPLL1) | BIT(DPLL_ID_ICL_DPLL0);
+   }
 
port_dpll->pll = intel_find_shared_dpll(state, crtc,
&port_dpll->hw_state,
@@ -4275,6 +4281,20 @@ static const struct intel_dpll_mgr tgl_pll_mgr = {
.dump_hw_state = icl_dump_hw_state,
 };
 
+static const struct dpll_info rkl_plls[] = {
+   { "DPLL 0", &combo_pll_funcs, DPLL_ID_ICL_DPLL0, 0 },
+   { "DPLL 1", &combo_pll_funcs, DPLL_ID_ICL_DPLL1, 0 },
+   { "DPLL 4", &combo_pll_funcs, DPLL_ID_EHL_DPLL4, 0 },
+   { },
+};
+
+static const struct intel_dpll_mgr rkl_pll_mgr = {
+   .dpll_info = rkl_plls,
+   .get_dplls = icl_get_dplls,
+   .put_dplls = icl_put_dplls,
+   .dump_hw_state = icl_dump_hw_state,
+};
+
 /**
  * intel_shared_dpll_init - Initialize shared DPLLs
  * @dev: drm device
@@ -4288,7 +4308,9 @@ void intel_shared_dpll_init(struct drm_device *dev)
const struct dpll_info *dpll_info;
int i;
 
-   if (INTEL_GEN(dev_priv) >= 12)
+   if (IS_ROCKETLAKE(dev_priv))
+   dpll_mgr = &rkl_pll_mgr;
+   else if (INTEL_GEN(dev_priv) >= 12)
dpll_mgr = &tgl_pll_mgr;
else if (IS_ELKHARTLAKE(dev_priv))
dpll_mgr = &ehl_pll_mgr;
-- 
2.24.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 17/23] drm/i915/rkl: Don't try to access transcoder D

2020-05-01 Thread Matt Roper
There are a couple places in our driver that loop over transcoders A..D
for gen11+; since RKL only has three pipes/transcoders, this can lead to
unclaimed register reads/writes.  We should add checks for transcoder
existence where appropriate.

Cc: Aditya Swarup 
Signed-off-by: Matt Roper 
---
 drivers/gpu/drm/i915/display/intel_display.c | 3 +++
 drivers/gpu/drm/i915/i915_irq.c  | 6 ++
 2 files changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/i915/display/intel_display.c 
b/drivers/gpu/drm/i915/display/intel_display.c
index 1ed6bb03379b..39d95dc14546 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -11007,6 +11007,9 @@ static bool bxt_get_dsi_transcoder_state(struct 
intel_crtc *crtc,
else
cpu_transcoder = TRANSCODER_DSI_C;
 
+   if (!HAS_TRANSCODER(dev_priv, cpu_transcoder))
+   continue;
+
power_domain = POWER_DOMAIN_TRANSCODER(cpu_transcoder);
drm_WARN_ON(dev, *power_domain_mask & BIT_ULL(power_domain));
 
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 622986759ec6..1381cb530c2f 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2849,6 +2849,9 @@ static void gen11_display_irq_reset(struct 
drm_i915_private *dev_priv)
for (trans = TRANSCODER_A; trans <= TRANSCODER_D; trans++) {
enum intel_display_power_domain domain;
 
+   if (!HAS_TRANSCODER(dev_priv, trans))
+   continue;
+
domain = POWER_DOMAIN_TRANSCODER(trans);
if (!intel_display_power_is_enabled(dev_priv, domain))
continue;
@@ -3399,6 +3402,9 @@ static void gen8_de_irq_postinstall(struct 
drm_i915_private *dev_priv)
for (trans = TRANSCODER_A; trans <= TRANSCODER_D; trans++) {
enum intel_display_power_domain domain;
 
+   if (!HAS_TRANSCODER(dev_priv, trans))
+   continue;
+
domain = POWER_DOMAIN_TRANSCODER(trans);
if (!intel_display_power_is_enabled(dev_priv, domain))
continue;
-- 
2.24.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 18/23] drm/i915/rkl: Don't try to read out DSI transcoders

2020-05-01 Thread Matt Roper
From: Aditya Swarup 

RKL doesn't have DSI outputs, so we shouldn't try to read out the DSI
transcoder registers.

Signed-off-by: Aditya Swarup 
Signed-off-by: Matt Roper 
---
 drivers/gpu/drm/i915/display/intel_display.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/display/intel_display.c 
b/drivers/gpu/drm/i915/display/intel_display.c
index 39d95dc14546..87defd30af1f 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -10901,7 +10901,7 @@ static bool hsw_get_transcoder_state(struct intel_crtc 
*crtc,
intel_wakeref_t wf;
u32 tmp;
 
-   if (INTEL_GEN(dev_priv) >= 11)
+   if (!IS_ROCKETLAKE(dev_priv) && INTEL_GEN(dev_priv) >= 11)
panel_transcoder_mask |=
BIT(TRANSCODER_DSI_0) | BIT(TRANSCODER_DSI_1);
 
-- 
2.24.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 03/23] drm/i915/rkl: Re-use TGL GuC/HuC firmware

2020-05-01 Thread Matt Roper
RKL uses the same GuC and HuC as TGL and should load the same firmwares.

Bspec: 50668
Cc: Anusha Srivatsa 
Signed-off-by: Matt Roper 
---
 drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c 
b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
index e1caae93996d..9b6218128d09 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
@@ -47,8 +47,11 @@ void intel_uc_fw_change_status(struct intel_uc_fw *uc_fw,
  * TGL 35.2 is interface-compatible with 33.0 for previous Gens. The deltas
  * between 33.0 and 35.2 are only related to new additions to support new Gen12
  * features.
+ *
+ * Note that RKL uses the same firmware as TGL.
  */
 #define INTEL_UC_FIRMWARE_DEFS(fw_def, guc_def, huc_def) \
+   fw_def(ROCKETLAKE,  0, guc_def(tgl, 35, 2, 0), huc_def(tgl,  7, 0, 12)) 
\
fw_def(TIGERLAKE,   0, guc_def(tgl, 35, 2, 0), huc_def(tgl,  7, 0, 12)) 
\
fw_def(ELKHARTLAKE, 0, guc_def(ehl, 33, 0, 4), huc_def(ehl,  9, 0, 0)) \
fw_def(ICELAKE, 0, guc_def(icl, 33, 0, 0), huc_def(icl,  9, 0, 0)) \
-- 
2.24.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 16/23] drm/i915/rkl: Add DDC pin mapping

2020-05-01 Thread Matt Roper
The pin mapping for the final two outputs varies according to which PCH
is present on the platform:  with TGP the pins are remapped into the TC
range, whereas with CMP they stay in the traditional combo output range.

Bspec: 49181
Cc: Aditya Swarup 
Signed-off-by: Matt Roper 
---
 drivers/gpu/drm/i915/display/intel_hdmi.c | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/display/intel_hdmi.c 
b/drivers/gpu/drm/i915/display/intel_hdmi.c
index 010f37240710..a31a98d26882 100644
--- a/drivers/gpu/drm/i915/display/intel_hdmi.c
+++ b/drivers/gpu/drm/i915/display/intel_hdmi.c
@@ -3082,6 +3082,24 @@ static u8 mcc_port_to_ddc_pin(struct drm_i915_private 
*dev_priv, enum port port)
return ddc_pin;
 }
 
+static u8 rkl_port_to_ddc_pin(struct drm_i915_private *dev_priv, enum port 
port)
+{
+   enum phy phy = intel_port_to_phy(dev_priv, port);
+
+   WARN_ON(port == PORT_C);
+
+   /*
+* Pin mapping for RKL depends on which PCH is present.  With TGP, the
+* final two outputs use type-c pins, even though they're actually
+* combo outputs.  With CMP, the traditional DDI A-D pins are used for
+* all outputs.
+*/
+   if (INTEL_PCH_TYPE(dev_priv) >= PCH_TGP && phy >= PHY_C)
+   return GMBUS_PIN_9_TC1_ICP + phy - PHY_C;
+
+   return GMBUS_PIN_1_BXT + phy;
+}
+
 static u8 g4x_port_to_ddc_pin(struct drm_i915_private *dev_priv,
  enum port port)
 {
@@ -3119,7 +3137,9 @@ static u8 intel_hdmi_ddc_pin(struct intel_encoder 
*encoder)
return ddc_pin;
}
 
-   if (HAS_PCH_MCC(dev_priv))
+   if (IS_ROCKETLAKE(dev_priv))
+   ddc_pin = rkl_port_to_ddc_pin(dev_priv, port);
+   else if (HAS_PCH_MCC(dev_priv))
ddc_pin = mcc_port_to_ddc_pin(dev_priv, port);
else if (INTEL_PCH_TYPE(dev_priv) >= PCH_ICP)
ddc_pin = icl_port_to_ddc_pin(dev_priv, port);
-- 
2.24.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 03/23] drm/i915/rkl: Re-use TGL GuC/HuC firmware

2020-05-01 Thread Srivatsa, Anusha



> -Original Message-
> From: Roper, Matthew D 
> Sent: Friday, May 1, 2020 10:37 PM
> To: intel-gfx@lists.freedesktop.org
> Cc: Roper, Matthew D ; Srivatsa, Anusha
> 
> Subject: [PATCH 03/23] drm/i915/rkl: Re-use TGL GuC/HuC firmware
> 
> RKL uses the same GuC and HuC as TGL and should load the same firmwares.
> 
> Bspec: 50668
> Cc: Anusha Srivatsa 
> Signed-off-by: Matt Roper 
Reviewed-by: Anusha Srivatsa 

> ---
>  drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
> b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
> index e1caae93996d..9b6218128d09 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
> @@ -47,8 +47,11 @@ void intel_uc_fw_change_status(struct intel_uc_fw
> *uc_fw,
>   * TGL 35.2 is interface-compatible with 33.0 for previous Gens. The deltas
>   * between 33.0 and 35.2 are only related to new additions to support new
> Gen12
>   * features.
> + *
> + * Note that RKL uses the same firmware as TGL.
>   */
>  #define INTEL_UC_FIRMWARE_DEFS(fw_def, guc_def, huc_def) \
> + fw_def(ROCKETLAKE,  0, guc_def(tgl, 35, 2, 0), huc_def(tgl,  7, 0, 12)) 
> \
>   fw_def(TIGERLAKE,   0, guc_def(tgl, 35, 2, 0), huc_def(tgl,  7, 0, 12)) 
> \
>   fw_def(ELKHARTLAKE, 0, guc_def(ehl, 33, 0, 4), huc_def(ehl,  9, 0, 0)) \
>   fw_def(ICELAKE, 0, guc_def(icl, 33, 0, 0), huc_def(icl,  9, 0, 0)) \
> --
> 2.24.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915: check to see if SIMD registers are available before using SIMD

2020-05-01 Thread Patchwork
== Series Details ==

Series: drm/i915: check to see if SIMD registers are available before using SIMD
URL   : https://patchwork.freedesktop.org/series/76825/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
962131944f1f drm/i915: check to see if SIMD registers are available before 
using SIMD
-:38: CHECK:LINE_SPACING: Please don't use multiple blank lines
#38: FILE: drivers/gpu/drm/i915/i915_memcpy.c:46:
+
+

total: 0 errors, 0 warnings, 1 checks, 30 lines checked

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 04/23] drm/i915/rkl: Load DMC firmware for Rocket Lake

2020-05-01 Thread Srivatsa, Anusha



> -Original Message-
> From: Roper, Matthew D 
> Sent: Friday, May 1, 2020 10:37 PM
> To: intel-gfx@lists.freedesktop.org
> Cc: Roper, Matthew D ; Srivatsa, Anusha
> 
> Subject: [PATCH 04/23] drm/i915/rkl: Load DMC firmware for Rocket Lake
> 
> Cc: Anusha Srivatsa 
> Signed-off-by: Matt Roper 
Reviewed-by: Anusha Srivatsa 

> ---
>  drivers/gpu/drm/i915/display/intel_csr.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_csr.c
> b/drivers/gpu/drm/i915/display/intel_csr.c
> index 3112572cfb7d..319932b03e88 100644
> --- a/drivers/gpu/drm/i915/display/intel_csr.c
> +++ b/drivers/gpu/drm/i915/display/intel_csr.c
> @@ -40,6 +40,10 @@
> 
>  #define GEN12_CSR_MAX_FW_SIZEICL_CSR_MAX_FW_SIZE
> 
> +#define RKL_CSR_PATH "i915/rkl_dmc_ver2_01.bin"
> +#define RKL_CSR_VERSION_REQUIRED CSR_VERSION(2, 1)
> +MODULE_FIRMWARE(RKL_CSR_PATH);
> +
>  #define TGL_CSR_PATH "i915/tgl_dmc_ver2_06.bin"
>  #define TGL_CSR_VERSION_REQUIRED CSR_VERSION(2, 6)
>  #define TGL_CSR_MAX_FW_SIZE  0x6000
> @@ -682,7 +686,11 @@ void intel_csr_ucode_init(struct drm_i915_private
> *dev_priv)
>*/
>   intel_csr_runtime_pm_get(dev_priv);
> 
> - if (INTEL_GEN(dev_priv) >= 12) {
> + if (IS_ROCKETLAKE(dev_priv)) {
> + csr->fw_path = RKL_CSR_PATH;
> + csr->required_version = RKL_CSR_VERSION_REQUIRED;
> + csr->max_fw_size = GEN12_CSR_MAX_FW_SIZE;
> + } else if (INTEL_GEN(dev_priv) >= 12) {
>   csr->fw_path = TGL_CSR_PATH;
>   csr->required_version = TGL_CSR_VERSION_REQUIRED;
>   /* Allow to load fw via parameter using the last known size */
> --
> 2.24.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


  1   2   >