Re: [PATCH 04/51] drm/i915/guc: Implement GuC submission tasklet

2021-07-19 Thread John Harrison

On 7/16/2021 13:16, Matthew Brost wrote:

Implement GuC submission tasklet for new interface. The new GuC
interface uses H2G to submit contexts to the GuC. Since H2G use a single
channel, a single tasklet submits is used for the submission path.
This still needs fixing - 'a single tasklet submits is used' is not 
valid English.


It also seems that the idea of splitting all the deletes of old code 
into a separate patch didn't happen. It really does obfuscate things 
significantly having completely unrelated deletes and adds interspersed :(.


John.




Also the per engine interrupt handler has been updated to disable the
rescheduling of the physical engine tasklet, when using GuC scheduling,
as the physical engine tasklet is no longer used.

In this patch the field, guc_id, has been added to intel_context and is
not assigned. Patches later in the series will assign this value.

v2:
  (John Harrison)
   - Clean up some comments

Cc: John Harrison 
Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/intel_context_types.h |   9 +
  drivers/gpu/drm/i915/gt/uc/intel_guc.h|   4 +
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 231 +-
  3 files changed, 127 insertions(+), 117 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 90026c177105..6d99631d19b9 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -137,6 +137,15 @@ struct intel_context {
struct intel_sseu sseu;
  
  	u8 wa_bb_page; /* if set, page num reserved for context workarounds */

+
+   /* GuC scheduling state flags that do not require a lock. */
+   atomic_t guc_sched_state_no_lock;
+
+   /*
+* GuC LRC descriptor ID - Not assigned in this patch but future patches
+* in the series will.
+*/
+   u16 guc_id;
  };
  
  #endif /* __INTEL_CONTEXT_TYPES__ */

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 35783558d261..8c7b92f699f1 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -30,6 +30,10 @@ struct intel_guc {
struct intel_guc_log log;
struct intel_guc_ct ct;
  
+	/* Global engine used to submit requests to GuC */

+   struct i915_sched_engine *sched_engine;
+   struct i915_request *stalled_request;
+
/* intel_guc_recv interrupt related state */
spinlock_t irq_lock;
unsigned int msg_enabled_mask;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 23a94a896a0b..ca0717166a27 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -60,6 +60,31 @@
  
  #define GUC_REQUEST_SIZE 64 /* bytes */
  
+/*

+ * Below is a set of functions which control the GuC scheduling state which do
+ * not require a lock as all state transitions are mutually exclusive. i.e. It
+ * is not possible for the context pinning code and submission, for the same
+ * context, to be executing simultaneously. We still need an atomic as it is
+ * possible for some of the bits to changing at the same time though.
+ */
+#define SCHED_STATE_NO_LOCK_ENABLEDBIT(0)
+static inline bool context_enabled(struct intel_context *ce)
+{
+   return (atomic_read(&ce->guc_sched_state_no_lock) &
+   SCHED_STATE_NO_LOCK_ENABLED);
+}
+
+static inline void set_context_enabled(struct intel_context *ce)
+{
+   atomic_or(SCHED_STATE_NO_LOCK_ENABLED, &ce->guc_sched_state_no_lock);
+}
+
+static inline void clr_context_enabled(struct intel_context *ce)
+{
+   atomic_and((u32)~SCHED_STATE_NO_LOCK_ENABLED,
+  &ce->guc_sched_state_no_lock);
+}
+
  static inline struct i915_priolist *to_priolist(struct rb_node *rb)
  {
return rb_entry(rb, struct i915_priolist, node);
@@ -122,37 +147,29 @@ static inline void set_lrc_desc_registered(struct 
intel_guc *guc, u32 id,
xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
  }
  
-static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)

+static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
  {
-   /* Leaving stub as this function will be used in future patches */
-}
+   int err;
+   struct intel_context *ce = rq->context;
+   u32 action[3];
+   int len = 0;
+   bool enabled = context_enabled(ce);
  
-/*

- * When we're doing submissions using regular execlists backend, writing to
- * ELSP from CPU side is enough to make sure that writes to ringbuffer pages
- * pinned in mappable aperture portion of GGTT are visible to command streamer.
- * Writes done by GuC on our behalf are not guaranteeing such ordering,
- * therefore, to ensure the flush, we're issuing a POSTING RE

Re: [PATCH 06/51] drm/i915/guc: Implement GuC context operations for new inteface

2021-07-19 Thread John Harrison

On 7/16/2021 13:16, Matthew Brost wrote:

Implement GuC context operations which includes GuC specific operations
alloc, pin, unpin, and destroy.

v2:
  (Daniel Vetter)
   - Use msleep_interruptible rather than cond_resched in busy loop
  (Michal)
   - Remove C++ style comment

Signed-off-by: John Harrison 
Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/intel_context.c   |   5 +
  drivers/gpu/drm/i915/gt/intel_context_types.h |  22 +-
  drivers/gpu/drm/i915/gt/intel_lrc_reg.h   |   1 -
  drivers/gpu/drm/i915/gt/uc/intel_guc.h|  40 ++
  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |   4 +
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 666 --
  drivers/gpu/drm/i915/i915_reg.h   |   1 +
  drivers/gpu/drm/i915/i915_request.c   |   1 +
  8 files changed, 685 insertions(+), 55 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index bd63813c8a80..32fd6647154b 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -384,6 +384,11 @@ intel_context_init(struct intel_context *ce, struct 
intel_engine_cs *engine)
  
  	mutex_init(&ce->pin_mutex);
  
+	spin_lock_init(&ce->guc_state.lock);

+
+   ce->guc_id = GUC_INVALID_LRC_ID;
+   INIT_LIST_HEAD(&ce->guc_id_link);
+
i915_active_init(&ce->active,
 __intel_context_active, __intel_context_retire, 0);
  }
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 6d99631d19b9..606c480aec26 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -96,6 +96,7 @@ struct intel_context {
  #define CONTEXT_BANNED6
  #define CONTEXT_FORCE_SINGLE_SUBMISSION   7
  #define CONTEXT_NOPREEMPT 8
+#define CONTEXT_LRCA_DIRTY 9
  
  	struct {

u64 timeout_us;
@@ -138,14 +139,29 @@ struct intel_context {
  
  	u8 wa_bb_page; /* if set, page num reserved for context workarounds */
  
+	struct {

+   /** lock: protects everything in guc_state */
+   spinlock_t lock;
+   /**
+* sched_state: scheduling state of this context using GuC
+* submission
+*/
+   u8 sched_state;
+   } guc_state;
+
/* GuC scheduling state flags that do not require a lock. */
atomic_t guc_sched_state_no_lock;
  
+	/* GuC LRC descriptor ID */

+   u16 guc_id;
+
+   /* GuC LRC descriptor reference count */
+   atomic_t guc_id_ref;
+
/*
-* GuC LRC descriptor ID - Not assigned in this patch but future patches
-* in the series will.
+* GuC ID link - in list when unpinned but guc_id still valid in GuC
 */
-   u16 guc_id;
+   struct list_head guc_id_link;
  };
  
  #endif /* __INTEL_CONTEXT_TYPES__ */

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h 
b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
index 41e5350a7a05..49d4857ad9b7 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
+++ b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
@@ -87,7 +87,6 @@
  #define GEN11_CSB_WRITE_PTR_MASK  (GEN11_CSB_PTR_MASK << 0)
  
  #define MAX_CONTEXT_HW_ID	(1 << 21) /* exclusive */

-#define MAX_GUC_CONTEXT_HW_ID  (1 << 20) /* exclusive */
  #define GEN11_MAX_CONTEXT_HW_ID   (1 << 11) /* exclusive */
  /* in Gen12 ID 0x7FF is reserved to indicate idle */
  #define GEN12_MAX_CONTEXT_HW_ID   (GEN11_MAX_CONTEXT_HW_ID - 1)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 8c7b92f699f1..30773cd699f5 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -7,6 +7,7 @@
  #define _INTEL_GUC_H_
  
  #include 

+#include 
  
  #include "intel_uncore.h"

  #include "intel_guc_fw.h"
@@ -44,6 +45,14 @@ struct intel_guc {
void (*disable)(struct intel_guc *guc);
} interrupts;
  
+	/*

+* contexts_lock protects the pool of free guc ids and a linked list of
+* guc ids available to be stolen
+*/
+   spinlock_t contexts_lock;
+   struct ida guc_ids;
+   struct list_head guc_id_list;
+
bool submission_selected;
  
  	struct i915_vma *ads_vma;

@@ -101,6 +110,34 @@ intel_guc_send_and_receive(struct intel_guc *guc, const 
u32 *action, u32 len,
 response_buf, response_buf_size, 0);
  }
  
+static inline int intel_guc_send_busy_loop(struct intel_guc* guc,

+  const u32 *action,
+  u32 len,
+  bool loop)
+{
+   int err;
+   unsigned int sleep_period_ms = 1;
+   bool not_atomic = !in_atomic() &a

Re: [PATCH 04/51] drm/i915/guc: Implement GuC submission tasklet

2021-07-19 Thread John Harrison

On 7/19/2021 15:55, Matthew Brost wrote:

On Mon, Jul 19, 2021 at 04:01:56PM -0700, John Harrison wrote:

On 7/16/2021 13:16, Matthew Brost wrote:

Implement GuC submission tasklet for new interface. The new GuC
interface uses H2G to submit contexts to the GuC. Since H2G use a single
channel, a single tasklet submits is used for the submission path.

This still needs fixing - 'a single tasklet submits is used' is not valid
English.


Will fix.


It also seems that the idea of splitting all the deletes of old code into a
separate patch didn't happen. It really does obfuscate things significantly
having completely unrelated deletes and adds interspersed :(.


I don't recall promising to do that.

Matt

"No promises but perhaps I'll do this in the next rev."

Well, this is the next rev. So I am expressing my disappointment that it 
didn't happen. Reviewability of patches is important.


John.



Re: [PATCH 13/51] drm/i915/guc: Disable semaphores when using GuC scheduling

2021-07-19 Thread John Harrison

On 7/16/2021 13:16, Matthew Brost wrote:

Semaphores are an optimization and not required for basic GuC submission
to work properly. Disable until we have time to do the implementation to
enable semaphores and tune them for performance. Also long direction is
just to delete semaphores from the i915 so another reason to not enable
these for GuC submission.

This patch fixes an existing bug where I915_ENGINE_HAS_SEMAPHORES was
not honored correctly.

Bugs plural. Otherwise:
Reviewed-by: John Harrison 



v2: Reword commit message
v3:
  (John H)
   - Add text to commit indicating this also fixing an existing bug

Cc: John Harrison 
Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 7d6f52d8a801..64659802d4df 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -799,7 +799,8 @@ static int intel_context_set_gem(struct intel_context *ce,
}
  
  	if (ctx->sched.priority >= I915_PRIORITY_NORMAL &&

-   intel_engine_has_timeslices(ce->engine))
+   intel_engine_has_timeslices(ce->engine) &&
+   intel_engine_has_semaphores(ce->engine))
__set_bit(CONTEXT_USE_SEMAPHORES, &ce->flags);
  
  	if (IS_ACTIVE(CONFIG_DRM_I915_REQUEST_TIMEOUT) &&

@@ -1778,7 +1779,8 @@ static void __apply_priority(struct intel_context *ce, 
void *arg)
if (!intel_engine_has_timeslices(ce->engine))
return;
  
-	if (ctx->sched.priority >= I915_PRIORITY_NORMAL)

+   if (ctx->sched.priority >= I915_PRIORITY_NORMAL &&
+   intel_engine_has_semaphores(ce->engine))
intel_context_set_use_semaphores(ce);
else
intel_context_clear_use_semaphores(ce);




Re: [PATCH 15/51] drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC

2021-07-19 Thread John Harrison

On 7/16/2021 13:16, Matthew Brost wrote:

When running the GuC the GPU can't be considered idle if the GuC still
has contexts pinned. As such, a call has been added in
intel_gt_wait_for_idle to idle the UC and in turn the GuC by waiting for
the number of unpinned contexts to go to zero.

v2: rtimeout -> remaining_timeout
v3: Drop unnecessary includes, guc_submission_busy_loop ->
guc_submission_send_busy_loop, drop negatie timeout trick, move a
refactor of guc_context_unpin to earlier path (John H)

Cc: John Harrison 
Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gem/i915_gem_mman.c  |  3 +-
  drivers/gpu/drm/i915/gt/intel_gt.c| 19 +
  drivers/gpu/drm/i915/gt/intel_gt.h|  2 +
  drivers/gpu/drm/i915/gt/intel_gt_requests.c   | 21 ++---
  drivers/gpu/drm/i915/gt/intel_gt_requests.h   |  7 +-
  drivers/gpu/drm/i915/gt/uc/intel_guc.h|  4 +
  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |  1 +
  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  4 +
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 85 +--
  drivers/gpu/drm/i915/gt/uc/intel_uc.h |  5 ++
  drivers/gpu/drm/i915/i915_gem_evict.c |  1 +
  .../gpu/drm/i915/selftests/igt_live_test.c|  2 +-
  .../gpu/drm/i915/selftests/mock_gem_device.c  |  3 +-
  13 files changed, 129 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index a90f796e85c0..6fffd4d377c2 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -645,7 +645,8 @@ mmap_offset_attach(struct drm_i915_gem_object *obj,
goto insert;
  
  	/* Attempt to reap some mmap space from dead objects */

-   err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT);
+   err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT,
+  NULL);
if (err)
goto err;
  
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c

index e714e21c0a4d..acfdd53b2678 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -585,6 +585,25 @@ static void __intel_gt_disable(struct intel_gt *gt)
GEM_BUG_ON(intel_gt_pm_is_awake(gt));
  }
  
+int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)

+{
+   long remaining_timeout;
+
+   /* If the device is asleep, we have no requests outstanding */
+   if (!intel_gt_pm_is_awake(gt))
+   return 0;
+
+   while ((timeout = intel_gt_retire_requests_timeout(gt, timeout,
+  &remaining_timeout)) 
> 0) {
+   cond_resched();
+   if (signal_pending(current))
+   return -EINTR;
+   }
+
+   return timeout ? timeout : intel_uc_wait_for_idle(>->uc,
+ remaining_timeout);
+}
+
  int intel_gt_init(struct intel_gt *gt)
  {
int err;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h 
b/drivers/gpu/drm/i915/gt/intel_gt.h
index e7aabe0cc5bf..74e771871a9b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -48,6 +48,8 @@ void intel_gt_driver_release(struct intel_gt *gt);
  
  void intel_gt_driver_late_release(struct intel_gt *gt);
  
+int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);

+
  void intel_gt_check_and_clear_faults(struct intel_gt *gt);
  void intel_gt_clear_error_registers(struct intel_gt *gt,
intel_engine_mask_t engine_mask);
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c 
b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
index 647eca9d867a..edb881d75630 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
@@ -130,7 +130,8 @@ void intel_engine_fini_retire(struct intel_engine_cs 
*engine)
GEM_BUG_ON(engine->retire);
  }
  
-long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout)

+long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
+ long *remaining_timeout)
  {
struct intel_gt_timelines *timelines = >->timelines;
struct intel_timeline *tl, *tn;
@@ -195,22 +196,10 @@ out_active:   spin_lock(&timelines->lock);
if (flush_submission(gt, timeout)) /* Wait, there's more! */
active_count++;
  
-	return active_count ? timeout : 0;

-}
-
-int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
-{
-   /* If the device is asleep, we have no requests outstanding */
-   if (!intel_gt_pm_is_awake(gt))
-   return 0;
-
-   while ((timeout = intel_gt_retire_requests_timeout(gt, timeout)) > 0) {
-   cond_resched();
-   if (

Re: [PATCH 16/51] drm/i915/guc: Update GuC debugfs to support new GuC

2021-07-19 Thread John Harrison

On 7/16/2021 13:16, Matthew Brost wrote:

Update GuC debugfs to support the new GuC structures.

v2:
  (John Harrison)
   - Remove intel_lrc_reg.h include from i915_debugfs.c
  (Michal)
   - Rename GuC debugfs functions

Signed-off-by: John Harrison 
Signed-off-by: Matthew Brost 

Reviewed-by: John Harrison 


---
  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 22 
  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  3 +
  .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c| 23 +++-
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 55 +++
  .../gpu/drm/i915/gt/uc/intel_guc_submission.h |  5 ++
  5 files changed, 107 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index f1cbed6b9f0a..503a78517610 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -1171,3 +1171,25 @@ void intel_guc_ct_event_handler(struct intel_guc_ct *ct)
  
  	ct_try_receive_message(ct);

  }
+
+void intel_guc_ct_print_info(struct intel_guc_ct *ct,
+struct drm_printer *p)
+{
+   drm_printf(p, "CT %s\n", enableddisabled(ct->enabled));
+
+   if (!ct->enabled)
+   return;
+
+   drm_printf(p, "H2G Space: %u\n",
+  atomic_read(&ct->ctbs.send.space) * 4);
+   drm_printf(p, "Head: %u\n",
+  ct->ctbs.send.desc->head);
+   drm_printf(p, "Tail: %u\n",
+  ct->ctbs.send.desc->tail);
+   drm_printf(p, "G2H Space: %u\n",
+  atomic_read(&ct->ctbs.recv.space) * 4);
+   drm_printf(p, "Head: %u\n",
+  ct->ctbs.recv.desc->head);
+   drm_printf(p, "Tail: %u\n",
+  ct->ctbs.recv.desc->tail);
+}
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index 4b30a562ae63..7b34026d264a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -16,6 +16,7 @@
  
  struct i915_vma;

  struct intel_guc;
+struct drm_printer;
  
  /**

   * DOC: Command Transport (CT).
@@ -112,4 +113,6 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 
*action, u32 len,
  u32 *response_buf, u32 response_buf_size, u32 flags);
  void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
  
+void intel_guc_ct_print_info(struct intel_guc_ct *ct, struct drm_printer *p);

+
  #endif /* _INTEL_GUC_CT_H_ */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
index fe7cb7b29a1e..7a454c91a736 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
@@ -9,6 +9,8 @@
  #include "intel_guc.h"
  #include "intel_guc_debugfs.h"
  #include "intel_guc_log_debugfs.h"
+#include "gt/uc/intel_guc_ct.h"
+#include "gt/uc/intel_guc_submission.h"
  
  static int guc_info_show(struct seq_file *m, void *data)

  {
@@ -22,16 +24,35 @@ static int guc_info_show(struct seq_file *m, void *data)
drm_puts(&p, "\n");
intel_guc_log_info(&guc->log, &p);
  
-	/* Add more as required ... */

+   if (!intel_guc_submission_is_used(guc))
+   return 0;
+
+   intel_guc_ct_print_info(&guc->ct, &p);
+   intel_guc_submission_print_info(guc, &p);
  
  	return 0;

  }
  DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_info);
  
+static int guc_registered_contexts_show(struct seq_file *m, void *data)

+{
+   struct intel_guc *guc = m->private;
+   struct drm_printer p = drm_seq_file_printer(m);
+
+   if (!intel_guc_submission_is_used(guc))
+   return -ENODEV;
+
+   intel_guc_submission_print_context_info(guc, &p);
+
+   return 0;
+}
+DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_registered_contexts);
+
  void intel_guc_debugfs_register(struct intel_guc *guc, struct dentry *root)
  {
static const struct debugfs_gt_file files[] = {
{ "guc_info", &guc_info_fops, NULL },
+   { "guc_registered_contexts", &guc_registered_contexts_fops, 
NULL },
};
  
  	if (!intel_guc_is_supported(guc))

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 088d11e2e497..a2af7e17dcc2 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1602,3 +1602,58 @@ int intel_guc_sched_done_process_msg(struct intel_guc 
*guc,
  
  	return 0;

  }
+
+void intel_guc_submission_print_info(struct intel_guc *guc,
+struct drm_printer *p)
+{
+   struct i915_sched_engine *sched_engine = guc->sched_engine;
+   struct rb_node *

Re: [PATCH 17/51] drm/i915/guc: Add several request trace points

2021-07-19 Thread John Harrison

On 7/16/2021 13:16, Matthew Brost wrote:

Add trace points for request dependencies and GuC submit. Extended
existing request trace points to include submit fence value,, guc_id,

Still has misplaced commas.

Also, Tvrtko has a bunch of comments/questions on the previous version 
that need to be addressed.


John.


and ring tail value.

v2: Fix white space alignment in i915_request_add trace point

Cc: John Harrison 
Signed-off-by: Matthew Brost 
Reviewed-by: John Harrison 
---
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  3 ++
  drivers/gpu/drm/i915/i915_request.c   |  3 ++
  drivers/gpu/drm/i915/i915_trace.h | 43 +--
  3 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index a2af7e17dcc2..480fb2184ecf 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -417,6 +417,7 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
guc->stalled_request = last;
return false;
}
+   trace_i915_request_guc_submit(last);
}
  
  	guc->stalled_request = NULL;

@@ -637,6 +638,8 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc,
ret = guc_add_request(guc, rq);
if (ret == -EBUSY)
guc->stalled_request = rq;
+   else
+   trace_i915_request_guc_submit(rq);
  
  	return ret;

  }
diff --git a/drivers/gpu/drm/i915/i915_request.c 
b/drivers/gpu/drm/i915/i915_request.c
index 2b2b63cba06c..01aa3d1ee2b1 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1319,6 +1319,9 @@ __i915_request_await_execution(struct i915_request *to,
return err;
}
  
+	trace_i915_request_dep_to(to);

+   trace_i915_request_dep_from(from);
+
/* Couple the dependency tree for PI on this exposed to->fence */
if (to->engine->sched_engine->schedule) {
err = i915_sched_node_add_dependency(&to->sched,
diff --git a/drivers/gpu/drm/i915/i915_trace.h 
b/drivers/gpu/drm/i915/i915_trace.h
index 6778ad2a14a4..ea41d069bf7d 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -794,30 +794,50 @@ DECLARE_EVENT_CLASS(i915_request,
TP_STRUCT__entry(
 __field(u32, dev)
 __field(u64, ctx)
+__field(u32, guc_id)
 __field(u16, class)
 __field(u16, instance)
 __field(u32, seqno)
+__field(u32, tail)
 ),
  
  	TP_fast_assign(

   __entry->dev = rq->engine->i915->drm.primary->index;
   __entry->class = rq->engine->uabi_class;
   __entry->instance = rq->engine->uabi_instance;
+  __entry->guc_id = rq->context->guc_id;
   __entry->ctx = rq->fence.context;
   __entry->seqno = rq->fence.seqno;
+  __entry->tail = rq->tail;
   ),
  
-	TP_printk("dev=%u, engine=%u:%u, ctx=%llu, seqno=%u",

+   TP_printk("dev=%u, engine=%u:%u, guc_id=%u, ctx=%llu, seqno=%u, 
tail=%u",
  __entry->dev, __entry->class, __entry->instance,
- __entry->ctx, __entry->seqno)
+ __entry->guc_id, __entry->ctx, __entry->seqno,
+ __entry->tail)
  );
  
  DEFINE_EVENT(i915_request, i915_request_add,

-   TP_PROTO(struct i915_request *rq),
-   TP_ARGS(rq)
+TP_PROTO(struct i915_request *rq),
+TP_ARGS(rq)
  );
  
  #if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS)

+DEFINE_EVENT(i915_request, i915_request_dep_to,
+TP_PROTO(struct i915_request *rq),
+TP_ARGS(rq)
+);
+
+DEFINE_EVENT(i915_request, i915_request_dep_from,
+TP_PROTO(struct i915_request *rq),
+TP_ARGS(rq)
+);
+
+DEFINE_EVENT(i915_request, i915_request_guc_submit,
+TP_PROTO(struct i915_request *rq),
+TP_ARGS(rq)
+);
+
  DEFINE_EVENT(i915_request, i915_request_submit,
 TP_PROTO(struct i915_request *rq),
 TP_ARGS(rq)
@@ -887,6 +907,21 @@ TRACE_EVENT(i915_request_out,
  
  #else

  #if !defined(TRACE_HEADER_MULTI_READ)
+static inline void
+trace_i915_request_dep_to(struct i915_request *rq)
+{
+}
+
+static inline void
+trace_i915_request_dep_from(struct i915_request *rq)
+{
+}
+
+static inline void
+trace_i915_request_guc_submit(struct i915_request *rq)
+{
+}
+
  static inline void
  trace_i915_request_submit(struct i915_request *rq)
  {




Re: [PATCH 20/51] drm/i915: Track 'serial' counts for virtual engines

2021-07-19 Thread John Harrison

On 7/16/2021 13:16, Matthew Brost wrote:

From: John Harrison 

The serial number tracking of engines happens at the backend of
request submission and was expecting to only be given physical
engines. However, in GuC submission mode, the decomposition of virtual
to physical engines does not happen in i915. Instead, requests are
submitted to their virtual engine mask all the way through to the
hardware (i.e. to GuC). This would mean that the heart beat code
thinks the physical engines are idle due to the serial number not
incrementing.

This patch updates the tracking to decompose virtual engines into
their physical constituents and tracks the request against each. This
is not entirely accurate as the GuC will only be issuing the request
to one physical engine. However, it is the best that i915 can do given
that it has no knowledge of the GuC's scheduling decisions.

Signed-off-by: John Harrison 
Signed-off-by: Matthew Brost 

Still needs to pull in Tvrtko's updated subject and description.

John.


---
  drivers/gpu/drm/i915/gt/intel_engine_types.h |  2 ++
  .../gpu/drm/i915/gt/intel_execlists_submission.c |  6 ++
  drivers/gpu/drm/i915/gt/intel_ring_submission.c  |  6 ++
  drivers/gpu/drm/i915/gt/mock_engine.c|  6 ++
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c| 16 
  drivers/gpu/drm/i915/i915_request.c  |  4 +++-
  6 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 1cb9c3b70b29..8ad304b2f2e4 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -388,6 +388,8 @@ struct intel_engine_cs {
void(*park)(struct intel_engine_cs *engine);
void(*unpark)(struct intel_engine_cs *engine);
  
+	void		(*bump_serial)(struct intel_engine_cs *engine);

+
void(*set_default_submission)(struct intel_engine_cs 
*engine);
  
  	const struct intel_context_ops *cops;

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 28492cdce706..920707e22eb0 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -3191,6 +3191,11 @@ static void execlists_release(struct intel_engine_cs 
*engine)
lrc_fini_wa_ctx(engine);
  }
  
+static void execlist_bump_serial(struct intel_engine_cs *engine)

+{
+   engine->serial++;
+}
+
  static void
  logical_ring_default_vfuncs(struct intel_engine_cs *engine)
  {
@@ -3200,6 +3205,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs 
*engine)
  
  	engine->cops = &execlists_context_ops;

engine->request_alloc = execlists_request_alloc;
+   engine->bump_serial = execlist_bump_serial;
  
  	engine->reset.prepare = execlists_reset_prepare;

engine->reset.rewind = execlists_reset_rewind;
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c 
b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
index 5c4d204d07cc..61469c631057 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
@@ -1047,6 +1047,11 @@ static void setup_irq(struct intel_engine_cs *engine)
}
  }
  
+static void ring_bump_serial(struct intel_engine_cs *engine)

+{
+   engine->serial++;
+}
+
  static void setup_common(struct intel_engine_cs *engine)
  {
struct drm_i915_private *i915 = engine->i915;
@@ -1066,6 +1071,7 @@ static void setup_common(struct intel_engine_cs *engine)
  
  	engine->cops = &ring_context_ops;

engine->request_alloc = ring_request_alloc;
+   engine->bump_serial = ring_bump_serial;
  
  	/*

 * Using a global execution timeline; the previous final breadcrumb is
diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c 
b/drivers/gpu/drm/i915/gt/mock_engine.c
index 68970398e4ef..9203c766db80 100644
--- a/drivers/gpu/drm/i915/gt/mock_engine.c
+++ b/drivers/gpu/drm/i915/gt/mock_engine.c
@@ -292,6 +292,11 @@ static void mock_engine_release(struct intel_engine_cs 
*engine)
intel_engine_fini_retire(engine);
  }
  
+static void mock_bump_serial(struct intel_engine_cs *engine)

+{
+   engine->serial++;
+}
+
  struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
const char *name,
int id)
@@ -318,6 +323,7 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private 
*i915,
  
  	engine->base.cops = &mock_context_ops;

engine->base.request_alloc = mock_request_alloc;
+   engine->base.bump_serial = mock_bump_serial;
engine->base.emit_flush = mock_emit_flush;
engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb;
engine->base.submit_request = mock_submit_request;
d

Re: [PATCH 23/51] drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs

2021-07-20 Thread John Harrison

On 7/16/2021 13:16, Matthew Brost wrote:

With GuC virtual engines the physical engine which a request executes
and completes on isn't known to the i915. Therefore we can't attach a
request to a physical engines breadcrumbs. To work around this we create
a single breadcrumbs per engine class when using GuC submission and
direct all physical engine interrupts to this breadcrumbs.

v2:
  (John H)
   - Rework header file structure so intel_engine_mask_t can be in
 intel_engine_types.h

Signed-off-by: Matthew Brost 
CC: John Harrison 

Reviewed-by: John Harrison 


---
  drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   | 41 +---
  drivers/gpu/drm/i915/gt/intel_breadcrumbs.h   | 16 -
  .../gpu/drm/i915/gt/intel_breadcrumbs_types.h |  7 ++
  drivers/gpu/drm/i915/gt/intel_engine.h|  3 +
  drivers/gpu/drm/i915/gt/intel_engine_cs.c | 28 +++-
  drivers/gpu/drm/i915/gt/intel_engine_types.h  |  2 +-
  .../drm/i915/gt/intel_execlists_submission.c  |  2 +-
  drivers/gpu/drm/i915/gt/mock_engine.c |  4 +-
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 67 +--
  9 files changed, 133 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c 
b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index 38cc42783dfb..2007dc6f6b99 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -15,28 +15,14 @@
  #include "intel_gt_pm.h"
  #include "intel_gt_requests.h"
  
-static bool irq_enable(struct intel_engine_cs *engine)

+static bool irq_enable(struct intel_breadcrumbs *b)
  {
-   if (!engine->irq_enable)
-   return false;
-
-   /* Caller disables interrupts */
-   spin_lock(&engine->gt->irq_lock);
-   engine->irq_enable(engine);
-   spin_unlock(&engine->gt->irq_lock);
-
-   return true;
+   return intel_engine_irq_enable(b->irq_engine);
  }
  
-static void irq_disable(struct intel_engine_cs *engine)

+static void irq_disable(struct intel_breadcrumbs *b)
  {
-   if (!engine->irq_disable)
-   return;
-
-   /* Caller disables interrupts */
-   spin_lock(&engine->gt->irq_lock);
-   engine->irq_disable(engine);
-   spin_unlock(&engine->gt->irq_lock);
+   intel_engine_irq_disable(b->irq_engine);
  }
  
  static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)

@@ -57,7 +43,7 @@ static void __intel_breadcrumbs_arm_irq(struct 
intel_breadcrumbs *b)
WRITE_ONCE(b->irq_armed, true);
  
  	/* Requests may have completed before we could enable the interrupt. */

-   if (!b->irq_enabled++ && irq_enable(b->irq_engine))
+   if (!b->irq_enabled++ && b->irq_enable(b))
irq_work_queue(&b->irq_work);
  }
  
@@ -76,7 +62,7 @@ static void __intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b)

  {
GEM_BUG_ON(!b->irq_enabled);
if (!--b->irq_enabled)
-   irq_disable(b->irq_engine);
+   b->irq_disable(b);
  
  	WRITE_ONCE(b->irq_armed, false);

intel_gt_pm_put_async(b->irq_engine->gt);
@@ -281,7 +267,7 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine)
if (!b)
return NULL;
  
-	b->irq_engine = irq_engine;

+   kref_init(&b->ref);
  
  	spin_lock_init(&b->signalers_lock);

INIT_LIST_HEAD(&b->signalers);
@@ -290,6 +276,10 @@ intel_breadcrumbs_create(struct intel_engine_cs 
*irq_engine)
spin_lock_init(&b->irq_lock);
init_irq_work(&b->irq_work, signal_irq_work);
  
+	b->irq_engine = irq_engine;

+   b->irq_enable = irq_enable;
+   b->irq_disable = irq_disable;
+
return b;
  }
  
@@ -303,9 +293,9 @@ void intel_breadcrumbs_reset(struct intel_breadcrumbs *b)

spin_lock_irqsave(&b->irq_lock, flags);
  
  	if (b->irq_enabled)

-   irq_enable(b->irq_engine);
+   b->irq_enable(b);
else
-   irq_disable(b->irq_engine);
+   b->irq_disable(b);
  
  	spin_unlock_irqrestore(&b->irq_lock, flags);

  }
@@ -325,11 +315,14 @@ void __intel_breadcrumbs_park(struct intel_breadcrumbs *b)
}
  }
  
-void intel_breadcrumbs_free(struct intel_breadcrumbs *b)

+void intel_breadcrumbs_free(struct kref *kref)
  {
+   struct intel_breadcrumbs *b = container_of(kref, typeof(*b), ref);
+
irq_work_sync(&b->irq_work);
GEM_BUG_ON(!list_empty(&b->signalers));
GEM_BUG_ON(b->irq_armed);
+
kfree(b);
  }
  
diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h

index 3ce5ce270b04..be0d4f379a85 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
@@ -9,7 +9,7 @@
  #include 

Re: [PATCH 15/51] drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC

2021-07-20 Thread John Harrison

On 7/19/2021 18:53, Matthew Brost wrote:

On Mon, Jul 19, 2021 at 06:03:05PM -0700, John Harrison wrote:

On 7/16/2021 13:16, Matthew Brost wrote:

When running the GuC the GPU can't be considered idle if the GuC still
has contexts pinned. As such, a call has been added in
intel_gt_wait_for_idle to idle the UC and in turn the GuC by waiting for
the number of unpinned contexts to go to zero.

v2: rtimeout -> remaining_timeout
v3: Drop unnecessary includes, guc_submission_busy_loop ->
guc_submission_send_busy_loop, drop negatie timeout trick, move a
refactor of guc_context_unpin to earlier path (John H)

Cc: John Harrison 
Signed-off-by: Matthew Brost 
---
   drivers/gpu/drm/i915/gem/i915_gem_mman.c  |  3 +-
   drivers/gpu/drm/i915/gt/intel_gt.c| 19 +
   drivers/gpu/drm/i915/gt/intel_gt.h|  2 +
   drivers/gpu/drm/i915/gt/intel_gt_requests.c   | 21 ++---
   drivers/gpu/drm/i915/gt/intel_gt_requests.h   |  7 +-
   drivers/gpu/drm/i915/gt/uc/intel_guc.h|  4 +
   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |  1 +
   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  4 +
   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 85 +--
   drivers/gpu/drm/i915/gt/uc/intel_uc.h |  5 ++
   drivers/gpu/drm/i915/i915_gem_evict.c |  1 +
   .../gpu/drm/i915/selftests/igt_live_test.c|  2 +-
   .../gpu/drm/i915/selftests/mock_gem_device.c  |  3 +-
   13 files changed, 129 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index a90f796e85c0..6fffd4d377c2 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -645,7 +645,8 @@ mmap_offset_attach(struct drm_i915_gem_object *obj,
goto insert;
/* Attempt to reap some mmap space from dead objects */
-   err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT);
+   err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT,
+  NULL);
if (err)
goto err;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c 
b/drivers/gpu/drm/i915/gt/intel_gt.c
index e714e21c0a4d..acfdd53b2678 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -585,6 +585,25 @@ static void __intel_gt_disable(struct intel_gt *gt)
GEM_BUG_ON(intel_gt_pm_is_awake(gt));
   }
+int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
+{
+   long remaining_timeout;
+
+   /* If the device is asleep, we have no requests outstanding */
+   if (!intel_gt_pm_is_awake(gt))
+   return 0;
+
+   while ((timeout = intel_gt_retire_requests_timeout(gt, timeout,
+  &remaining_timeout)) 
> 0) {
+   cond_resched();
+   if (signal_pending(current))
+   return -EINTR;
+   }
+
+   return timeout ? timeout : intel_uc_wait_for_idle(>->uc,
+ remaining_timeout);
+}
+
   int intel_gt_init(struct intel_gt *gt)
   {
int err;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h 
b/drivers/gpu/drm/i915/gt/intel_gt.h
index e7aabe0cc5bf..74e771871a9b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -48,6 +48,8 @@ void intel_gt_driver_release(struct intel_gt *gt);
   void intel_gt_driver_late_release(struct intel_gt *gt);
+int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
+
   void intel_gt_check_and_clear_faults(struct intel_gt *gt);
   void intel_gt_clear_error_registers(struct intel_gt *gt,
intel_engine_mask_t engine_mask);
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c 
b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
index 647eca9d867a..edb881d75630 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
@@ -130,7 +130,8 @@ void intel_engine_fini_retire(struct intel_engine_cs 
*engine)
GEM_BUG_ON(engine->retire);
   }
-long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout)
+long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
+ long *remaining_timeout)
   {
struct intel_gt_timelines *timelines = >->timelines;
struct intel_timeline *tl, *tn;
@@ -195,22 +196,10 @@ out_active:   spin_lock(&timelines->lock);
if (flush_submission(gt, timeout)) /* Wait, there's more! */
active_count++;
-   return active_count ? timeout : 0;
-}
-
-int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
-{
-   /* If the device is asleep, we have no requests outstanding */
-   if (!intel_gt_pm_is_awake(gt))
-   return 0;
-
-   while ((t

Re: [PATCH 24/51] drm/i915: Add i915_sched_engine destroy vfunc

2021-07-20 Thread John Harrison

On 7/16/2021 13:16, Matthew Brost wrote:

This help the backends clean up when the schedule engine object gets
help -> helps. Although, I would say it's more like 'this is required to 
allow backend specific cleanup'. It doesn't just make life a bit easier, 
it allows us to not leak stuff and/or dereference null pointers!


Either way...
Reviewed-by: John Harrison 


destroyed.

Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/i915_scheduler.c   | 3 ++-
  drivers/gpu/drm/i915/i915_scheduler.h   | 4 +---
  drivers/gpu/drm/i915/i915_scheduler_types.h | 5 +
  3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
b/drivers/gpu/drm/i915/i915_scheduler.c
index 3a58a9130309..4fceda96deed 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -431,7 +431,7 @@ void i915_request_show_with_schedule(struct drm_printer *m,
rcu_read_unlock();
  }
  
-void i915_sched_engine_free(struct kref *kref)

+static void default_destroy(struct kref *kref)
  {
struct i915_sched_engine *sched_engine =
container_of(kref, typeof(*sched_engine), ref);
@@ -453,6 +453,7 @@ i915_sched_engine_create(unsigned int subclass)
  
  	sched_engine->queue = RB_ROOT_CACHED;

sched_engine->queue_priority_hint = INT_MIN;
+   sched_engine->destroy = default_destroy;
  
  	INIT_LIST_HEAD(&sched_engine->requests);

INIT_LIST_HEAD(&sched_engine->hold);
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h 
b/drivers/gpu/drm/i915/i915_scheduler.h
index 650ab8e0db9f..3c9504e9f409 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -51,8 +51,6 @@ static inline void i915_priolist_free(struct i915_priolist *p)
  struct i915_sched_engine *
  i915_sched_engine_create(unsigned int subclass);
  
-void i915_sched_engine_free(struct kref *kref);

-
  static inline struct i915_sched_engine *
  i915_sched_engine_get(struct i915_sched_engine *sched_engine)
  {
@@ -63,7 +61,7 @@ i915_sched_engine_get(struct i915_sched_engine *sched_engine)
  static inline void
  i915_sched_engine_put(struct i915_sched_engine *sched_engine)
  {
-   kref_put(&sched_engine->ref, i915_sched_engine_free);
+   kref_put(&sched_engine->ref, sched_engine->destroy);
  }
  
  static inline bool

diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h 
b/drivers/gpu/drm/i915/i915_scheduler_types.h
index 5935c3152bdc..00384e2c5273 100644
--- a/drivers/gpu/drm/i915/i915_scheduler_types.h
+++ b/drivers/gpu/drm/i915/i915_scheduler_types.h
@@ -163,6 +163,11 @@ struct i915_sched_engine {
 */
void *private_data;
  
+	/**

+* @destroy: destroy schedule engine / cleanup in backend
+*/
+   void(*destroy)(struct kref *kref);
+
/**
 * @kick_backend: kick backend after a request's priority has changed
 */




Re: [PATCH 25/51] drm/i915: Move active request tracking to a vfunc

2021-07-20 Thread John Harrison

On 7/16/2021 13:16, Matthew Brost wrote:

Move active request tracking to a backend vfunc rather than assuming all
backends want to do this in the maner. In the case execlists /

maner -> manner.
In the case *of* execlists

With those fixed...
Reviewed-by: John Harrison 



ring submission the tracking is on the physical engine while with GuC
submission it is on the context.

Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/intel_context.c   |  3 ++
  drivers/gpu/drm/i915/gt/intel_context_types.h |  7 
  drivers/gpu/drm/i915/gt/intel_engine_types.h  |  6 +++
  .../drm/i915/gt/intel_execlists_submission.c  | 40 ++
  .../gpu/drm/i915/gt/intel_ring_submission.c   | 22 ++
  drivers/gpu/drm/i915/gt/mock_engine.c | 30 ++
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 33 +++
  drivers/gpu/drm/i915/i915_request.c   | 41 ++-
  drivers/gpu/drm/i915/i915_request.h   |  2 +
  9 files changed, 147 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index 251ff7eea22d..bfb05d8697d1 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -393,6 +393,9 @@ intel_context_init(struct intel_context *ce, struct 
intel_engine_cs *engine)
spin_lock_init(&ce->guc_state.lock);
INIT_LIST_HEAD(&ce->guc_state.fences);
  
+	spin_lock_init(&ce->guc_active.lock);

+   INIT_LIST_HEAD(&ce->guc_active.requests);
+
ce->guc_id = GUC_INVALID_LRC_ID;
INIT_LIST_HEAD(&ce->guc_id_link);
  
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h

index 542c98418771..035108c10b2c 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -162,6 +162,13 @@ struct intel_context {
struct list_head fences;
} guc_state;
  
+	struct {

+   /** lock: protects everything in guc_active */
+   spinlock_t lock;
+   /** requests: active requests on this context */
+   struct list_head requests;
+   } guc_active;
+
/* GuC scheduling state flags that do not require a lock. */
atomic_t guc_sched_state_no_lock;
  
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h

index 03a81e8d87f4..950fc73ed6af 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -420,6 +420,12 @@ struct intel_engine_cs {
  
  	void		(*release)(struct intel_engine_cs *engine);
  
+	/*

+* Add / remove request from engine active tracking
+*/
+   void(*add_active_request)(struct i915_request *rq);
+   void(*remove_active_request)(struct i915_request *rq);
+
struct intel_engine_execlists execlists;
  
  	/*

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index abe48421fd7a..f9b5f54a5abe 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -3106,6 +3106,42 @@ static void execlists_park(struct intel_engine_cs 
*engine)
cancel_timer(&engine->execlists.preempt);
  }
  
+static void add_to_engine(struct i915_request *rq)

+{
+   lockdep_assert_held(&rq->engine->sched_engine->lock);
+   list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
+}
+
+static void remove_from_engine(struct i915_request *rq)
+{
+   struct intel_engine_cs *engine, *locked;
+
+   /*
+* Virtual engines complicate acquiring the engine timeline lock,
+* as their rq->engine pointer is not stable until under that
+* engine lock. The simple ploy we use is to take the lock then
+* check that the rq still belongs to the newly locked engine.
+*/
+   locked = READ_ONCE(rq->engine);
+   spin_lock_irq(&locked->sched_engine->lock);
+   while (unlikely(locked != (engine = READ_ONCE(rq->engine {
+   spin_unlock(&locked->sched_engine->lock);
+   spin_lock(&engine->sched_engine->lock);
+   locked = engine;
+   }
+   list_del_init(&rq->sched.link);
+
+   clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
+   clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags);
+
+   /* Prevent further __await_execution() registering a cb, then flush */
+   set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
+
+   spin_unlock_irq(&locked->sched_engine->lock);
+
+   i915_request_notify_execute_cb_imm(rq);
+}
+
  static bool can_preempt(struct intel_engine_cs *e

Re: [PATCH 26/51] drm/i915/guc: Reset implementation for new GuC interface

2021-07-20 Thread John Harrison

On 7/16/2021 13:16, Matthew Brost wrote:

Reset implementation for new GuC interface. This is the legacy reset
implementation which is called when the i915 owns the engine hang check.
Future patches will offload the engine hang check to GuC but we will
continue to maintain this legacy path as a fallback and this code path
is also required if the GuC dies.

With the new GuC interface it is not possible to reset individual
engines - it is only possible to reset the GPU entirely. This patch
forces an entire chip reset if any engine hangs.

v2:
  (Michal)
   - Check for -EPIPE rather than -EIO (CT deadlock/corrupt check)
v3:
  (John H)
   - Split into a series of smaller patches
While the split happened, it doesn't look like any of the other comments 
were address. Repeated below for clarity. Also, Tvrtko has a bunch of 
outstanding comments too.




Cc: John Harrison 
Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/intel_gt_pm.c |   6 +-
  drivers/gpu/drm/i915/gt/intel_reset.c |  18 +-
  drivers/gpu/drm/i915/gt/uc/intel_guc.c|  13 -
  drivers/gpu/drm/i915/gt/uc/intel_guc.h|   8 +-
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 562 ++
  drivers/gpu/drm/i915/gt/uc/intel_uc.c |  39 +-
  drivers/gpu/drm/i915/gt/uc/intel_uc.h |   3 +
  7 files changed, 515 insertions(+), 134 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c 
b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
index aef3084e8b16..463a6ae605a0 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
@@ -174,8 +174,6 @@ static void gt_sanitize(struct intel_gt *gt, bool force)
if (intel_gt_is_wedged(gt))
intel_gt_unset_wedged(gt);
  
-	intel_uc_sanitize(>->uc);

-
for_each_engine(engine, gt, id)
if (engine->reset.prepare)
engine->reset.prepare(engine);
@@ -191,6 +189,8 @@ static void gt_sanitize(struct intel_gt *gt, bool force)
__intel_engine_reset(engine, false);
}
  
+	intel_uc_reset(>->uc, false);

+
for_each_engine(engine, gt, id)
if (engine->reset.finish)
engine->reset.finish(engine);
@@ -243,6 +243,8 @@ int intel_gt_resume(struct intel_gt *gt)
goto err_wedged;
}
  
+	intel_uc_reset_finish(>->uc);

+
intel_rps_enable(>->rps);
intel_llc_enable(>->llc);
  
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c

index 72251638d4ea..2987282dff6d 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -826,6 +826,8 @@ static int gt_reset(struct intel_gt *gt, 
intel_engine_mask_t stalled_mask)
__intel_engine_reset(engine, stalled_mask & engine->mask);
local_bh_enable();
  
+	intel_uc_reset(>->uc, true);

+
intel_ggtt_restore_fences(gt->ggtt);
  
  	return err;

@@ -850,6 +852,8 @@ static void reset_finish(struct intel_gt *gt, 
intel_engine_mask_t awake)
if (awake & engine->mask)
intel_engine_pm_put(engine);
}
+
+   intel_uc_reset_finish(>->uc);
  }
  
  static void nop_submit_request(struct i915_request *request)

@@ -903,6 +907,7 @@ static void __intel_gt_set_wedged(struct intel_gt *gt)
for_each_engine(engine, gt, id)
if (engine->reset.cancel)
engine->reset.cancel(engine);
+   intel_uc_cancel_requests(>->uc);
local_bh_enable();
  
  	reset_finish(gt, awake);

@@ -1191,6 +1196,9 @@ int __intel_engine_reset_bh(struct intel_engine_cs 
*engine, const char *msg)
ENGINE_TRACE(engine, "flags=%lx\n", gt->reset.flags);
GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, >->reset.flags));
  
+	if (intel_engine_uses_guc(engine))

+   return -ENODEV;
+
if (!intel_engine_pm_get_if_awake(engine))
return 0;
  
@@ -1201,13 +1209,10 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg)

   "Resetting %s for %s\n", engine->name, msg);

atomic_inc(&engine->i915->gpu_error.reset_engine_count[engine->uabi_class]);
  
-	if (intel_engine_uses_guc(engine))

-   ret = intel_guc_reset_engine(&engine->gt->uc.guc, engine);
-   else
-   ret = intel_gt_reset_engine(engine);
+   ret = intel_gt_reset_engine(engine);
if (ret) {
/* If we fail here, we expect to fallback to a global reset */
-   ENGINE_TRACE(engine, "Failed to reset, err: %d\n", ret);
+   ENGINE_TRACE(engine, "Failed to reset %s, err: %d\n", 
engine->name, ret);
goto out;
}
  
@@ -1341,7 +1346,8 @@ void in

Re: [PATCH 30/51] drm/i915/guc: Handle context reset notification

2021-07-20 Thread John Harrison

On 7/16/2021 13:17, Matthew Brost wrote:

GuC will issue a reset on detecting an engine hang and will notify
the driver via a G2H message. The driver will service the notification
by resetting the guilty context to a simple state or banning it
completely.

v2:
  (John Harrison)
   - Move msg[0] lookup after length check

Cc: Matthew Brost 
Cc: John Harrison 
Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/uc/intel_guc.h|  2 ++
  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |  3 ++
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 36 +++
  drivers/gpu/drm/i915/i915_trace.h | 10 ++
  4 files changed, 51 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index b3cfc52fe0bc..f23a3a618550 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -262,6 +262,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc 
*guc,
  const u32 *msg, u32 len);
  int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 const u32 *msg, u32 len);
+int intel_guc_context_reset_process_msg(struct intel_guc *guc,
+   const u32 *msg, u32 len);
  
  void intel_guc_submission_reset_prepare(struct intel_guc *guc);

  void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 503a78517610..c4f9b44b9f86 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -981,6 +981,9 @@ static int ct_process_request(struct intel_guc_ct *ct, 
struct ct_incoming_msg *r
case INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE:
ret = intel_guc_sched_done_process_msg(guc, payload, len);
break;
+   case INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION:
+   ret = intel_guc_context_reset_process_msg(guc, payload, len);
+   break;
default:
ret = -EOPNOTSUPP;
break;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index fdb17279095c..feaf1ca61eaa 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2196,6 +2196,42 @@ int intel_guc_sched_done_process_msg(struct intel_guc 
*guc,
return 0;
  }
  
+static void guc_context_replay(struct intel_context *ce)

+{
+   struct i915_sched_engine *sched_engine = ce->engine->sched_engine;
+
+   __guc_reset_context(ce, true);
+   tasklet_hi_schedule(&sched_engine->tasklet);
+}
+
+static void guc_handle_context_reset(struct intel_guc *guc,
+struct intel_context *ce)
+{
+   trace_intel_context_reset(ce);
+   guc_context_replay(ce);
+}
+
+int intel_guc_context_reset_process_msg(struct intel_guc *guc,
+   const u32 *msg, u32 len)
+{
+   struct intel_context *ce;
+   int desc_idx;
+
+   if (unlikely(len != 1)) {
+   drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);

I think we decided that these should be drm_err rather than drm_dbg?

With that updated:
Reviewed-by: John Harrison 


+   return -EPROTO;
+   }
+
+   desc_idx = msg[0];
+   ce = g2h_context_lookup(guc, desc_idx);
+   if (unlikely(!ce))
+   return -EPROTO;
+
+   guc_handle_context_reset(guc, ce);
+
+   return 0;
+}
+
  void intel_guc_submission_print_info(struct intel_guc *guc,
 struct drm_printer *p)
  {
diff --git a/drivers/gpu/drm/i915/i915_trace.h 
b/drivers/gpu/drm/i915/i915_trace.h
index 97c2e83984ed..c095c4d39456 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -929,6 +929,11 @@ DECLARE_EVENT_CLASS(intel_context,
  __entry->guc_sched_state_no_lock)
  );
  
+DEFINE_EVENT(intel_context, intel_context_reset,

+TP_PROTO(struct intel_context *ce),
+TP_ARGS(ce)
+);
+
  DEFINE_EVENT(intel_context, intel_context_register,
 TP_PROTO(struct intel_context *ce),
 TP_ARGS(ce)
@@ -1026,6 +1031,11 @@ trace_i915_request_out(struct i915_request *rq)
  {
  }
  
+static inline void

+trace_intel_context_reset(struct intel_context *ce)
+{
+}
+
  static inline void
  trace_intel_context_register(struct intel_context *ce)
  {




Re: [PATCH 42/51] drm/i915/guc: Implement banned contexts for GuC submission

2021-07-20 Thread John Harrison

On 7/16/2021 13:17, Matthew Brost wrote:

When using GuC submission, if a context gets banned disable scheduling
and mark all inflight requests as complete.

Cc: John Harrison 
Signed-off-by: Matthew Brost 

Reviewed-by: John Harrison 


---
  drivers/gpu/drm/i915/gem/i915_gem_context.c   |   2 +-
  drivers/gpu/drm/i915/gt/intel_context.h   |  13 ++
  drivers/gpu/drm/i915/gt/intel_context_types.h |   2 +
  drivers/gpu/drm/i915/gt/intel_reset.c |  32 +---
  .../gpu/drm/i915/gt/intel_ring_submission.c   |  20 +++
  drivers/gpu/drm/i915/gt/uc/intel_guc.h|   2 +
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 151 --
  drivers/gpu/drm/i915/i915_trace.h |  10 ++
  8 files changed, 195 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 28c62f7ccfc7..d87a4c6da5bc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1084,7 +1084,7 @@ static void kill_engines(struct i915_gem_engines 
*engines, bool ban)
for_each_gem_engine(ce, engines, it) {
struct intel_engine_cs *engine;
  
-		if (ban && intel_context_set_banned(ce))

+   if (ban && intel_context_ban(ce, NULL))
continue;
  
  		/*

diff --git a/drivers/gpu/drm/i915/gt/intel_context.h 
b/drivers/gpu/drm/i915/gt/intel_context.h
index 2ed9bf5f91a5..814d9277096a 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -16,6 +16,7 @@
  #include "intel_engine_types.h"
  #include "intel_ring_types.h"
  #include "intel_timeline_types.h"
+#include "i915_trace.h"
  
  #define CE_TRACE(ce, fmt, ...) do {	\

const struct intel_context *ce__ = (ce);\
@@ -243,6 +244,18 @@ static inline bool intel_context_set_banned(struct 
intel_context *ce)
return test_and_set_bit(CONTEXT_BANNED, &ce->flags);
  }
  
+static inline bool intel_context_ban(struct intel_context *ce,

+struct i915_request *rq)
+{
+   bool ret = intel_context_set_banned(ce);
+
+   trace_intel_context_ban(ce);
+   if (ce->ops->ban)
+   ce->ops->ban(ce, rq);
+
+   return ret;
+}
+
  static inline bool
  intel_context_force_single_submission(const struct intel_context *ce)
  {
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 035108c10b2c..57c19ee3e313 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -35,6 +35,8 @@ struct intel_context_ops {
  
  	int (*alloc)(struct intel_context *ce);
  
+	void (*ban)(struct intel_context *ce, struct i915_request *rq);

+
int (*pre_pin)(struct intel_context *ce, struct i915_gem_ww_ctx *ww, 
void **vaddr);
int (*pin)(struct intel_context *ce, void *vaddr);
void (*unpin)(struct intel_context *ce);
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c 
b/drivers/gpu/drm/i915/gt/intel_reset.c
index f3cdbf4ba5c8..3ed694cab5af 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -22,7 +22,6 @@
  #include "intel_reset.h"
  
  #include "uc/intel_guc.h"

-#include "uc/intel_guc_submission.h"
  
  #define RESET_MAX_RETRIES 3
  
@@ -39,21 +38,6 @@ static void rmw_clear_fw(struct intel_uncore *uncore, i915_reg_t reg, u32 clr)

intel_uncore_rmw_fw(uncore, reg, clr, 0);
  }
  
-static void skip_context(struct i915_request *rq)

-{
-   struct intel_context *hung_ctx = rq->context;
-
-   list_for_each_entry_from_rcu(rq, &hung_ctx->timeline->requests, link) {
-   if (!i915_request_is_active(rq))
-   return;
-
-   if (rq->context == hung_ctx) {
-   i915_request_set_error_once(rq, -EIO);
-   __i915_request_skip(rq);
-   }
-   }
-}
-
  static void client_mark_guilty(struct i915_gem_context *ctx, bool banned)
  {
struct drm_i915_file_private *file_priv = ctx->file_priv;
@@ -88,10 +72,8 @@ static bool mark_guilty(struct i915_request *rq)
bool banned;
int i;
  
-	if (intel_context_is_closed(rq->context)) {

-   intel_context_set_banned(rq->context);
+   if (intel_context_is_closed(rq->context))
return true;
-   }
  
  	rcu_read_lock();

ctx = rcu_dereference(rq->context->gem_context);
@@ -123,11 +105,9 @@ static bool mark_guilty(struct i915_request *rq)
banned = !i915_gem_context_is_recoverable(ctx);
if (time_before(jiffies, prev_hang + CONTEXT_FAST_HANG_JIFFIES))
banned = true;
-   if (banned) {
+   if (banned)
  

Re: [PATCH 47/51] drm/i915/selftest: Increase some timeouts in live_requests

2021-07-20 Thread John Harrison

On 7/16/2021 13:17, Matthew Brost wrote:

Requests may take slightly longer with GuC submission, let's increase
the timeouts in live_requests.

Signed-off-by: Matthew Brost 

Reviewed-by: John Harrison 


---
  drivers/gpu/drm/i915/selftests/i915_request.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c 
b/drivers/gpu/drm/i915/selftests/i915_request.c
index bd5c96a77ba3..d67710d10615 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -1313,7 +1313,7 @@ static int __live_parallel_engine1(void *arg)
i915_request_add(rq);
  
  		err = 0;

-   if (i915_request_wait(rq, 0, HZ / 5) < 0)
+   if (i915_request_wait(rq, 0, HZ) < 0)
err = -ETIME;
i915_request_put(rq);
if (err)
@@ -1419,7 +1419,7 @@ static int __live_parallel_spin(void *arg)
}
igt_spinner_end(&spin);
  
-	if (err == 0 && i915_request_wait(rq, 0, HZ / 5) < 0)

+   if (err == 0 && i915_request_wait(rq, 0, HZ) < 0)
err = -EIO;
i915_request_put(rq);
  




Re: [PATCH 04/18] drm/i915/guc: Implement GuC submission tasklet

2021-07-20 Thread John Harrison

On 7/20/2021 15:39, Matthew Brost wrote:

Implement GuC submission tasklet for new interface. The new GuC
interface uses H2G to submit contexts to the GuC. Since H2G use a single
channel, a single tasklet is used for the submission path.

Also the per engine interrupt handler has been updated to disable the
rescheduling of the physical engine tasklet, when using GuC scheduling,
as the physical engine tasklet is no longer used.

In this patch the field, guc_id, has been added to intel_context and is
not assigned. Patches later in the series will assign this value.

v2:
  (John Harrison)
   - Clean up some comments
v3:
  (John Harrison)
   - More comment cleanups

Cc: John Harrison 
Signed-off-by: Matthew Brost 

Reviewed-by: John Harrison 


---
  drivers/gpu/drm/i915/gt/intel_context_types.h |   9 +
  drivers/gpu/drm/i915/gt/uc/intel_guc.h|   4 +
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 231 +-
  3 files changed, 127 insertions(+), 117 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 90026c177105..6d99631d19b9 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -137,6 +137,15 @@ struct intel_context {
struct intel_sseu sseu;
  
  	u8 wa_bb_page; /* if set, page num reserved for context workarounds */

+
+   /* GuC scheduling state flags that do not require a lock. */
+   atomic_t guc_sched_state_no_lock;
+
+   /*
+* GuC LRC descriptor ID - Not assigned in this patch but future patches
+* in the series will.
+*/
+   u16 guc_id;
  };
  
  #endif /* __INTEL_CONTEXT_TYPES__ */

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 35783558d261..8c7b92f699f1 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -30,6 +30,10 @@ struct intel_guc {
struct intel_guc_log log;
struct intel_guc_ct ct;
  
+	/* Global engine used to submit requests to GuC */

+   struct i915_sched_engine *sched_engine;
+   struct i915_request *stalled_request;
+
/* intel_guc_recv interrupt related state */
spinlock_t irq_lock;
unsigned int msg_enabled_mask;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 23a94a896a0b..ca0717166a27 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -60,6 +60,31 @@
  
  #define GUC_REQUEST_SIZE 64 /* bytes */
  
+/*

+ * Below is a set of functions which control the GuC scheduling state which do
+ * not require a lock as all state transitions are mutually exclusive. i.e. It
+ * is not possible for the context pinning code and submission, for the same
+ * context, to be executing simultaneously. We still need an atomic as it is
+ * possible for some of the bits to changing at the same time though.
+ */
+#define SCHED_STATE_NO_LOCK_ENABLEDBIT(0)
+static inline bool context_enabled(struct intel_context *ce)
+{
+   return (atomic_read(&ce->guc_sched_state_no_lock) &
+   SCHED_STATE_NO_LOCK_ENABLED);
+}
+
+static inline void set_context_enabled(struct intel_context *ce)
+{
+   atomic_or(SCHED_STATE_NO_LOCK_ENABLED, &ce->guc_sched_state_no_lock);
+}
+
+static inline void clr_context_enabled(struct intel_context *ce)
+{
+   atomic_and((u32)~SCHED_STATE_NO_LOCK_ENABLED,
+  &ce->guc_sched_state_no_lock);
+}
+
  static inline struct i915_priolist *to_priolist(struct rb_node *rb)
  {
return rb_entry(rb, struct i915_priolist, node);
@@ -122,37 +147,29 @@ static inline void set_lrc_desc_registered(struct 
intel_guc *guc, u32 id,
xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
  }
  
-static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)

+static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
  {
-   /* Leaving stub as this function will be used in future patches */
-}
+   int err;
+   struct intel_context *ce = rq->context;
+   u32 action[3];
+   int len = 0;
+   bool enabled = context_enabled(ce);
  
-/*

- * When we're doing submissions using regular execlists backend, writing to
- * ELSP from CPU side is enough to make sure that writes to ringbuffer pages
- * pinned in mappable aperture portion of GGTT are visible to command streamer.
- * Writes done by GuC on our behalf are not guaranteeing such ordering,
- * therefore, to ensure the flush, we're issuing a POSTING READ.
- */
-static void flush_ggtt_writes(struct i915_vma *vma)
-{
-   if (i915_vma_is_map_and_fenceable(vma))
-   intel_uncore_posting_read_fw(vma->vm->gt->uncore,
-GUC_STATUS);

Re: [PATCH 06/18] drm/i915/guc: Implement GuC context operations for new inteface

2021-07-20 Thread John Harrison

On 7/20/2021 15:39, Matthew Brost wrote:

Implement GuC context operations which includes GuC specific operations
alloc, pin, unpin, and destroy.

v2:
  (Daniel Vetter)
   - Use msleep_interruptible rather than cond_resched in busy loop
  (Michal)
   - Remove C++ style comment
v3:
  (Matthew Brost)
   - Drop GUC_ID_START
  (John Harrison)
   - Fix a bunch of typos
   - Use drm_err rather than drm_dbg for G2H errors
  (Daniele)
   - Fix ;; typo
   - Clean up sched state functions
   - Add lockdep for guc_id functions
   - Don't call __release_guc_id when guc_id is invalid
   - Use MISSING_CASE
   - Add comment in guc_context_pin
   - Use shorter path to rpm
  (Daniele / CI)
   - Don't call release_guc_id on an invalid guc_id in destroy

Signed-off-by: John Harrison 
Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/intel_context.c   |   5 +
  drivers/gpu/drm/i915/gt/intel_context_types.h |  22 +-
  drivers/gpu/drm/i915/gt/intel_lrc_reg.h   |   1 -
  drivers/gpu/drm/i915/gt/uc/intel_guc.h|  40 ++
  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |   4 +
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 667 --
  drivers/gpu/drm/i915/i915_reg.h   |   1 +
  drivers/gpu/drm/i915/i915_request.c   |   1 +
  8 files changed, 686 insertions(+), 55 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index bd63813c8a80..32fd6647154b 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -384,6 +384,11 @@ intel_context_init(struct intel_context *ce, struct 
intel_engine_cs *engine)
  
  	mutex_init(&ce->pin_mutex);
  
+	spin_lock_init(&ce->guc_state.lock);

+
+   ce->guc_id = GUC_INVALID_LRC_ID;
+   INIT_LIST_HEAD(&ce->guc_id_link);
+
i915_active_init(&ce->active,
 __intel_context_active, __intel_context_retire, 0);
  }
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 6d99631d19b9..606c480aec26 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -96,6 +96,7 @@ struct intel_context {
  #define CONTEXT_BANNED6
  #define CONTEXT_FORCE_SINGLE_SUBMISSION   7
  #define CONTEXT_NOPREEMPT 8
+#define CONTEXT_LRCA_DIRTY 9
  
  	struct {

u64 timeout_us;
@@ -138,14 +139,29 @@ struct intel_context {
  
  	u8 wa_bb_page; /* if set, page num reserved for context workarounds */
  
+	struct {

+   /** lock: protects everything in guc_state */
+   spinlock_t lock;
+   /**
+* sched_state: scheduling state of this context using GuC
+* submission
+*/
+   u8 sched_state;
+   } guc_state;
+
/* GuC scheduling state flags that do not require a lock. */
atomic_t guc_sched_state_no_lock;
  
+	/* GuC LRC descriptor ID */

+   u16 guc_id;
+
+   /* GuC LRC descriptor reference count */
+   atomic_t guc_id_ref;
+
/*
-* GuC LRC descriptor ID - Not assigned in this patch but future patches
-* in the series will.
+* GuC ID link - in list when unpinned but guc_id still valid in GuC
 */
-   u16 guc_id;
+   struct list_head guc_id_link;
  };
  
  #endif /* __INTEL_CONTEXT_TYPES__ */

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h 
b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
index 41e5350a7a05..49d4857ad9b7 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
+++ b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
@@ -87,7 +87,6 @@
  #define GEN11_CSB_WRITE_PTR_MASK  (GEN11_CSB_PTR_MASK << 0)
  
  #define MAX_CONTEXT_HW_ID	(1 << 21) /* exclusive */

-#define MAX_GUC_CONTEXT_HW_ID  (1 << 20) /* exclusive */
  #define GEN11_MAX_CONTEXT_HW_ID   (1 << 11) /* exclusive */
  /* in Gen12 ID 0x7FF is reserved to indicate idle */
  #define GEN12_MAX_CONTEXT_HW_ID   (GEN11_MAX_CONTEXT_HW_ID - 1)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 8c7b92f699f1..30773cd699f5 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -7,6 +7,7 @@
  #define _INTEL_GUC_H_
  
  #include 

+#include 
  
  #include "intel_uncore.h"

  #include "intel_guc_fw.h"
@@ -44,6 +45,14 @@ struct intel_guc {
void (*disable)(struct intel_guc *guc);
} interrupts;
  
+	/*

+* contexts_lock protects the pool of free guc ids and a linked list of
+* guc ids available to be stolen
+*/
+   spinlock_t contexts_lock;
+   struct ida guc_ids;
+   struct list_head guc_id_list;
+
bool submission_selected;
  
  	struct i915_vma *ads_vma;

@@ -101,6 +11

Re: [Intel-gfx] [PATCH 08/33] drm/i915/guc: Reset implementation for new GuC interface

2021-07-26 Thread John Harrison

On 7/22/2021 16:54, Matthew Brost wrote:

Reset implementation for new GuC interface. This is the legacy reset
implementation which is called when the i915 owns the engine hang check.
Future patches will offload the engine hang check to GuC but we will
continue to maintain this legacy path as a fallback and this code path
is also required if the GuC dies.

With the new GuC interface it is not possible to reset individual
engines - it is only possible to reset the GPU entirely. This patch
forces an entire chip reset if any engine hangs.

v2:
  (Michal)
   - Check for -EPIPE rather than -EIO (CT deadlock/corrupt check)
v3:
  (John H)
   - Split into a series of smaller patches
v4:
  (John H)
   - Fix typo
   - Add braces around if statements in reset code

Cc: John Harrison 
Signed-off-by: Matthew Brost 

Reviewed-by: John Harrison 


---
  drivers/gpu/drm/i915/gt/intel_gt_pm.c |   6 +-
  drivers/gpu/drm/i915/gt/intel_reset.c |  18 +-
  drivers/gpu/drm/i915/gt/uc/intel_guc.c|  13 -
  drivers/gpu/drm/i915/gt/uc/intel_guc.h|   8 +-
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 561 ++
  drivers/gpu/drm/i915/gt/uc/intel_uc.c |  39 +-
  drivers/gpu/drm/i915/gt/uc/intel_uc.h |   3 +
  7 files changed, 516 insertions(+), 132 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c 
b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
index d86825437516..cd7b96005d29 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
@@ -170,8 +170,6 @@ static void gt_sanitize(struct intel_gt *gt, bool force)
if (intel_gt_is_wedged(gt))
intel_gt_unset_wedged(gt);
  
-	intel_uc_sanitize(>->uc);

-
for_each_engine(engine, gt, id)
if (engine->reset.prepare)
engine->reset.prepare(engine);
@@ -187,6 +185,8 @@ static void gt_sanitize(struct intel_gt *gt, bool force)
__intel_engine_reset(engine, false);
}
  
+	intel_uc_reset(>->uc, false);

+
for_each_engine(engine, gt, id)
if (engine->reset.finish)
engine->reset.finish(engine);
@@ -239,6 +239,8 @@ int intel_gt_resume(struct intel_gt *gt)
goto err_wedged;
}
  
+	intel_uc_reset_finish(>->uc);

+
intel_rps_enable(>->rps);
intel_llc_enable(>->llc);
  
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c

index 72251638d4ea..2987282dff6d 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -826,6 +826,8 @@ static int gt_reset(struct intel_gt *gt, 
intel_engine_mask_t stalled_mask)
__intel_engine_reset(engine, stalled_mask & engine->mask);
local_bh_enable();
  
+	intel_uc_reset(>->uc, true);

+
intel_ggtt_restore_fences(gt->ggtt);
  
  	return err;

@@ -850,6 +852,8 @@ static void reset_finish(struct intel_gt *gt, 
intel_engine_mask_t awake)
if (awake & engine->mask)
intel_engine_pm_put(engine);
}
+
+   intel_uc_reset_finish(>->uc);
  }
  
  static void nop_submit_request(struct i915_request *request)

@@ -903,6 +907,7 @@ static void __intel_gt_set_wedged(struct intel_gt *gt)
for_each_engine(engine, gt, id)
if (engine->reset.cancel)
engine->reset.cancel(engine);
+   intel_uc_cancel_requests(>->uc);
local_bh_enable();
  
  	reset_finish(gt, awake);

@@ -1191,6 +1196,9 @@ int __intel_engine_reset_bh(struct intel_engine_cs 
*engine, const char *msg)
ENGINE_TRACE(engine, "flags=%lx\n", gt->reset.flags);
GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, >->reset.flags));
  
+	if (intel_engine_uses_guc(engine))

+   return -ENODEV;
+
if (!intel_engine_pm_get_if_awake(engine))
return 0;
  
@@ -1201,13 +1209,10 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg)

   "Resetting %s for %s\n", engine->name, msg);

atomic_inc(&engine->i915->gpu_error.reset_engine_count[engine->uabi_class]);
  
-	if (intel_engine_uses_guc(engine))

-   ret = intel_guc_reset_engine(&engine->gt->uc.guc, engine);
-   else
-   ret = intel_gt_reset_engine(engine);
+   ret = intel_gt_reset_engine(engine);
if (ret) {
/* If we fail here, we expect to fallback to a global reset */
-   ENGINE_TRACE(engine, "Failed to reset, err: %d\n", ret);
+   ENGINE_TRACE(engine, "Failed to reset %s, err: %d\n", 
engine->name, ret);
goto out;
}
  
@@ -1341,7 +1346,8 @@ void intel_gt_handle_error(struct intel_gt *gt,

 * Try engine reset when

Re: [PATCH 0/2] Add support for querying hw info that UMDs need

2021-07-27 Thread John Harrison

On 7/27/2021 02:49, Daniel Vetter wrote:

On Mon, Jul 26, 2021 at 07:21:43PM -0700, john.c.harri...@intel.com wrote:

From: John Harrison 

Various UMDs require hardware configuration information about the
current platform. A bunch of static information is available in a
fixed table that can be retrieved from the GuC.

Test-with: 20210727002812.43469-2-john.c.harri...@intel.com
UMD: https://github.com/intel/compute-runtime/pull/432/files

Signed-off-by: John Harrison 

Can you pls submit this with all the usual suspect from the umd side (so
also media-driver and mesa) cced?

Do you have a list of names that you would like included?



Also do the mesa/media-driver patches exist somewhere? Afaiui this isn't
very useful without those bits in place too.
I don't know about mesa but the media team have the support in place in 
their internal tree and (as per compute) are waiting for us to push the 
kernel side. This also comes under the headings of both new platforms 
and platforms which are POR for GuC submission. So I believe a lot of 
the UMD side changes for the config table are wrapped up in their 
support for the new platforms/GuC as a whole and thus not yet ready for 
upstream.


John.



-Daniel



John Harrison (1):
   drm/i915/guc: Add fetch of hwconfig table

Rodrigo Vivi (1):
   drm/i915/uapi: Add query for hwconfig table

  drivers/gpu/drm/i915/Makefile |   1 +
  .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |   1 +
  .../gpu/drm/i915/gt/uc/abi/guc_errors_abi.h   |   4 +
  drivers/gpu/drm/i915/gt/uc/intel_guc.c|   3 +-
  drivers/gpu/drm/i915/gt/uc/intel_guc.h|   2 +
  .../gpu/drm/i915/gt/uc/intel_guc_hwconfig.c   | 156 ++
  .../gpu/drm/i915/gt/uc/intel_guc_hwconfig.h   |  19 +++
  drivers/gpu/drm/i915/gt/uc/intel_uc.c |   6 +
  drivers/gpu/drm/i915/i915_query.c |  23 +++
  include/uapi/drm/i915_drm.h   |   1 +
  10 files changed, 215 insertions(+), 1 deletion(-)
  create mode 100644 drivers/gpu/drm/i915/gt/uc/intel_guc_hwconfig.c
  create mode 100644 drivers/gpu/drm/i915/gt/uc/intel_guc_hwconfig.h

--
2.25.1





Re: [PATCH 29/33] drm/i915/selftest: Increase some timeouts in live_requests

2021-07-27 Thread John Harrison

On 7/26/2021 17:23, Matthew Brost wrote:

Requests may take slightly longer with GuC submission, let's increase
the timeouts in live_requests.

Signed-off-by: Matthew Brost 

Was already reviewed in previous series. Repeating here for patchwork:
Reviewed-by: John Harrison 


---
  drivers/gpu/drm/i915/selftests/i915_request.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c 
b/drivers/gpu/drm/i915/selftests/i915_request.c
index bd5c96a77ba3..d67710d10615 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -1313,7 +1313,7 @@ static int __live_parallel_engine1(void *arg)
i915_request_add(rq);
  
  		err = 0;

-   if (i915_request_wait(rq, 0, HZ / 5) < 0)
+   if (i915_request_wait(rq, 0, HZ) < 0)
err = -ETIME;
i915_request_put(rq);
if (err)
@@ -1419,7 +1419,7 @@ static int __live_parallel_spin(void *arg)
}
igt_spinner_end(&spin);
  
-	if (err == 0 && i915_request_wait(rq, 0, HZ / 5) < 0)

+   if (err == 0 && i915_request_wait(rq, 0, HZ) < 0)
err = -EIO;
i915_request_put(rq);
  




Re: [Intel-gfx] [PATCH 1/1] drm/i915: Check if engine has heartbeat when closing a context

2021-07-29 Thread John Harrison

On 7/28/2021 17:34, Matthew Brost wrote:

If an engine associated with a context does not have a heartbeat, ban it
immediately. This is needed for GuC submission as a idle pulse doesn't
kick the context off the hardware where it then can check for a
heartbeat and ban the context.
It's worse than this. If the engine in question is an individual 
physical engine then sending a pulse (with sufficiently high priority) 
will pre-empt the engine and kick the context off. However, the GuC 
scheduler does not have hacks in it to check the state of the heartbeat 
or whether a context is actually a zombie or not. Thus, the context will 
get resubmitted to the hardware after the pulse completes and 
effectively nothing will have happened.


I would assume that the DRM scheduler which we are meant to be switching 
to for execlist as well as GuC submission is also unlikely to have hacks 
for zombie contexts and tests for whether the i915 specific heartbeat 
has been disabled since the context became a zombie. So when that switch 
happens, this test will also fail in execlist mode as well as GuC mode.


The choices I see here are to simply remove persistence completely (it 
is a basically a bug that became UAPI because it wasn't caught soon 
enough!) or to implement it in a way that does not require hacks in the 
back end scheduler. Apparently, the DRM scheduler is expected to allow 
zombie contexts to persist until the DRM file handle is closed. So 
presumably we will have to go with option two.


That means flagging a context as being a zombie when it is closed but 
still active. The driver would then add it to a zombie list owned by the 
DRM client object. When that client object is closed, i915 would go 
through the list and genuinely kill all the contexts. No back end 
scheduler hacks required and no intimate knowledge of the i915 heartbeat 
mechanism required either.


John.




This patch also updates intel_engine_has_heartbeat to be a vfunc as we
now need to call this function on execlists virtual engines too.

Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gem/i915_gem_context.c   |  5 +++--
  drivers/gpu/drm/i915/gt/intel_context_types.h |  2 ++
  drivers/gpu/drm/i915/gt/intel_engine.h| 21 ++-
  .../drm/i915/gt/intel_execlists_submission.c  | 14 +
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  6 +-
  .../gpu/drm/i915/gt/uc/intel_guc_submission.h |  2 --
  6 files changed, 26 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 9c3672bac0e2..b8e01c5ba9e5 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1090,8 +1090,9 @@ static void kill_engines(struct i915_gem_engines 
*engines, bool ban)
 */
for_each_gem_engine(ce, engines, it) {
struct intel_engine_cs *engine;
+   bool local_ban = ban || !intel_engine_has_heartbeat(ce->engine);
  
-		if (ban && intel_context_ban(ce, NULL))

+   if (local_ban && intel_context_ban(ce, NULL))
continue;
  
  		/*

@@ -1104,7 +1105,7 @@ static void kill_engines(struct i915_gem_engines 
*engines, bool ban)
engine = active_engine(ce);
  
  		/* First attempt to gracefully cancel the context */

-   if (engine && !__cancel_engine(engine) && ban)
+   if (engine && !__cancel_engine(engine) && local_ban)
/*
 * If we are unable to send a preemptive pulse to bump
 * the context from the GPU, we have to resort to a full
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index e54351a170e2..65f2eb2a78e4 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -55,6 +55,8 @@ struct intel_context_ops {
void (*reset)(struct intel_context *ce);
void (*destroy)(struct kref *kref);
  
+	bool (*has_heartbeat)(const struct intel_engine_cs *engine);

+
/* virtual engine/context interface */
struct intel_context *(*create_virtual)(struct intel_engine_cs **engine,
unsigned int count);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h 
b/drivers/gpu/drm/i915/gt/intel_engine.h
index c2a5640ae055..1b11a808acc4 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -283,28 +283,11 @@ struct intel_context *
  intel_engine_create_virtual(struct intel_engine_cs **siblings,
unsigned int count);
  
-static inline bool

-intel_virtual_engine_has_heartbeat(const struct intel_engine_cs *engine)
-{
-   /*
-* For non-GuC submission we expect the back-end to look at the
-* heartbeat status of the actual physical engine that the work
-

Re: [Intel-gfx] [PATCH 1/1] drm/i915: Check if engine has heartbeat when closing a context

2021-07-30 Thread John Harrison

On 7/30/2021 02:49, Tvrtko Ursulin wrote:

On 30/07/2021 01:13, John Harrison wrote:

On 7/28/2021 17:34, Matthew Brost wrote:
If an engine associated with a context does not have a heartbeat, 
ban it

immediately. This is needed for GuC submission as a idle pulse doesn't
kick the context off the hardware where it then can check for a
heartbeat and ban the context.


Pulse, that is a request with I915_PRIORITY_BARRIER, does not preempt 
a running normal priority context?


Why does it matter then whether or not heartbeats are enabled - when 
heartbeat just ends up sending the same engine pulse (eventually, with 
raising priority)?
The point is that the pulse is pointless. See the rest of my comments 
below, specifically "the context will get resubmitted to the hardware 
after the pulse completes". To re-iterate...


Yes, it preempts the context. Yes, it does so whether heartbeats are 
enabled or not. But so what? Who cares? You have preempted a context. It 
is no longer running on the hardware. BUT IT IS STILL A VALID CONTEXT. 
The backend scheduler will just resubmit it to the hardware as soon as 
the pulse completes. The only reason this works at all is because of the 
horrid hack in the execlist scheduler's back end implementation (in 
__execlists_schedule_in):

    if (unlikely(intel_context_is_closed(ce) &&
 !intel_engine_has_heartbeat(engine)))
    intel_context_set_banned(ce);

The actual back end scheduler is saying "Is this a zombie context? Is 
the heartbeat disabled? Then ban it". No other scheduler backend is 
going to have knowledge of zombie context status or of the heartbeat 
status. Nor are they going to call back into the higher levels of the 
i915 driver to trigger a ban operation. Certainly a hardware implemented 
scheduler is not going to be looking at private i915 driver information 
to decide whether to submit a context or whether to tell the OS to kill 
it off instead.


For persistence to work with a hardware scheduler (or a non-Intel 
specific scheduler such as the DRM one), the handling of zombie 
contexts, banning, etc. *must* be done entirely in the front end. It 
cannot rely on any backend hacks. That means you can't rely on any fancy 
behaviour of pulses.


If you want to ban a context then you must explicitly ban that context. 
If you want to ban it at some later point then you need to track it at 
the top level as a zombie and then explicitly ban that zombie at 
whatever later point.





It's worse than this. If the engine in question is an individual 
physical engine then sending a pulse (with sufficiently high 
priority) will pre-empt the engine and kick the context off. However, 
the GuC 


Why it is different for physical vs virtual, aren't both just 
schedulable contexts with different engine masks for what GuC is 
concerned? Oh, is it a matter of needing to send pulses to all engines 
which comprise a virtual one?
It isn't different. It is totally broken for both. It is potentially 
more broken for virtual engines because of the question of which engine 
to pulse. But as stated above, the pulse is pointless anyway so the 
which engine question doesn't even matter.


John.




scheduler does not have hacks in it to check the state of the 
heartbeat or whether a context is actually a zombie or not. Thus, the 
context will get resubmitted to the hardware after the pulse 
completes and effectively nothing will have happened.


I would assume that the DRM scheduler which we are meant to be 
switching to for execlist as well as GuC submission is also unlikely 
to have hacks for zombie contexts and tests for whether the i915 
specific heartbeat has been disabled since the context became a 
zombie. So when that switch happens, this test will also fail in 
execlist mode as well as GuC mode.


The choices I see here are to simply remove persistence completely 
(it is a basically a bug that became UAPI because it wasn't caught 
soon enough!) or to implement it in a way that does not require hacks 
in the back end scheduler. Apparently, the DRM scheduler is expected 
to allow zombie contexts to persist until the DRM file handle is 
closed. So presumably we will have to go with option two.


That means flagging a context as being a zombie when it is closed but 
still active. The driver would then add it to a zombie list owned by 
the DRM client object. When that client object is closed, i915 would 
go through the list and genuinely kill all the contexts. No back end 
scheduler hacks required and no intimate knowledge of the i915 
heartbeat mechanism required either.


John.




This patch also updates intel_engine_has_heartbeat to be a vfunc as we
now need to call this function on execlists virtual engines too.

Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gem/i915_gem_context.c   |  5 +++--
  drivers/gpu/drm/i915/gt/intel_context_types.h |  2 ++
  drivers/gpu/dr

Re: [Intel-gfx] [PATCH 1/1] drm/i915: Check if engine has heartbeat when closing a context

2021-08-06 Thread John Harrison

On 8/2/2021 02:40, Tvrtko Ursulin wrote:

On 30/07/2021 19:13, John Harrison wrote:

On 7/30/2021 02:49, Tvrtko Ursulin wrote:

On 30/07/2021 01:13, John Harrison wrote:

On 7/28/2021 17:34, Matthew Brost wrote:
If an engine associated with a context does not have a heartbeat, 
ban it
immediately. This is needed for GuC submission as a idle pulse 
doesn't

kick the context off the hardware where it then can check for a
heartbeat and ban the context.


Pulse, that is a request with I915_PRIORITY_BARRIER, does not 
preempt a running normal priority context?


Why does it matter then whether or not heartbeats are enabled - when 
heartbeat just ends up sending the same engine pulse (eventually, 
with raising priority)?
The point is that the pulse is pointless. See the rest of my comments 
below, specifically "the context will get resubmitted to the hardware 
after the pulse completes". To re-iterate...


Yes, it preempts the context. Yes, it does so whether heartbeats are 
enabled or not. But so what? Who cares? You have preempted a context. 
It is no longer running on the hardware. BUT IT IS STILL A VALID 
CONTEXT. 


It is valid yes, and it even may be the current ABI so another 
question is whether it is okay to change that.


The backend scheduler will just resubmit it to the hardware as soon 
as the pulse completes. The only reason this works at all is because 
of the horrid hack in the execlist scheduler's back end 
implementation (in __execlists_schedule_in):

 if (unlikely(intel_context_is_closed(ce) &&
  !intel_engine_has_heartbeat(engine)))
 intel_context_set_banned(ce);


Right, is the above code then needed with this patch - when ban is 
immediately applied on the higher level?


The actual back end scheduler is saying "Is this a zombie context? Is 
the heartbeat disabled? Then ban it". No other scheduler backend is 
going to have knowledge of zombie context status or of the heartbeat 
status. Nor are they going to call back into the higher levels of the 
i915 driver to trigger a ban operation. Certainly a hardware 
implemented scheduler is not going to be looking at private i915 
driver information to decide whether to submit a context or whether 
to tell the OS to kill it off instead.


For persistence to work with a hardware scheduler (or a non-Intel 
specific scheduler such as the DRM one), the handling of zombie 
contexts, banning, etc. *must* be done entirely in the front end. It 
cannot rely on any backend hacks. That means you can't rely on any 
fancy behaviour of pulses.


If you want to ban a context then you must explicitly ban that 
context. If you want to ban it at some later point then you need to 
track it at the top level as a zombie and then explicitly ban that 
zombie at whatever later point.


I am still trying to understand it all. If I go by the commit message:

"""
This is needed for GuC submission as a idle pulse doesn't
kick the context off the hardware where it then can check for a
heartbeat and ban the context.
"""

That did not explain things for me. Sentence does not appear to make 
sense. Now, it seems "kick off the hardware" is meant as revoke and 
not just preempt. Which is fine, perhaps just needs to be written more 
explicitly. But the part of checking for heartbeat after idle pulse 
does not compute for me. It is the heartbeat which emits idle pulses, 
not idle pulse emitting heartbeats.
I am in agreement that the commit message is confusing and does not 
explain either the problem or the solution.






But anyway, I can buy the handling at the front end story completely. 
It makes sense. We just need to agree that a) it is okay to change the 
ABI and b) remove the backend check from execlists if it is not needed 
any longer.


And if ABI change is okay then commit message needs to talk about it 
loudly and clearly.
I don't think we have a choice. The current ABI is not and cannot ever 
be compatible with any scheduler external to i915. It cannot be 
implemented with a hardware scheduler such as the GuC and it cannot be 
implemented with an external software scheduler such as the DRM one.


My view is that any implementation involving knowledge of the heartbeat 
is fundamentally broken.


According to Daniel Vetter, the DRM ABI on this subject is that an 
actively executing context should persist until the DRM file handle is 
closed. That seems like a much more plausible and simple ABI than one 
that says 'if the heartbeat is running then a context will persist 
forever, if the heartbeat is not running then it will be killed 
immediately, if the heart was running but then stops running then the 
context will be killed on the next context switch, ...'. And if I 
understand it correctly, the current ABI allows a badly written user app 
to cause a denial of service by leaving contexts permanently running an 
infinit

Re: [Intel-gfx] [PATCH] drm/i915: Fix syncmap memory leak

2021-08-06 Thread John Harrison

On 7/30/2021 12:53, Matthew Brost wrote:

A small race exists between intel_gt_retire_requests_timeout and
intel_timeline_exit which could result in the syncmap not getting
free'd. Rather than work to hard to seal this race, simply cleanup the

free'd -> freed


syncmap on fini.

unreferenced object 0x88813bc53b18 (size 96):
   comm "gem_close_race", pid 5410, jiffies 4294917818 (age 1105.600s)
   hex dump (first 32 bytes):
 01 00 00 00 00 00 00 00 00 00 00 00 0a 00 00 00  
 00 00 00 00 00 00 00 00 6b 6b 6b 6b 06 00 00 00  
   backtrace:
 [<120b863a>] __sync_alloc_leaf+0x1e/0x40 [i915]
 [<042f6959>] __sync_set+0x1bb/0x240 [i915]
 [<90f0e90f>] i915_request_await_dma_fence+0x1c7/0x400 [i915]
 [<56a48219>] i915_request_await_object+0x222/0x360 [i915]
 [] i915_gem_do_execbuffer+0x1bd0/0x2250 [i915]
 [<3c9d830f>] i915_gem_execbuffer2_ioctl+0x405/0xce0 [i915]
 [] drm_ioctl_kernel+0xb0/0xf0 [drm]
 [] drm_ioctl+0x305/0x3c0 [drm]
 [<8b0d8986>] __x64_sys_ioctl+0x71/0xb0
 [<76c362a4>] do_syscall_64+0x33/0x80
 [] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Signed-off-by: Matthew Brost 
Fixes: 531958f6f357 ("drm/i915/gt: Track timeline activeness in enter/exit")
Cc: 
---
  drivers/gpu/drm/i915/gt/intel_timeline.c | 9 +
  1 file changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c 
b/drivers/gpu/drm/i915/gt/intel_timeline.c
index c4a126c8caef..1257f4f11e66 100644
--- a/drivers/gpu/drm/i915/gt/intel_timeline.c
+++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
@@ -127,6 +127,15 @@ static void intel_timeline_fini(struct rcu_head *rcu)
  
  	i915_vma_put(timeline->hwsp_ggtt);

i915_active_fini(&timeline->active);
+
+   /*
+* A small race exists between intel_gt_retire_requests_timeout and
+* intel_timeline_exit which could result in the syncmap not getting
+* free'd. Rather than work to hard to seal this race, simply cleanup
+* the syncmap on fini.
What is the race? I'm going round in circles just trying to work out how 
intel_gt_retire_requests_timeout is supposed to get to 
intel_timeline_exit in the first place.


Also, free'd -> freed.

John.



+*/
+   i915_syncmap_free(&timeline->sync);
+
kfree(timeline);
  }
  




Re: [Intel-gfx] [PATCH] drm/i915: Disable bonding on gen12+ platforms

2021-08-06 Thread John Harrison

On 7/28/2021 12:21, Matthew Brost wrote:

Disable bonding on gen12+ platforms aside from ones already supported by
the i915 - TGL, RKL, and ADL-S.

Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gem/i915_gem_context.c | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 05c3ee191710..9c3672bac0e2 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -446,6 +446,13 @@ set_proto_ctx_engines_bond(struct i915_user_extension 
__user *base, void *data)
u16 idx, num_bonds;
int err, n;
  
+	if (GRAPHICS_VER(i915) >= 12 && !IS_TIGERLAKE(i915) &&

+   !IS_ROCKETLAKE(i915) && !IS_ALDERLAKE_S(i915)) {
+   drm_dbg(&i915->drm,
+   "Bonding on gen12+ aside from TGL, RKL, and ADL_S not 
allowed\n");

I would have said not supported rather than not allowed. Either way:
Reviewed-by: John Harrison 


+   return -ENODEV;
+   }
+
if (get_user(idx, &ext->virtual_index))
return -EFAULT;
  




Re: [Intel-gfx] [PATCH 2/4] drm/i915/guc: put all guc objects in lmem when available

2021-08-06 Thread John Harrison

On 8/2/2021 22:11, Matthew Brost wrote:

From: Daniele Ceraolo Spurio 

The firmware binary has to be loaded from lmem and the recommendation is
to put all other objects in there as well. Note that we don't fall back
to system memory if the allocation in lmem fails because all objects are
allocated during driver load and if we have issues with lmem at that point
something is seriously wrong with the system, so no point in trying to
handle it.

Cc: Matthew Auld 
Cc: Abdiel Janulgue 
Cc: Michal Wajdeczko 
Cc: Vinay Belgaumkar 
Cc: Radoslaw Szwichtenberg 
Signed-off-by: Daniele Ceraolo Spurio 
Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gem/i915_gem_lmem.c  | 26 
  drivers/gpu/drm/i915/gem/i915_gem_lmem.h  |  4 ++
  drivers/gpu/drm/i915/gt/uc/intel_guc.c|  9 ++-
  drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c | 11 +++-
  drivers/gpu/drm/i915/gt/uc/intel_huc.c| 14 -
  drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c  | 75 +--
  6 files changed, 127 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c 
b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
index eb345305dc52..034226c5d4d0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
@@ -103,6 +103,32 @@ __i915_gem_object_create_lmem_with_ps(struct 
drm_i915_private *i915,
 size, page_size, flags);
  }
  
+struct drm_i915_gem_object *

+i915_gem_object_create_lmem_from_data(struct drm_i915_private *i915,
+ const void *data, size_t size)
+{
+   struct drm_i915_gem_object *obj;
+   void *map;
+
+   obj = i915_gem_object_create_lmem(i915,
+ round_up(size, PAGE_SIZE),
+ I915_BO_ALLOC_CONTIGUOUS);
+   if (IS_ERR(obj))
+   return obj;
+
+   map = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WC);
+   if (IS_ERR(map)) {
+   i915_gem_object_put(obj);
+   return map;
+   }
+
+   memcpy(map, data, size);
+
+   i915_gem_object_unpin_map(obj);
+
+   return obj;
+}
+
  struct drm_i915_gem_object *
  i915_gem_object_create_lmem(struct drm_i915_private *i915,
resource_size_t size,
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.h 
b/drivers/gpu/drm/i915/gem/i915_gem_lmem.h
index 4ee81fc66302..1b88ea13435c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.h
@@ -23,6 +23,10 @@ bool i915_gem_object_is_lmem(struct drm_i915_gem_object 
*obj);
  
  bool __i915_gem_object_is_lmem(struct drm_i915_gem_object *obj);
  
+struct drm_i915_gem_object *

+i915_gem_object_create_lmem_from_data(struct drm_i915_private *i915,
+ const void *data, size_t size);
+
  struct drm_i915_gem_object *
  __i915_gem_object_create_lmem_with_ps(struct drm_i915_private *i915,
  resource_size_t size,
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index 979128e28372..55160d3e401a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -3,6 +3,7 @@
   * Copyright © 2014-2019 Intel Corporation
   */
  
+#include "gem/i915_gem_lmem.h"

  #include "gt/intel_gt.h"
  #include "gt/intel_gt_irq.h"
  #include "gt/intel_gt_pm_irq.h"
@@ -630,7 +631,13 @@ struct i915_vma *intel_guc_allocate_vma(struct intel_guc 
*guc, u32 size)
u64 flags;
int ret;
  
-	obj = i915_gem_object_create_shmem(gt->i915, size);

+   if (HAS_LMEM(gt->i915))
+   obj = i915_gem_object_create_lmem(gt->i915, size,
+ I915_BO_ALLOC_CPU_CLEAR |
+ I915_BO_ALLOC_CONTIGUOUS);
+   else
+   obj = i915_gem_object_create_shmem(gt->i915, size);
+
if (IS_ERR(obj))
return ERR_CAST(obj);
  
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c

index 76fe766ad1bc..962be0c12208 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c
@@ -41,7 +41,7 @@ static void guc_prepare_xfer(struct intel_uncore *uncore)
  }
  
  /* Copy RSA signature from the fw image to HW for verification */

-static void guc_xfer_rsa(struct intel_uc_fw *guc_fw,
+static int guc_xfer_rsa(struct intel_uc_fw *guc_fw,
 struct intel_uncore *uncore)
  {
u32 rsa[UOS_RSA_SCRATCH_COUNT];
@@ -49,10 +49,13 @@ static void guc_xfer_rsa(struct intel_uc_fw *guc_fw,
int i;
  
  	copied = intel_uc_fw_copy_rsa(guc_fw, rsa, sizeof(rsa));

-   GEM_BUG_ON(copied < sizeof(rsa));
+   if (copied < sizeof(rsa))
+   return -ENOMEM;
  
  	for (i = 0; i < UOS_RSA_SCRATCH_COUNT; i++)

intel_uncore_write(u

Re: [Intel-gfx] [PATCH 3/4] drm/i915/guc: Add DG1 GuC / HuC firmware defs

2021-08-06 Thread John Harrison

On 8/2/2021 22:11, Matthew Brost wrote:

Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c 
b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
index f8cb00ffb506..a685d563df72 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
@@ -51,6 +51,7 @@ void intel_uc_fw_change_status(struct intel_uc_fw *uc_fw,
  #define INTEL_UC_FIRMWARE_DEFS(fw_def, guc_def, huc_def) \
fw_def(ALDERLAKE_P, 0, guc_def(adlp, 62, 0, 3), huc_def(tgl, 7, 9, 3)) \
fw_def(ALDERLAKE_S, 0, guc_def(tgl, 62, 0, 0), huc_def(tgl,  7, 9, 3)) \
+   fw_def(DG1, 0, guc_def(dg1, 62, 0, 0), huc_def(dg1,  7, 9, 3)) \
fw_def(ROCKETLAKE,  0, guc_def(tgl, 62, 0, 0), huc_def(tgl,  7, 9, 3)) \
fw_def(TIGERLAKE,   0, guc_def(tgl, 62, 0, 0), huc_def(tgl,  7, 9, 3)) \
fw_def(JASPERLAKE,  0, guc_def(ehl, 62, 0, 0), huc_def(ehl,  9, 0, 0)) \


Reviewed-by: John Harrison 



Re: [Intel-gfx] [PATCH 4/4] drm/i915/guc: Enable GuC submission by default on DG1

2021-08-06 Thread John Harrison

On 8/2/2021 22:11, Matthew Brost wrote:

Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/uc/intel_uc.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c 
b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index da57d18d9f6b..fc2fc8d111d8 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -35,7 +35,7 @@ static void uc_expand_default_options(struct intel_uc *uc)
}
  
  	/* Intermediate platforms are HuC authentication only */

-   if (IS_DG1(i915) || IS_ALDERLAKE_S(i915)) {
+   if (IS_ALDERLAKE_S(i915)) {
i915->params.enable_guc = ENABLE_GUC_LOAD_HUC;
return;
}


Reviewed-by: John Harrison 



Re: [Intel-gfx] [PATCH] drm/i915: Fix syncmap memory leak

2021-08-06 Thread John Harrison

On 8/6/2021 11:29, Matthew Brost wrote:

On Fri, Aug 06, 2021 at 11:23:06AM -0700, John Harrison wrote:

On 7/30/2021 12:53, Matthew Brost wrote:

A small race exists between intel_gt_retire_requests_timeout and
intel_timeline_exit which could result in the syncmap not getting
free'd. Rather than work to hard to seal this race, simply cleanup the

free'd -> freed


Sure.


syncmap on fini.

unreferenced object 0x88813bc53b18 (size 96):
comm "gem_close_race", pid 5410, jiffies 4294917818 (age 1105.600s)
hex dump (first 32 bytes):
  01 00 00 00 00 00 00 00 00 00 00 00 0a 00 00 00  
  00 00 00 00 00 00 00 00 6b 6b 6b 6b 06 00 00 00  
backtrace:
  [<120b863a>] __sync_alloc_leaf+0x1e/0x40 [i915]
  [<042f6959>] __sync_set+0x1bb/0x240 [i915]
  [<90f0e90f>] i915_request_await_dma_fence+0x1c7/0x400 [i915]
  [<56a48219>] i915_request_await_object+0x222/0x360 [i915]
  [<aaac4ee3>] i915_gem_do_execbuffer+0x1bd0/0x2250 [i915]
  [<3c9d830f>] i915_gem_execbuffer2_ioctl+0x405/0xce0 [i915]
  [<fd7a8e68>] drm_ioctl_kernel+0xb0/0xf0 [drm]
  [<e721ee87>] drm_ioctl+0x305/0x3c0 [drm]
  [<8b0d8986>] __x64_sys_ioctl+0x71/0xb0
  [<76c362a4>] do_syscall_64+0x33/0x80
  [<eb7a4831>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Signed-off-by: Matthew Brost 
Fixes: 531958f6f357 ("drm/i915/gt: Track timeline activeness in enter/exit")
Cc: 
---
   drivers/gpu/drm/i915/gt/intel_timeline.c | 9 +
   1 file changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c 
b/drivers/gpu/drm/i915/gt/intel_timeline.c
index c4a126c8caef..1257f4f11e66 100644
--- a/drivers/gpu/drm/i915/gt/intel_timeline.c
+++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
@@ -127,6 +127,15 @@ static void intel_timeline_fini(struct rcu_head *rcu)
i915_vma_put(timeline->hwsp_ggtt);
i915_active_fini(&timeline->active);
+
+   /*
+* A small race exists between intel_gt_retire_requests_timeout and
+* intel_timeline_exit which could result in the syncmap not getting
+* free'd. Rather than work to hard to seal this race, simply cleanup
+* the syncmap on fini.

What is the race? I'm going round in circles just trying to work out how
intel_gt_retire_requests_timeout is supposed to get to intel_timeline_exit
in the first place.


intel_gt_retire_requests_timeout increments tl->active_count, active_count == 2
intel_timeline_exit is called, returns on atomic_add_unless, active_count == 1
intel_gt_retire_requests_timeout decrements tl->active_count, active_count == 0
i915_syncmap_free is never called, memory leak

Matt

Okay. Think I follow it now.

Seems like the syncmap free should have been in timeline_fini instead of 
timeline_exit in the first place?


Reviewed-by: John Harrison 





Also, free'd -> freed.

John.



+*/
+   i915_syncmap_free(&timeline->sync);
+
kfree(timeline);
   }




Re: [Intel-gfx] [PATCH 1/1] drm/i915: Check if engine has heartbeat when closing a context

2021-08-09 Thread John Harrison

On 8/6/2021 12:46, Daniel Vetter wrote:

Seen this fly by and figured I dropped a few thoughts in here. At the
likely cost of looking a bit out of whack :-)

On Fri, Aug 6, 2021 at 8:01 PM John Harrison  wrote:

On 8/2/2021 02:40, Tvrtko Ursulin wrote:

On 30/07/2021 19:13, John Harrison wrote:

On 7/30/2021 02:49, Tvrtko Ursulin wrote:

On 30/07/2021 01:13, John Harrison wrote:

On 7/28/2021 17:34, Matthew Brost wrote:

If an engine associated with a context does not have a heartbeat,
ban it
immediately. This is needed for GuC submission as a idle pulse
doesn't
kick the context off the hardware where it then can check for a
heartbeat and ban the context.

Pulse, that is a request with I915_PRIORITY_BARRIER, does not
preempt a running normal priority context?

Why does it matter then whether or not heartbeats are enabled - when
heartbeat just ends up sending the same engine pulse (eventually,
with raising priority)?

The point is that the pulse is pointless. See the rest of my comments
below, specifically "the context will get resubmitted to the hardware
after the pulse completes". To re-iterate...

Yes, it preempts the context. Yes, it does so whether heartbeats are
enabled or not. But so what? Who cares? You have preempted a context.
It is no longer running on the hardware. BUT IT IS STILL A VALID
CONTEXT.

It is valid yes, and it even may be the current ABI so another
question is whether it is okay to change that.


The backend scheduler will just resubmit it to the hardware as soon
as the pulse completes. The only reason this works at all is because
of the horrid hack in the execlist scheduler's back end
implementation (in __execlists_schedule_in):
  if (unlikely(intel_context_is_closed(ce) &&
   !intel_engine_has_heartbeat(engine)))
  intel_context_set_banned(ce);

Right, is the above code then needed with this patch - when ban is
immediately applied on the higher level?


The actual back end scheduler is saying "Is this a zombie context? Is
the heartbeat disabled? Then ban it". No other scheduler backend is
going to have knowledge of zombie context status or of the heartbeat
status. Nor are they going to call back into the higher levels of the
i915 driver to trigger a ban operation. Certainly a hardware
implemented scheduler is not going to be looking at private i915
driver information to decide whether to submit a context or whether
to tell the OS to kill it off instead.

For persistence to work with a hardware scheduler (or a non-Intel
specific scheduler such as the DRM one), the handling of zombie
contexts, banning, etc. *must* be done entirely in the front end. It
cannot rely on any backend hacks. That means you can't rely on any
fancy behaviour of pulses.

If you want to ban a context then you must explicitly ban that
context. If you want to ban it at some later point then you need to
track it at the top level as a zombie and then explicitly ban that
zombie at whatever later point.

I am still trying to understand it all. If I go by the commit message:

"""
This is needed for GuC submission as a idle pulse doesn't
kick the context off the hardware where it then can check for a
heartbeat and ban the context.
"""

That did not explain things for me. Sentence does not appear to make
sense. Now, it seems "kick off the hardware" is meant as revoke and
not just preempt. Which is fine, perhaps just needs to be written more
explicitly. But the part of checking for heartbeat after idle pulse
does not compute for me. It is the heartbeat which emits idle pulses,
not idle pulse emitting heartbeats.

I am in agreement that the commit message is confusing and does not
explain either the problem or the solution.




But anyway, I can buy the handling at the front end story completely.
It makes sense. We just need to agree that a) it is okay to change the
ABI and b) remove the backend check from execlists if it is not needed
any longer.

And if ABI change is okay then commit message needs to talk about it
loudly and clearly.

I don't think we have a choice. The current ABI is not and cannot ever
be compatible with any scheduler external to i915. It cannot be
implemented with a hardware scheduler such as the GuC and it cannot be
implemented with an external software scheduler such as the DRM one.

So generally on linux we implement helper libraries, which means
massive flexibility everywhere.

https://blog.ffwll.ch/2016/12/midlayers-once-more-with-feeling.html

So it shouldn't be an insurmountable problem to make this happen even
with drm/scheduler, we can patch it up.

Whether that's justified is another question.

Helper libraries won't work with a hardware scheduler.




My view is that any implementation involving knowledge of the heartbeat
is fundamentally broken.

According to Daniel Vetter, the DRM ABI on this subject is that an
actively executing cont

Re: [Intel-gfx] [PATCH 0/1] Fix gem_ctx_persistence failures with GuC submission

2021-08-17 Thread John Harrison

On 8/9/2021 23:38, Daniel Vetter wrote:

On Wed, Jul 28, 2021 at 05:33:59PM -0700, Matthew Brost wrote:

Should fix below failures with GuC submission for the following tests:
gem_exec_balancer --r noheartbeat
gem_ctx_persistence --r heartbeat-close

Not going to fix:
gem_ctx_persistence --r heartbeat-many
gem_ctx_persistence --r heartbeat-stop

After looking at that big thread and being very confused: Are we fixing an
actual use-case here, or is this another case of blindly following igts
tests just because they exist?
My understanding is that this is established behaviour and therefore 
must be maintained because the UAPI (whether documented or not) is 
inviolate. Therefore IGTs have been written to validate this past 
behaviour and now we must conform to the IGTs in order to keep the 
existing behaviour unchanged.


Whether anybody actually makes use of this behaviour or not is another 
matter entirely. I am certainly not aware of any vital use case. Others 
might have more recollection. I do know that we tell the UMD teams to 
explicitly disable persistence on every context they create.




I'm leaning towards that we should stall on this, and first document what
exactly is the actual intention behind all this, and then fix up the tests
I'm not sure there ever was an 'intention'. The rumour I heard way back 
when was that persistence was a bug on earlier platforms (or possibly we 
didn't have hardware support for doing engine resets?). But once the bug 
was realised (or the hardware support was added), it was too late to 
change the default behaviour because existing kernel behaviour must 
never change on pain of painful things. Thus the persistence flag was 
added so that people could opt out of the broken, leaky behaviour and 
have their contexts clean up properly.


Feel free to document what you believe should be the behaviour from a 
software architect point of view. Any documentation I produce is 
basically going to be created by reverse engineering the existing code. 
That is the only 'spec' that I am aware of and as I keep saying, I 
personally think it is a totally broken concept that should just be removed.



to match (if needed). And only then fix up GuC to match whatever we
actually want to do.
I also still maintain there is no 'fix up the GuC'. This is not 
behaviour we should be adding to a hardware scheduler. It is behaviour 
that should be implemented at the front end not the back end. If we 
absolutely need to do this then we need to do it solely at the context 
management level not at the back end submission level. And the solution 
should work by default on any submission back end.


John.



-Daniel


As the above tests change the heartbeat value to 0 (off) after the
context is closed and we have no way to detect that with GuC submission
unless we keep a list of closed but running contexts which seems like
overkill for a non-real world use case. We likely should just skip these
tests with GuC submission.

Signed-off-by: Matthew Brost 

Matthew Brost (1):
   drm/i915: Check if engine has heartbeat when closing a context

  drivers/gpu/drm/i915/gem/i915_gem_context.c   |  5 +++--
  drivers/gpu/drm/i915/gt/intel_context_types.h |  2 ++
  drivers/gpu/drm/i915/gt/intel_engine.h| 21 ++-
  .../drm/i915/gt/intel_execlists_submission.c  | 14 +
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  6 +-
  .../gpu/drm/i915/gt/uc/intel_guc_submission.h |  2 --
  6 files changed, 26 insertions(+), 24 deletions(-)

--
2.28.0





Re: [Intel-gfx] [PATCH 1/1] drm/i915: Check if engine has heartbeat when closing a context

2021-08-17 Thread John Harrison

On 8/9/2021 23:36, Daniel Vetter wrote:

On Mon, Aug 09, 2021 at 04:12:52PM -0700, John Harrison wrote:

On 8/6/2021 12:46, Daniel Vetter wrote:

Seen this fly by and figured I dropped a few thoughts in here. At the
likely cost of looking a bit out of whack :-)

On Fri, Aug 6, 2021 at 8:01 PM John Harrison  wrote:

On 8/2/2021 02:40, Tvrtko Ursulin wrote:

On 30/07/2021 19:13, John Harrison wrote:

On 7/30/2021 02:49, Tvrtko Ursulin wrote:

On 30/07/2021 01:13, John Harrison wrote:

On 7/28/2021 17:34, Matthew Brost wrote:

If an engine associated with a context does not have a heartbeat,
ban it
immediately. This is needed for GuC submission as a idle pulse
doesn't
kick the context off the hardware where it then can check for a
heartbeat and ban the context.

Pulse, that is a request with I915_PRIORITY_BARRIER, does not
preempt a running normal priority context?

Why does it matter then whether or not heartbeats are enabled - when
heartbeat just ends up sending the same engine pulse (eventually,
with raising priority)?

The point is that the pulse is pointless. See the rest of my comments
below, specifically "the context will get resubmitted to the hardware
after the pulse completes". To re-iterate...

Yes, it preempts the context. Yes, it does so whether heartbeats are
enabled or not. But so what? Who cares? You have preempted a context.
It is no longer running on the hardware. BUT IT IS STILL A VALID
CONTEXT.

It is valid yes, and it even may be the current ABI so another
question is whether it is okay to change that.


The backend scheduler will just resubmit it to the hardware as soon
as the pulse completes. The only reason this works at all is because
of the horrid hack in the execlist scheduler's back end
implementation (in __execlists_schedule_in):
   if (unlikely(intel_context_is_closed(ce) &&
!intel_engine_has_heartbeat(engine)))
   intel_context_set_banned(ce);

Right, is the above code then needed with this patch - when ban is
immediately applied on the higher level?


The actual back end scheduler is saying "Is this a zombie context? Is
the heartbeat disabled? Then ban it". No other scheduler backend is
going to have knowledge of zombie context status or of the heartbeat
status. Nor are they going to call back into the higher levels of the
i915 driver to trigger a ban operation. Certainly a hardware
implemented scheduler is not going to be looking at private i915
driver information to decide whether to submit a context or whether
to tell the OS to kill it off instead.

For persistence to work with a hardware scheduler (or a non-Intel
specific scheduler such as the DRM one), the handling of zombie
contexts, banning, etc. *must* be done entirely in the front end. It
cannot rely on any backend hacks. That means you can't rely on any
fancy behaviour of pulses.

If you want to ban a context then you must explicitly ban that
context. If you want to ban it at some later point then you need to
track it at the top level as a zombie and then explicitly ban that
zombie at whatever later point.

I am still trying to understand it all. If I go by the commit message:

"""
This is needed for GuC submission as a idle pulse doesn't
kick the context off the hardware where it then can check for a
heartbeat and ban the context.
"""

That did not explain things for me. Sentence does not appear to make
sense. Now, it seems "kick off the hardware" is meant as revoke and
not just preempt. Which is fine, perhaps just needs to be written more
explicitly. But the part of checking for heartbeat after idle pulse
does not compute for me. It is the heartbeat which emits idle pulses,
not idle pulse emitting heartbeats.

I am in agreement that the commit message is confusing and does not
explain either the problem or the solution.



But anyway, I can buy the handling at the front end story completely.
It makes sense. We just need to agree that a) it is okay to change the
ABI and b) remove the backend check from execlists if it is not needed
any longer.

And if ABI change is okay then commit message needs to talk about it
loudly and clearly.

I don't think we have a choice. The current ABI is not and cannot ever
be compatible with any scheduler external to i915. It cannot be
implemented with a hardware scheduler such as the GuC and it cannot be
implemented with an external software scheduler such as the DRM one.

So generally on linux we implement helper libraries, which means
massive flexibility everywhere.

https://blog.ffwll.ch/2016/12/midlayers-once-more-with-feeling.html

So it shouldn't be an insurmountable problem to make this happen even
with drm/scheduler, we can patch it up.

Whether that's justified is another question.

Helper libraries won't work with a hardware scheduler.

Hm I guess I misunderstood then what exactly the hold-up is. This entire
discussi

Re: [PATCH 1/1] drm/i915/selftests: Increase timeout in i915_gem_contexts selftests

2021-08-19 Thread John Harrison

On 7/26/2021 20:17, Matthew Brost wrote:

Like in the case of several other selftests, generating lots of requests
in a loop takes a bit longer with GuC submission. Increase a timeout in
i915_gem_contexts selftest to take this into account.

Signed-off-by: Matthew Brost 

Reviewed-by: John Harrison 


---
  drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
index 8eb5050f8cb3..4d2758718d21 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
@@ -94,7 +94,7 @@ static int live_nop_switch(void *arg)
rq = i915_request_get(this);
i915_request_add(this);
}
-   if (i915_request_wait(rq, 0, HZ / 5) < 0) {
+   if (i915_request_wait(rq, 0, HZ) < 0) {
pr_err("Failed to populated %d contexts\n", nctx);
intel_gt_set_wedged(&i915->gt);
i915_request_put(rq);




Re: [Intel-gfx] [PATCH 11/27] drm/i915/guc: Copy whole golden context, set engine state size of subset

2021-08-26 Thread John Harrison

On 8/25/2021 20:23, Matthew Brost wrote:

When the GuC does a media reset, it copies a golden context state back
into the corrupted context's state. The address of the golden context
and the size of the engine state restore are passed in via the GuC ADS.
The i915 had a bug where it passed in the whole size of the golden
context, not the size of the engine state to restore resulting in a
memory corruption.

Also copy the entire golden context on init rather than just the engine
state that is restored.

Fixes: 481d458caede ("drm/i915/guc: Add golden context to GuC ADS")
Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 28 +-
  1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
index 6926919bcac6..df2734bfe078 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
@@ -358,6 +358,11 @@ static int guc_prep_golden_context(struct intel_guc *guc,
u8 engine_class, guc_class;
struct guc_gt_system_info *info, local_info;
  
+	/* Skip execlist and PPGTT registers + HWSP */

+   const u32 lr_hw_context_size = 80 * sizeof(u32);
+   const u32 skip_size = LRC_PPHWSP_SZ * PAGE_SIZE +
+   lr_hw_context_size;
+
/*
 * Reserve the memory for the golden contexts and point GuC at it but
 * leave it empty for now. The context data will be filled in later
@@ -396,7 +401,18 @@ static int guc_prep_golden_context(struct intel_guc *guc,
if (!blob)
continue;
  
-		blob->ads.eng_state_size[guc_class] = real_size;

+   /*
+* This interface is slightly confusing. We need to pass the
+* base address of the golden context and the engine state size
+* which is not the size of the whole golden context, it is a
+* subset that the GuC uses when doing a watchdog reset. The
+* engine state size must match the size of the golden context
+* minus the first part of the golden context that the GuC does
+* not retore during reset. Currently no real way to verify this
+* other than reading the GuC spec / code and ensuring the
+* 'skip_size' below matches the value used in the GuC code.
+*/
+   blob->ads.eng_state_size[guc_class] = real_size - skip_size;
blob->ads.golden_context_lrca[guc_class] = addr_ggtt;
addr_ggtt += alloc_size;
}
@@ -437,8 +453,8 @@ static void guc_init_golden_context(struct intel_guc *guc)
u8 *ptr;
  
  	/* Skip execlist and PPGTT registers + HWSP */

-   const u32 lr_hw_context_size = 80 * sizeof(u32);
-   const u32 skip_size = LRC_PPHWSP_SZ * PAGE_SIZE +
+   __maybe_unused const u32 lr_hw_context_size = 80 * sizeof(u32);
+   __maybe_unused const u32 skip_size = LRC_PPHWSP_SZ * PAGE_SIZE +
lr_hw_context_size;
Not sure why the 'maybe unused'? The values are not only used in BUG_ONs 
or such that could vanish.


More importantly, you now have two sets of definitions for these magic 
numbers. That seems like a very bad idea. They should be moved into a 
helper function rather than repeated.


John.


  
  	if (!intel_uc_uses_guc_submission(>->uc))

@@ -476,12 +492,12 @@ static void guc_init_golden_context(struct intel_guc *guc)
continue;
}
  
-		GEM_BUG_ON(blob->ads.eng_state_size[guc_class] != real_size);

+   GEM_BUG_ON(blob->ads.eng_state_size[guc_class] !=
+  real_size - skip_size);
GEM_BUG_ON(blob->ads.golden_context_lrca[guc_class] != 
addr_ggtt);
addr_ggtt += alloc_size;
  
-		shmem_read(engine->default_state, skip_size, ptr + skip_size,

-  real_size - skip_size);
+   shmem_read(engine->default_state, 0, ptr, real_size);
ptr += alloc_size;
}
  




Re: [PATCH 08/47] drm/i915/guc: Add new GuC interface defines and structures

2021-06-29 Thread John Harrison

On 6/24/2021 00:04, Matthew Brost wrote:

Add new GuC interface defines and structures while maintaining old ones
in parallel.

Cc: John Harrison 
Signed-off-by: Matthew Brost 
I think there was some difference of opinion over whether these 
additions should be squashed in to the specific patches that first use 
them. However, on the grounds that such is basically a patch-only style 
comment and doesn't change the final product plus, we need to get this 
stuff merged efficiently and not spend forever rebasing and refactoring...


Reviewed-by: John Harrison 



---
  .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  | 14 +++
  drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   | 41 +++
  2 files changed, 55 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h 
b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
index 2d6198e63ebe..57e18babdf4b 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
@@ -124,10 +124,24 @@ enum intel_guc_action {
INTEL_GUC_ACTION_FORCE_LOG_BUFFER_FLUSH = 0x302,
INTEL_GUC_ACTION_ENTER_S_STATE = 0x501,
INTEL_GUC_ACTION_EXIT_S_STATE = 0x502,
+   INTEL_GUC_ACTION_GLOBAL_SCHED_POLICY_CHANGE = 0x506,
+   INTEL_GUC_ACTION_SCHED_CONTEXT = 0x1000,
+   INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET = 0x1001,
+   INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE = 0x1002,
+   INTEL_GUC_ACTION_SCHED_ENGINE_MODE_SET = 0x1003,
+   INTEL_GUC_ACTION_SCHED_ENGINE_MODE_DONE = 0x1004,
+   INTEL_GUC_ACTION_SET_CONTEXT_PRIORITY = 0x1005,
+   INTEL_GUC_ACTION_SET_CONTEXT_EXECUTION_QUANTUM = 0x1006,
+   INTEL_GUC_ACTION_SET_CONTEXT_PREEMPTION_TIMEOUT = 0x1007,
+   INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION = 0x1008,
+   INTEL_GUC_ACTION_ENGINE_FAILURE_NOTIFICATION = 0x1009,
INTEL_GUC_ACTION_SLPC_REQUEST = 0x3003,
INTEL_GUC_ACTION_AUTHENTICATE_HUC = 0x4000,
+   INTEL_GUC_ACTION_REGISTER_CONTEXT = 0x4502,
+   INTEL_GUC_ACTION_DEREGISTER_CONTEXT = 0x4503,
INTEL_GUC_ACTION_REGISTER_COMMAND_TRANSPORT_BUFFER = 0x4505,
INTEL_GUC_ACTION_DEREGISTER_COMMAND_TRANSPORT_BUFFER = 0x4506,
+   INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE = 0x4600,
INTEL_GUC_ACTION_LIMIT
  };
  
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h

index 617ec601648d..28245a217a39 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -17,6 +17,9 @@
  #include "abi/guc_communication_ctb_abi.h"
  #include "abi/guc_messages_abi.h"
  
+#define GUC_CONTEXT_DISABLE		0

+#define GUC_CONTEXT_ENABLE 1
+
  #define GUC_CLIENT_PRIORITY_KMD_HIGH  0
  #define GUC_CLIENT_PRIORITY_HIGH  1
  #define GUC_CLIENT_PRIORITY_KMD_NORMAL2
@@ -26,6 +29,9 @@
  #define GUC_MAX_STAGE_DESCRIPTORS 1024
  #define   GUC_INVALID_STAGE_IDGUC_MAX_STAGE_DESCRIPTORS
  
+#define GUC_MAX_LRC_DESCRIPTORS		65535

+#defineGUC_INVALID_LRC_ID  GUC_MAX_LRC_DESCRIPTORS
+
  #define GUC_RENDER_ENGINE 0
  #define GUC_VIDEO_ENGINE  1
  #define GUC_BLITTER_ENGINE2
@@ -237,6 +243,41 @@ struct guc_stage_desc {
u64 desc_private;
  } __packed;
  
+#define CONTEXT_REGISTRATION_FLAG_KMD	BIT(0)

+
+#define CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US 100
+#define CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US 50
+
+/* Preempt to idle on quantum expiry */
+#define CONTEXT_POLICY_FLAG_PREEMPT_TO_IDLEBIT(0)
+
+/*
+ * GuC Context registration descriptor.
+ * FIXME: This is only required to exist during context registration.
+ * The current 1:1 between guc_lrc_desc and LRCs for the lifetime of the LRC
+ * is not required.
+ */
+struct guc_lrc_desc {
+   u32 hw_context_desc;
+   u32 slpm_perf_mode_hint;/* SPLC v1 only */
+   u32 slpm_freq_hint;
+   u32 engine_submit_mask; /* In logical space */
+   u8 engine_class;
+   u8 reserved0[3];
+   u32 priority;
+   u32 process_desc;
+   u32 wq_addr;
+   u32 wq_size;
+   u32 context_flags;  /* CONTEXT_REGISTRATION_* */
+   /* Time for one workload to execute. (in micro seconds) */
+   u32 execution_quantum;
+   /* Time to wait for a preemption request to complete before issuing a
+* reset. (in micro seconds). */
+   u32 preemption_timeout;
+   u32 policy_flags;   /* CONTEXT_POLICY_* */
+   u32 reserved1[19];
+} __packed;
+
  #define GUC_POWER_UNSPECIFIED 0
  #define GUC_POWER_D0  1
  #define GUC_POWER_D1  2




Re: [PATCH 10/47] drm/i915/guc: Add lrc descriptor context lookup array

2021-06-29 Thread John Harrison

On 6/25/2021 10:26, Matthew Brost wrote:

On Fri, Jun 25, 2021 at 03:17:51PM +0200, Michal Wajdeczko wrote:

On 24.06.2021 09:04, Matthew Brost wrote:

Add lrc descriptor context lookup array which can resolve the
intel_context from the lrc descriptor index. In addition to lookup, it
can determine in the lrc descriptor context is currently registered with
the GuC by checking if an entry for a descriptor index is present.
Future patches in the series will make use of this array.

s/lrc/LRC


I guess? lrc and LRC are used interchangeably throughout the current
code base.
It is an abbreviation so LRC is technically the correct version for a 
comment. The fact that other existing comments are incorrect is not a 
valid reason to perpetuate a mistake :). Might as well fix it if you are 
going to repost the patch anyway for any other reason, but I would not 
call it a blocking issue.


Also, 'can determine in the' should be 'can determine if the'. Again, 
not exactly a blocking issue but should be fixed.



Cc: John Harrison 
Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/uc/intel_guc.h|  5 +++
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 32 +--
  2 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index b28fa54214f2..2313d9fc087b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -6,6 +6,8 @@
  #ifndef _INTEL_GUC_H_
  #define _INTEL_GUC_H_
  
+#include "linux/xarray.h"

#include 


Yep.


+
  #include "intel_uncore.h"
  #include "intel_guc_fw.h"
  #include "intel_guc_fwif.h"
@@ -46,6 +48,9 @@ struct intel_guc {
struct i915_vma *lrc_desc_pool;
void *lrc_desc_pool_vaddr;
  
+	/* guc_id to intel_context lookup */

+   struct xarray context_lookup;
+
/* Control params for fw initialization */
u32 params[GUC_CTL_MAX_DWORDS];

btw, IIRC there was idea to move most struct definitions to
intel_guc_types.h, is this still a plan ?


I don't ever recall discussing this but we can certainly do this. For
what it is worth we do introduce intel_guc_submission_types.h a bit
later. I'll make a note about intel_guc_types.h though.

Matt
Yeah, my only recollection was about the submission types header. Are 
there sufficient non-submission fields in the GuC structure to warrant a 
general GuC types header?


With the commit message tweaks and #include fix mentioned above, it 
looks good to me.

Reviewed-by: John Harrison 


  
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c

index a366890fb840..23a94a896a0b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -65,8 +65,6 @@ static inline struct i915_priolist *to_priolist(struct 
rb_node *rb)
return rb_entry(rb, struct i915_priolist, node);
  }
  
-/* Future patches will use this function */

-__attribute__ ((unused))
  static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)
  {
struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr;
@@ -76,6 +74,15 @@ static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc 
*guc, u32 index)
return &base[index];
  }
  
+static inline struct intel_context *__get_context(struct intel_guc *guc, u32 id)

+{
+   struct intel_context *ce = xa_load(&guc->context_lookup, id);
+
+   GEM_BUG_ON(id >= GUC_MAX_LRC_DESCRIPTORS);
+
+   return ce;
+}
+
  static int guc_lrc_desc_pool_create(struct intel_guc *guc)
  {
u32 size;
@@ -96,6 +103,25 @@ static void guc_lrc_desc_pool_destroy(struct intel_guc *guc)
i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP);
  }
  
+static inline void reset_lrc_desc(struct intel_guc *guc, u32 id)

+{
+   struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
+
+   memset(desc, 0, sizeof(*desc));
+   xa_erase_irq(&guc->context_lookup, id);
+}
+
+static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
+{
+   return __get_context(guc, id);
+}
+
+static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
+  struct intel_context *ce)
+{
+   xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
+}
+
  static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)
  {
/* Leaving stub as this function will be used in future patches */
@@ -400,6 +426,8 @@ int intel_guc_submission_init(struct intel_guc *guc)
 */
GEM_BUG_ON(!guc->lrc_desc_pool);
  
+	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);

+
return 0;
  }
  





Re: [PATCH 11/47] drm/i915/guc: Implement GuC submission tasklet

2021-06-29 Thread John Harrison

On 6/24/2021 00:04, Matthew Brost wrote:

Implement GuC submission tasklet for new interface. The new GuC
interface uses H2G to submit contexts to the GuC. Since H2G use a single
channel, a single tasklet submits is used for the submission path.

Re-word? 'a single tasklet submits is used...' doesn't make sense.


Also the per engine interrupt handler has been updated to disable the
rescheduling of the physical engine tasklet, when using GuC scheduling,
as the physical engine tasklet is no longer used.

In this patch the field, guc_id, has been added to intel_context and is
not assigned. Patches later in the series will assign this value.

Cc: John Harrison
Signed-off-by: Matthew Brost
---
  drivers/gpu/drm/i915/gt/intel_context_types.h |   9 +
  drivers/gpu/drm/i915/gt/uc/intel_guc.h|   4 +
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 231 +-
  3 files changed, 127 insertions(+), 117 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index ed8c447a7346..bb6fef7eae52 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -136,6 +136,15 @@ struct intel_context {
struct intel_sseu sseu;
  
  	u8 wa_bb_page; /* if set, page num reserved for context workarounds */

+
+   /* GuC scheduling state that does not require a lock. */
Maybe 'GuC scheduling state flags that do not require a lock'? Otherwise 
it just looks like a counter or something.



+   atomic_t guc_sched_state_no_lock;
+
+   /*
+* GuC lrc descriptor ID - Not assigned in this patch but future patches
Not a blocker but s/lrc/LRC/ would keep Michal happy ;). Although 
presumably this comment is at least being amended by later patches in 
the series.



+* in the series will.
+*/
+   u16 guc_id;
  };
  
  #endif /* __INTEL_CONTEXT_TYPES__ */

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 2313d9fc087b..9ba8219475b2 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -30,6 +30,10 @@ struct intel_guc {
struct intel_guc_log log;
struct intel_guc_ct ct;
  
+	/* Global engine used to submit requests to GuC */

+   struct i915_sched_engine *sched_engine;
+   struct i915_request *stalled_request;
+
/* intel_guc_recv interrupt related state */
spinlock_t irq_lock;
unsigned int msg_enabled_mask;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 23a94a896a0b..ee933efbf0ff 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -60,6 +60,31 @@
  
  #define GUC_REQUEST_SIZE 64 /* bytes */
  
+/*

+ * Below is a set of functions which control the GuC scheduling state which do
+ * not require a lock as all state transitions are mutually exclusive. i.e. It
+ * is not possible for the context pinning code and submission, for the same
+ * context, to be executing simultaneously. We still need an atomic as it is
+ * possible for some of the bits to changing at the same time though.
+ */
+#define SCHED_STATE_NO_LOCK_ENABLEDBIT(0)
+static inline bool context_enabled(struct intel_context *ce)
+{
+   return (atomic_read(&ce->guc_sched_state_no_lock) &
+   SCHED_STATE_NO_LOCK_ENABLED);
+}
+
+static inline void set_context_enabled(struct intel_context *ce)
+{
+   atomic_or(SCHED_STATE_NO_LOCK_ENABLED, &ce->guc_sched_state_no_lock);
+}
+
+static inline void clr_context_enabled(struct intel_context *ce)
+{
+   atomic_and((u32)~SCHED_STATE_NO_LOCK_ENABLED,
+  &ce->guc_sched_state_no_lock);
+}
+
  static inline struct i915_priolist *to_priolist(struct rb_node *rb)
  {
return rb_entry(rb, struct i915_priolist, node);
@@ -122,37 +147,29 @@ static inline void set_lrc_desc_registered(struct 
intel_guc *guc, u32 id,
xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
  }
  
-static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)

+static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
  {
-   /* Leaving stub as this function will be used in future patches */
-}
+   int err;
+   struct intel_context *ce = rq->context;
+   u32 action[3];
+   int len = 0;
+   bool enabled = context_enabled(ce);
  
-/*

- * When we're doing submissions using regular execlists backend, writing to
- * ELSP from CPU side is enough to make sure that writes to ringbuffer pages
- * pinned in mappable aperture portion of GGTT are visible to command streamer.
- * Writes done by GuC on our behalf are not guaranteeing such ordering,
- * therefore, to ensure the flush, we're issuing a POSTING READ.
-

Re: [PATCH 12/47] drm/i915/guc: Add bypass tasklet submission path to GuC

2021-06-29 Thread John Harrison

On 6/24/2021 00:04, Matthew Brost wrote:

Add bypass tasklet submission path to GuC. The tasklet is only used if H2G
channel has backpresure.

Signed-off-by: Matthew Brost 

Reviewed-by: John Harrison 


---
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 37 +++
  1 file changed, 29 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index ee933efbf0ff..38aff83ee9fa 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -172,6 +172,12 @@ static int guc_add_request(struct intel_guc *guc, struct 
i915_request *rq)
return err;
  }
  
+static inline void guc_set_lrc_tail(struct i915_request *rq)

+{
+   rq->context->lrc_reg_state[CTX_RING_TAIL] =
+   intel_ring_set_tail(rq->ring, rq->tail);
+}
+
  static inline int rq_prio(const struct i915_request *rq)
  {
return rq->sched.attr.priority;
@@ -215,8 +221,7 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
}
  done:
if (submit) {
-   last->context->lrc_reg_state[CTX_RING_TAIL] =
-   intel_ring_set_tail(last->ring, last->tail);
+   guc_set_lrc_tail(last);
  resubmit:
/*
 * We only check for -EBUSY here even though it is possible for
@@ -496,20 +501,36 @@ static inline void queue_request(struct i915_sched_engine 
*sched_engine,
set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
  }
  
+static int guc_bypass_tasklet_submit(struct intel_guc *guc,

+struct i915_request *rq)
+{
+   int ret;
+
+   __i915_request_submit(rq);
+
+   trace_i915_request_in(rq, 0);
+
+   guc_set_lrc_tail(rq);
+   ret = guc_add_request(guc, rq);
+   if (ret == -EBUSY)
+   guc->stalled_request = rq;
+
+   return ret;
+}
+
  static void guc_submit_request(struct i915_request *rq)
  {
struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
+   struct intel_guc *guc = &rq->engine->gt->uc.guc;
unsigned long flags;
  
  	/* Will be called from irq-context when using foreign fences. */

spin_lock_irqsave(&sched_engine->lock, flags);
  
-	queue_request(sched_engine, rq, rq_prio(rq));

-
-   GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine));
-   GEM_BUG_ON(list_empty(&rq->sched.link));
-
-   tasklet_hi_schedule(&sched_engine->tasklet);
+   if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
+   queue_request(sched_engine, rq, rq_prio(rq));
+   else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
+   tasklet_hi_schedule(&sched_engine->tasklet);
  
  	spin_unlock_irqrestore(&sched_engine->lock, flags);

  }




Re: [PATCH 47/47] drm/i915/guc: Unblock GuC submission on Gen11+

2021-06-30 Thread John Harrison

On 6/30/2021 01:22, Martin Peres wrote:

On 24/06/2021 10:05, Matthew Brost wrote:

From: Daniele Ceraolo Spurio 

Unblock GuC submission on Gen11+ platforms.

Signed-off-by: Michal Wajdeczko 
Signed-off-by: Daniele Ceraolo Spurio 
Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/uc/intel_guc.h    |  1 +
  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c |  8 
  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h |  3 +--
  drivers/gpu/drm/i915/gt/uc/intel_uc.c | 14 +-
  4 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h

index fae01dc8e1b9..77981788204f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -54,6 +54,7 @@ struct intel_guc {
  struct ida guc_ids;
  struct list_head guc_id_list;
  +    bool submission_supported;
  bool submission_selected;
    struct i915_vma *ads_vma;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c

index a427336ce916..405339202280 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2042,6 +2042,13 @@ void intel_guc_submission_disable(struct 
intel_guc *guc)
  /* Note: By the time we're here, GuC may have already been 
reset */

  }
  +static bool __guc_submission_supported(struct intel_guc *guc)
+{
+    /* GuC submission is unavailable for pre-Gen11 */
+    return intel_guc_is_supported(guc) &&
+   INTEL_GEN(guc_to_gt(guc)->i915) >= 11;
+}
+
  static bool __guc_submission_selected(struct intel_guc *guc)
  {
  struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
@@ -2054,6 +2061,7 @@ static bool __guc_submission_selected(struct 
intel_guc *guc)

    void intel_guc_submission_init_early(struct intel_guc *guc)
  {
+    guc->submission_supported = __guc_submission_supported(guc);
  guc->submission_selected = __guc_submission_selected(guc);
  }
  diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h

index a2a3fad72be1..be767eb6ff71 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
@@ -37,8 +37,7 @@ int intel_guc_wait_for_pending_msg(struct intel_guc 
*guc,
    static inline bool intel_guc_submission_is_supported(struct 
intel_guc *guc)

  {
-    /* XXX: GuC submission is unavailable for now */
-    return false;
+    return guc->submission_supported;
  }
    static inline bool intel_guc_submission_is_wanted(struct 
intel_guc *guc)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c 
b/drivers/gpu/drm/i915/gt/uc/intel_uc.c

index 7a69c3c027e9..61be0aa81492 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -34,8 +34,15 @@ static void uc_expand_default_options(struct 
intel_uc *uc)

  return;
  }
  -    /* Default: enable HuC authentication only */
-    i915->params.enable_guc = ENABLE_GUC_LOAD_HUC;
+    /* Intermediate platforms are HuC authentication only */
+    if (IS_DG1(i915) || IS_ALDERLAKE_S(i915)) {
+    drm_dbg(&i915->drm, "Disabling GuC only due to old 
platform\n");


This comment does not seem accurate, given that DG1 is barely out, and 
ADL is not out yet. How about:


"Disabling GuC on untested platforms"?

Just because something is not in the shops yet does not mean it is new. 
Technology is always obsolete by the time it goes on sale.


And the issue is not a lack of testing, it is a question of whether we 
are allowed to change the default on something that has already started 
being used by customers or not (including pre-release beta customers). 
I.e. it is basically a political decision not an engineering decision.




+    i915->params.enable_guc = ENABLE_GUC_LOAD_HUC;
+    return;
+    }
+
+    /* Default: enable HuC authentication and GuC submission */
+    i915->params.enable_guc = ENABLE_GUC_LOAD_HUC | 
ENABLE_GUC_SUBMISSION;


This seems to be in contradiction with the GuC submission plan which 
states:


"Not enabled by default on any current platforms but can be enabled 
via modparam enable_guc".
All current platforms have already been explicitly tested for above. 
This is setting the default on newer platforms - ADL-P and later. For 
which the official expectation is to have GuC enabled.




When you rework the patch, could you please add a warning when the 
user force-enables the GuC Command Submission? 
There already is one. If you set the module parameter then the kernel is 
tainted. That means 'here be dragons' - you have done something 
officially not supported to your kernel so all bets are off, if it blows 
up it is your own problem.



Something like:

"WARNING: The user force-enabled the experimental GuC command 
submission backend using i915.enable_guc. Please disable it if 
experie

Re: [PATCH 4/7] drm/i915/guc: Add non blocking CTB send function

2021-07-06 Thread John Harrison

On 7/1/2021 10:15, Matthew Brost wrote:

Add non blocking CTB send function, intel_guc_send_nb. GuC submission
will send CTBs in the critical path and does not need to wait for these
CTBs to complete before moving on, hence the need for this new function.

The non-blocking CTB now must have a flow control mechanism to ensure
the buffer isn't overrun. A lazy spin wait is used as we believe the
flow control condition should be rare with a properly sized buffer.

The function, intel_guc_send_nb, is exported in this patch but unused.
Several patches later in the series make use of this function.

v2:
  (Michal)
   - Use define for H2G room calculations
   - Move INTEL_GUC_SEND_NB define
  (Daniel Vetter)
   - Use msleep_interruptible rather than cond_resched
v3:
  (Michal)
   - Move includes to following patch
   - s/INTEL_GUC_SEND_NB/INTEL_GUC_CT_SEND_NB/g

Signed-off-by: John Harrison 
Signed-off-by: Matthew Brost 
---
  .../gt/uc/abi/guc_communication_ctb_abi.h |  3 +-
  drivers/gpu/drm/i915/gt/uc/intel_guc.h| 11 ++-
  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 87 +--
  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  4 +-
  4 files changed, 91 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h 
b/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
index e933ca02d0eb..99e1fad5ca20 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
@@ -79,7 +79,8 @@ static_assert(sizeof(struct guc_ct_buffer_desc) == 64);
   *  
+---+---+--+
   */
  
-#define GUC_CTB_MSG_MIN_LEN			1u

+#define GUC_CTB_HDR_LEN1u
+#define GUC_CTB_MSG_MIN_LENGUC_CTB_HDR_LEN
  #define GUC_CTB_MSG_MAX_LEN   256u
  #define GUC_CTB_MSG_0_FENCE   (0x << 16)
  #define GUC_CTB_MSG_0_FORMAT  (0xf << 12)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 4abc59f6f3cd..72e4653222e2 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -74,7 +74,14 @@ static inline struct intel_guc *log_to_guc(struct 
intel_guc_log *log)
  static
  inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len)
  {
-   return intel_guc_ct_send(&guc->ct, action, len, NULL, 0);
+   return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, 0);
+}
+
+static
+inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len)
+{
+   return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,
+INTEL_GUC_CT_SEND_NB);
  }
  
  static inline int

@@ -82,7 +89,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 
*action, u32 len,
   u32 *response_buf, u32 response_buf_size)
  {
return intel_guc_ct_send(&guc->ct, action, len,
-response_buf, response_buf_size);
+response_buf, response_buf_size, 0);
  }
  
  static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 43e03aa2dde8..fb825cc1d090 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -3,6 +3,8 @@
   * Copyright © 2016-2019 Intel Corporation
   */
  
+#include 

+
  #include "i915_drv.h"
  #include "intel_guc_ct.h"
  #include "gt/intel_gt.h"
@@ -373,7 +375,7 @@ static void write_barrier(struct intel_guc_ct *ct)
  static int ct_write(struct intel_guc_ct *ct,
const u32 *action,
u32 len /* in dwords */,
-   u32 fence)
+   u32 fence, u32 flags)
  {
struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
struct guc_ct_buffer_desc *desc = ctb->desc;
@@ -409,7 +411,7 @@ static int ct_write(struct intel_guc_ct *ct,
used = tail - head;
  
  	/* make sure there is a space including extra dw for the fence */

-   if (unlikely(used + len + 1 >= size))
+   if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
I thought the plan was to update the comment? Given that the '+1' is now 
'HDR_LEN' it would be good to update the comment to say 'header' instead 
of 'fence'.



return -ENOSPC;
  
  	/*

@@ -421,9 +423,13 @@ static int ct_write(struct intel_guc_ct *ct,
 FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) |
 FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence);
  
-	hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |

- FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
-

Re: [PATCH 5/7] drm/i915/guc: Add stall timer to non blocking CTB send function

2021-07-06 Thread John Harrison

On 7/1/2021 10:15, Matthew Brost wrote:

Implement a stall timer which fails H2G CTBs once a period of time
with no forward progress is reached to prevent deadlock.

v2:
  (Michal)
   - Improve error message in ct_deadlock()
   - Set broken when ct_deadlock() returns true
   - Return -EPIPE on ct_deadlock()
v3:
  (Michal)
   - Add ms to stall timer comment
  (Matthew)
   - Move broken check to intel_guc_ct_send()

Signed-off-by: John Harrison 
Signed-off-by: Daniele Ceraolo Spurio 
Signed-off-by: Matthew Brost 

Looks plausible to me.

Reviewed-by: John Harrison 


---
  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 62 ---
  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  4 ++
  2 files changed, 59 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index fb825cc1d090..a9cb7b608520 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -4,6 +4,9 @@
   */
  
  #include 

+#include 
+#include 
+#include 
  
  #include "i915_drv.h"

  #include "intel_guc_ct.h"
@@ -316,6 +319,7 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct)
goto err_deregister;
  
  	ct->enabled = true;

+   ct->stall_time = KTIME_MAX;
  
  	return 0;
  
@@ -388,9 +392,6 @@ static int ct_write(struct intel_guc_ct *ct,

u32 *cmds = ctb->cmds;
unsigned int i;
  
-	if (unlikely(ctb->broken))

-   return -EPIPE;
-
if (unlikely(desc->status))
goto corrupted;
  
@@ -506,6 +507,25 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)

return err;
  }
  
+#define GUC_CTB_TIMEOUT_MS	1500

+static inline bool ct_deadlocked(struct intel_guc_ct *ct)
+{
+   long timeout = GUC_CTB_TIMEOUT_MS;
+   bool ret = ktime_ms_delta(ktime_get(), ct->stall_time) > timeout;
+
+   if (unlikely(ret)) {
+   struct guc_ct_buffer_desc *send = ct->ctbs.send.desc;
+   struct guc_ct_buffer_desc *recv = ct->ctbs.send.desc;
+
+   CT_ERROR(ct, "Communication stalled for %lld ms, desc 
status=%#x,%#x\n",
+ktime_ms_delta(ktime_get(), ct->stall_time),
+send->status, recv->status);
+   ct->ctbs.send.broken = true;
+   }
+
+   return ret;
+}
+
  static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
  {
struct guc_ct_buffer_desc *desc = ctb->desc;
@@ -517,6 +537,26 @@ static inline bool h2g_has_room(struct intel_guc_ct_buffer 
*ctb, u32 len_dw)
return space >= len_dw;
  }
  
+static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)

+{
+   struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
+
+   lockdep_assert_held(&ct->ctbs.send.lock);
+
+   if (unlikely(!h2g_has_room(ctb, len_dw))) {
+   if (ct->stall_time == KTIME_MAX)
+   ct->stall_time = ktime_get();
+
+   if (unlikely(ct_deadlocked(ct)))
+   return -EPIPE;
+   else
+   return -EBUSY;
+   }
+
+   ct->stall_time = KTIME_MAX;
+   return 0;
+}
+
  static int ct_send_nb(struct intel_guc_ct *ct,
  const u32 *action,
  u32 len,
@@ -529,11 +569,9 @@ static int ct_send_nb(struct intel_guc_ct *ct,
  
  	spin_lock_irqsave(&ctb->lock, spin_flags);
  
-	ret = h2g_has_room(ctb, len + GUC_CTB_HDR_LEN);

-   if (unlikely(!ret)) {
-   ret = -EBUSY;
+   ret = has_room_nb(ct, len + GUC_CTB_HDR_LEN);
+   if (unlikely(ret))
goto out;
-   }
  
  	fence = ct_get_next_fence(ct);

ret = ct_write(ct, action, len, fence, flags);
@@ -576,8 +614,13 @@ static int ct_send(struct intel_guc_ct *ct,
  retry:
spin_lock_irqsave(&ctb->lock, flags);
if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) {
+   if (ct->stall_time == KTIME_MAX)
+   ct->stall_time = ktime_get();
spin_unlock_irqrestore(&ctb->lock, flags);
  
+		if (unlikely(ct_deadlocked(ct)))

+   return -EPIPE;
+
if (msleep_interruptible(sleep_period_ms))
return -EINTR;
sleep_period_ms = sleep_period_ms << 1;
@@ -585,6 +628,8 @@ static int ct_send(struct intel_guc_ct *ct,
goto retry;
}
  
+	ct->stall_time = KTIME_MAX;

+
fence = ct_get_next_fence(ct);
request.fence = fence;
request.status = 0;
@@ -647,6 +692,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 
*action, u32 len,
return -ENODEV;
}
  
+	if (unlikely(ct->ctbs.send.broken))

+   return -EPIPE;
+
if (flags & INTEL_GUC_CT_SEND_NB)
ret

Re: [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads

2021-07-06 Thread John Harrison

On 7/1/2021 10:15, Matthew Brost wrote:

CTB writes are now in the path of command submission and should be
optimized for performance. Rather than reading CTB descriptor values
(e.g. head, tail) which could result in accesses across the PCIe bus,
store shadow local copies and only read/write the descriptor values when
absolutely necessary. Also store the current space in the each channel
locally.

v2:
  (Michel)
   - Add additional sanity checks for head / tail pointers
   - Use GUC_CTB_HDR_LEN rather than magic 1

Signed-off-by: John Harrison 
Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 88 +++
  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
  2 files changed, 65 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index a9cb7b608520..5b8b4ff609e2 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct 
guc_ct_buffer_desc *desc)
  static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
  {
ctb->broken = false;
+   ctb->tail = 0;
+   ctb->head = 0;
+   ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
+
guc_ct_buffer_desc_init(ctb->desc);
  }
  
@@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct,

  {
struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
struct guc_ct_buffer_desc *desc = ctb->desc;
-   u32 head = desc->head;
-   u32 tail = desc->tail;
+   u32 tail = ctb->tail;
u32 size = ctb->size;
-   u32 used;
u32 header;
u32 hxg;
u32 *cmds = ctb->cmds;
@@ -395,25 +397,22 @@ static int ct_write(struct intel_guc_ct *ct,
if (unlikely(desc->status))
goto corrupted;
  
-	if (unlikely((tail | head) >= size)) {

+   GEM_BUG_ON(tail > size);
+
+#ifdef CONFIG_DRM_I915_DEBUG_GUC
+   if (unlikely(tail != READ_ONCE(desc->tail))) {
+   CT_ERROR(ct, "Tail was modified %u != %u\n",
+desc->tail, ctb->tail);
+   desc->status |= GUC_CTB_STATUS_MISMATCH;
+   goto corrupted;
+   }
+   if (unlikely((desc->tail | desc->head) >= size)) {
CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
-head, tail, size);
+desc->head, desc->tail, size);
desc->status |= GUC_CTB_STATUS_OVERFLOW;
goto corrupted;
}
-
-   /*
-* tail == head condition indicates empty. GuC FW does not support
-* using up the entire buffer to get tail == head meaning full.
-*/
-   if (tail < head)
-   used = (size - head) + tail;
-   else
-   used = tail - head;
-
-   /* make sure there is a space including extra dw for the fence */
-   if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
-   return -ENOSPC;
+#endif
  
  	/*

 * dw0: CT header (including fence)
@@ -454,7 +453,9 @@ static int ct_write(struct intel_guc_ct *ct,
write_barrier(ct);
  
  	/* now update descriptor */

+   ctb->tail = tail;
WRITE_ONCE(desc->tail, tail);
+   ctb->space -= len + GUC_CTB_HDR_LEN;
  
  	return 0;
  
@@ -470,7 +471,7 @@ static int ct_write(struct intel_guc_ct *ct,

   * @req:  pointer to pending request
   * @status:   placeholder for status
   *
- * For each sent request, Guc shall send bac CT response message.
+ * For each sent request, GuC shall send back CT response message.
   * Our message handler will update status of tracked request once
   * response message with given fence is received. Wait here and
   * check for valid response status value.
@@ -526,24 +527,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
return ret;
  }
  
-static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)

+static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
  {
-   struct guc_ct_buffer_desc *desc = ctb->desc;
-   u32 head = READ_ONCE(desc->head);
+   struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
+   u32 head;
u32 space;
  
-	space = CIRC_SPACE(desc->tail, head, ctb->size);

+   if (ctb->space >= len_dw)
+   return true;
+
+   head = READ_ONCE(ctb->desc->head);
+   if (unlikely(head > ctb->size)) {
+   CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
+ctb->desc->head, ctb->desc->tail, ctb->size);
+   ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
+   ctb->broken = true;
+   return false;
+   }
+
+   s

Re: [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads

2021-07-06 Thread John Harrison

On 7/6/2021 12:12, Michal Wajdeczko wrote:

On 06.07.2021 21:00, John Harrison wrote:

On 7/1/2021 10:15, Matthew Brost wrote:

CTB writes are now in the path of command submission and should be
optimized for performance. Rather than reading CTB descriptor values
(e.g. head, tail) which could result in accesses across the PCIe bus,
store shadow local copies and only read/write the descriptor values when
absolutely necessary. Also store the current space in the each channel
locally.

v2:
   (Michel)
    - Add additional sanity checks for head / tail pointers
    - Use GUC_CTB_HDR_LEN rather than magic 1

Signed-off-by: John Harrison 
Signed-off-by: Matthew Brost 
---
   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 88 +++
   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
   2 files changed, 65 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index a9cb7b608520..5b8b4ff609e2 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct
guc_ct_buffer_desc *desc)
   static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
   {
   ctb->broken = false;
+    ctb->tail = 0;
+    ctb->head = 0;
+    ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
+
   guc_ct_buffer_desc_init(ctb->desc);
   }
   @@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct,
   {
   struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
   struct guc_ct_buffer_desc *desc = ctb->desc;
-    u32 head = desc->head;
-    u32 tail = desc->tail;
+    u32 tail = ctb->tail;
   u32 size = ctb->size;
-    u32 used;
   u32 header;
   u32 hxg;
   u32 *cmds = ctb->cmds;
@@ -395,25 +397,22 @@ static int ct_write(struct intel_guc_ct *ct,
   if (unlikely(desc->status))
   goto corrupted;
   -    if (unlikely((tail | head) >= size)) {
+    GEM_BUG_ON(tail > size);
+
+#ifdef CONFIG_DRM_I915_DEBUG_GUC
+    if (unlikely(tail != READ_ONCE(desc->tail))) {
+    CT_ERROR(ct, "Tail was modified %u != %u\n",
+ desc->tail, ctb->tail);
+    desc->status |= GUC_CTB_STATUS_MISMATCH;
+    goto corrupted;
+    }
+    if (unlikely((desc->tail | desc->head) >= size)) {
   CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
- head, tail, size);
+ desc->head, desc->tail, size);
   desc->status |= GUC_CTB_STATUS_OVERFLOW;
   goto corrupted;
   }
-
-    /*
- * tail == head condition indicates empty. GuC FW does not support
- * using up the entire buffer to get tail == head meaning full.
- */
-    if (tail < head)
-    used = (size - head) + tail;
-    else
-    used = tail - head;
-
-    /* make sure there is a space including extra dw for the fence */
-    if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
-    return -ENOSPC;
+#endif
     /*
    * dw0: CT header (including fence)
@@ -454,7 +453,9 @@ static int ct_write(struct intel_guc_ct *ct,
   write_barrier(ct);
     /* now update descriptor */
+    ctb->tail = tail;
   WRITE_ONCE(desc->tail, tail);
+    ctb->space -= len + GUC_CTB_HDR_LEN;
     return 0;
   @@ -470,7 +471,7 @@ static int ct_write(struct intel_guc_ct *ct,
    * @req:    pointer to pending request
    * @status:    placeholder for status
    *
- * For each sent request, Guc shall send bac CT response message.
+ * For each sent request, GuC shall send back CT response message.
    * Our message handler will update status of tracked request once
    * response message with given fence is received. Wait here and
    * check for valid response status value.
@@ -526,24 +527,35 @@ static inline bool ct_deadlocked(struct
intel_guc_ct *ct)
   return ret;
   }
   -static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb,
u32 len_dw)
+static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
   {
-    struct guc_ct_buffer_desc *desc = ctb->desc;
-    u32 head = READ_ONCE(desc->head);
+    struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
+    u32 head;
   u32 space;
   -    space = CIRC_SPACE(desc->tail, head, ctb->size);
+    if (ctb->space >= len_dw)
+    return true;
+
+    head = READ_ONCE(ctb->desc->head);
+    if (unlikely(head > ctb->size)) {
+    CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
+ ctb->desc->head, ctb->desc->tail, ctb->size);
+    ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
+    ctb->broken = true;
+    return false;
+    }
+
+    space = CIRC_SPACE(ctb->tail, head, ctb->size);
+    ctb->space = space;
     return space >= len_dw;
   }
     static int 

Re: [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads

2021-07-06 Thread John Harrison

On 7/6/2021 12:33, Michal Wajdeczko wrote:

On 06.07.2021 21:19, John Harrison wrote:

On 7/6/2021 12:12, Michal Wajdeczko wrote:

On 06.07.2021 21:00, John Harrison wrote:

On 7/1/2021 10:15, Matthew Brost wrote:

CTB writes are now in the path of command submission and should be
optimized for performance. Rather than reading CTB descriptor values
(e.g. head, tail) which could result in accesses across the PCIe bus,
store shadow local copies and only read/write the descriptor values
when
absolutely necessary. Also store the current space in the each channel
locally.

v2:
    (Michel)
     - Add additional sanity checks for head / tail pointers
     - Use GUC_CTB_HDR_LEN rather than magic 1

Signed-off-by: John Harrison 
Signed-off-by: Matthew Brost 
---
    drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 88
+++
    drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
    2 files changed, 65 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index a9cb7b608520..5b8b4ff609e2 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct
guc_ct_buffer_desc *desc)
    static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
    {
    ctb->broken = false;
+    ctb->tail = 0;
+    ctb->head = 0;
+    ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
+
    guc_ct_buffer_desc_init(ctb->desc);
    }
    @@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct,
    {
    struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
    struct guc_ct_buffer_desc *desc = ctb->desc;
-    u32 head = desc->head;
-    u32 tail = desc->tail;
+    u32 tail = ctb->tail;
    u32 size = ctb->size;
-    u32 used;
    u32 header;
    u32 hxg;
    u32 *cmds = ctb->cmds;
@@ -395,25 +397,22 @@ static int ct_write(struct intel_guc_ct *ct,
    if (unlikely(desc->status))
    goto corrupted;
    -    if (unlikely((tail | head) >= size)) {
+    GEM_BUG_ON(tail > size);
+
+#ifdef CONFIG_DRM_I915_DEBUG_GUC
+    if (unlikely(tail != READ_ONCE(desc->tail))) {
+    CT_ERROR(ct, "Tail was modified %u != %u\n",
+ desc->tail, ctb->tail);
+    desc->status |= GUC_CTB_STATUS_MISMATCH;
+    goto corrupted;
+    }
+    if (unlikely((desc->tail | desc->head) >= size)) {
    CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
- head, tail, size);
+ desc->head, desc->tail, size);
    desc->status |= GUC_CTB_STATUS_OVERFLOW;
    goto corrupted;
    }
-
-    /*
- * tail == head condition indicates empty. GuC FW does not support
- * using up the entire buffer to get tail == head meaning full.
- */
-    if (tail < head)
-    used = (size - head) + tail;
-    else
-    used = tail - head;
-
-    /* make sure there is a space including extra dw for the fence */
-    if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
-    return -ENOSPC;
+#endif
      /*
     * dw0: CT header (including fence)
@@ -454,7 +453,9 @@ static int ct_write(struct intel_guc_ct *ct,
    write_barrier(ct);
      /* now update descriptor */
+    ctb->tail = tail;
    WRITE_ONCE(desc->tail, tail);
+    ctb->space -= len + GUC_CTB_HDR_LEN;
      return 0;
    @@ -470,7 +471,7 @@ static int ct_write(struct intel_guc_ct *ct,
     * @req:    pointer to pending request
     * @status:    placeholder for status
     *
- * For each sent request, Guc shall send bac CT response message.
+ * For each sent request, GuC shall send back CT response message.
     * Our message handler will update status of tracked request once
     * response message with given fence is received. Wait here and
     * check for valid response status value.
@@ -526,24 +527,35 @@ static inline bool ct_deadlocked(struct
intel_guc_ct *ct)
    return ret;
    }
    -static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb,
u32 len_dw)
+static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
    {
-    struct guc_ct_buffer_desc *desc = ctb->desc;
-    u32 head = READ_ONCE(desc->head);
+    struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
+    u32 head;
    u32 space;
    -    space = CIRC_SPACE(desc->tail, head, ctb->size);
+    if (ctb->space >= len_dw)
+    return true;
+
+    head = READ_ONCE(ctb->desc->head);
+    if (unlikely(head > ctb->size)) {
+    CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
+ ctb->desc->head, ctb->desc->tail, ctb->size);
+    ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
+    ctb->broken = true;
+    return false;
+    }
+
+    sp

Re: [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads

2021-07-06 Thread John Harrison

On 7/6/2021 15:20, Matthew Brost wrote:

CTB writes are now in the path of command submission and should be
optimized for performance. Rather than reading CTB descriptor values
(e.g. head, tail) which could result in accesses across the PCIe bus,
store shadow local copies and only read/write the descriptor values when
absolutely necessary. Also store the current space in the each channel
locally.

v2:
  (Michal)
   - Add additional sanity checks for head / tail pointers
   - Use GUC_CTB_HDR_LEN rather than magic 1
v3:
  (Michal / John H)
   - Drop redundant check of head value

Signed-off-by: John Harrison 
Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 88 +++
  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
  2 files changed, 65 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index db3e85b89573..4a73a1f03a9b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct 
guc_ct_buffer_desc *desc)
  static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
  {
ctb->broken = false;
+   ctb->tail = 0;
+   ctb->head = 0;
+   ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
+
guc_ct_buffer_desc_init(ctb->desc);
  }
  
@@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct,

  {
struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
struct guc_ct_buffer_desc *desc = ctb->desc;
-   u32 head = desc->head;
-   u32 tail = desc->tail;
+   u32 tail = ctb->tail;
u32 size = ctb->size;
-   u32 used;
u32 header;
u32 hxg;
u32 type;
@@ -396,25 +398,22 @@ static int ct_write(struct intel_guc_ct *ct,
if (unlikely(desc->status))
goto corrupted;
  
-	if (unlikely((tail | head) >= size)) {

+   GEM_BUG_ON(tail > size);
+
+#ifdef CONFIG_DRM_I915_DEBUG_GUC
+   if (unlikely(tail != READ_ONCE(desc->tail))) {
+   CT_ERROR(ct, "Tail was modified %u != %u\n",
+desc->tail, ctb->tail);
+   desc->status |= GUC_CTB_STATUS_MISMATCH;
+   goto corrupted;
+   }
+   if (unlikely((desc->tail | desc->head) >= size)) {
Same arguments below about head apply to tail here. Also, there is no 
#else check on ctb->head?



CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
-head, tail, size);
+desc->head, desc->tail, size);
desc->status |= GUC_CTB_STATUS_OVERFLOW;
goto corrupted;
}
-
-   /*
-* tail == head condition indicates empty. GuC FW does not support
-* using up the entire buffer to get tail == head meaning full.
-*/
-   if (tail < head)
-   used = (size - head) + tail;
-   else
-   used = tail - head;
-
-   /* make sure there is a space including extra dw for the header */
-   if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
-   return -ENOSPC;
+#endif
  
  	/*

 * dw0: CT header (including fence)
@@ -453,7 +452,9 @@ static int ct_write(struct intel_guc_ct *ct,
write_barrier(ct);
  
  	/* now update descriptor */

+   ctb->tail = tail;
WRITE_ONCE(desc->tail, tail);
+   ctb->space -= len + GUC_CTB_HDR_LEN;
  
  	return 0;
  
@@ -469,7 +470,7 @@ static int ct_write(struct intel_guc_ct *ct,

   * @req:  pointer to pending request
   * @status:   placeholder for status
   *
- * For each sent request, Guc shall send bac CT response message.
+ * For each sent request, GuC shall send back CT response message.
   * Our message handler will update status of tracked request once
   * response message with given fence is received. Wait here and
   * check for valid response status value.
@@ -525,24 +526,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
return ret;
  }
  
-static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)

+static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
  {
-   struct guc_ct_buffer_desc *desc = ctb->desc;
-   u32 head = READ_ONCE(desc->head);
+   struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
+   u32 head;
u32 space;
  
-	space = CIRC_SPACE(desc->tail, head, ctb->size);

+   if (ctb->space >= len_dw)
+   return true;
+
+   head = READ_ONCE(ctb->desc->head);
+   if (unlikely(head > ctb->size)) {
+   CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
+ctb->desc->head, ctb->desc->tail, ctb->size);
+   

Re: [Intel-gfx] [PATCH 47/47] drm/i915/guc: Unblock GuC submission on Gen11+

2021-07-06 Thread John Harrison

On 7/3/2021 01:21, Martin Peres wrote:

On 02/07/2021 18:07, Michal Wajdeczko wrote:

On 02.07.2021 10:09, Martin Peres wrote:

On 02/07/2021 10:29, Pekka Paalanen wrote:

On Thu, 1 Jul 2021 21:28:06 +0200
Daniel Vetter  wrote:


On Thu, Jul 1, 2021 at 8:27 PM Martin Peres 
wrote:


On 01/07/2021 11:14, Pekka Paalanen wrote:

On Wed, 30 Jun 2021 11:58:25 -0700
John Harrison  wrote:

On 6/30/2021 01:22, Martin Peres wrote:

On 24/06/2021 10:05, Matthew Brost wrote:

From: Daniele Ceraolo Spurio 

Unblock GuC submission on Gen11+ platforms.

Signed-off-by: Michal Wajdeczko 
Signed-off-by: Daniele Ceraolo Spurio

Signed-off-by: Matthew Brost 
---
drivers/gpu/drm/i915/gt/uc/intel_guc.h |  1 +
drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c |  8 
drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h |  3 +--
drivers/gpu/drm/i915/gt/uc/intel_uc.c | 14
+-
 4 files changed, 19 insertions(+), 7 deletions(-)


...

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index 7a69c3c027e9..61be0aa81492 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -34,8 +34,15 @@ static void uc_expand_default_options(struct
intel_uc *uc)
 return;
 }
 -    /* Default: enable HuC authentication only */
-    i915->params.enable_guc = ENABLE_GUC_LOAD_HUC;
+    /* Intermediate platforms are HuC authentication only */
+    if (IS_DG1(i915) || IS_ALDERLAKE_S(i915)) {
+    drm_dbg(&i915->drm, "Disabling GuC only due to old
platform\n");


This comment does not seem accurate, given that DG1 is barely
out, and
ADL is not out yet. How about:

"Disabling GuC on untested platforms"?

Just because something is not in the shops yet does not mean it is
new.
Technology is always obsolete by the time it goes on sale.


That is a very good reason to not use terminology like "new", 
"old",

"current", "modern" etc. at all.

End users like me definitely do not share your interpretation of
"old".


Yep, old and new is relative. In the end, what matters is the
validation
effort, which is why I was proposing "untested platforms".

Also, remember that you are not writing these messages for Intel
engineers, but instead are writing for Linux *users*.


It's drm_dbg. Users don't read this stuff, at least not users with no
clue what the driver does and stuff like that.


If I had a problem, I would read it, and I have no clue what anything
of that is.


Exactly.
I don't see how replacing 'old' for 'untested' helps anybody to 
understand anything. Untested just implies we can't be bothered to test 
stuff before publishing it. And as previously stated, this is purely a 
political decision not a technical one. Sure, change the message to be 
'Disabling GuC submission but enabling HuC loading via GuC on platform 
XXX' if that makes it clearer what is going on. Or just drop the message 
completely. It's simply explaining what the default option is for the 
current platform which you can also get by reading the code. However, I 
disagree that 'untested' is the correct message. Quite a lot of testing 
has been happening on TGL+ with GuC submission enabled.




This level of defense for what is clearly a bad *debug* message (at the
very least, the grammar) makes no sense at all!

I don't want to hear arguments like "Not my patch" from a developer
literally sending the patch to the ML and who added his SoB to the
patch, playing with words, or minimizing the problem of having such a
message.


Agree that 'not my patch' is never a good excuse, but equally we can't
blame original patch author as patch was updated few times since then.


I never wanted to blame the author here, I was only speaking about the 
handling of feedback on the patch.




Maybe to avoid confusions and simplify reviews, we could split this
patch into two smaller: first one that really unblocks GuC submission on
all Gen11+ (see __guc_submission_supported) and second one that updates
defaults for Gen12+ (see uc_expand_default_options), as original patch
(from ~2019) evolved more than what title/commit message says.


Both work for me, as long as it is a collaborative effort.
I'm not seeing how splitting the patch up fixes the complaints about the 
debug message.


And to be clear, no-one is actually arguing for a code change as such? 
The issue is just about the text of the debug message? Or did I miss 
something somewhere?


John.




Cheers,
Martin



Then we can fix all messaging and make sure it's clear and understood.

Thanks,
Michal



All of the above are just clear signals for the community to get off
your playground, which is frankly unacceptable. Your email address does
not matter.

In the spirit of collaboration, your response should have been "Good
catch, ho

Re: [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads

2021-07-07 Thread John Harrison

On 7/7/2021 10:50, Matthew Brost wrote:

On Tue, Jul 06, 2021 at 03:51:00PM -0700, John Harrison wrote:

On 7/6/2021 15:20, Matthew Brost wrote:

CTB writes are now in the path of command submission and should be
optimized for performance. Rather than reading CTB descriptor values
(e.g. head, tail) which could result in accesses across the PCIe bus,
store shadow local copies and only read/write the descriptor values when
absolutely necessary. Also store the current space in the each channel
locally.

v2:
   (Michal)
- Add additional sanity checks for head / tail pointers
- Use GUC_CTB_HDR_LEN rather than magic 1
v3:
   (Michal / John H)
- Drop redundant check of head value

Signed-off-by: John Harrison 
Signed-off-by: Matthew Brost 
---
   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 88 +++
   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  6 ++
   2 files changed, 65 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index db3e85b89573..4a73a1f03a9b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct 
guc_ct_buffer_desc *desc)
   static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
   {
ctb->broken = false;
+   ctb->tail = 0;
+   ctb->head = 0;
+   ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
+
guc_ct_buffer_desc_init(ctb->desc);
   }
@@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct,
   {
struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
struct guc_ct_buffer_desc *desc = ctb->desc;
-   u32 head = desc->head;
-   u32 tail = desc->tail;
+   u32 tail = ctb->tail;
u32 size = ctb->size;
-   u32 used;
u32 header;
u32 hxg;
u32 type;
@@ -396,25 +398,22 @@ static int ct_write(struct intel_guc_ct *ct,
if (unlikely(desc->status))
goto corrupted;
-   if (unlikely((tail | head) >= size)) {
+   GEM_BUG_ON(tail > size);
+
+#ifdef CONFIG_DRM_I915_DEBUG_GUC
+   if (unlikely(tail != READ_ONCE(desc->tail))) {
+   CT_ERROR(ct, "Tail was modified %u != %u\n",
+desc->tail, ctb->tail);
+   desc->status |= GUC_CTB_STATUS_MISMATCH;
+   goto corrupted;
+   }
+   if (unlikely((desc->tail | desc->head) >= size)) {

Same arguments below about head apply to tail here. Also, there is no #else

Yes, desc->tail can be removed from this check. Same for head below. Can
you fix this when merging?


check on ctb->head?

ctb->head variable isn't used in this path nor is ctb->tail in the
other. In the other path desc->tail is checked as it is read while
desc->head isn't needed to be read here. The other path can also likely
be reworked to pull the tail check outside of the if / else define
block.


CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
-head, tail, size);
+desc->head, desc->tail, size);
desc->status |= GUC_CTB_STATUS_OVERFLOW;
goto corrupted;
}
-
-   /*
-* tail == head condition indicates empty. GuC FW does not support
-* using up the entire buffer to get tail == head meaning full.
-*/
-   if (tail < head)
-   used = (size - head) + tail;
-   else
-   used = tail - head;
-
-   /* make sure there is a space including extra dw for the header */
-   if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
-   return -ENOSPC;
+#endif
/*
 * dw0: CT header (including fence)
@@ -453,7 +452,9 @@ static int ct_write(struct intel_guc_ct *ct,
write_barrier(ct);
/* now update descriptor */
+   ctb->tail = tail;
WRITE_ONCE(desc->tail, tail);
+   ctb->space -= len + GUC_CTB_HDR_LEN;
return 0;
@@ -469,7 +470,7 @@ static int ct_write(struct intel_guc_ct *ct,
* @req: pointer to pending request
* @status:  placeholder for status
*
- * For each sent request, Guc shall send bac CT response message.
+ * For each sent request, GuC shall send back CT response message.
* Our message handler will update status of tracked request once
* response message with given fence is received. Wait here and
* check for valid response status value.
@@ -525,24 +526,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
return ret;
   }
-static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
+static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
   {
-   struct guc_ct_buffer_desc *desc = ctb->desc;
-   u32 head = READ_ONCE(desc->he

Re: [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads

2021-07-07 Thread John Harrison

On 7/7/2021 11:56, Matthew Brost wrote:


Ok, I sent it but I looks like patchworks didn't like it. Anyways we
should be able to review that patch.

Matt
Maybe because it came out as 6/56 instead of 6/7? Also, not sure if it 
needs to be in reply to 0/7 or 6/7?


John.



Re: [PATCH 14/47] drm/i915/guc: Insert fence on context when deregistering

2021-07-09 Thread John Harrison

On 6/24/2021 00:04, Matthew Brost wrote:

Sometime during context pinning a context with the same guc_id is

Sometime*s*


registered with the GuC. In this a case deregister must be before before

before before -> done before


the context can be registered. A fence is inserted on all requests while
the deregister is in flight. Once the G2H is received indicating the
deregistration is complete the context is registered and the fence is
released.

Cc: John Harrison
Signed-off-by: Matthew Brost

With the above text fixed up:
Reviewed-by: John Harrison 


---
  drivers/gpu/drm/i915/gt/intel_context.c   |  1 +
  drivers/gpu/drm/i915/gt/intel_context_types.h |  5 ++
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 51 ++-
  drivers/gpu/drm/i915/i915_request.h   |  8 +++
  4 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index 2b68af16222c..f750c826e19d 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -384,6 +384,7 @@ intel_context_init(struct intel_context *ce, struct 
intel_engine_cs *engine)
mutex_init(&ce->pin_mutex);
  
  	spin_lock_init(&ce->guc_state.lock);

+   INIT_LIST_HEAD(&ce->guc_state.fences);
  
  	ce->guc_id = GUC_INVALID_LRC_ID;

INIT_LIST_HEAD(&ce->guc_id_link);
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index ce7c69b34cd1..beafe55a9101 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -146,6 +146,11 @@ struct intel_context {
 * submission
 */
u8 sched_state;
+   /*
+* fences: maintains of list of requests that have a submit
+* fence related to GuC submission
+*/
+   struct list_head fences;
} guc_state;
  
  	/* GuC scheduling state that does not require a lock. */

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index d39579ac2faa..49e5d460d54b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -924,6 +924,30 @@ static const struct intel_context_ops guc_context_ops = {
.destroy = guc_context_destroy,
  };
  
+static void __guc_signal_context_fence(struct intel_context *ce)

+{
+   struct i915_request *rq;
+
+   lockdep_assert_held(&ce->guc_state.lock);
+
+   list_for_each_entry(rq, &ce->guc_state.fences, guc_fence_link)
+   i915_sw_fence_complete(&rq->submit);
+
+   INIT_LIST_HEAD(&ce->guc_state.fences);
+}
+
+static void guc_signal_context_fence(struct intel_context *ce)
+{
+   unsigned long flags;
+
+   GEM_BUG_ON(!context_wait_for_deregister_to_register(ce));
+
+   spin_lock_irqsave(&ce->guc_state.lock, flags);
+   clr_context_wait_for_deregister_to_register(ce);
+   __guc_signal_context_fence(ce);
+   spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+}
+
  static bool context_needs_register(struct intel_context *ce, bool new_guc_id)
  {
return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
@@ -934,6 +958,7 @@ static int guc_request_alloc(struct i915_request *rq)
  {
struct intel_context *ce = rq->context;
struct intel_guc *guc = ce_to_guc(ce);
+   unsigned long flags;
int ret;
  
  	GEM_BUG_ON(!intel_context_is_pinned(rq->context));

@@ -978,7 +1003,7 @@ static int guc_request_alloc(struct i915_request *rq)
 * increment (in pin_guc_id) is needed to seal a race with unpin_guc_id.
 */
if (atomic_add_unless(&ce->guc_id_ref, 1, 0))
-   return 0;
+   goto out;
  
  	ret = pin_guc_id(guc, ce);	/* returns 1 if new guc_id assigned */

if (unlikely(ret < 0))
@@ -994,6 +1019,28 @@ static int guc_request_alloc(struct i915_request *rq)
  
  	clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
  
+out:

+   /*
+* We block all requests on this context if a G2H is pending for a
+* context deregistration as the GuC will fail a context registration
+* while this G2H is pending. Once a G2H returns, the fence is released
+* that is blocking these requests (see guc_signal_context_fence).
+*
+* We can safely check the below field outside of the lock as it isn't
+* possible for this field to transition from being clear to set but
+* converse is possible, hence the need for the check within the lock.
+*/
+   if (likely(!context_wait_for_deregister_to_register(ce)))
+   return 0;
+
+   spin_lock_irqsave(&ce->guc_state.lock, flags);
+   if (context_wait_f

Re: [PATCH 15/47] drm/i915/guc: Defer context unpin until scheduling is disabled

2021-07-09 Thread John Harrison

On 6/24/2021 00:04, Matthew Brost wrote:

With GuC scheduling, it isn't safe to unpin a context while scheduling
is enabled for that context as the GuC may touch some of the pinned
state (e.g. LRC). To ensure scheduling isn't enabled when an unpin is
done, a call back is added to intel_context_unpin when pin count == 1
to disable scheduling for that context. When the response CTB is
received it is safe to do the final unpin.

Future patches may add a heuristic / delay to schedule the disable
call back to avoid thrashing on schedule enable / disable.

Cc: John Harrison 
Signed-off-by: Matthew Brost 

Reviewed-by: John Harrison 


---
  drivers/gpu/drm/i915/gt/intel_context.c   |   4 +-
  drivers/gpu/drm/i915/gt/intel_context.h   |  27 +++-
  drivers/gpu/drm/i915/gt/intel_context_types.h |   2 +
  drivers/gpu/drm/i915/gt/uc/intel_guc.h|   2 +
  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |   3 +
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 145 +-
  6 files changed, 179 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index f750c826e19d..1499b8aace2a 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -306,9 +306,9 @@ int __intel_context_do_pin(struct intel_context *ce)
return err;
  }
  
-void intel_context_unpin(struct intel_context *ce)

+void __intel_context_do_unpin(struct intel_context *ce, int sub)
  {
-   if (!atomic_dec_and_test(&ce->pin_count))
+   if (!atomic_sub_and_test(sub, &ce->pin_count))
return;
  
  	CE_TRACE(ce, "unpin\n");

diff --git a/drivers/gpu/drm/i915/gt/intel_context.h 
b/drivers/gpu/drm/i915/gt/intel_context.h
index f83a73a2b39f..8a7199afbe61 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -113,7 +113,32 @@ static inline void __intel_context_pin(struct 
intel_context *ce)
atomic_inc(&ce->pin_count);
  }
  
-void intel_context_unpin(struct intel_context *ce);

+void __intel_context_do_unpin(struct intel_context *ce, int sub);
+
+static inline void intel_context_sched_disable_unpin(struct intel_context *ce)
+{
+   __intel_context_do_unpin(ce, 2);
+}
+
+static inline void intel_context_unpin(struct intel_context *ce)
+{
+   if (!ce->ops->sched_disable) {
+   __intel_context_do_unpin(ce, 1);
+   } else {
+   /*
+* Move ownership of this pin to the scheduling disable which is
+* an async operation. When that operation completes the above
+* intel_context_sched_disable_unpin is called potentially
+* unpinning the context.
+*/
+   while (!atomic_add_unless(&ce->pin_count, -1, 1)) {
+   if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1) {
+   ce->ops->sched_disable(ce);
+   break;
+   }
+   }
+   }
+}
  
  void intel_context_enter_engine(struct intel_context *ce);

  void intel_context_exit_engine(struct intel_context *ce);
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index beafe55a9101..e7af6a2368f8 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -43,6 +43,8 @@ struct intel_context_ops {
void (*enter)(struct intel_context *ce);
void (*exit)(struct intel_context *ce);
  
+	void (*sched_disable)(struct intel_context *ce);

+
void (*reset)(struct intel_context *ce);
void (*destroy)(struct kref *kref);
  };
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index d44316dc914b..b43ec56986b5 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -236,6 +236,8 @@ int intel_guc_reset_engine(struct intel_guc *guc,
  
  int intel_guc_deregister_done_process_msg(struct intel_guc *guc,

  const u32 *msg, u32 len);
+int intel_guc_sched_done_process_msg(struct intel_guc *guc,
+const u32 *msg, u32 len);
  
  void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
  
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c

index 42a7daef2ff6..7491f041859e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -905,6 +905,9 @@ static int ct_process_request(struct intel_guc_ct *ct, 
struct ct_incoming_msg *r
ret = intel_guc_deregister_done_process_msg(guc, payload,
len);
break;
+   case INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE:

Re: [PATCH 16/47] drm/i915/guc: Disable engine barriers with GuC during unpin

2021-07-09 Thread John Harrison

On 6/24/2021 00:04, Matthew Brost wrote:

Disable engine barriers for unpinning with GuC. This feature isn't
needed with the GuC as it disables context scheduling before unpinning
which guarantees the HW will not reference the context. Hence it is
not necessary to defer unpinning until a kernel context request
completes on each engine in the context engine mask.

Cc: John Harrison 
Signed-off-by: Matthew Brost 
Signed-off-by: Daniele Ceraolo Spurio 
---
  drivers/gpu/drm/i915/gt/intel_context.c|  2 +-
  drivers/gpu/drm/i915/gt/intel_context.h|  1 +
  drivers/gpu/drm/i915/gt/selftest_context.c | 10 ++
  3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index 1499b8aace2a..7f97753ab164 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -80,7 +80,7 @@ static int intel_context_active_acquire(struct intel_context 
*ce)
  
  	__i915_active_acquire(&ce->active);
  
-	if (intel_context_is_barrier(ce))

+   if (intel_context_is_barrier(ce) || intel_engine_uses_guc(ce->engine))
return 0;
Would be better to have a scheduler flag to say whether barriers are 
required or not. That would prevent polluting front end code with back 
end details.


John.


  
  	/* Preallocate tracking nodes */

diff --git a/drivers/gpu/drm/i915/gt/intel_context.h 
b/drivers/gpu/drm/i915/gt/intel_context.h
index 8a7199afbe61..a592a9605dc8 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -16,6 +16,7 @@
  #include "intel_engine_types.h"
  #include "intel_ring_types.h"
  #include "intel_timeline_types.h"
+#include "uc/intel_guc_submission.h"
  
  #define CE_TRACE(ce, fmt, ...) do {	\

const struct intel_context *ce__ = (ce);\
diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c 
b/drivers/gpu/drm/i915/gt/selftest_context.c
index 26685b927169..fa7b99a671dd 100644
--- a/drivers/gpu/drm/i915/gt/selftest_context.c
+++ b/drivers/gpu/drm/i915/gt/selftest_context.c
@@ -209,7 +209,13 @@ static int __live_active_context(struct intel_engine_cs 
*engine)
 * This test makes sure that the context is kept alive until a
 * subsequent idle-barrier (emitted when the engine wakeref hits 0
 * with no more outstanding requests).
+*
+* In GuC submission mode we don't use idle barriers and we instead
+* get a message from the GuC to signal that it is safe to unpin the
+* context from memory.
 */
+   if (intel_engine_uses_guc(engine))
+   return 0;
  
  	if (intel_engine_pm_is_awake(engine)) {

pr_err("%s is awake before starting %s!\n",
@@ -357,7 +363,11 @@ static int __live_remote_context(struct intel_engine_cs 
*engine)
 * on the context image remotely (intel_context_prepare_remote_request),
 * which inserts foreign fences into intel_context.active, does not
 * clobber the idle-barrier.
+*
+* In GuC submission mode we don't use idle barriers.
 */
+   if (intel_engine_uses_guc(engine))
+   return 0;
  
  	if (intel_engine_pm_is_awake(engine)) {

pr_err("%s is awake before starting %s!\n",




Re: [PATCH 17/47] drm/i915/guc: Extend deregistration fence to schedule disable

2021-07-09 Thread John Harrison

On 6/24/2021 00:04, Matthew Brost wrote:

Extend the deregistration context fence to fence whne a GuC context has
scheduling disable pending.

Cc: John Harrison 
Signed-off-by: Matthew Brost 
---
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 37 +++
  1 file changed, 30 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 0386ccd5a481..0a6ccdf32316 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -918,7 +918,19 @@ static void guc_context_sched_disable(struct intel_context 
*ce)
goto unpin;
  
  	spin_lock_irqsave(&ce->guc_state.lock, flags);

+
+   /*
+* We have to check if the context has been pinned again as another pin
+* operation is allowed to pass this function. Checking the pin count
+* here synchronizes this function with guc_request_alloc ensuring a
+* request doesn't slip through the 'context_pending_disable' fence.
+*/
The pin count is an atomic so doesn't need the spinlock. Also the above 
comment 'checking the pin count here synchronizes ...' seems wrong. 
Isn't the point that acquiring the spinlock is what synchronises with 
guc_request_alloc? So the comment should be before the spinlock acquire 
and should mention using the spinlock for this purpose?


John.



+   if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
+   spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+   return;
+   }
guc_id = prep_context_pending_disable(ce);
+
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
  
  	with_intel_runtime_pm(runtime_pm, wakeref)

@@ -1123,19 +1135,22 @@ static int guc_request_alloc(struct i915_request *rq)
  out:
/*
 * We block all requests on this context if a G2H is pending for a
-* context deregistration as the GuC will fail a context registration
-* while this G2H is pending. Once a G2H returns, the fence is released
-* that is blocking these requests (see guc_signal_context_fence).
+* schedule disable or context deregistration as the GuC will fail a
+* schedule enable or context registration if either G2H is pending
+* respectfully. Once a G2H returns, the fence is released that is
+* blocking these requests (see guc_signal_context_fence).
 *
-* We can safely check the below field outside of the lock as it isn't
-* possible for this field to transition from being clear to set but
+* We can safely check the below fields outside of the lock as it isn't
+* possible for these fields to transition from being clear to set but
 * converse is possible, hence the need for the check within the lock.
 */
-   if (likely(!context_wait_for_deregister_to_register(ce)))
+   if (likely(!context_wait_for_deregister_to_register(ce) &&
+  !context_pending_disable(ce)))
return 0;
  
  	spin_lock_irqsave(&ce->guc_state.lock, flags);

-   if (context_wait_for_deregister_to_register(ce)) {
+   if (context_wait_for_deregister_to_register(ce) ||
+   context_pending_disable(ce)) {
i915_sw_fence_await(&rq->submit);
  
  		list_add_tail(&rq->guc_fence_link, &ce->guc_state.fences);

@@ -1484,10 +1499,18 @@ int intel_guc_sched_done_process_msg(struct intel_guc 
*guc,
if (context_pending_enable(ce)) {
clr_context_pending_enable(ce);
} else if (context_pending_disable(ce)) {
+   /*
+* Unpin must be done before __guc_signal_context_fence,
+* otherwise a race exists between the requests getting
+* submitted + retired before this unpin completes resulting in
+* the pin_count going to zero and the context still being
+* enabled.
+*/
intel_context_sched_disable_unpin(ce);
  
  		spin_lock_irqsave(&ce->guc_state.lock, flags);

clr_context_pending_disable(ce);
+   __guc_signal_context_fence(ce);
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
}
  




Re: [PATCH 18/47] drm/i915: Disable preempt busywait when using GuC scheduling

2021-07-09 Thread John Harrison

On 6/24/2021 00:04, Matthew Brost wrote:

Disable preempt busywait when using GuC scheduling. This isn't need as

needed


the GuC control preemption when scheduling.

controls

With the above fixed:
Reviewed-by: John Harrison 




Cc: John Harrison 
Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c 
b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
index 87b06572fd2e..f7aae502ec3d 100644
--- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
@@ -506,7 +506,8 @@ gen8_emit_fini_breadcrumb_tail(struct i915_request *rq, u32 
*cs)
*cs++ = MI_USER_INTERRUPT;
  
  	*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;

-   if (intel_engine_has_semaphores(rq->engine))
+   if (intel_engine_has_semaphores(rq->engine) &&
+   !intel_uc_uses_guc_submission(&rq->engine->gt->uc))
cs = emit_preempt_busywait(rq, cs);
  
  	rq->tail = intel_ring_offset(rq, cs);

@@ -598,7 +599,8 @@ gen12_emit_fini_breadcrumb_tail(struct i915_request *rq, 
u32 *cs)
*cs++ = MI_USER_INTERRUPT;
  
  	*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;

-   if (intel_engine_has_semaphores(rq->engine))
+   if (intel_engine_has_semaphores(rq->engine) &&
+   !intel_uc_uses_guc_submission(&rq->engine->gt->uc))
cs = gen12_emit_preempt_busywait(rq, cs);
  
  	rq->tail = intel_ring_offset(rq, cs);




Re: [PATCH 20/47] drm/i915/guc: Disable semaphores when using GuC scheduling

2021-07-09 Thread John Harrison

On 6/24/2021 00:04, Matthew Brost wrote:

Semaphores are an optimization and not required for basic GuC submission
to work properly. Disable until we have time to do the implementation to
enable semaphores and tune them for performance. Also long direction is
just to delete semaphores from the i915 so another reason to not enable
these for GuC submission.

v2: Reword commit message

Cc: John Harrison 
Signed-off-by: Matthew Brost 
I think the commit description does not really match the patch content. 
The description is valid but the 'disable' is done by simply not setting 
the enable flag (done in the execlist back end and presumably not done 
in the GuC back end). However, what the patch is actually doing seems to 
be fixing bugs with the 'are semaphores enabled' mechanism. I.e. 
correcting pieces of code that used semaphores without checking if they 
are enabled. And presumably this would be broken if someone tried to 
disable semaphores in execlist mode for any reason?


So I think keeping the existing comment text is fine but something 
should be added to explain the actual changes.


John.



---
  drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 7720b8c22c81..5c07e6abf16a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -230,7 +230,8 @@ static void intel_context_set_gem(struct intel_context *ce,
ce->timeline = intel_timeline_get(ctx->timeline);
  
  	if (ctx->sched.priority >= I915_PRIORITY_NORMAL &&

-   intel_engine_has_timeslices(ce->engine))
+   intel_engine_has_timeslices(ce->engine) &&
+   intel_engine_has_semaphores(ce->engine))
__set_bit(CONTEXT_USE_SEMAPHORES, &ce->flags);
  
  	intel_context_set_watchdog_us(ce, ctx->watchdog.timeout_us);

@@ -1938,7 +1939,8 @@ static int __apply_priority(struct intel_context *ce, 
void *arg)
if (!intel_engine_has_timeslices(ce->engine))
return 0;
  
-	if (ctx->sched.priority >= I915_PRIORITY_NORMAL)

+   if (ctx->sched.priority >= I915_PRIORITY_NORMAL &&
+   intel_engine_has_semaphores(ce->engine))
intel_context_set_use_semaphores(ce);
else
intel_context_clear_use_semaphores(ce);




Re: [PATCH 22/47] drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC

2021-07-09 Thread John Harrison

On 6/24/2021 00:04, Matthew Brost wrote:

When running the GuC the GPU can't be considered idle if the GuC still
has contexts pinned. As such, a call has been added in
intel_gt_wait_for_idle to idle the UC and in turn the GuC by waiting for
the number of unpinned contexts to go to zero.

v2: rtimeout -> remaining_timeout

Cc: John Harrison 
Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gem/i915_gem_mman.c  |  3 +-
  drivers/gpu/drm/i915/gt/intel_gt.c| 19 
  drivers/gpu/drm/i915/gt/intel_gt.h|  2 +
  drivers/gpu/drm/i915/gt/intel_gt_requests.c   | 22 ++---
  drivers/gpu/drm/i915/gt/intel_gt_requests.h   |  9 +-
  drivers/gpu/drm/i915/gt/uc/intel_guc.h|  4 +
  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |  1 +
  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  4 +
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 88 ++-
  drivers/gpu/drm/i915/gt/uc/intel_uc.h |  5 ++
  drivers/gpu/drm/i915/i915_debugfs.c   |  1 +
  drivers/gpu/drm/i915/i915_gem_evict.c |  1 +
  .../gpu/drm/i915/selftests/igt_live_test.c|  2 +-
  .../gpu/drm/i915/selftests/mock_gem_device.c  |  3 +-
  14 files changed, 137 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index 2fd155742bd2..335b955d5b4b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -644,7 +644,8 @@ mmap_offset_attach(struct drm_i915_gem_object *obj,
goto insert;
  
  	/* Attempt to reap some mmap space from dead objects */

-   err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT);
+   err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT,
+  NULL);
if (err)
goto err;
  
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c

index e714e21c0a4d..acfdd53b2678 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -585,6 +585,25 @@ static void __intel_gt_disable(struct intel_gt *gt)
GEM_BUG_ON(intel_gt_pm_is_awake(gt));
  }
  
+int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)

+{
+   long remaining_timeout;
+
+   /* If the device is asleep, we have no requests outstanding */
+   if (!intel_gt_pm_is_awake(gt))
+   return 0;
+
+   while ((timeout = intel_gt_retire_requests_timeout(gt, timeout,
+  &remaining_timeout)) 
> 0) {
+   cond_resched();
+   if (signal_pending(current))
+   return -EINTR;
+   }
+
+   return timeout ? timeout : intel_uc_wait_for_idle(>->uc,
+ remaining_timeout);
+}
+
  int intel_gt_init(struct intel_gt *gt)
  {
int err;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h 
b/drivers/gpu/drm/i915/gt/intel_gt.h
index e7aabe0cc5bf..74e771871a9b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -48,6 +48,8 @@ void intel_gt_driver_release(struct intel_gt *gt);
  
  void intel_gt_driver_late_release(struct intel_gt *gt);
  
+int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);

+
  void intel_gt_check_and_clear_faults(struct intel_gt *gt);
  void intel_gt_clear_error_registers(struct intel_gt *gt,
intel_engine_mask_t engine_mask);
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c 
b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
index 647eca9d867a..39f5e824dac5 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
@@ -13,6 +13,7 @@
  #include "intel_gt_pm.h"
  #include "intel_gt_requests.h"
  #include "intel_timeline.h"
+#include "uc/intel_uc.h"

Why is this needed?

  
  static bool retire_requests(struct intel_timeline *tl)

  {
@@ -130,7 +131,8 @@ void intel_engine_fini_retire(struct intel_engine_cs 
*engine)
GEM_BUG_ON(engine->retire);
  }
  
-long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout)

+long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
+ long *remaining_timeout)
  {
struct intel_gt_timelines *timelines = >->timelines;
struct intel_timeline *tl, *tn;
@@ -195,22 +197,10 @@ out_active:   spin_lock(&timelines->lock);
if (flush_submission(gt, timeout)) /* Wait, there's more! */
active_count++;
  
-	return active_count ? timeout : 0;

-}
-
-int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
-{
-   /* If the device is asleep, we have no requests outstanding */
-   if (!intel_gt_pm_is_awake(gt))
-   return 0;
-
-   whil

Re: [PATCH 17/47] drm/i915/guc: Extend deregistration fence to schedule disable

2021-07-12 Thread John Harrison

On 7/9/2021 20:36, Matthew Brost wrote:

On Fri, Jul 09, 2021 at 03:59:11PM -0700, John Harrison wrote:

On 6/24/2021 00:04, Matthew Brost wrote:

Extend the deregistration context fence to fence whne a GuC context has
scheduling disable pending.

Cc: John Harrison 
Signed-off-by: Matthew Brost 
---
   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 37 +++
   1 file changed, 30 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 0386ccd5a481..0a6ccdf32316 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -918,7 +918,19 @@ static void guc_context_sched_disable(struct intel_context 
*ce)
goto unpin;
spin_lock_irqsave(&ce->guc_state.lock, flags);
+
+   /*
+* We have to check if the context has been pinned again as another pin
+* operation is allowed to pass this function. Checking the pin count
+* here synchronizes this function with guc_request_alloc ensuring a
+* request doesn't slip through the 'context_pending_disable' fence.
+*/

The pin count is an atomic so doesn't need the spinlock. Also the above

How about?

/*
  * We have to check if the context has been pinned again as another pin
  * operation is allowed to pass this function. Checking the pin count,
  * within ce->guc_state.lock, synchronizes this function with
  * guc_request_alloc ensuring a request doesn't slip through the
  * 'context_pending_disable' fence. Checking within the spin lock (can't
  * sleep) ensures another process doesn't pin this context and generate
  * a request before we set the 'context_pending_disable' flag here.
  */

Matt

Sounds good. With that added in:
Reviewed-by: John Harrison 




comment 'checking the pin count here synchronizes ...' seems wrong. Isn't
the point that acquiring the spinlock is what synchronises with
guc_request_alloc? So the comment should be before the spinlock acquire and
should mention using the spinlock for this purpose?

John.



+   if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
+   spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+   return;
+   }
guc_id = prep_context_pending_disable(ce);
+
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
with_intel_runtime_pm(runtime_pm, wakeref)
@@ -1123,19 +1135,22 @@ static int guc_request_alloc(struct i915_request *rq)
   out:
/*
 * We block all requests on this context if a G2H is pending for a
-* context deregistration as the GuC will fail a context registration
-* while this G2H is pending. Once a G2H returns, the fence is released
-* that is blocking these requests (see guc_signal_context_fence).
+* schedule disable or context deregistration as the GuC will fail a
+* schedule enable or context registration if either G2H is pending
+* respectfully. Once a G2H returns, the fence is released that is
+* blocking these requests (see guc_signal_context_fence).
 *
-* We can safely check the below field outside of the lock as it isn't
-* possible for this field to transition from being clear to set but
+* We can safely check the below fields outside of the lock as it isn't
+* possible for these fields to transition from being clear to set but
 * converse is possible, hence the need for the check within the lock.
 */
-   if (likely(!context_wait_for_deregister_to_register(ce)))
+   if (likely(!context_wait_for_deregister_to_register(ce) &&
+  !context_pending_disable(ce)))
return 0;
spin_lock_irqsave(&ce->guc_state.lock, flags);
-   if (context_wait_for_deregister_to_register(ce)) {
+   if (context_wait_for_deregister_to_register(ce) ||
+   context_pending_disable(ce)) {
i915_sw_fence_await(&rq->submit);
list_add_tail(&rq->guc_fence_link, &ce->guc_state.fences);
@@ -1484,10 +1499,18 @@ int intel_guc_sched_done_process_msg(struct intel_guc 
*guc,
if (context_pending_enable(ce)) {
clr_context_pending_enable(ce);
} else if (context_pending_disable(ce)) {
+   /*
+* Unpin must be done before __guc_signal_context_fence,
+* otherwise a race exists between the requests getting
+* submitted + retired before this unpin completes resulting in
+* the pin_count going to zero and the context still being
+* enabled.
+*/
intel_context_sched_disable_unpin(ce);
spin_lock_irqsave(&ce->guc_state

Re: [PATCH 16/47] drm/i915/guc: Disable engine barriers with GuC during unpin

2021-07-12 Thread John Harrison

On 7/9/2021 20:00, Matthew Brost wrote:

On Fri, Jul 09, 2021 at 03:53:29PM -0700, John Harrison wrote:

On 6/24/2021 00:04, Matthew Brost wrote:

Disable engine barriers for unpinning with GuC. This feature isn't
needed with the GuC as it disables context scheduling before unpinning
which guarantees the HW will not reference the context. Hence it is
not necessary to defer unpinning until a kernel context request
completes on each engine in the context engine mask.

Cc: John Harrison 
Signed-off-by: Matthew Brost 
Signed-off-by: Daniele Ceraolo Spurio 
---
   drivers/gpu/drm/i915/gt/intel_context.c|  2 +-
   drivers/gpu/drm/i915/gt/intel_context.h|  1 +
   drivers/gpu/drm/i915/gt/selftest_context.c | 10 ++
   3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index 1499b8aace2a..7f97753ab164 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -80,7 +80,7 @@ static int intel_context_active_acquire(struct intel_context 
*ce)
__i915_active_acquire(&ce->active);
-   if (intel_context_is_barrier(ce))
+   if (intel_context_is_barrier(ce) || intel_engine_uses_guc(ce->engine))
return 0;

Would be better to have a scheduler flag to say whether barriers are
required or not. That would prevent polluting front end code with back end
details.


I guess an engine flag is slightly better but I still don't love that
as we have to test if the context is a barrier (kernel context) and then
call a function that is basically backend specific after. IMO we really
need to push all of this to a vfunc. If you really want me to make this
an engine flag I can, but in the end it just seems like that will
trash the code (adding an engine flag just to remove it). I think this
is just a clean up we write down, and figure out a bit later as nothing
is functionally wrong + quite clear that it is something that should be
cleaned up.

Matt
Flag, vfunc, whatever. I just mean that it would be better to abstract 
it out in some manner. Maybe a flag/vfunc on the ce object? That way it 
would swallow the 'ignore kernel contexts' test as well. But yes, 
probably best to add it to the todo list and move on as it is not going 
to be a two minute quick fix. I've added a comment to the Jira, so...


Reviewed-by: John Harrison 





John.



/* Preallocate tracking nodes */
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h 
b/drivers/gpu/drm/i915/gt/intel_context.h
index 8a7199afbe61..a592a9605dc8 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -16,6 +16,7 @@
   #include "intel_engine_types.h"
   #include "intel_ring_types.h"
   #include "intel_timeline_types.h"
+#include "uc/intel_guc_submission.h"
   #define CE_TRACE(ce, fmt, ...) do {  \
const struct intel_context *ce__ = (ce);\
diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c 
b/drivers/gpu/drm/i915/gt/selftest_context.c
index 26685b927169..fa7b99a671dd 100644
--- a/drivers/gpu/drm/i915/gt/selftest_context.c
+++ b/drivers/gpu/drm/i915/gt/selftest_context.c
@@ -209,7 +209,13 @@ static int __live_active_context(struct intel_engine_cs 
*engine)
 * This test makes sure that the context is kept alive until a
 * subsequent idle-barrier (emitted when the engine wakeref hits 0
 * with no more outstanding requests).
+*
+* In GuC submission mode we don't use idle barriers and we instead
+* get a message from the GuC to signal that it is safe to unpin the
+* context from memory.
 */
+   if (intel_engine_uses_guc(engine))
+   return 0;
if (intel_engine_pm_is_awake(engine)) {
pr_err("%s is awake before starting %s!\n",
@@ -357,7 +363,11 @@ static int __live_remote_context(struct intel_engine_cs 
*engine)
 * on the context image remotely (intel_context_prepare_remote_request),
 * which inserts foreign fences into intel_context.active, does not
 * clobber the idle-barrier.
+*
+* In GuC submission mode we don't use idle barriers.
 */
+   if (intel_engine_uses_guc(engine))
+   return 0;
if (intel_engine_pm_is_awake(engine)) {
pr_err("%s is awake before starting %s!\n",




Re: [PATCH 23/47] drm/i915/guc: Update GuC debugfs to support new GuC

2021-07-12 Thread John Harrison

On 6/24/2021 00:04, Matthew Brost wrote:

Update GuC debugfs to support the new GuC structures.

Signed-off-by: John Harrison 
Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 22 
  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  3 ++
  .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c| 23 +++-
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 52 +++
  .../gpu/drm/i915/gt/uc/intel_guc_submission.h |  4 ++
  drivers/gpu/drm/i915/i915_debugfs.c   |  1 +
  6 files changed, 104 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index e0f92e28350c..4ed074df88e5 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -1135,3 +1135,25 @@ void intel_guc_ct_event_handler(struct intel_guc_ct *ct)
  
  	ct_try_receive_message(ct);

  }
+
+void intel_guc_log_ct_info(struct intel_guc_ct *ct,
+  struct drm_printer *p)
+{
+   if (!ct->enabled) {
+   drm_puts(p, "CT disabled\n");
+   return;
+   }
+
+   drm_printf(p, "H2G Space: %u\n",
+  atomic_read(&ct->ctbs.send.space) * 4);
+   drm_printf(p, "Head: %u\n",
+  ct->ctbs.send.desc->head);
+   drm_printf(p, "Tail: %u\n",
+  ct->ctbs.send.desc->tail);
+   drm_printf(p, "G2H Space: %u\n",
+  atomic_read(&ct->ctbs.recv.space) * 4);
+   drm_printf(p, "Head: %u\n",
+  ct->ctbs.recv.desc->head);
+   drm_printf(p, "Tail: %u\n",
+  ct->ctbs.recv.desc->tail);
+}
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index ab1b79ab960b..f62eb06b32fc 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -16,6 +16,7 @@
  
  struct i915_vma;

  struct intel_guc;
+struct drm_printer;
  
  /**

   * DOC: Command Transport (CT).
@@ -106,4 +107,6 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 
*action, u32 len,
  u32 *response_buf, u32 response_buf_size, u32 flags);
  void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
  
+void intel_guc_log_ct_info(struct intel_guc_ct *ct, struct drm_printer *p);

+
  #endif /* _INTEL_GUC_CT_H_ */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
index fe7cb7b29a1e..62b9ce0fafaa 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
@@ -9,6 +9,8 @@
  #include "intel_guc.h"
  #include "intel_guc_debugfs.h"
  #include "intel_guc_log_debugfs.h"
+#include "gt/uc/intel_guc_ct.h"
+#include "gt/uc/intel_guc_submission.h"
  
  static int guc_info_show(struct seq_file *m, void *data)

  {
@@ -22,16 +24,35 @@ static int guc_info_show(struct seq_file *m, void *data)
drm_puts(&p, "\n");
intel_guc_log_info(&guc->log, &p);
  
-	/* Add more as required ... */

+   if (!intel_guc_submission_is_used(guc))
+   return 0;
+
+   intel_guc_log_ct_info(&guc->ct, &p);
+   intel_guc_log_submission_info(guc, &p);
  
  	return 0;

  }
  DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_info);
  
+static int guc_registered_contexts_show(struct seq_file *m, void *data)

+{
+   struct intel_guc *guc = m->private;
+   struct drm_printer p = drm_seq_file_printer(m);
+
+   if (!intel_guc_submission_is_used(guc))
+   return -ENODEV;
+
+   intel_guc_log_context_info(guc, &p);
+
+   return 0;
+}
+DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_registered_contexts);
+
  void intel_guc_debugfs_register(struct intel_guc *guc, struct dentry *root)
  {
static const struct debugfs_gt_file files[] = {
{ "guc_info", &guc_info_fops, NULL },
+   { "guc_registered_contexts", &guc_registered_contexts_fops, 
NULL },
};
  
  	if (!intel_guc_is_supported(guc))

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index d1a28283a9ae..89b3c7e5d15b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1600,3 +1600,55 @@ int intel_guc_sched_done_process_msg(struct intel_guc 
*guc,
  
  	return 0;

  }
+
+void intel_guc_log_submission_info(struct intel_guc *guc,
+  struct drm_printer *p)
+{
+   struct i915_sched_engine *sched_engine = guc->sched_engine;
+   struct rb_node *rb;
+   unsigned long flags;
+
+   drm_printf(p, "GuC Number Outstanding Submission G2H: %u\n",
+  

Re: [PATCH 24/47] drm/i915/guc: Add several request trace points

2021-07-12 Thread John Harrison

On 6/24/2021 00:04, Matthew Brost wrote:

Add trace points for request dependencies and GuC submit. Extended
existing request trace points to include submit fence value,, guc_id,
Excessive punctuation. Or maybe should say 'fence value, tail, guc_id'? 
With that fixed:


Reviewed-by: John Harrison 



and ring tail value.

Cc: John Harrison 
Signed-off-by: Matthew Brost 
---
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  3 ++
  drivers/gpu/drm/i915/i915_request.c   |  3 ++
  drivers/gpu/drm/i915/i915_trace.h | 39 ++-
  3 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 89b3c7e5d15b..c2327eebc09c 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -422,6 +422,7 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
guc->stalled_request = last;
return false;
}
+   trace_i915_request_guc_submit(last);
}
  
  	guc->stalled_request = NULL;

@@ -642,6 +643,8 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc,
ret = guc_add_request(guc, rq);
if (ret == -EBUSY)
guc->stalled_request = rq;
+   else
+   trace_i915_request_guc_submit(rq);
  
  	return ret;

  }
diff --git a/drivers/gpu/drm/i915/i915_request.c 
b/drivers/gpu/drm/i915/i915_request.c
index d92c9f25c9f4..7f7aa096e873 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1344,6 +1344,9 @@ __i915_request_await_execution(struct i915_request *to,
return err;
}
  
+	trace_i915_request_dep_to(to);

+   trace_i915_request_dep_from(from);
+
/* Couple the dependency tree for PI on this exposed to->fence */
if (to->engine->sched_engine->schedule) {
err = i915_sched_node_add_dependency(&to->sched,
diff --git a/drivers/gpu/drm/i915/i915_trace.h 
b/drivers/gpu/drm/i915/i915_trace.h
index 6778ad2a14a4..b02d04b6c8f6 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -794,22 +794,27 @@ DECLARE_EVENT_CLASS(i915_request,
TP_STRUCT__entry(
 __field(u32, dev)
 __field(u64, ctx)
+__field(u32, guc_id)
 __field(u16, class)
 __field(u16, instance)
 __field(u32, seqno)
+__field(u32, tail)
 ),
  
  	TP_fast_assign(

   __entry->dev = rq->engine->i915->drm.primary->index;
   __entry->class = rq->engine->uabi_class;
   __entry->instance = rq->engine->uabi_instance;
+  __entry->guc_id = rq->context->guc_id;
   __entry->ctx = rq->fence.context;
   __entry->seqno = rq->fence.seqno;
+  __entry->tail = rq->tail;
   ),
  
-	TP_printk("dev=%u, engine=%u:%u, ctx=%llu, seqno=%u",

+   TP_printk("dev=%u, engine=%u:%u, guc_id=%u, ctx=%llu, seqno=%u, 
tail=%u",
  __entry->dev, __entry->class, __entry->instance,
- __entry->ctx, __entry->seqno)
+ __entry->guc_id, __entry->ctx, __entry->seqno,
+ __entry->tail)
  );
  
  DEFINE_EVENT(i915_request, i915_request_add,

@@ -818,6 +823,21 @@ DEFINE_EVENT(i915_request, i915_request_add,
  );
  
  #if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS)

+DEFINE_EVENT(i915_request, i915_request_dep_to,
+TP_PROTO(struct i915_request *rq),
+TP_ARGS(rq)
+);
+
+DEFINE_EVENT(i915_request, i915_request_dep_from,
+TP_PROTO(struct i915_request *rq),
+TP_ARGS(rq)
+);
+
+DEFINE_EVENT(i915_request, i915_request_guc_submit,
+TP_PROTO(struct i915_request *rq),
+TP_ARGS(rq)
+);
+
  DEFINE_EVENT(i915_request, i915_request_submit,
 TP_PROTO(struct i915_request *rq),
 TP_ARGS(rq)
@@ -887,6 +907,21 @@ TRACE_EVENT(i915_request_out,
  
  #else

  #if !defined(TRACE_HEADER_MULTI_READ)
+static inline void
+trace_i915_request_dep_to(struct i915_request *rq)
+{
+}
+
+static inline void
+trace_i915_request_dep_from(struct i915_request *rq)
+{
+}
+
+static inline void
+trace_i915_request_guc_submit(struct i915_request *rq)
+{
+}
+
  static inline void
  trace_i915_request_submit(struct i915_request *rq)
  {




Re: [PATCH 25/47] drm/i915: Add intel_context tracing

2021-07-12 Thread John Harrison

On 6/24/2021 00:04, Matthew Brost wrote:

Add intel_context tracing. These trace points are particular helpful
when debugging the GuC firmware and can be enabled via
CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS kernel config option.

Cc: John Harrison 
Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/intel_context.c   |   6 +
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  14 ++
  drivers/gpu/drm/i915/i915_trace.h | 148 +-
  3 files changed, 166 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index 7f97753ab164..b24a1b7a3f88 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -8,6 +8,7 @@
  
  #include "i915_drv.h"

  #include "i915_globals.h"
+#include "i915_trace.h"
  
  #include "intel_context.h"

  #include "intel_engine.h"
@@ -28,6 +29,7 @@ static void rcu_context_free(struct rcu_head *rcu)
  {
struct intel_context *ce = container_of(rcu, typeof(*ce), rcu);
  
+	trace_intel_context_free(ce);

kmem_cache_free(global.slab_ce, ce);
  }
  
@@ -46,6 +48,7 @@ intel_context_create(struct intel_engine_cs *engine)

return ERR_PTR(-ENOMEM);
  
  	intel_context_init(ce, engine);

+   trace_intel_context_create(ce);
return ce;
  }
  
@@ -268,6 +271,8 @@ int __intel_context_do_pin_ww(struct intel_context *ce,
  
  	GEM_BUG_ON(!intel_context_is_pinned(ce)); /* no overflow! */
  
+	trace_intel_context_do_pin(ce);

+
  err_unlock:
mutex_unlock(&ce->pin_mutex);
  err_post_unpin:
@@ -323,6 +328,7 @@ void __intel_context_do_unpin(struct intel_context *ce, int 
sub)
 */
intel_context_get(ce);
intel_context_active_release(ce);
+   trace_intel_context_do_unpin(ce);
intel_context_put(ce);
  }
  
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c

index c2327eebc09c..d605af0d66e6 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -348,6 +348,7 @@ static int guc_add_request(struct intel_guc *guc, struct 
i915_request *rq)
  
  	err = intel_guc_send_nb(guc, action, len, g2h_len_dw);

if (!enabled && !err) {
+   trace_intel_context_sched_enable(ce);
atomic_inc(&guc->outstanding_submission_g2h);
set_context_enabled(ce);
} else if (!enabled) {
@@ -812,6 +813,8 @@ static int register_context(struct intel_context *ce)
u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
ce->guc_id * sizeof(struct guc_lrc_desc);
  
+	trace_intel_context_register(ce);

+
return __guc_action_register_context(guc, ce->guc_id, offset);
  }
  
@@ -831,6 +834,8 @@ static int deregister_context(struct intel_context *ce, u32 guc_id)

  {
struct intel_guc *guc = ce_to_guc(ce);
  
+	trace_intel_context_deregister(ce);

+
return __guc_action_deregister_context(guc, guc_id);
  }
  
@@ -905,6 +910,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce)

 * GuC before registering this context.
 */
if (context_registered) {
+   trace_intel_context_steal_guc_id(ce);
set_context_wait_for_deregister_to_register(ce);
intel_context_get(ce);
  
@@ -963,6 +969,7 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
  
  	GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID);
  
+	trace_intel_context_sched_disable(ce);

intel_context_get(ce);
  
  	guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),

@@ -1119,6 +1126,9 @@ static void __guc_signal_context_fence(struct 
intel_context *ce)
  
  	lockdep_assert_held(&ce->guc_state.lock);
  
+	if (!list_empty(&ce->guc_state.fences))

+   trace_intel_context_fence_release(ce);
+
list_for_each_entry(rq, &ce->guc_state.fences, guc_fence_link)
i915_sw_fence_complete(&rq->submit);
  
@@ -1529,6 +1539,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,

if (unlikely(!ce))
return -EPROTO;
  
+	trace_intel_context_deregister_done(ce);

+
if (context_wait_for_deregister_to_register(ce)) {
struct intel_runtime_pm *runtime_pm =
&ce->engine->gt->i915->runtime_pm;
@@ -1580,6 +1592,8 @@ int intel_guc_sched_done_process_msg(struct intel_guc 
*guc,
return -EPROTO;
}
  
+	trace_intel_context_sched_done(ce);

+
if (context_pending_enable(ce)) {
clr_context_pending_enable(ce);
} else if (context_pending_disable(ce)) {
diff --git a/drivers/gpu/drm/i915/i915_trace.h 
b/drivers/gpu/drm/i915/i915_trace.h
index b02d04b6c8f6..97c2e83984ed 100644
--- a/d

Re: [PATCH 27/47] drm/i915: Track 'serial' counts for virtual engines

2021-07-12 Thread John Harrison

On 6/24/2021 00:04, Matthew Brost wrote:

From: John Harrison 

The serial number tracking of engines happens at the backend of
request submission and was expecting to only be given physical
engines. However, in GuC submission mode, the decomposition of virtual
to physical engines does not happen in i915. Instead, requests are
submitted to their virtual engine mask all the way through to the
hardware (i.e. to GuC). This would mean that the heart beat code
thinks the physical engines are idle due to the serial number not
incrementing.

This patch updates the tracking to decompose virtual engines into
their physical constituents and tracks the request against each. This
is not entirely accurate as the GuC will only be issuing the request
to one physical engine. However, it is the best that i915 can do given
that it has no knowledge of the GuC's scheduling decisions.

Signed-off-by: John Harrison 
Signed-off-by: Matthew Brost 
Need to pull in the updated subject line and commit description from 
Tvrtko in the RFC patch set review.


John.


---
  drivers/gpu/drm/i915/gt/intel_engine_types.h |  2 ++
  .../gpu/drm/i915/gt/intel_execlists_submission.c |  6 ++
  drivers/gpu/drm/i915/gt/intel_ring_submission.c  |  6 ++
  drivers/gpu/drm/i915/gt/mock_engine.c|  6 ++
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c| 16 
  drivers/gpu/drm/i915/i915_request.c  |  4 +++-
  6 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 5b91068ab277..1dc59e6c9a92 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -388,6 +388,8 @@ struct intel_engine_cs {
void(*park)(struct intel_engine_cs *engine);
void(*unpark)(struct intel_engine_cs *engine);
  
+	void		(*bump_serial)(struct intel_engine_cs *engine);

+
void(*set_default_submission)(struct intel_engine_cs 
*engine);
  
  	const struct intel_context_ops *cops;

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index bd4ced794ff9..9cfb8800a0e6 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -3203,6 +3203,11 @@ static void execlists_release(struct intel_engine_cs 
*engine)
lrc_fini_wa_ctx(engine);
  }
  
+static void execlist_bump_serial(struct intel_engine_cs *engine)

+{
+   engine->serial++;
+}
+
  static void
  logical_ring_default_vfuncs(struct intel_engine_cs *engine)
  {
@@ -3212,6 +3217,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs 
*engine)
  
  	engine->cops = &execlists_context_ops;

engine->request_alloc = execlists_request_alloc;
+   engine->bump_serial = execlist_bump_serial;
  
  	engine->reset.prepare = execlists_reset_prepare;

engine->reset.rewind = execlists_reset_rewind;
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c 
b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
index 5d42a12ef3d6..e1506b280df1 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
@@ -1044,6 +1044,11 @@ static void setup_irq(struct intel_engine_cs *engine)
}
  }
  
+static void ring_bump_serial(struct intel_engine_cs *engine)

+{
+   engine->serial++;
+}
+
  static void setup_common(struct intel_engine_cs *engine)
  {
struct drm_i915_private *i915 = engine->i915;
@@ -1063,6 +1068,7 @@ static void setup_common(struct intel_engine_cs *engine)
  
  	engine->cops = &ring_context_ops;

engine->request_alloc = ring_request_alloc;
+   engine->bump_serial = ring_bump_serial;
  
  	/*

 * Using a global execution timeline; the previous final breadcrumb is
diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c 
b/drivers/gpu/drm/i915/gt/mock_engine.c
index 68970398e4ef..9203c766db80 100644
--- a/drivers/gpu/drm/i915/gt/mock_engine.c
+++ b/drivers/gpu/drm/i915/gt/mock_engine.c
@@ -292,6 +292,11 @@ static void mock_engine_release(struct intel_engine_cs 
*engine)
intel_engine_fini_retire(engine);
  }
  
+static void mock_bump_serial(struct intel_engine_cs *engine)

+{
+   engine->serial++;
+}
+
  struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
const char *name,
int id)
@@ -318,6 +323,7 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private 
*i915,
  
  	engine->base.cops = &mock_context_ops;

engine->base.request_alloc = mock_request_alloc;
+   engine->base.bump_serial = mock_bump_serial;
engine->base.emit_flush = mock_emit_flush;
engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb;
engine->base.sub

Re: [PATCH 28/47] drm/i915: Hold reference to intel_context over life of i915_request

2021-07-12 Thread John Harrison

On 6/24/2021 00:04, Matthew Brost wrote:

Hold a reference to the intel_context over life of an i915_request.
Without this an i915_request can exist after the context has been
destroyed (e.g. request retired, context closed, but user space holds a
reference to the request from an out fence). In the case of GuC
submission + virtual engine, the engine that the request references is
also destroyed which can trigger bad pointer dref in fence ops (e.g.
Maybe quickly explain a why this is different for GuC submission vs 
execlist? Presumably it is about only decomposing virtual engines to 
physical ones in execlist mode?




i915_fence_get_driver_name). We could likely change
i915_fence_get_driver_name to avoid touching the engine but let's just
be safe and hold the intel_context reference.

Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/i915_request.c | 54 -
  1 file changed, 22 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c 
b/drivers/gpu/drm/i915/i915_request.c
index de9deb95b8b1..dec5a35c9aa2 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -126,39 +126,17 @@ static void i915_fence_release(struct dma_fence *fence)
i915_sw_fence_fini(&rq->semaphore);
  
  	/*

-* Keep one request on each engine for reserved use under mempressure
-*
-* We do not hold a reference to the engine here and so have to be
-* very careful in what rq->engine we poke. The virtual engine is
-* referenced via the rq->context and we released that ref during
-* i915_request_retire(), ergo we must not dereference a virtual
-* engine here. Not that we would want to, as the only consumer of
-* the reserved engine->request_pool is the power management parking,
-* which must-not-fail, and that is only run on the physical engines.
-*
-* Since the request must have been executed to be have completed,
-* we know that it will have been processed by the HW and will
-* not be unsubmitted again, so rq->engine and rq->execution_mask
-* at this point is stable. rq->execution_mask will be a single
-* bit if the last and _only_ engine it could execution on was a
-* physical engine, if it's multiple bits then it started on and
-* could still be on a virtual engine. Thus if the mask is not a
-* power-of-two we assume that rq->engine may still be a virtual
-* engine and so a dangling invalid pointer that we cannot dereference
-*
-* For example, consider the flow of a bonded request through a virtual
-* engine. The request is created with a wide engine mask (all engines
-* that we might execute on). On processing the bond, the request mask
-* is reduced to one or more engines. If the request is subsequently
-* bound to a single engine, it will then be constrained to only
-* execute on that engine and never returned to the virtual engine
-* after timeslicing away, see __unwind_incomplete_requests(). Thus we
-* know that if the rq->execution_mask is a single bit, rq->engine
-* can be a physical engine with the exact corresponding mask.
+* Keep one request on each engine for reserved use under mempressure,
+* do not use with virtual engines as this really is only needed for
+* kernel contexts.
 */
-   if (is_power_of_2(rq->execution_mask) &&
-   !cmpxchg(&rq->engine->request_pool, NULL, rq))
+   if (!intel_engine_is_virtual(rq->engine) &&
+   !cmpxchg(&rq->engine->request_pool, NULL, rq)) {
+   intel_context_put(rq->context);
return;
+   }
+
+   intel_context_put(rq->context);

The put is actually unconditional? So it could be moved before the if?

John.

  
  	kmem_cache_free(global.slab_requests, rq);

  }
@@ -977,7 +955,18 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp)
}
}
  
-	rq->context = ce;

+   /*
+* Hold a reference to the intel_context over life of an i915_request.
+* Without this an i915_request can exist after the context has been
+* destroyed (e.g. request retired, context closed, but user space holds
+* a reference to the request from an out fence). In the case of GuC
+* submission + virtual engine, the engine that the request references
+* is also destroyed which can trigger bad pointer dref in fence ops
+* (e.g. i915_fence_get_driver_name). We could likely change these
+* functions to avoid touching the engine but let's just be safe and
+* hold the intel_context reference.
+*/
+   rq->context = intel_context_get(ce);
rq->engine = ce->engine;
rq->ring = ce->ring;
rq->execution_mask = ce->engine->mask;
@@ -1054,6 +1043,7 @@ __i915_request_create(struct intel_

Re: [PATCH 29/47] drm/i915/guc: Disable bonding extension with GuC submission

2021-07-12 Thread John Harrison

On 6/24/2021 00:04, Matthew Brost wrote:

Update the bonding extension to return -ENODEV when using GuC submission
as this extension fundamentally will not work with the GuC submission
interface.

Signed-off-by: Matthew Brost 

Reviewed-by: John Harrison 


---
  drivers/gpu/drm/i915/gem/i915_gem_context.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 8a9293e0ca92..0429aa4172bf 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1674,6 +1674,11 @@ set_engines__bond(struct i915_user_extension __user 
*base, void *data)
}
virtual = set->engines->engines[idx]->engine;
  
+	if (intel_engine_uses_guc(virtual)) {

+   DRM_DEBUG("bonding extension not supported with GuC 
submission");
+   return -ENODEV;
+   }
+
err = check_user_mbz(&ext->flags);
if (err)
return err;




Re: [PATCH 30/47] drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs

2021-07-12 Thread John Harrison

On 6/24/2021 00:04, Matthew Brost wrote:

With GuC virtual engines the physical engine which a request executes
and completes on isn't known to the i915. Therefore we can't attach a
request to a physical engines breadcrumbs. To work around this we create
a single breadcrumbs per engine class when using GuC submission and
direct all physical engine interrupts to this breadcrumbs.

Signed-off-by: Matthew Brost 
CC: John Harrison 
---
  drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   | 41 +---
  drivers/gpu/drm/i915/gt/intel_breadcrumbs.h   | 14 +++-
  .../gpu/drm/i915/gt/intel_breadcrumbs_types.h |  7 ++
  drivers/gpu/drm/i915/gt/intel_engine.h|  3 +
  drivers/gpu/drm/i915/gt/intel_engine_cs.c | 28 +++-
  drivers/gpu/drm/i915/gt/intel_engine_types.h  |  1 -
  .../drm/i915/gt/intel_execlists_submission.c  |  2 +-
  drivers/gpu/drm/i915/gt/mock_engine.c |  4 +-
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 67 +--
  9 files changed, 131 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c 
b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index 38cc42783dfb..2007dc6f6b99 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -15,28 +15,14 @@
  #include "intel_gt_pm.h"
  #include "intel_gt_requests.h"
  
-static bool irq_enable(struct intel_engine_cs *engine)

+static bool irq_enable(struct intel_breadcrumbs *b)
  {
-   if (!engine->irq_enable)
-   return false;
-
-   /* Caller disables interrupts */
-   spin_lock(&engine->gt->irq_lock);
-   engine->irq_enable(engine);
-   spin_unlock(&engine->gt->irq_lock);
-
-   return true;
+   return intel_engine_irq_enable(b->irq_engine);
  }
  
-static void irq_disable(struct intel_engine_cs *engine)

+static void irq_disable(struct intel_breadcrumbs *b)
  {
-   if (!engine->irq_disable)
-   return;
-
-   /* Caller disables interrupts */
-   spin_lock(&engine->gt->irq_lock);
-   engine->irq_disable(engine);
-   spin_unlock(&engine->gt->irq_lock);
+   intel_engine_irq_disable(b->irq_engine);
  }
  
  static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)

@@ -57,7 +43,7 @@ static void __intel_breadcrumbs_arm_irq(struct 
intel_breadcrumbs *b)
WRITE_ONCE(b->irq_armed, true);
  
  	/* Requests may have completed before we could enable the interrupt. */

-   if (!b->irq_enabled++ && irq_enable(b->irq_engine))
+   if (!b->irq_enabled++ && b->irq_enable(b))
irq_work_queue(&b->irq_work);
  }
  
@@ -76,7 +62,7 @@ static void __intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b)

  {
GEM_BUG_ON(!b->irq_enabled);
if (!--b->irq_enabled)
-   irq_disable(b->irq_engine);
+   b->irq_disable(b);
  
  	WRITE_ONCE(b->irq_armed, false);

intel_gt_pm_put_async(b->irq_engine->gt);
@@ -281,7 +267,7 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine)
if (!b)
return NULL;
  
-	b->irq_engine = irq_engine;

+   kref_init(&b->ref);
  
  	spin_lock_init(&b->signalers_lock);

INIT_LIST_HEAD(&b->signalers);
@@ -290,6 +276,10 @@ intel_breadcrumbs_create(struct intel_engine_cs 
*irq_engine)
spin_lock_init(&b->irq_lock);
init_irq_work(&b->irq_work, signal_irq_work);
  
+	b->irq_engine = irq_engine;

+   b->irq_enable = irq_enable;
+   b->irq_disable = irq_disable;
+
return b;
  }
  
@@ -303,9 +293,9 @@ void intel_breadcrumbs_reset(struct intel_breadcrumbs *b)

spin_lock_irqsave(&b->irq_lock, flags);
  
  	if (b->irq_enabled)

-   irq_enable(b->irq_engine);
+   b->irq_enable(b);
else
-   irq_disable(b->irq_engine);
+   b->irq_disable(b);
  
  	spin_unlock_irqrestore(&b->irq_lock, flags);

  }
@@ -325,11 +315,14 @@ void __intel_breadcrumbs_park(struct intel_breadcrumbs *b)
}
  }
  
-void intel_breadcrumbs_free(struct intel_breadcrumbs *b)

+void intel_breadcrumbs_free(struct kref *kref)
  {
+   struct intel_breadcrumbs *b = container_of(kref, typeof(*b), ref);
+
irq_work_sync(&b->irq_work);
GEM_BUG_ON(!list_empty(&b->signalers));
GEM_BUG_ON(b->irq_armed);
+
kfree(b);
  }
  
diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h

index 3ce5ce270b04..72105b74663d 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
@@ -17,7 +17,7 @@ struct intel_breadcrumbs;
  
  struct intel_breadcrumbs *

  intel_breadcrumbs_create(struct intel_engine_cs *irq_engine);
-void intel_breadc

Re: [PATCH 31/47] drm/i915/guc: Reset implementation for new GuC interface

2021-07-12 Thread John Harrison

On 6/24/2021 00:05, Matthew Brost wrote:

Reset implementation for new GuC interface. This is the legacy reset
implementation which is called when the i915 owns the engine hang check.
Future patches will offload the engine hang check to GuC but we will
continue to maintain this legacy path as a fallback and this code path
is also required if the GuC dies.

With the new GuC interface it is not possible to reset individual
engines - it is only possible to reset the GPU entirely. This patch
forces an entire chip reset if any engine hangs.
There seems to be quite a lot more code being changed in the patch than 
is described above. Sure, it's all in order to support resets but there 
is a lot happening to request/context management, support for GuC 
submission enable/disable, etc. It feels like this patch really should 
be split into a couple of prep patches followed by the actual reset 
support. Plus see couple of minor comments below.



Cc: John Harrison 
Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/intel_context.c   |   3 +
  drivers/gpu/drm/i915/gt/intel_context_types.h |   7 +
  drivers/gpu/drm/i915/gt/intel_engine_types.h  |   6 +
  .../drm/i915/gt/intel_execlists_submission.c  |  40 ++
  drivers/gpu/drm/i915/gt/intel_gt_pm.c |   6 +-
  drivers/gpu/drm/i915/gt/intel_reset.c |  18 +-
  .../gpu/drm/i915/gt/intel_ring_submission.c   |  22 +
  drivers/gpu/drm/i915/gt/mock_engine.c |  31 +
  drivers/gpu/drm/i915/gt/uc/intel_guc.c|  13 -
  drivers/gpu/drm/i915/gt/uc/intel_guc.h|   8 +-
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 581 ++
  drivers/gpu/drm/i915/gt/uc/intel_uc.c |  39 +-
  drivers/gpu/drm/i915/gt/uc/intel_uc.h |   3 +
  drivers/gpu/drm/i915/i915_request.c   |  41 +-
  drivers/gpu/drm/i915/i915_request.h   |   2 +
  15 files changed, 649 insertions(+), 171 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index b24a1b7a3f88..2f01437056a8 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -392,6 +392,9 @@ intel_context_init(struct intel_context *ce, struct 
intel_engine_cs *engine)
spin_lock_init(&ce->guc_state.lock);
INIT_LIST_HEAD(&ce->guc_state.fences);
  
+	spin_lock_init(&ce->guc_active.lock);

+   INIT_LIST_HEAD(&ce->guc_active.requests);
+
ce->guc_id = GUC_INVALID_LRC_ID;
INIT_LIST_HEAD(&ce->guc_id_link);
  
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h

index 6945963a31ba..b63c8cf7823b 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -165,6 +165,13 @@ struct intel_context {
struct list_head fences;
} guc_state;
  
+	struct {

+   /** lock: protects everything in guc_active */
+   spinlock_t lock;
+   /** requests: active requests on this context */
+   struct list_head requests;
+   } guc_active;
+
/* GuC scheduling state that does not require a lock. */
atomic_t guc_sched_state_no_lock;
  
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h

index e7cb6a06db9d..f9d264c008e8 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -426,6 +426,12 @@ struct intel_engine_cs {
  
  	void		(*release)(struct intel_engine_cs *engine);
  
+	/*

+* Add / remove request from engine active tracking
+*/
+   void(*add_active_request)(struct i915_request *rq);
+   void(*remove_active_request)(struct i915_request *rq);
+
struct intel_engine_execlists execlists;
  
  	/*

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index c10ea6080752..c301a2d088b1 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -3118,6 +3118,42 @@ static void execlists_park(struct intel_engine_cs 
*engine)
cancel_timer(&engine->execlists.preempt);
  }
  
+static void add_to_engine(struct i915_request *rq)

+{
+   lockdep_assert_held(&rq->engine->sched_engine->lock);
+   list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
+}
+
+static void remove_from_engine(struct i915_request *rq)
+{
+   struct intel_engine_cs *engine, *locked;
+
+   /*
+* Virtual engines complicate acquiring the engine timeline lock,
+* as their rq->engine pointer is not stable until under that
+* engine lock. The simple ploy we use is to take the lock then
+* check that the rq still belongs to the newly locked engine.
+

Re: [PATCH 32/47] drm/i915: Reset GPU immediately if submission is disabled

2021-07-12 Thread John Harrison

On 6/24/2021 00:05, Matthew Brost wrote:

If submission is disabled by the backend for any reason, reset the GPU
immediately in the heartbeat code as the backend can't be reenabled
until the GPU is reset.

Signed-off-by: Matthew Brost 

Reviewed-by: John Harrison 


---
  .../gpu/drm/i915/gt/intel_engine_heartbeat.c  | 63 +++
  .../gpu/drm/i915/gt/intel_engine_heartbeat.h  |  4 ++
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  9 +++
  drivers/gpu/drm/i915/i915_scheduler.c |  6 ++
  drivers/gpu/drm/i915/i915_scheduler.h |  6 ++
  drivers/gpu/drm/i915/i915_scheduler_types.h   |  5 ++
  6 files changed, 80 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c 
b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
index b6a305e6a974..a8495364d906 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
@@ -70,12 +70,30 @@ static void show_heartbeat(const struct i915_request *rq,
  {
struct drm_printer p = drm_debug_printer("heartbeat");
  
-	intel_engine_dump(engine, &p,

- "%s heartbeat {seqno:%llx:%lld, prio:%d} not 
ticking\n",
- engine->name,
- rq->fence.context,
- rq->fence.seqno,
- rq->sched.attr.priority);
+   if (!rq) {
+   intel_engine_dump(engine, &p,
+ "%s heartbeat not ticking\n",
+ engine->name);
+   } else {
+   intel_engine_dump(engine, &p,
+ "%s heartbeat {seqno:%llx:%lld, prio:%d} not 
ticking\n",
+ engine->name,
+ rq->fence.context,
+ rq->fence.seqno,
+ rq->sched.attr.priority);
+   }
+}
+
+static void
+reset_engine(struct intel_engine_cs *engine, struct i915_request *rq)
+{
+   if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
+   show_heartbeat(rq, engine);
+
+   intel_gt_handle_error(engine->gt, engine->mask,
+ I915_ERROR_CAPTURE,
+ "stopped heartbeat on %s",
+ engine->name);
  }
  
  static void heartbeat(struct work_struct *wrk)

@@ -102,6 +120,11 @@ static void heartbeat(struct work_struct *wrk)
if (intel_gt_is_wedged(engine->gt))
goto out;
  
+	if (i915_sched_engine_disabled(engine->sched_engine)) {

+   reset_engine(engine, engine->heartbeat.systole);
+   goto out;
+   }
+
if (engine->heartbeat.systole) {
long delay = READ_ONCE(engine->props.heartbeat_interval_ms);
  
@@ -139,13 +162,7 @@ static void heartbeat(struct work_struct *wrk)

engine->sched_engine->schedule(rq, &attr);
local_bh_enable();
} else {
-   if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
-   show_heartbeat(rq, engine);
-
-   intel_gt_handle_error(engine->gt, engine->mask,
- I915_ERROR_CAPTURE,
- "stopped heartbeat on %s",
- engine->name);
+   reset_engine(engine, rq);
}
  
  		rq->emitted_jiffies = jiffies;

@@ -194,6 +211,26 @@ void intel_engine_park_heartbeat(struct intel_engine_cs 
*engine)
i915_request_put(fetch_and_zero(&engine->heartbeat.systole));
  }
  
+void intel_gt_unpark_heartbeats(struct intel_gt *gt)

+{
+   struct intel_engine_cs *engine;
+   enum intel_engine_id id;
+
+   for_each_engine(engine, gt, id)
+   if (intel_engine_pm_is_awake(engine))
+   intel_engine_unpark_heartbeat(engine);
+
+}
+
+void intel_gt_park_heartbeats(struct intel_gt *gt)
+{
+   struct intel_engine_cs *engine;
+   enum intel_engine_id id;
+
+   for_each_engine(engine, gt, id)
+   intel_engine_park_heartbeat(engine);
+}
+
  void intel_engine_init_heartbeat(struct intel_engine_cs *engine)
  {
INIT_DELAYED_WORK(&engine->heartbeat.work, heartbeat);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h 
b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
index a488ea3e84a3..5da6d809a87a 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
@@ -7,6 +7,7 @@
  #define INTEL_ENGINE_HEARTBEAT_H
  
  struct intel_engine_cs;

+struct intel_gt;
  
  void intel_engine_init_heartbeat(struct intel_engine_cs *engine);
  
@@ -16,6 +17,9 @@ int intel_e

Re: [PATCH 33/47] drm/i915/guc: Add disable interrupts to guc sanitize

2021-07-12 Thread John Harrison

On 6/24/2021 00:05, Matthew Brost wrote:

Add disable GuC interrupts to intel_guc_sanitize(). Part of this
requires moving the guc_*_interrupt wrapper function into header file
intel_guc.h.

Signed-off-by: Matthew Brost 
Cc: Daniele Ceraolo Spurio 
Reviewed-by: John Harrison 


---
  drivers/gpu/drm/i915/gt/uc/intel_guc.h | 16 
  drivers/gpu/drm/i915/gt/uc/intel_uc.c  | 21 +++--
  2 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 40c9868762d7..85ef6767f13b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -217,9 +217,25 @@ static inline bool intel_guc_is_ready(struct intel_guc 
*guc)
return intel_guc_is_fw_running(guc) && intel_guc_ct_enabled(&guc->ct);
  }
  
+static inline void intel_guc_reset_interrupts(struct intel_guc *guc)

+{
+   guc->interrupts.reset(guc);
+}
+
+static inline void intel_guc_enable_interrupts(struct intel_guc *guc)
+{
+   guc->interrupts.enable(guc);
+}
+
+static inline void intel_guc_disable_interrupts(struct intel_guc *guc)
+{
+   guc->interrupts.disable(guc);
+}
+
  static inline int intel_guc_sanitize(struct intel_guc *guc)
  {
intel_uc_fw_sanitize(&guc->fw);
+   intel_guc_disable_interrupts(guc);
intel_guc_ct_sanitize(&guc->ct);
guc->mmio_msg = 0;
  
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c

index f0b02200aa01..ab11fe731ee7 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -207,21 +207,6 @@ static void guc_handle_mmio_msg(struct intel_guc *guc)
spin_unlock_irq(&guc->irq_lock);
  }
  
-static void guc_reset_interrupts(struct intel_guc *guc)

-{
-   guc->interrupts.reset(guc);
-}
-
-static void guc_enable_interrupts(struct intel_guc *guc)
-{
-   guc->interrupts.enable(guc);
-}
-
-static void guc_disable_interrupts(struct intel_guc *guc)
-{
-   guc->interrupts.disable(guc);
-}
-
  static int guc_enable_communication(struct intel_guc *guc)
  {
struct intel_gt *gt = guc_to_gt(guc);
@@ -242,7 +227,7 @@ static int guc_enable_communication(struct intel_guc *guc)
guc_get_mmio_msg(guc);
guc_handle_mmio_msg(guc);
  
-	guc_enable_interrupts(guc);

+   intel_guc_enable_interrupts(guc);
  
  	/* check for CT messages received before we enabled interrupts */

spin_lock_irq(>->irq_lock);
@@ -265,7 +250,7 @@ static void guc_disable_communication(struct intel_guc *guc)
 */
guc_clear_mmio_msg(guc);
  
-	guc_disable_interrupts(guc);

+   intel_guc_disable_interrupts(guc);
  
  	intel_guc_ct_disable(&guc->ct);
  
@@ -463,7 +448,7 @@ static int __uc_init_hw(struct intel_uc *uc)

if (ret)
goto err_out;
  
-	guc_reset_interrupts(guc);

+   intel_guc_reset_interrupts(guc);
  
  	/* WaEnableuKernelHeaderValidFix:skl */

/* WaEnableGuCBootHashCheckNotSet:skl,bxt,kbl */




Re: [PATCH 23/47] drm/i915/guc: Update GuC debugfs to support new GuC

2021-07-12 Thread John Harrison

On 7/12/2021 13:59, Matthew Brost wrote:

On Mon, Jul 12, 2021 at 11:05:59AM -0700, John Harrison wrote:

On 6/24/2021 00:04, Matthew Brost wrote:

Update GuC debugfs to support the new GuC structures.

Signed-off-by: John Harrison 
Signed-off-by: Matthew Brost 
---
   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 22 
   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  3 ++
   .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c| 23 +++-
   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 52 +++
   .../gpu/drm/i915/gt/uc/intel_guc_submission.h |  4 ++
   drivers/gpu/drm/i915/i915_debugfs.c   |  1 +
   6 files changed, 104 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index e0f92e28350c..4ed074df88e5 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -1135,3 +1135,25 @@ void intel_guc_ct_event_handler(struct intel_guc_ct *ct)
ct_try_receive_message(ct);
   }
+
+void intel_guc_log_ct_info(struct intel_guc_ct *ct,
+  struct drm_printer *p)
+{
+   if (!ct->enabled) {
+   drm_puts(p, "CT disabled\n");
+   return;
+   }
+
+   drm_printf(p, "H2G Space: %u\n",
+  atomic_read(&ct->ctbs.send.space) * 4);
+   drm_printf(p, "Head: %u\n",
+  ct->ctbs.send.desc->head);
+   drm_printf(p, "Tail: %u\n",
+  ct->ctbs.send.desc->tail);
+   drm_printf(p, "G2H Space: %u\n",
+  atomic_read(&ct->ctbs.recv.space) * 4);
+   drm_printf(p, "Head: %u\n",
+  ct->ctbs.recv.desc->head);
+   drm_printf(p, "Tail: %u\n",
+  ct->ctbs.recv.desc->tail);
+}
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index ab1b79ab960b..f62eb06b32fc 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -16,6 +16,7 @@
   struct i915_vma;
   struct intel_guc;
+struct drm_printer;
   /**
* DOC: Command Transport (CT).
@@ -106,4 +107,6 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 
*action, u32 len,
  u32 *response_buf, u32 response_buf_size, u32 flags);
   void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
+void intel_guc_log_ct_info(struct intel_guc_ct *ct, struct drm_printer *p);
+
   #endif /* _INTEL_GUC_CT_H_ */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
index fe7cb7b29a1e..62b9ce0fafaa 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
@@ -9,6 +9,8 @@
   #include "intel_guc.h"
   #include "intel_guc_debugfs.h"
   #include "intel_guc_log_debugfs.h"
+#include "gt/uc/intel_guc_ct.h"
+#include "gt/uc/intel_guc_submission.h"
   static int guc_info_show(struct seq_file *m, void *data)
   {
@@ -22,16 +24,35 @@ static int guc_info_show(struct seq_file *m, void *data)
drm_puts(&p, "\n");
intel_guc_log_info(&guc->log, &p);
-   /* Add more as required ... */
+   if (!intel_guc_submission_is_used(guc))
+   return 0;
+
+   intel_guc_log_ct_info(&guc->ct, &p);
+   intel_guc_log_submission_info(guc, &p);
return 0;
   }
   DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_info);
+static int guc_registered_contexts_show(struct seq_file *m, void *data)
+{
+   struct intel_guc *guc = m->private;
+   struct drm_printer p = drm_seq_file_printer(m);
+
+   if (!intel_guc_submission_is_used(guc))
+   return -ENODEV;
+
+   intel_guc_log_context_info(guc, &p);
+
+   return 0;
+}
+DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_registered_contexts);
+
   void intel_guc_debugfs_register(struct intel_guc *guc, struct dentry *root)
   {
static const struct debugfs_gt_file files[] = {
{ "guc_info", &guc_info_fops, NULL },
+   { "guc_registered_contexts", &guc_registered_contexts_fops, 
NULL },
};
if (!intel_guc_is_supported(guc))
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index d1a28283a9ae..89b3c7e5d15b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1600,3 +1600,55 @@ int intel_guc_sched_done_process_msg(struct intel_guc 
*guc,
return 0;
   }
+
+void intel_guc_log_submission_info(struct intel_guc *guc,
+  struct drm_printer *p)
+{
+   struct i915_sched_engine *sched_engine = guc->sched_engine;
+   struct rb_node *rb;
+   u

Re: [Intel-gfx] [PATCH 28/47] drm/i915: Hold reference to intel_context over life of i915_request

2021-07-12 Thread John Harrison

On 7/12/2021 14:36, Matthew Brost wrote:

On Mon, Jul 12, 2021 at 08:05:30PM +, Matthew Brost wrote:

On Mon, Jul 12, 2021 at 11:23:14AM -0700, John Harrison wrote:

On 6/24/2021 00:04, Matthew Brost wrote:

Hold a reference to the intel_context over life of an i915_request.
Without this an i915_request can exist after the context has been
destroyed (e.g. request retired, context closed, but user space holds a
reference to the request from an out fence). In the case of GuC
submission + virtual engine, the engine that the request references is
also destroyed which can trigger bad pointer dref in fence ops (e.g.

Maybe quickly explain a why this is different for GuC submission vs
execlist? Presumably it is about only decomposing virtual engines to
physical ones in execlist mode?


Yes, it because in execlists we always end up pointing to a physical
engine in the end while in GuC mode we can be pointing to dynamically
allocated virtual engine. I can update the comment.


i915_fence_get_driver_name). We could likely change
i915_fence_get_driver_name to avoid touching the engine but let's just
be safe and hold the intel_context reference.

Signed-off-by: Matthew Brost 
---
   drivers/gpu/drm/i915/i915_request.c | 54 -
   1 file changed, 22 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c 
b/drivers/gpu/drm/i915/i915_request.c
index de9deb95b8b1..dec5a35c9aa2 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -126,39 +126,17 @@ static void i915_fence_release(struct dma_fence *fence)
i915_sw_fence_fini(&rq->semaphore);
/*
-* Keep one request on each engine for reserved use under mempressure
-*
-* We do not hold a reference to the engine here and so have to be
-* very careful in what rq->engine we poke. The virtual engine is
-* referenced via the rq->context and we released that ref during
-* i915_request_retire(), ergo we must not dereference a virtual
-* engine here. Not that we would want to, as the only consumer of
-* the reserved engine->request_pool is the power management parking,
-* which must-not-fail, and that is only run on the physical engines.
-*
-* Since the request must have been executed to be have completed,
-* we know that it will have been processed by the HW and will
-* not be unsubmitted again, so rq->engine and rq->execution_mask
-* at this point is stable. rq->execution_mask will be a single
-* bit if the last and _only_ engine it could execution on was a
-* physical engine, if it's multiple bits then it started on and
-* could still be on a virtual engine. Thus if the mask is not a
-* power-of-two we assume that rq->engine may still be a virtual
-* engine and so a dangling invalid pointer that we cannot dereference
-*
-* For example, consider the flow of a bonded request through a virtual
-* engine. The request is created with a wide engine mask (all engines
-* that we might execute on). On processing the bond, the request mask
-* is reduced to one or more engines. If the request is subsequently
-* bound to a single engine, it will then be constrained to only
-* execute on that engine and never returned to the virtual engine
-* after timeslicing away, see __unwind_incomplete_requests(). Thus we
-* know that if the rq->execution_mask is a single bit, rq->engine
-* can be a physical engine with the exact corresponding mask.
+* Keep one request on each engine for reserved use under mempressure,
+* do not use with virtual engines as this really is only needed for
+* kernel contexts.
 */
-   if (is_power_of_2(rq->execution_mask) &&
-   !cmpxchg(&rq->engine->request_pool, NULL, rq))
+   if (!intel_engine_is_virtual(rq->engine) &&
+   !cmpxchg(&rq->engine->request_pool, NULL, rq)) {
+   intel_context_put(rq->context);
return;
+   }
+
+   intel_context_put(rq->context);

The put is actually unconditional? So it could be moved before the if?


Yep, I think so.


Wait nope. We reference rq->engine which could a virtual engine and the
intel_context_put could free that engine. So we need to do the put after
we reference it.

Matt

Doh! That's a pretty good reason.

Okay, with a tweaked description to explain about virtual engines being 
different on GuC vs execlist...


Reviewed-by: John Harrison 




Matt


John.


kmem_cache_free(global.slab_requests, rq);
   }
@@ -977,7 +955,18 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp)
}
}
-   rq->context = ce;
+   /*
+* Hold a reference to the intel_contex

Re: [PATCH 25/47] drm/i915: Add intel_context tracing

2021-07-12 Thread John Harrison

On 7/12/2021 14:47, Matthew Brost wrote:

On Mon, Jul 12, 2021 at 11:10:40AM -0700, John Harrison wrote:

On 6/24/2021 00:04, Matthew Brost wrote:

Add intel_context tracing. These trace points are particular helpful
when debugging the GuC firmware and can be enabled via
CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS kernel config option.

Cc: John Harrison 
Signed-off-by: Matthew Brost 
---
   drivers/gpu/drm/i915/gt/intel_context.c   |   6 +
   .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  14 ++
   drivers/gpu/drm/i915/i915_trace.h | 148 +-
   3 files changed, 166 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index 7f97753ab164..b24a1b7a3f88 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -8,6 +8,7 @@
   #include "i915_drv.h"
   #include "i915_globals.h"
+#include "i915_trace.h"
   #include "intel_context.h"
   #include "intel_engine.h"
@@ -28,6 +29,7 @@ static void rcu_context_free(struct rcu_head *rcu)
   {
struct intel_context *ce = container_of(rcu, typeof(*ce), rcu);
+   trace_intel_context_free(ce);
kmem_cache_free(global.slab_ce, ce);
   }
@@ -46,6 +48,7 @@ intel_context_create(struct intel_engine_cs *engine)
return ERR_PTR(-ENOMEM);
intel_context_init(ce, engine);
+   trace_intel_context_create(ce);
return ce;
   }
@@ -268,6 +271,8 @@ int __intel_context_do_pin_ww(struct intel_context *ce,
GEM_BUG_ON(!intel_context_is_pinned(ce)); /* no overflow! */
+   trace_intel_context_do_pin(ce);
+
   err_unlock:
mutex_unlock(&ce->pin_mutex);
   err_post_unpin:
@@ -323,6 +328,7 @@ void __intel_context_do_unpin(struct intel_context *ce, int 
sub)
 */
intel_context_get(ce);
intel_context_active_release(ce);
+   trace_intel_context_do_unpin(ce);
intel_context_put(ce);
   }
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index c2327eebc09c..d605af0d66e6 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -348,6 +348,7 @@ static int guc_add_request(struct intel_guc *guc, struct 
i915_request *rq)
err = intel_guc_send_nb(guc, action, len, g2h_len_dw);
if (!enabled && !err) {
+   trace_intel_context_sched_enable(ce);
atomic_inc(&guc->outstanding_submission_g2h);
set_context_enabled(ce);
} else if (!enabled) {
@@ -812,6 +813,8 @@ static int register_context(struct intel_context *ce)
u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
ce->guc_id * sizeof(struct guc_lrc_desc);
+   trace_intel_context_register(ce);
+
return __guc_action_register_context(guc, ce->guc_id, offset);
   }
@@ -831,6 +834,8 @@ static int deregister_context(struct intel_context *ce, u32 
guc_id)
   {
struct intel_guc *guc = ce_to_guc(ce);
+   trace_intel_context_deregister(ce);
+
return __guc_action_deregister_context(guc, guc_id);
   }
@@ -905,6 +910,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce)
 * GuC before registering this context.
 */
if (context_registered) {
+   trace_intel_context_steal_guc_id(ce);
set_context_wait_for_deregister_to_register(ce);
intel_context_get(ce);
@@ -963,6 +969,7 @@ static void __guc_context_sched_disable(struct intel_guc 
*guc,
GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID);
+   trace_intel_context_sched_disable(ce);
intel_context_get(ce);
guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
@@ -1119,6 +1126,9 @@ static void __guc_signal_context_fence(struct 
intel_context *ce)
lockdep_assert_held(&ce->guc_state.lock);
+   if (!list_empty(&ce->guc_state.fences))
+   trace_intel_context_fence_release(ce);
+
list_for_each_entry(rq, &ce->guc_state.fences, guc_fence_link)
i915_sw_fence_complete(&rq->submit);
@@ -1529,6 +1539,8 @@ int intel_guc_deregister_done_process_msg(struct 
intel_guc *guc,
if (unlikely(!ce))
return -EPROTO;
+   trace_intel_context_deregister_done(ce);
+
if (context_wait_for_deregister_to_register(ce)) {
struct intel_runtime_pm *runtime_pm =
&ce->engine->gt->i915->runtime_pm;
@@ -1580,6 +1592,8 @@ int intel_guc_sched_done_process_msg(struct intel_guc 
*guc,
return -EPROTO;
}
+   trace_intel_context_sched_done(ce);
+
if (context_pending_enable(ce)) {
clr_context_pending_enable(ce);
} else if (context_pending_disable(ce)) {
diff --git a/drivers/gpu/

Re: [PATCH 34/47] drm/i915/guc: Suspend/resume implementation for new interface

2021-07-12 Thread John Harrison

On 6/24/2021 00:05, Matthew Brost wrote:

The new GuC interface introduces an MMIO H2G command,
INTEL_GUC_ACTION_RESET_CLIENT, which is used to implement suspend. This
MMIO tears down any active contexts generating a context reset G2H CTB
for each. Once that step completes the GuC tears down the CTB
channels. It is safe to suspend once this MMIO H2G command completes
and all G2H CTBs have been processed. In practice the i915 will likely
never receive a G2H as suspend should only be called after the GPU is
idle.

Resume is implemented in the same manner as before - simply reload the
GuC firmware and reinitialize everything (e.g. CTB channels, contexts,
etc..).

Cc: John Harrison 
Signed-off-by: Matthew Brost 
Signed-off-by: Michal Wajdeczko 

Reviewed-by: John Harrison 
---
  .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |  1 +
  drivers/gpu/drm/i915/gt/uc/intel_guc.c| 64 ---
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 14 ++--
  .../gpu/drm/i915/gt/uc/intel_guc_submission.h |  5 ++
  drivers/gpu/drm/i915/gt/uc/intel_uc.c | 20 --
  5 files changed, 53 insertions(+), 51 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h 
b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
index 57e18babdf4b..596cf4b818e5 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
@@ -142,6 +142,7 @@ enum intel_guc_action {
INTEL_GUC_ACTION_REGISTER_COMMAND_TRANSPORT_BUFFER = 0x4505,
INTEL_GUC_ACTION_DEREGISTER_COMMAND_TRANSPORT_BUFFER = 0x4506,
INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE = 0x4600,
+   INTEL_GUC_ACTION_RESET_CLIENT = 0x5B01,
INTEL_GUC_ACTION_LIMIT
  };
  
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c

index 9b09395b998f..68266cbffd1f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -524,51 +524,34 @@ int intel_guc_auth_huc(struct intel_guc *guc, u32 
rsa_offset)
   */
  int intel_guc_suspend(struct intel_guc *guc)
  {
-   struct intel_uncore *uncore = guc_to_gt(guc)->uncore;
int ret;
-   u32 status;
u32 action[] = {
-   INTEL_GUC_ACTION_ENTER_S_STATE,
-   GUC_POWER_D1, /* any value greater than GUC_POWER_D0 */
+   INTEL_GUC_ACTION_RESET_CLIENT,
};
  
-	/*

-* If GuC communication is enabled but submission is not supported,
-* we do not need to suspend the GuC.
-*/
-   if (!intel_guc_submission_is_used(guc) || !intel_guc_is_ready(guc))
+   if (!intel_guc_is_ready(guc))
return 0;
  
-	/*

-* The ENTER_S_STATE action queues the save/restore operation in GuC FW
-* and then returns, so waiting on the H2G is not enough to guarantee
-* GuC is done. When all the processing is done, GuC writes
-* INTEL_GUC_SLEEP_STATE_SUCCESS to scratch register 14, so we can poll
-* on that. Note that GuC does not ensure that the value in the register
-* is different from INTEL_GUC_SLEEP_STATE_SUCCESS while the action is
-* in progress so we need to take care of that ourselves as well.
-*/
-
-   intel_uncore_write(uncore, SOFT_SCRATCH(14),
-  INTEL_GUC_SLEEP_STATE_INVALID_MASK);
-
-   ret = intel_guc_send(guc, action, ARRAY_SIZE(action));
-   if (ret)
-   return ret;
-
-   ret = __intel_wait_for_register(uncore, SOFT_SCRATCH(14),
-   INTEL_GUC_SLEEP_STATE_INVALID_MASK,
-   0, 0, 10, &status);
-   if (ret)
-   return ret;
-
-   if (status != INTEL_GUC_SLEEP_STATE_SUCCESS) {
-   DRM_ERROR("GuC failed to change sleep state. "
- "action=0x%x, err=%u\n",
- action[0], status);
-   return -EIO;
+   if (intel_guc_submission_is_used(guc)) {
+   /*
+* This H2G MMIO command tears down the GuC in two steps. First 
it will
+* generate a G2H CTB for every active context indicating a 
reset. In
+* practice the i915 shouldn't ever get a G2H as suspend should 
only be
+* called when the GPU is idle. Next, it tears down the CTBs 
and this
+* H2G MMIO command completes.
+*
+* Don't abort on a failure code from the GuC. Keep going and 
do the
+* clean up in santize() and re-initialisation on resume and 
hopefully
+* the error here won't be problematic.
+*/
+   ret = intel_guc_send_mmio(guc, action, ARRAY_SIZE(action), 
NULL, 0);
+   if (ret)
+   DRM_ERROR("GuC suspend: RESET_CLIENT action failed with 
error %d!\n", ret);

Re: [PATCH 35/47] drm/i915/guc: Handle context reset notification

2021-07-12 Thread John Harrison

On 6/24/2021 00:05, Matthew Brost wrote:

GuC will issue a reset on detecting an engine hang and will notify
the driver via a G2H message. The driver will service the notification
by resetting the guilty context to a simple state or banning it
completely.

Cc: Matthew Brost 
Cc: John Harrison 
Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/uc/intel_guc.h|  2 ++
  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |  3 ++
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 35 +++
  drivers/gpu/drm/i915/i915_trace.h | 10 ++
  4 files changed, 50 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 85ef6767f13b..e94b0ef733da 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -262,6 +262,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc 
*guc,
  const u32 *msg, u32 len);
  int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 const u32 *msg, u32 len);
+int intel_guc_context_reset_process_msg(struct intel_guc *guc,
+   const u32 *msg, u32 len);
  
  void intel_guc_submission_reset_prepare(struct intel_guc *guc);

  void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 4ed074df88e5..a2020373b8e8 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -945,6 +945,9 @@ static int ct_process_request(struct intel_guc_ct *ct, 
struct ct_incoming_msg *r
case INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE:
ret = intel_guc_sched_done_process_msg(guc, payload, len);
break;
+   case INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION:
+   ret = intel_guc_context_reset_process_msg(guc, payload, len);
+   break;
default:
ret = -EOPNOTSUPP;
break;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 16b61fe71b07..9845c5bd9832 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2192,6 +2192,41 @@ int intel_guc_sched_done_process_msg(struct intel_guc 
*guc,
return 0;
  }
  
+static void guc_context_replay(struct intel_context *ce)

+{
+   struct i915_sched_engine *sched_engine = ce->engine->sched_engine;
+
+   __guc_reset_context(ce, true);
+   tasklet_hi_schedule(&sched_engine->tasklet);
+}
+
+static void guc_handle_context_reset(struct intel_guc *guc,
+struct intel_context *ce)
+{
+   trace_intel_context_reset(ce);
+   guc_context_replay(ce);
+}
+
+int intel_guc_context_reset_process_msg(struct intel_guc *guc,
+   const u32 *msg, u32 len)
+{
+   struct intel_context *ce;
+   int desc_idx = msg[0];
Should do this dereference after checking the length? Or is it 
guaranteed that the length cannot be zero?


John.


+
+   if (unlikely(len != 1)) {
+   drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
+   return -EPROTO;
+   }
+
+   ce = g2h_context_lookup(guc, desc_idx);
+   if (unlikely(!ce))
+   return -EPROTO;
+
+   guc_handle_context_reset(guc, ce);
+
+   return 0;
+}
+
  void intel_guc_log_submission_info(struct intel_guc *guc,
   struct drm_printer *p)
  {
diff --git a/drivers/gpu/drm/i915/i915_trace.h 
b/drivers/gpu/drm/i915/i915_trace.h
index 97c2e83984ed..c095c4d39456 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -929,6 +929,11 @@ DECLARE_EVENT_CLASS(intel_context,
  __entry->guc_sched_state_no_lock)
  );
  
+DEFINE_EVENT(intel_context, intel_context_reset,

+TP_PROTO(struct intel_context *ce),
+TP_ARGS(ce)
+);
+
  DEFINE_EVENT(intel_context, intel_context_register,
 TP_PROTO(struct intel_context *ce),
 TP_ARGS(ce)
@@ -1026,6 +1031,11 @@ trace_i915_request_out(struct i915_request *rq)
  {
  }
  
+static inline void

+trace_intel_context_reset(struct intel_context *ce)
+{
+}
+
  static inline void
  trace_intel_context_register(struct intel_context *ce)
  {




Re: [PATCH 36/47] drm/i915/guc: Handle engine reset failure notification

2021-07-12 Thread John Harrison

On 6/24/2021 00:05, Matthew Brost wrote:

GuC will notify the driver, via G2H, if it fails to
reset an engine. We recover by resorting to a full GPU
reset.

Signed-off-by: Matthew Brost 
Signed-off-by: Fernando Pacheco 

Reviewed-by: John Harrison 


---
  drivers/gpu/drm/i915/gt/uc/intel_guc.h|  2 +
  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |  3 ++
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 43 +++
  3 files changed, 48 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index e94b0ef733da..99742625e6ff 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -264,6 +264,8 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 const u32 *msg, u32 len);
  int intel_guc_context_reset_process_msg(struct intel_guc *guc,
const u32 *msg, u32 len);
+int intel_guc_engine_failure_process_msg(struct intel_guc *guc,
+const u32 *msg, u32 len);
  
  void intel_guc_submission_reset_prepare(struct intel_guc *guc);

  void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index a2020373b8e8..dd6177c8d75c 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -948,6 +948,9 @@ static int ct_process_request(struct intel_guc_ct *ct, 
struct ct_incoming_msg *r
case INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION:
ret = intel_guc_context_reset_process_msg(guc, payload, len);
break;
+   case INTEL_GUC_ACTION_ENGINE_FAILURE_NOTIFICATION:
+   ret = intel_guc_engine_failure_process_msg(guc, payload, len);
+   break;
default:
ret = -EOPNOTSUPP;
break;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 9845c5bd9832..c3223958dfe0 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2227,6 +2227,49 @@ int intel_guc_context_reset_process_msg(struct intel_guc 
*guc,
return 0;
  }
  
+static struct intel_engine_cs *

+guc_lookup_engine(struct intel_guc *guc, u8 guc_class, u8 instance)
+{
+   struct intel_gt *gt = guc_to_gt(guc);
+   u8 engine_class = guc_class_to_engine_class(guc_class);
+
+   /* Class index is checked in class converter */
+   GEM_BUG_ON(instance > MAX_ENGINE_INSTANCE);
+
+   return gt->engine_class[engine_class][instance];
+}
+
+int intel_guc_engine_failure_process_msg(struct intel_guc *guc,
+const u32 *msg, u32 len)
+{
+   struct intel_engine_cs *engine;
+   u8 guc_class, instance;
+   u32 reason;
+
+   if (unlikely(len != 3)) {
+   drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
+   return -EPROTO;
+   }
+
+   guc_class = msg[0];
+   instance = msg[1];
+   reason = msg[2];
+
+   engine = guc_lookup_engine(guc, guc_class, instance);
+   if (unlikely(!engine)) {
+   drm_dbg(&guc_to_gt(guc)->i915->drm,
+   "Invalid engine %d:%d", guc_class, instance);
+   return -EPROTO;
+   }
+
+   intel_gt_handle_error(guc_to_gt(guc), engine->mask,
+ I915_ERROR_CAPTURE,
+ "GuC failed to reset %s (reason=0x%08x)\n",
+ engine->name, reason);
+
+   return 0;
+}
+
  void intel_guc_log_submission_info(struct intel_guc *guc,
   struct drm_printer *p)
  {




Re: [PATCH 37/47] drm/i915/guc: Enable the timer expired interrupt for GuC

2021-07-12 Thread John Harrison

On 6/24/2021 00:05, Matthew Brost wrote:

The GuC can implement execution qunatums, detect hung contexts and
other such things but it requires the timer expired interrupt to do so.

Signed-off-by: Matthew Brost 
CC: John Harrison 

Reviewed-by: John Harrison 


---
  drivers/gpu/drm/i915/gt/intel_rps.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c 
b/drivers/gpu/drm/i915/gt/intel_rps.c
index 06e9a8ed4e03..0c8e7f2b06f0 100644
--- a/drivers/gpu/drm/i915/gt/intel_rps.c
+++ b/drivers/gpu/drm/i915/gt/intel_rps.c
@@ -1877,6 +1877,10 @@ void intel_rps_init(struct intel_rps *rps)
  
  	if (GRAPHICS_VER(i915) >= 8 && GRAPHICS_VER(i915) < 11)

rps->pm_intrmsk_mbz |= GEN8_PMINTR_DISABLE_REDIRECT_TO_GUC;
+
+   /* GuC needs ARAT expired interrupt unmasked */
+   if (intel_uc_uses_guc_submission(&rps_to_gt(rps)->uc))
+   rps->pm_intrmsk_mbz |= ARAT_EXPIRED_INTRMSK;
  }
  
  void intel_rps_sanitize(struct intel_rps *rps)




Re: [PATCH 41/47] drm/i915/guc: Capture error state on context reset

2021-07-12 Thread John Harrison

On 6/24/2021 00:05, Matthew Brost wrote:

We receive notification of an engine reset from GuC at its
completion. Meaning GuC has potentially cleared any HW state
we may have been interested in capturing. GuC resumes scheduling
on the engine post-reset, as the resets are meant to be transparent,
further muddling our error state.

There is ongoing work to define an API for a GuC debug state dump. The
suggestion for now is to manually disable FW initiated resets in cases
where debug state is needed.

Signed-off-by: Matthew Brost 

Reviewed-by: John Harrison 


---
  drivers/gpu/drm/i915/gt/intel_context.c   | 20 +++
  drivers/gpu/drm/i915/gt/intel_context.h   |  3 ++
  drivers/gpu/drm/i915/gt/intel_engine.h| 21 ++-
  drivers/gpu/drm/i915/gt/intel_engine_cs.c | 11 --
  drivers/gpu/drm/i915/gt/intel_engine_types.h  |  2 ++
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 35 +--
  drivers/gpu/drm/i915/i915_gpu_error.c | 25 ++---
  7 files changed, 91 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index 2f01437056a8..3fe7794b2bfd 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -514,6 +514,26 @@ struct i915_request *intel_context_create_request(struct 
intel_context *ce)
return rq;
  }
  
+struct i915_request *intel_context_find_active_request(struct intel_context *ce)

+{
+   struct i915_request *rq, *active = NULL;
+   unsigned long flags;
+
+   GEM_BUG_ON(!intel_engine_uses_guc(ce->engine));
+
+   spin_lock_irqsave(&ce->guc_active.lock, flags);
+   list_for_each_entry_reverse(rq, &ce->guc_active.requests,
+   sched.link) {
+   if (i915_request_completed(rq))
+   break;
+
+   active = rq;
+   }
+   spin_unlock_irqrestore(&ce->guc_active.lock, flags);
+
+   return active;
+}
+
  #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
  #include "selftest_context.c"
  #endif
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h 
b/drivers/gpu/drm/i915/gt/intel_context.h
index a592a9605dc8..3363b59c0c40 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -201,6 +201,9 @@ int intel_context_prepare_remote_request(struct 
intel_context *ce,
  
  struct i915_request *intel_context_create_request(struct intel_context *ce);
  
+struct i915_request *

+intel_context_find_active_request(struct intel_context *ce);
+
  static inline struct intel_ring *__intel_context_ring_size(u64 sz)
  {
return u64_to_ptr(struct intel_ring, sz);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h 
b/drivers/gpu/drm/i915/gt/intel_engine.h
index e9e0657f847a..6ea5643a3aaa 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -245,7 +245,7 @@ ktime_t intel_engine_get_busy_time(struct intel_engine_cs 
*engine,
   ktime_t *now);
  
  struct i915_request *

-intel_engine_find_active_request(struct intel_engine_cs *engine);
+intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine);
  
  u32 intel_engine_context_size(struct intel_gt *gt, u8 class);

  struct intel_context *
@@ -328,4 +328,23 @@ intel_engine_get_sibling(struct intel_engine_cs *engine, 
unsigned int sibling)
return engine->cops->get_sibling(engine, sibling);
  }
  
+static inline void

+intel_engine_set_hung_context(struct intel_engine_cs *engine,
+ struct intel_context *ce)
+{
+   engine->hung_ce = ce;
+}
+
+static inline void
+intel_engine_clear_hung_context(struct intel_engine_cs *engine)
+{
+   intel_engine_set_hung_context(engine, NULL);
+}
+
+static inline struct intel_context *
+intel_engine_get_hung_context(struct intel_engine_cs *engine)
+{
+   return engine->hung_ce;
+}
+
  #endif /* _INTEL_RINGBUFFER_H_ */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 69245670b8b0..1d243b83b023 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -1671,7 +1671,7 @@ void intel_engine_dump(struct intel_engine_cs *engine,
drm_printf(m, "\tRequests:\n");
  
  	spin_lock_irqsave(&engine->sched_engine->lock, flags);

-   rq = intel_engine_find_active_request(engine);
+   rq = intel_engine_execlist_find_hung_request(engine);
if (rq) {
struct intel_timeline *tl = get_timeline(rq);
  
@@ -1782,10 +1782,17 @@ static bool match_ring(struct i915_request *rq)

  }
  
  struct i915_request *

-intel_engine_find_active_request(struct intel_engine_cs *engine)
+intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine)
  {
struct i915_request *request, *active = NULL;
  

Re: [PATCH 21/47] drm/i915/guc: Ensure G2H response has space in buffer

2021-07-13 Thread John Harrison

On 6/24/2021 00:04, Matthew Brost wrote:

Ensure G2H response has space in the buffer before sending H2G CTB as
the GuC can't handle any backpressure on the G2H interface.

Signed-off-by: John Harrison 
Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/uc/intel_guc.h| 13 +++-
  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 76 +++
  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  4 +-
  drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |  4 +
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 13 ++--
  5 files changed, 87 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index b43ec56986b5..24e7a924134e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -95,11 +95,17 @@ inline int intel_guc_send(struct intel_guc *guc, const u32 
*action, u32 len)
  }
  
  #define INTEL_GUC_SEND_NB		BIT(31)

+#define INTEL_GUC_SEND_G2H_DW_SHIFT0
+#define INTEL_GUC_SEND_G2H_DW_MASK (0xff << INTEL_GUC_SEND_G2H_DW_SHIFT)
+#define MAKE_SEND_FLAGS(len) \
+   ({GEM_BUG_ON(!FIELD_FIT(INTEL_GUC_SEND_G2H_DW_MASK, len)); \
+   (FIELD_PREP(INTEL_GUC_SEND_G2H_DW_MASK, len) | INTEL_GUC_SEND_NB);})
  static
-inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len)
+inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len,
+u32 g2h_len_dw)
  {
return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,
-INTEL_GUC_SEND_NB);
+MAKE_SEND_FLAGS(g2h_len_dw));
  }
  
  static inline int

@@ -113,6 +119,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 
*action, u32 len,
  static inline int intel_guc_send_busy_loop(struct intel_guc* guc,
   const u32 *action,
   u32 len,
+  u32 g2h_len_dw,
   bool loop)
  {
int err;
@@ -121,7 +128,7 @@ static inline int intel_guc_send_busy_loop(struct 
intel_guc* guc,
might_sleep_if(loop && (!in_atomic() && !irqs_disabled()));
  
  retry:

-   err = intel_guc_send_nb(guc, action, len);
+   err = intel_guc_send_nb(guc, action, len, g2h_len_dw);
if (unlikely(err == -EBUSY && loop)) {
if (likely(!in_atomic() && !irqs_disabled()))
cond_resched();
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 7491f041859e..a60970e85635 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -73,6 +73,7 @@ static inline struct drm_device *ct_to_drm(struct 
intel_guc_ct *ct)
  #define CTB_DESC_SIZE ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K)
  #define CTB_H2G_BUFFER_SIZE   (SZ_4K)
  #define CTB_G2H_BUFFER_SIZE   (4 * CTB_H2G_BUFFER_SIZE)
+#define G2H_ROOM_BUFFER_SIZE   (PAGE_SIZE)
Any particular reason why PAGE_SIZE instead of SZ_4K? I'm not seeing 
anything in the code that is actually related to page sizes. Seems like 
'(CTB_G2H_BUFFER_SIZE / 4)' would be a more correct way to express it. 
Unless I'm missing something about how it's used?


John.


  
  struct ct_request {

struct list_head link;
@@ -129,23 +130,27 @@ static void guc_ct_buffer_desc_init(struct 
guc_ct_buffer_desc *desc)
  
  static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)

  {
+   u32 space;
+
ctb->broken = false;
ctb->tail = 0;
ctb->head = 0;
-   ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
+   space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size) - ctb->resv_space;
+   atomic_set(&ctb->space, space);
  
  	guc_ct_buffer_desc_init(ctb->desc);

  }
  
  static void guc_ct_buffer_init(struct intel_guc_ct_buffer *ctb,

   struct guc_ct_buffer_desc *desc,
-  u32 *cmds, u32 size_in_bytes)
+  u32 *cmds, u32 size_in_bytes, u32 resv_space)
  {
GEM_BUG_ON(size_in_bytes % 4);
  
  	ctb->desc = desc;

ctb->cmds = cmds;
ctb->size = size_in_bytes / 4;
+   ctb->resv_space = resv_space / 4;
  
  	guc_ct_buffer_reset(ctb);

  }
@@ -226,6 +231,7 @@ int intel_guc_ct_init(struct intel_guc_ct *ct)
struct guc_ct_buffer_desc *desc;
u32 blob_size;
u32 cmds_size;
+   u32 resv_space;
void *blob;
u32 *cmds;
int err;
@@ -250,19 +256,23 @@ int intel_guc_ct_init(struct intel_guc_ct *ct)
desc = blob;
cmds = blob + 2 * CTB_DESC_SIZE;
cmds_size = CTB_H2G_BUFFER_SIZE;
-   CT_DEBUG(ct, "%s desc %#tx cmds %#tx size %u\n", "s

Re: [PATCH 21/47] drm/i915/guc: Ensure G2H response has space in buffer

2021-07-14 Thread John Harrison

On 7/14/2021 17:06, Matthew Brost wrote:

On Tue, Jul 13, 2021 at 11:36:05AM -0700, John Harrison wrote:

On 6/24/2021 00:04, Matthew Brost wrote:

Ensure G2H response has space in the buffer before sending H2G CTB as
the GuC can't handle any backpressure on the G2H interface.

Signed-off-by: John Harrison 
Signed-off-by: Matthew Brost 
---
   drivers/gpu/drm/i915/gt/uc/intel_guc.h| 13 +++-
   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 76 +++
   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  4 +-
   drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |  4 +
   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 13 ++--
   5 files changed, 87 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index b43ec56986b5..24e7a924134e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -95,11 +95,17 @@ inline int intel_guc_send(struct intel_guc *guc, const u32 
*action, u32 len)
   }
   #define INTEL_GUC_SEND_NBBIT(31)
+#define INTEL_GUC_SEND_G2H_DW_SHIFT0
+#define INTEL_GUC_SEND_G2H_DW_MASK (0xff << INTEL_GUC_SEND_G2H_DW_SHIFT)
+#define MAKE_SEND_FLAGS(len) \
+   ({GEM_BUG_ON(!FIELD_FIT(INTEL_GUC_SEND_G2H_DW_MASK, len)); \
+   (FIELD_PREP(INTEL_GUC_SEND_G2H_DW_MASK, len) | INTEL_GUC_SEND_NB);})
   static
-inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len)
+inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len,
+u32 g2h_len_dw)
   {
return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,
-INTEL_GUC_SEND_NB);
+MAKE_SEND_FLAGS(g2h_len_dw));
   }
   static inline int
@@ -113,6 +119,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 
*action, u32 len,
   static inline int intel_guc_send_busy_loop(struct intel_guc* guc,
   const u32 *action,
   u32 len,
+  u32 g2h_len_dw,
   bool loop)
   {
int err;
@@ -121,7 +128,7 @@ static inline int intel_guc_send_busy_loop(struct 
intel_guc* guc,
might_sleep_if(loop && (!in_atomic() && !irqs_disabled()));
   retry:
-   err = intel_guc_send_nb(guc, action, len);
+   err = intel_guc_send_nb(guc, action, len, g2h_len_dw);
if (unlikely(err == -EBUSY && loop)) {
if (likely(!in_atomic() && !irqs_disabled()))
cond_resched();
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 7491f041859e..a60970e85635 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -73,6 +73,7 @@ static inline struct drm_device *ct_to_drm(struct 
intel_guc_ct *ct)
   #define CTB_DESC_SIZEALIGN(sizeof(struct 
guc_ct_buffer_desc), SZ_2K)
   #define CTB_H2G_BUFFER_SIZE  (SZ_4K)
   #define CTB_G2H_BUFFER_SIZE  (4 * CTB_H2G_BUFFER_SIZE)
+#define G2H_ROOM_BUFFER_SIZE   (PAGE_SIZE)

Any particular reason why PAGE_SIZE instead of SZ_4K? I'm not seeing
anything in the code that is actually related to page sizes. Seems like
'(CTB_G2H_BUFFER_SIZE / 4)' would be a more correct way to express it.
Unless I'm missing something about how it's used?


Yes, CTB_G2H_BUFFER_SIZE / 4 is better.

Matt

Okay. With that changed:

Reviewed-by: John Harrison 





John.



   struct ct_request {
struct list_head link;
@@ -129,23 +130,27 @@ static void guc_ct_buffer_desc_init(struct 
guc_ct_buffer_desc *desc)
   static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
   {
+   u32 space;
+
ctb->broken = false;
ctb->tail = 0;
ctb->head = 0;
-   ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
+   space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size) - ctb->resv_space;
+   atomic_set(&ctb->space, space);
guc_ct_buffer_desc_init(ctb->desc);
   }
   static void guc_ct_buffer_init(struct intel_guc_ct_buffer *ctb,
   struct guc_ct_buffer_desc *desc,
-  u32 *cmds, u32 size_in_bytes)
+  u32 *cmds, u32 size_in_bytes, u32 resv_space)
   {
GEM_BUG_ON(size_in_bytes % 4);
ctb->desc = desc;
ctb->cmds = cmds;
ctb->size = size_in_bytes / 4;
+   ctb->resv_space = resv_space / 4;
guc_ct_buffer_reset(ctb);
   }
@@ -226,6 +231,7 @@ int intel_guc_ct_init(struct intel_guc_ct *ct)
struct guc_ct_buffer_desc *desc;
u32 blob_size;
u32 cmds_size;
+   u32 resv_space;
void *blob;
u32 *cmds;
int err;

Re: [Intel-gfx] [PATCH 3/3] drm/i915/uapi: Add query for L3 bank count

2021-07-15 Thread John Harrison

On 6/16/2021 03:25, Daniel Vetter wrote:

On Thu, Jun 10, 2021 at 10:46 PM  wrote:

From: John Harrison 

Various UMDs need to know the L3 bank count. So add a query API for it.

Please link to both the igt test submission for this (there's not even
a Test-with: on the cover letter)
Is there a wiki page that describes all such tags? That is not one I was 
aware of and I can't find anything in the Kernel patch submission wiki 
or DRM maintainers wiki that mentions it.




  and the merge requests for the
various UMD which uses new uapi.

Is there a particular tag to use for this?

John.


  Also as other mentioned, full uapi
kerneldoc is needed too. Please fill in any gaps in the existing docs
that relate to your addition directly (like we've e.g. done for the
extension chaining when adding lmem support).

Thanks, Daniel


Signed-off-by: John Harrison 
---
  drivers/gpu/drm/i915/gt/intel_gt.c | 15 +++
  drivers/gpu/drm/i915/gt/intel_gt.h |  1 +
  drivers/gpu/drm/i915/i915_query.c  | 22 ++
  drivers/gpu/drm/i915/i915_reg.h|  1 +
  include/uapi/drm/i915_drm.h|  1 +
  5 files changed, 40 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c 
b/drivers/gpu/drm/i915/gt/intel_gt.c
index 2161bf01ef8b..708bb3581d83 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -704,3 +704,18 @@ void intel_gt_info_print(const struct intel_gt_info *info,

 intel_sseu_dump(&info->sseu, p);
  }
+
+int intel_gt_get_l3bank_count(struct intel_gt *gt)
+{
+   struct drm_i915_private *i915 = gt->i915;
+   intel_wakeref_t wakeref;
+   u32 fuse3;
+
+   if (GRAPHICS_VER(i915) < 12)
+   return -ENODEV;
+
+   with_intel_runtime_pm(gt->uncore->rpm, wakeref)
+   fuse3 = intel_uncore_read(gt->uncore, GEN10_MIRROR_FUSE3);
+
+   return hweight32(REG_FIELD_GET(GEN12_GT_L3_MODE_MASK, ~fuse3));
+}
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h 
b/drivers/gpu/drm/i915/gt/intel_gt.h
index 7ec395cace69..46aa1cf4cf30 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -77,6 +77,7 @@ static inline bool intel_gt_is_wedged(const struct intel_gt 
*gt)

  void intel_gt_info_print(const struct intel_gt_info *info,
  struct drm_printer *p);
+int intel_gt_get_l3bank_count(struct intel_gt *gt);

  void intel_gt_watchdog_work(struct work_struct *work);

diff --git a/drivers/gpu/drm/i915/i915_query.c 
b/drivers/gpu/drm/i915/i915_query.c
index 96bd8fb3e895..0e92bb2d21b2 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -10,6 +10,7 @@
  #include "i915_perf.h"
  #include "i915_query.h"
  #include 
+#include "gt/intel_gt.h"

  static int copy_query_item(void *query_hdr, size_t query_sz,
u32 total_length,
@@ -502,6 +503,26 @@ static int query_hwconfig_table(struct drm_i915_private 
*i915,
 return hwconfig->size;
  }

+static int query_l3banks(struct drm_i915_private *i915,
+struct drm_i915_query_item *query_item)
+{
+   u32 banks;
+
+   if (query_item->length == 0)
+   return sizeof(banks);
+
+   if (query_item->length < sizeof(banks))
+   return -EINVAL;
+
+   banks = intel_gt_get_l3bank_count(&i915->gt);
+
+   if (copy_to_user(u64_to_user_ptr(query_item->data_ptr),
+&banks, sizeof(banks)))
+   return -EFAULT;
+
+   return sizeof(banks);
+}
+
  static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
 struct drm_i915_query_item 
*query_item) = {
 query_topology_info,
@@ -509,6 +530,7 @@ static int (* const i915_query_funcs[])(struct 
drm_i915_private *dev_priv,
 query_perf_config,
 query_memregion_info,
 query_hwconfig_table,
+   query_l3banks,
  };

  int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file 
*file)
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index eb13c601d680..e9ba88fe3db7 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -3099,6 +3099,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
  #defineGEN10_MIRROR_FUSE3  _MMIO(0x9118)
  #define GEN10_L3BANK_PAIR_COUNT 4
  #define GEN10_L3BANK_MASK   0x0F
+#define GEN12_GT_L3_MODE_MASK 0xFF

  #define GEN8_EU_DISABLE0   _MMIO(0x9134)
  #define   GEN8_EU_DIS0_S0_MASK 0xff
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 87d369cae22a..20d18cca5066 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -2234,6 +2234,7 @@ struct drm_i915_query_item {
  #define DRM_I915_QUERY_PERF_CONFIG  3
  #define DRM_I915_QUERY_MEMORY_REGIONS   4
  #def

Re: [Intel-gfx] [PATCH 41/51] drm/i915/guc: Add golden context to GuC ADS

2021-07-19 Thread John Harrison

On 7/19/2021 10:24, Matthew Brost wrote:

On Fri, Jul 16, 2021 at 01:17:14PM -0700, Matthew Brost wrote:

From: John Harrison 

The media watchdog mechanism involves GuC doing a silent reset and
continue of the hung context. This requires the i915 driver provide a
golden context to GuC in the ADS.

Signed-off-by: John Harrison 
Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/intel_gt.c |   2 +
  drivers/gpu/drm/i915/gt/uc/intel_guc.c |   5 +
  drivers/gpu/drm/i915/gt/uc/intel_guc.h |   2 +
  drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 213 ++---
  drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h |   1 +
  drivers/gpu/drm/i915/gt/uc/intel_uc.c  |   5 +
  drivers/gpu/drm/i915/gt/uc/intel_uc.h  |   1 +
  7 files changed, 199 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c 
b/drivers/gpu/drm/i915/gt/intel_gt.c
index acfdd53b2678..ceeb517ba259 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -654,6 +654,8 @@ int intel_gt_init(struct intel_gt *gt)
if (err)
goto err_gt;
  
+	intel_uc_init_late(>->uc);

+
err = i915_inject_probe_error(gt->i915, -EIO);
if (err)
goto err_gt;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index 68266cbffd1f..979128e28372 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -180,6 +180,11 @@ void intel_guc_init_early(struct intel_guc *guc)
}
  }
  
+void intel_guc_init_late(struct intel_guc *guc)

+{
+   intel_guc_ads_init_late(guc);
+}
+
  static u32 guc_ctl_debug_flags(struct intel_guc *guc)
  {
u32 level = intel_guc_log_get_level(&guc->log);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index bc71635c70b9..dc18ac510ac8 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -60,6 +60,7 @@ struct intel_guc {
struct i915_vma *ads_vma;
struct __guc_ads_blob *ads_blob;
u32 ads_regset_size;
+   u32 ads_golden_ctxt_size;
  
  	struct i915_vma *lrc_desc_pool;

void *lrc_desc_pool_vaddr;
@@ -176,6 +177,7 @@ static inline u32 intel_guc_ggtt_offset(struct intel_guc 
*guc,
  }
  
  void intel_guc_init_early(struct intel_guc *guc);

+void intel_guc_init_late(struct intel_guc *guc);
  void intel_guc_init_send_regs(struct intel_guc *guc);
  void intel_guc_write_params(struct intel_guc *guc);
  int intel_guc_init(struct intel_guc *guc);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
index 93b0ac35a508..241b3089b658 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
@@ -7,6 +7,7 @@
  
  #include "gt/intel_gt.h"

  #include "gt/intel_lrc.h"
+#include "gt/shmem_utils.h"
  #include "intel_guc_ads.h"
  #include "intel_guc_fwif.h"
  #include "intel_uc.h"
@@ -33,6 +34,10 @@
   *  +---+ <== dynamic
   *  | padding   |
   *  +---+ <== 4K aligned
+ *  | golden contexts   |
+ *  +---+
+ *  | padding   |
+ *  +---+ <== 4K aligned
   *  | private data  |
   *  +---+
   *  | padding   |
@@ -52,6 +57,11 @@ static u32 guc_ads_regset_size(struct intel_guc *guc)
return guc->ads_regset_size;
  }
  
+static u32 guc_ads_golden_ctxt_size(struct intel_guc *guc)

+{
+   return PAGE_ALIGN(guc->ads_golden_ctxt_size);
+}
+
  static u32 guc_ads_private_data_size(struct intel_guc *guc)
  {
return PAGE_ALIGN(guc->fw.private_data_size);
@@ -62,12 +72,23 @@ static u32 guc_ads_regset_offset(struct intel_guc *guc)
return offsetof(struct __guc_ads_blob, regset);
  }
  
-static u32 guc_ads_private_data_offset(struct intel_guc *guc)

+static u32 guc_ads_golden_ctxt_offset(struct intel_guc *guc)
  {
u32 offset;
  
  	offset = guc_ads_regset_offset(guc) +

 guc_ads_regset_size(guc);
+
+   return PAGE_ALIGN(offset);
+}
+
+static u32 guc_ads_private_data_offset(struct intel_guc *guc)
+{
+   u32 offset;
+
+   offset = guc_ads_golden_ctxt_offset(guc) +
+guc_ads_golden_ctxt_size(guc);
+
return PAGE_ALIGN(offset);
  }
  
@@ -319,53 +340,163 @@ static void guc_mmio_reg_state_init(struct intel_guc *guc,

GEM_BUG_ON(temp_set.size);
  }
  
-/*

- * The first 80 dwords of the register state context, containing the
- * execlists and ppgtt registers.
- */
-#define LR_HW

Re: [Intel-gfx] [PATCH v3 1/4] drm/i915/guc: Limit scheduling properties to avoid overflow

2022-03-09 Thread John Harrison

On 3/8/2022 01:43, Tvrtko Ursulin wrote:

On 03/03/2022 22:37, john.c.harri...@intel.com wrote:

From: John Harrison 

GuC converts the pre-emption timeout and timeslice quantum values into
clock ticks internally. That significantly reduces the point of 32bit
overflow. On current platforms, worst case scenario is approximately
110 seconds. Rather than allowing the user to set higher values and
then get confused by early timeouts, add limits when setting these
values.

v2: Add helper functins for clamping (review feedback from Tvrtko).

Signed-off-by: John Harrison 
Reviewed-by: Daniele Ceraolo Spurio  
(v1)


diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c

index b3a429a92c0d..8208164c25e7 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2218,13 +2218,24 @@ static inline u32 
get_children_join_value(struct intel_context *ce,

 static void guc_context_policy_init(struct intel_engine_cs *engine,
    struct guc_lrc_desc *desc)
 {
+   struct drm_device *drm = &engine->i915->drm;
+
    desc->policy_flags = 0;

    if (engine->flags & I915_ENGINE_WANT_FORCED_PREEMPTION)
    desc->policy_flags |= 
CONTEXT_POLICY_FLAG_PREEMPT_TO_IDLE;


    /* NB: For both of these, zero means disabled. */
+   if (overflows_type(engine->props.timeslice_duration_ms * 1000,
+  desc->execution_quantum))
+   drm_warn_once(drm, "GuC interface cannot support %lums 
timeslice!\n",

+ engine->props.timeslice_duration_ms);
    desc->execution_quantum = engine->props.timeslice_duration_ms 
* 1000;

+
+   if (overflows_type(engine->props.preempt_timeout_ms * 1000,
+  desc->preemption_timeout))
+   drm_warn_once(drm, "GuC interface cannot support %lums 
preemption timeout!\n",

+ engine->props.preempt_timeout_ms);
    desc->preemption_timeout = engine->props.preempt_timeout_ms * 
1000;

 }
As previously explained, this is wrong. If the check must be present 
then it should be a BUG_ON as it is indicative of an internal driver 
failure. There is already a top level helper function for ensuring all 
range checks are done and the value is valid. If that is broken then 
that is a bug and should have been caught in pre-merge testing or code 
review. It is not possible for a bad value to get beyond that helper 
function. That is the whole point of the helper. We do not double bag 
every other value check in the driver. Once you have passed input 
validation, the values are assumed to be correct. Otherwise we would 
have every other line of code be a value check! And if somehow a bad 
value did make it through, simply printing a once shot warning is 
pointless. You are still going to get undefined behaviour potentially 
leading to a totally broken system. E.g. your very big timeout has 
overflowed and become extremely small, thus no batch buffer can ever 
complete because they all get reset before they have even finished the 
context switch in. That is a fundamentally broken system.


John.





With that:

Reviewed-by: Tvrtko Ursulin 

Regards,

Tvrtko


---
  drivers/gpu/drm/i915/gt/intel_engine.h  |  6 ++
  drivers/gpu/drm/i915/gt/intel_engine_cs.c   | 69 +
  drivers/gpu/drm/i915/gt/sysfs_engines.c | 25 +---
  drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h |  9 +++
  4 files changed, 99 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h 
b/drivers/gpu/drm/i915/gt/intel_engine.h

index 1c0ab05c3c40..d7044c4e526e 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -351,4 +351,10 @@ intel_engine_get_hung_context(struct 
intel_engine_cs *engine)

  return engine->hung_ce;
  }
  +u64 intel_clamp_heartbeat_interval_ms(struct intel_engine_cs 
*engine, u64 value);
+u64 intel_clamp_max_busywait_duration_ns(struct intel_engine_cs 
*engine, u64 value);
+u64 intel_clamp_preempt_timeout_ms(struct intel_engine_cs *engine, 
u64 value);
+u64 intel_clamp_stop_timeout_ms(struct intel_engine_cs *engine, u64 
value);
+u64 intel_clamp_timeslice_duration_ms(struct intel_engine_cs 
*engine, u64 value);

+
  #endif /* _INTEL_RINGBUFFER_H_ */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c

index 7447411a5b26..22e70e4e007c 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -442,6 +442,26 @@ static int intel_engine_setup(struct intel_gt 
*gt, enum intel_engine_id id,

  engine->flags |= I915_ENGINE_HAS_EU_PRIORITY;
  }
  +    /* Cap properties according to any system limits */
+#define CLAMP_PROP(field) \
+    do { \
+    u64 clamp = intel_clamp_##

Re: [Intel-gfx] [PATCH v3 4/4] drm/i915: Improve long running OCL w/a for GuC submission

2022-03-09 Thread John Harrison

On 3/8/2022 01:41, Tvrtko Ursulin wrote:

On 03/03/2022 22:37, john.c.harri...@intel.com wrote:

From: John Harrison 

A workaround was added to the driver to allow OpenCL workloads to run
'forever' by disabling pre-emption on the RCS engine for Gen12.
It is not totally unbound as the heartbeat will kick in eventually
and cause a reset of the hung engine.

However, this does not work well in GuC submission mode. In GuC mode,
the pre-emption timeout is how GuC detects hung contexts and triggers
a per engine reset. Thus, disabling the timeout means also losing all
per engine reset ability. A full GT reset will still occur when the
heartbeat finally expires, but that is a much more destructive and
undesirable mechanism.

The purpose of the workaround is actually to give OpenCL tasks longer
to reach a pre-emption point after a pre-emption request has been
issued. This is necessary because Gen12 does not support mid-thread
pre-emption and OpenCL can have long running threads.

So, rather than disabling the timeout completely, just set it to a
'long' value.

v2: Review feedback from Tvrtko - must hard code the 'long' value
instead of determining it algorithmically. So make it an extra CONFIG
definition. Also, remove the execlist centric comment from the
existing pre-emption timeout CONFIG option given that it applies to
more than just execlists.

Signed-off-by: John Harrison 
Reviewed-by: Daniele Ceraolo Spurio  
(v1)

Acked-by: Michal Mrozek 
---
  drivers/gpu/drm/i915/Kconfig.profile  | 26 +++
  drivers/gpu/drm/i915/gt/intel_engine_cs.c |  9 ++--
  2 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/Kconfig.profile 
b/drivers/gpu/drm/i915/Kconfig.profile

index 39328567c200..7cc38d25ee5c 100644
--- a/drivers/gpu/drm/i915/Kconfig.profile
+++ b/drivers/gpu/drm/i915/Kconfig.profile
@@ -57,10 +57,28 @@ config DRM_I915_PREEMPT_TIMEOUT
  default 640 # milliseconds
  help
    How long to wait (in milliseconds) for a preemption event to 
occur
-  when submitting a new context via execlists. If the current 
context
-  does not hit an arbitration point and yield to HW before the 
timer

-  expires, the HW will be reset to allow the more important context
-  to execute.
+  when submitting a new context. If the current context does not 
hit
+  an arbitration point and yield to HW before the timer expires, 
the

+  HW will be reset to allow the more important context to execute.
+
+  This is adjustable via
+  /sys/class/drm/card?/engine/*/preempt_timeout_ms
+
+  May be 0 to disable the timeout.
+
+  The compiled in default may get overridden at driver probe 
time on
+  certain platforms and certain engines which will be reflected 
in the

+  sysfs control.
+
+config DRM_I915_PREEMPT_TIMEOUT_COMPUTE
+    int "Preempt timeout for compute engines (ms, jiffy granularity)"
+    default 7500 # milliseconds
+    help
+  How long to wait (in milliseconds) for a preemption event to 
occur

+  when submitting a new context to a compute capable engine. If the
+  current context does not hit an arbitration point and yield to HW
+  before the timer expires, the HW will be reset to allow the more
+  important context to execute.
      This is adjustable via
    /sys/class/drm/card?/engine/*/preempt_timeout_ms
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c

index 4185c7338581..cc0954ad836a 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -438,9 +438,14 @@ static int intel_engine_setup(struct intel_gt 
*gt, enum intel_engine_id id,

  engine->props.timeslice_duration_ms =
  CONFIG_DRM_I915_TIMESLICE_DURATION;
  -    /* Override to uninterruptible for OpenCL workloads. */
+    /*
+ * Mid-thread pre-emption is not available in Gen12. Unfortunately,
+ * some OpenCL workloads run quite long threads. That means they 
get

+ * reset due to not pre-empting in a timely manner. So, bump the
+ * pre-emption timeout value to be much higher for compute engines.
+ */
  if (GRAPHICS_VER(i915) == 12 && (engine->flags & 
I915_ENGINE_HAS_RCS_REG_STATE))

-    engine->props.preempt_timeout_ms = 0;
+    engine->props.preempt_timeout_ms = 
CONFIG_DRM_I915_PREEMPT_TIMEOUT_COMPUTE;


I wouldn't go as far as adding a config option since as it is it only 
applies to Gen12 but Kconfig text says nothing about that. And I am 
not saying you should add a Gen12 specific config option, that would 
be weird. So IMO just drop it.


You were the one arguing that the driver was illegally overriding the 
user's explicitly chosen settings, including the compile time config 
options. Just having a hardcoded magic number in the driver is the 
absolute worst kind of override there is.


And tec

Re: [Intel-gfx] [PATCH v3 4/4] drm/i915: Improve long running OCL w/a for GuC submission

2022-03-10 Thread John Harrison

On 3/10/2022 01:27, Tvrtko Ursulin wrote:

On 09/03/2022 21:16, John Harrison wrote:

On 3/8/2022 01:41, Tvrtko Ursulin wrote:

On 03/03/2022 22:37, john.c.harri...@intel.com wrote:

From: John Harrison 

A workaround was added to the driver to allow OpenCL workloads to run
'forever' by disabling pre-emption on the RCS engine for Gen12.
It is not totally unbound as the heartbeat will kick in eventually
and cause a reset of the hung engine.

However, this does not work well in GuC submission mode. In GuC mode,
the pre-emption timeout is how GuC detects hung contexts and triggers
a per engine reset. Thus, disabling the timeout means also losing all
per engine reset ability. A full GT reset will still occur when the
heartbeat finally expires, but that is a much more destructive and
undesirable mechanism.

The purpose of the workaround is actually to give OpenCL tasks longer
to reach a pre-emption point after a pre-emption request has been
issued. This is necessary because Gen12 does not support mid-thread
pre-emption and OpenCL can have long running threads.

So, rather than disabling the timeout completely, just set it to a
'long' value.

v2: Review feedback from Tvrtko - must hard code the 'long' value
instead of determining it algorithmically. So make it an extra CONFIG
definition. Also, remove the execlist centric comment from the
existing pre-emption timeout CONFIG option given that it applies to
more than just execlists.

Signed-off-by: John Harrison 
Reviewed-by: Daniele Ceraolo Spurio 
 (v1)

Acked-by: Michal Mrozek 
---
  drivers/gpu/drm/i915/Kconfig.profile  | 26 
+++

  drivers/gpu/drm/i915/gt/intel_engine_cs.c |  9 ++--
  2 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/Kconfig.profile 
b/drivers/gpu/drm/i915/Kconfig.profile

index 39328567c200..7cc38d25ee5c 100644
--- a/drivers/gpu/drm/i915/Kconfig.profile
+++ b/drivers/gpu/drm/i915/Kconfig.profile
@@ -57,10 +57,28 @@ config DRM_I915_PREEMPT_TIMEOUT
  default 640 # milliseconds
  help
    How long to wait (in milliseconds) for a preemption event 
to occur
-  when submitting a new context via execlists. If the current 
context
-  does not hit an arbitration point and yield to HW before the 
timer
-  expires, the HW will be reset to allow the more important 
context

-  to execute.
+  when submitting a new context. If the current context does 
not hit
+  an arbitration point and yield to HW before the timer 
expires, the
+  HW will be reset to allow the more important context to 
execute.

+
+  This is adjustable via
+  /sys/class/drm/card?/engine/*/preempt_timeout_ms
+
+  May be 0 to disable the timeout.
+
+  The compiled in default may get overridden at driver probe 
time on
+  certain platforms and certain engines which will be 
reflected in the

+  sysfs control.
+
+config DRM_I915_PREEMPT_TIMEOUT_COMPUTE
+    int "Preempt timeout for compute engines (ms, jiffy granularity)"
+    default 7500 # milliseconds
+    help
+  How long to wait (in milliseconds) for a preemption event to 
occur
+  when submitting a new context to a compute capable engine. 
If the
+  current context does not hit an arbitration point and yield 
to HW
+  before the timer expires, the HW will be reset to allow the 
more

+  important context to execute.
      This is adjustable via
    /sys/class/drm/card?/engine/*/preempt_timeout_ms
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c

index 4185c7338581..cc0954ad836a 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -438,9 +438,14 @@ static int intel_engine_setup(struct intel_gt 
*gt, enum intel_engine_id id,

  engine->props.timeslice_duration_ms =
  CONFIG_DRM_I915_TIMESLICE_DURATION;
  -    /* Override to uninterruptible for OpenCL workloads. */
+    /*
+ * Mid-thread pre-emption is not available in Gen12. 
Unfortunately,
+ * some OpenCL workloads run quite long threads. That means 
they get

+ * reset due to not pre-empting in a timely manner. So, bump the
+ * pre-emption timeout value to be much higher for compute 
engines.

+ */
  if (GRAPHICS_VER(i915) == 12 && (engine->flags & 
I915_ENGINE_HAS_RCS_REG_STATE))

-    engine->props.preempt_timeout_ms = 0;
+    engine->props.preempt_timeout_ms = 
CONFIG_DRM_I915_PREEMPT_TIMEOUT_COMPUTE;


I wouldn't go as far as adding a config option since as it is it 
only applies to Gen12 but Kconfig text says nothing about that. And 
I am not saying you should add a Gen12 specific config option, that 
would be weird. So IMO just drop it.


You were the one arguing that the driver was illegally overriding the 
user's explicitly chosen settings, including the compile time config 


This is a bit out of contex

Re: [PATCH] drm/i915/guc: Use iosys_map interface to update lrc_desc

2022-03-30 Thread John Harrison

Sorry, only just seen this patch.

Please do not do this!

The entire lrc_desc_pool entity is being dropped as part of the update 
to GuC v70. That's why there was a recent patch set to significantly 
re-organise how/where it is used. That patch set explicitly said - this 
is all in preparation for removing the desc pool entirely.


Merging this change would just cause unnecessary churn and rebase 
conflicts with the v70 update patches that I am working on. Please wait 
until that lands and then see if there is anything left that you think 
still needs to be updated.


John.


On 3/8/2022 08:47, Balasubramani Vivekanandan wrote:

This patch is continuation of the effort to move all pointers in i915,
which at any point may be pointing to device memory or system memory, to
iosys_map interface.
More details about the need of this change is explained in the patch
series which initiated this task
https://patchwork.freedesktop.org/series/99711/

This patch converts all access to the lrc_desc through iosys_map
interfaces.

Cc: Lucas De Marchi 
Cc: John Harrison 
Cc: Matthew Brost 
Cc: Umesh Nerlige Ramappa 
Signed-off-by: Balasubramani Vivekanandan 
---
  drivers/gpu/drm/i915/gt/uc/intel_guc.h|  2 +-
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 68 ---
  2 files changed, 43 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index e439e6c1ac8b..cbbc24dbaf0f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -168,7 +168,7 @@ struct intel_guc {
/** @lrc_desc_pool: object allocated to hold the GuC LRC descriptor 
pool */
struct i915_vma *lrc_desc_pool;
/** @lrc_desc_pool_vaddr: contents of the GuC LRC descriptor pool */
-   void *lrc_desc_pool_vaddr;
+   struct iosys_map lrc_desc_pool_vaddr;
  
  	/**

 * @context_lookup: used to resolve intel_context from guc_id, if a
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 9ec03234d2c2..84b17ded886a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -467,13 +467,14 @@ static u32 *get_wq_pointer(struct guc_process_desc *desc,
return &__get_parent_scratch(ce)->wq[ce->parallel.guc.wqi_tail / 
sizeof(u32)];
  }
  
-static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)

+static void __write_lrc_desc(struct intel_guc *guc, u32 index,
+struct guc_lrc_desc *desc)
  {
-   struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr;
+   unsigned int size = sizeof(struct guc_lrc_desc);
  
  	GEM_BUG_ON(index >= GUC_MAX_CONTEXT_ID);
  
-	return &base[index];

+   iosys_map_memcpy_to(&guc->lrc_desc_pool_vaddr, index * size, desc, 
size);
  }
  
  static inline struct intel_context *__get_context(struct intel_guc *guc, u32 id)

@@ -489,20 +490,28 @@ static int guc_lrc_desc_pool_create(struct intel_guc *guc)
  {
u32 size;
int ret;
+   void *addr;
  
  	size = PAGE_ALIGN(sizeof(struct guc_lrc_desc) *

  GUC_MAX_CONTEXT_ID);
ret = intel_guc_allocate_and_map_vma(guc, size, &guc->lrc_desc_pool,
-(void 
**)&guc->lrc_desc_pool_vaddr);
+&addr);
+
if (ret)
return ret;
  
+	if (i915_gem_object_is_lmem(guc->lrc_desc_pool->obj))

+   iosys_map_set_vaddr_iomem(&guc->lrc_desc_pool_vaddr,
+ (void __iomem *)addr);
+   else
+   iosys_map_set_vaddr(&guc->lrc_desc_pool_vaddr, addr);
+
return 0;
  }
  
  static void guc_lrc_desc_pool_destroy(struct intel_guc *guc)

  {
-   guc->lrc_desc_pool_vaddr = NULL;
+   iosys_map_clear(&guc->lrc_desc_pool_vaddr);
i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP);
  }
  
@@ -513,9 +522,11 @@ static inline bool guc_submission_initialized(struct intel_guc *guc)
  
  static inline void _reset_lrc_desc(struct intel_guc *guc, u32 id)

  {
-   struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
+   unsigned int size = sizeof(struct guc_lrc_desc);
  
-	memset(desc, 0, sizeof(*desc));

+   GEM_BUG_ON(id >= GUC_MAX_CONTEXT_ID);
+
+   iosys_map_memset(&guc->lrc_desc_pool_vaddr, id * size, 0, size);
  }
  
  static inline bool ctx_id_mapped(struct intel_guc *guc, u32 id)

@@ -2233,7 +2244,7 @@ static void prepare_context_registration_info(struct 
intel_context *ce)
struct intel_engine_cs *engine = ce->engine;
struct intel_guc *guc = &engine->gt->uc.guc;
u32 ctx_id = ce->guc_id.id;
-   struct guc_lrc_desc *desc;
+   struct guc_lrc_desc

Re: [PATCH] drm/i915/guc: Initialize GuC submission locks and queues early

2022-02-18 Thread John Harrison

On 2/14/2022 17:11, Daniele Ceraolo Spurio wrote:

Move initialization of submission-related spinlock, lists and workers to
init_early. This fixes an issue where if the GuC init fails we might
still try to get the lock in the context cleanup code. Note that it is
safe to call the GuC context cleanup code even if the init failed
because all contexts are initialized with an invalid GuC ID, which will
cause the GuC side of the cleanup to be skipped, so it is easier to just
make sure the variables are initialized than to special case the cleanup
to handle the case when they're not.

References: https://gitlab.freedesktop.org/drm/intel/-/issues/4932
Signed-off-by: Daniele Ceraolo Spurio 
Cc: Matthew Brost 
Cc: John Harrison 

Reviewed-by: John Harrison 


---
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 27 ++-
  1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index b3a429a92c0da..2160da2c83cbf 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1818,24 +1818,11 @@ int intel_guc_submission_init(struct intel_guc *guc)
 */
GEM_BUG_ON(!guc->lrc_desc_pool);
  
-	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);

-
-   spin_lock_init(&guc->submission_state.lock);
-   INIT_LIST_HEAD(&guc->submission_state.guc_id_list);
-   ida_init(&guc->submission_state.guc_ids);
-   INIT_LIST_HEAD(&guc->submission_state.destroyed_contexts);
-   INIT_WORK(&guc->submission_state.destroyed_worker,
- destroyed_worker_func);
-   INIT_WORK(&guc->submission_state.reset_fail_worker,
- reset_fail_worker_func);
-
guc->submission_state.guc_ids_bitmap =
bitmap_zalloc(NUMBER_MULTI_LRC_GUC_ID(guc), GFP_KERNEL);
if (!guc->submission_state.guc_ids_bitmap)
return -ENOMEM;
  
-	spin_lock_init(&guc->timestamp.lock);

-   INIT_DELAYED_WORK(&guc->timestamp.work, guc_timestamp_ping);
guc->timestamp.ping_delay = (POLL_TIME_CLKS / gt->clock_frequency + 1) 
* HZ;
guc->timestamp.shift = gpm_timestamp_shift(gt);
  
@@ -3831,6 +3818,20 @@ static bool __guc_submission_selected(struct intel_guc *guc)
  
  void intel_guc_submission_init_early(struct intel_guc *guc)

  {
+   xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
+
+   spin_lock_init(&guc->submission_state.lock);
+   INIT_LIST_HEAD(&guc->submission_state.guc_id_list);
+   ida_init(&guc->submission_state.guc_ids);
+   INIT_LIST_HEAD(&guc->submission_state.destroyed_contexts);
+   INIT_WORK(&guc->submission_state.destroyed_worker,
+ destroyed_worker_func);
+   INIT_WORK(&guc->submission_state.reset_fail_worker,
+ reset_fail_worker_func);
+
+   spin_lock_init(&guc->timestamp.lock);
+   INIT_DELAYED_WORK(&guc->timestamp.work, guc_timestamp_ping);
+
guc->submission_state.num_guc_ids = GUC_MAX_LRC_DESCRIPTORS;
guc->submission_supported = __guc_submission_supported(guc);
guc->submission_selected = __guc_submission_selected(guc);




Re: [Intel-gfx] [PATCH v2] drm/i915/guc: Do not complain about stale reset notifications

2022-02-22 Thread John Harrison

On 2/22/2022 17:39, Ceraolo Spurio, Daniele wrote:

On 2/11/2022 5:04 PM, john.c.harri...@intel.com wrote:

From: John Harrison 

It is possible for reset notifications to arrive for a context that is
in the process of being banned. So don't flag these as an error, just
report it as informational (because it is still useful to know that
resets are happening even if they are being ignored).

v2: Better wording for the message (review feedback from Tvrtko).

Signed-off-by: John Harrison 
---
  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c

index b3a429a92c0d..3afff24b8f24 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -4022,10 +4022,10 @@ static void guc_handle_context_reset(struct 
intel_guc *guc,

  capture_error_state(guc, ce);
  guc_context_replay(ce);
  } else {
-    drm_err(&guc_to_gt(guc)->i915->drm,
-    "Invalid GuC engine reset notificaion for 0x%04X on %s: 
banned = %d, blocked = %d",
-    ce->guc_id.id, ce->engine->name, 
intel_context_is_banned(ce),

-    context_blocked(ce));
+    drm_info(&guc_to_gt(guc)->i915->drm,
+ "Ignoring context reset notification for 0x%04X on %s: 
banned = %d, blocked = %d",


The if statement above checks for !banned, so if we're here we're 
banned for sure, no need to print it as if it was conditional. I'd 
reword it as something like: "Ignoring reset notification for banned 
context 0x%04X ...". With that:
Hmm. The patch was based on an older tree that had an extra term in the 
if. Seems like the patch applied cleanly and I didn't check the 
surrounding code! Will update it to drop the banned and blocked values.


John.




Reviewed-by: Daniele Ceraolo Spurio 

Daniele

+ ce->guc_id.id, ce->engine->name, 
intel_context_is_banned(ce),

+ context_blocked(ce));
  }
  }






Re: [Intel-gfx] [PATCH 1/3] drm/i915/guc: Limit scheduling properties to avoid overflow

2022-02-22 Thread John Harrison

On 2/22/2022 01:52, Tvrtko Ursulin wrote:

On 18/02/2022 21:33, john.c.harri...@intel.com wrote:

From: John Harrison 

GuC converts the pre-emption timeout and timeslice quantum values into
clock ticks internally. That significantly reduces the point of 32bit
overflow. On current platforms, worst case scenario is approximately


Where does 32-bit come from, the GuC side? We already use 64-bits so 
that something to fix to start with. Yep...
Yes, the GuC API is defined as 32bits only and then does a straight 
multiply by the clock speed with no range checking. We have requested 
64bit support but there was push back on the grounds that it is not 
something the GuC timer hardware supports and such long timeouts are not 
real world usable anyway.





./gt/uc/intel_guc_fwif.h:   u32 execution_quantum;

./gt/uc/intel_guc_submission.c: desc->execution_quantum = 
engine->props.timeslice_duration_ms * 1000;


./gt/intel_engine_types.h:  unsigned long 
timeslice_duration_ms;


timeslice_store/preempt_timeout_store:
err = kstrtoull(buf, 0, &duration);

So both kconfig and sysfs can already overflow GuC, not only because 
of tick conversion internally but because at backend level nothing was 
done for assigning 64-bit into 32-bit. Or I failed to find where it is 
handled.
That's why I'm adding this range check to make sure we don't allow 
overflows.





110 seconds. Rather than allowing the user to set higher values and
then get confused by early timeouts, add limits when setting these
values.


Btw who is reviewing GuC patches these days - things have somehow 
gotten pretty quiet in activity and I don't think that's due absence 
of stuff to improve or fix? Asking since I think I noticed a few 
already which you posted and then crickets on the mailing list.

Too much work to do and not enough engineers to do it all :(.





Signed-off-by: John Harrison 
---
  drivers/gpu/drm/i915/gt/intel_engine_cs.c   | 15 +++
  drivers/gpu/drm/i915/gt/sysfs_engines.c | 14 ++
  drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h |  9 +
  3 files changed, 38 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c

index e53008b4dd05..2a1e9f36e6f5 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -389,6 +389,21 @@ static int intel_engine_setup(struct intel_gt 
*gt, enum intel_engine_id id,

  if (GRAPHICS_VER(i915) == 12 && engine->class == RENDER_CLASS)
  engine->props.preempt_timeout_ms = 0;
  +    /* Cap timeouts to prevent overflow inside GuC */
+    if (intel_guc_submission_is_wanted(>->uc.guc)) {
+    if (engine->props.timeslice_duration_ms > 
GUC_POLICY_MAX_EXEC_QUANTUM_MS) {


Hm "wanted".. There's been too much back and forth on the GuC load 
options over the years to keep track.. intel_engine_uses_guc work 
sounds like would work and read nicer.
I'm not adding a new feature check here. I'm just using the existing 
one. If we want to rename it yet again then that would be a different 
patch set.




And limit to class instead of applying to all engines looks like a miss.

As per follow up email, the class limit is not applied here.



+ drm_info(&engine->i915->drm, "Warning, clamping timeslice duration 
to %d to prevent possibly overflow\n",

+ GUC_POLICY_MAX_EXEC_QUANTUM_MS);
+    engine->props.timeslice_duration_ms = 
GUC_POLICY_MAX_EXEC_QUANTUM_MS;


I am not sure logging such message during driver load is useful. 
Sounds more like a confused driver which starts with one value and 
then overrides itself. I'd just silently set the value appropriate for 
the active backend. Preemption timeout kconfig text already documents 
the fact timeouts can get overriden at runtime depending on 
platform+engine. So maybe just add same text to timeslice kconfig.
The point is to make people aware if they compile with unsupported 
config options. As far as I know, there is no way to apply range 
checking or other limits to config defines. Which means that a user 
would silently get unwanted behaviour. That seems like a bad thing to 
me. If the driver is confused because the user built it in a confused 
manner then we should let them know.






+    }
+
+    if (engine->props.preempt_timeout_ms > 
GUC_POLICY_MAX_PREEMPT_TIMEOUT_MS) {
+    drm_info(&engine->i915->drm, "Warning, clamping 
pre-emption timeout to %d to prevent possibly overflow\n",

+ GUC_POLICY_MAX_PREEMPT_TIMEOUT_MS);
+    engine->props.preempt_timeout_ms = 
GUC_POLICY_MAX_PREEMPT_TIMEOUT_MS;

+    }
+    }
+
  engine->defaults = engine->props; /* never to change again */
    engine->context_size = intel_engine_context_size(gt, 
engine->class);
diff --git a/drivers/gpu/dr

Re: [Intel-gfx] [PATCH 1/3] drm/i915/guc: Limit scheduling properties to avoid overflow

2022-02-22 Thread John Harrison

On 2/22/2022 16:52, Ceraolo Spurio, Daniele wrote:

On 2/18/2022 1:33 PM, john.c.harri...@intel.com wrote:

From: John Harrison 

GuC converts the pre-emption timeout and timeslice quantum values into
clock ticks internally. That significantly reduces the point of 32bit
overflow. On current platforms, worst case scenario is approximately
110 seconds. Rather than allowing the user to set higher values and
then get confused by early timeouts, add limits when setting these
values.

Signed-off-by: John Harrison 
---
  drivers/gpu/drm/i915/gt/intel_engine_cs.c   | 15 +++
  drivers/gpu/drm/i915/gt/sysfs_engines.c | 14 ++
  drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h |  9 +
  3 files changed, 38 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c

index e53008b4dd05..2a1e9f36e6f5 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -389,6 +389,21 @@ static int intel_engine_setup(struct intel_gt 
*gt, enum intel_engine_id id,

  if (GRAPHICS_VER(i915) == 12 && engine->class == RENDER_CLASS)
  engine->props.preempt_timeout_ms = 0;
  +    /* Cap timeouts to prevent overflow inside GuC */
+    if (intel_guc_submission_is_wanted(>->uc.guc)) {
+    if (engine->props.timeslice_duration_ms > 
GUC_POLICY_MAX_EXEC_QUANTUM_MS) {
+    drm_info(&engine->i915->drm, "Warning, clamping 
timeslice duration to %d to prevent possibly overflow\n",


I'd drop the word "possibly"


+ GUC_POLICY_MAX_EXEC_QUANTUM_MS);
+    engine->props.timeslice_duration_ms = 
GUC_POLICY_MAX_EXEC_QUANTUM_MS;

+    }
+
+    if (engine->props.preempt_timeout_ms > 
GUC_POLICY_MAX_PREEMPT_TIMEOUT_MS) {
+    drm_info(&engine->i915->drm, "Warning, clamping 
pre-emption timeout to %d to prevent possibly overflow\n",

+ GUC_POLICY_MAX_PREEMPT_TIMEOUT_MS);
+    engine->props.preempt_timeout_ms = 
GUC_POLICY_MAX_PREEMPT_TIMEOUT_MS;

+    }
+    }
+
  engine->defaults = engine->props; /* never to change again */
    engine->context_size = intel_engine_context_size(gt, 
engine->class);
diff --git a/drivers/gpu/drm/i915/gt/sysfs_engines.c 
b/drivers/gpu/drm/i915/gt/sysfs_engines.c

index 967031056202..f57efe026474 100644
--- a/drivers/gpu/drm/i915/gt/sysfs_engines.c
+++ b/drivers/gpu/drm/i915/gt/sysfs_engines.c
@@ -221,6 +221,13 @@ timeslice_store(struct kobject *kobj, struct 
kobj_attribute *attr,

  if (duration > jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT))
  return -EINVAL;
  +    if (intel_uc_uses_guc_submission(&engine->gt->uc) &&
+    duration > GUC_POLICY_MAX_EXEC_QUANTUM_MS) {
+    duration = GUC_POLICY_MAX_EXEC_QUANTUM_MS;
+    drm_info(&engine->i915->drm, "Warning, clamping timeslice 
duration to %lld to prevent possibly overflow\n",

+ duration);
+    }
+
  WRITE_ONCE(engine->props.timeslice_duration_ms, duration);
    if (execlists_active(&engine->execlists))
@@ -325,6 +332,13 @@ preempt_timeout_store(struct kobject *kobj, 
struct kobj_attribute *attr,

  if (timeout > jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT))
  return -EINVAL;
  +    if (intel_uc_uses_guc_submission(&engine->gt->uc) &&
+    timeout > GUC_POLICY_MAX_PREEMPT_TIMEOUT_MS) {
+    timeout = GUC_POLICY_MAX_PREEMPT_TIMEOUT_MS;
+    drm_info(&engine->i915->drm, "Warning, clamping pre-emption 
timeout to %lld to prevent possibly overflow\n",

+ timeout);
+    }
+
  WRITE_ONCE(engine->props.preempt_timeout_ms, timeout);
    if (READ_ONCE(engine->execlists.pending[0]))
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h

index 6a4612a852e2..ad131092f8df 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -248,6 +248,15 @@ struct guc_lrc_desc {
    #define GLOBAL_POLICY_DEFAULT_DPC_PROMOTE_TIME_US 50
  +/*
+ * GuC converts the timeout to clock ticks internally. Different 
platforms have
+ * different GuC clocks. Thus, the maximum value before overflow is 
platform
+ * dependent. Current worst case scenario is about 110s. So, limit 
to 100s to be

+ * safe.
+ */
+#define GUC_POLICY_MAX_EXEC_QUANTUM_MS    (100 * 1000)
+#define GUC_POLICY_MAX_PREEMPT_TIMEOUT_MS    (100 * 1000)


Those values don't seem to be defined in the GuC interface. If I'm 
correct, IMO we need to ask the GuC team to add them in, because it 
shouldn't be our responsibility to convert from ms to GuC clocks, 
considering that the interface is in ms. Not a blocker for this patch.


As per other reply, no. GuC doesn't give us any hints or clues on any 
limits of these values.

Re: [Intel-gfx] [PATCH 0/3] Improve anti-pre-emption w/a for compute workloads

2022-02-22 Thread John Harrison

On 2/22/2022 01:53, Tvrtko Ursulin wrote:

On 18/02/2022 21:33, john.c.harri...@intel.com wrote:

From: John Harrison 

Compute workloads are inherently not pre-emptible on current hardware.
Thus the pre-emption timeout was disabled as a workaround to prevent
unwanted resets. Instead, the hang detection was left to the heartbeat
and its (longer) timeout. This is undesirable with GuC submission as
the heartbeat is a full GT reset rather than a per engine reset and so
is much more destructive. Instead, just bump the pre-emption timeout


Can we have a feature request to allow asking GuC for an engine reset?

For what purpose?

GuC manages the scheduling of contexts across engines. With virtual 
engines, the KMD has no knowledge of which engine a context might be 
executing on. Even without virtual engines, the KMD still has no 
knowledge of which context is currently executing on any given engine at 
any given time.


There is a reason why hang detection should be left to the entity that 
is doing the scheduling. Any other entity is second guessing at best.


The reason for keeping the heartbeat around even when GuC submission is 
enabled is for the case where the KMD/GuC have got out of sync with 
either other somehow or GuC itself has just crashed. I.e. when no 
submission at all is working and we need to reset the GuC itself and 
start over.


John.




Regards,

Tvrtko


to a big value. Also, update the heartbeat to allow such a long
pre-emption delay in the final heartbeat period.

Signed-off-by: John Harrison 


John Harrison (3):
   drm/i915/guc: Limit scheduling properties to avoid overflow
   drm/i915/gt: Make the heartbeat play nice with long pre-emption
 timeouts
   drm/i915: Improve long running OCL w/a for GuC submission

  drivers/gpu/drm/i915/gt/intel_engine_cs.c | 37 +--
  .../gpu/drm/i915/gt/intel_engine_heartbeat.c  | 16 
  drivers/gpu/drm/i915/gt/sysfs_engines.c   | 14 +++
  drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |  9 +
  4 files changed, 73 insertions(+), 3 deletions(-)





Re: [Intel-gfx] [PATCH 2/3] drm/i915/gt: Make the heartbeat play nice with long pre-emption timeouts

2022-02-22 Thread John Harrison

On 2/22/2022 03:19, Tvrtko Ursulin wrote:

On 18/02/2022 21:33, john.c.harri...@intel.com wrote:

From: John Harrison 

Compute workloads are inherantly not pre-emptible for long periods on
current hardware. As a workaround for this, the pre-emption timeout
for compute capable engines was disabled. This is undesirable with GuC
submission as it prevents per engine reset of hung contexts. Hence the
next patch will re-enable the timeout but bumped up by an order of
magnititude.


(Some typos above.)

I'm spotting 'inherently' but not anything else.




However, the heartbeat might not respect that. Depending upon current
activity, a pre-emption to the heartbeat pulse might not even be
attempted until the last heartbeat period. Which means that only one


Might not be attempted, but could be if something is running with 
lower priority. In which case I think special casing the last 
heartbeat does not feel right because it can end up resetting the 
engine before it was intended.


Like if first heartbeat decides to preempt (the decision is backend 
specific, could be same prio + timeslicing), and preempt timeout has 
been set to heartbeat interval * 3, then 2nd heartbeat gets queued up, 
then 3rd, and so reset is triggered even before the first preempt 
timeout legitimately expires (or just as it is about to react).


Instead, how about preempt timeout is always considered when 
calculating when to emit the next heartbeat? End result would be 
similar to your patch, in terms of avoiding the direct problem, 
although hang detection would be overall longer (but more correct I 
think).


And it also means in the next patch you don't have to add coupling 
between preempt timeout and heartbeat to intel_engine_setup. Instead 
just some long preempt timeout would be needed. Granted, the 
decoupling argument is not super strong since then the heartbeat code 
has the coupling instead, but that still feels better to me. (Since we 
can say heartbeats only make sense on loaded engines, and so things 
like preempt timeout can legitimately be considered from there.)


Incidentally, that would be similar to a patch which Chris had a year 
ago 
(https://patchwork.freedesktop.org/patch/419783/?series=86841&rev=1) 
to fix some CI issue.



I'm not following your arguments.

Chris' patch is about not having two i915 based resets triggered 
concurrently - i915 based engine reset and i915 based GT reset. The 
purpose of this patch is to allow the GuC based engine reset to have a 
chance to occur before the i915 based GT reset kicks in.


It sounds like your argument above is about making the engine reset 
slower so that it doesn't happen before the appropriate heartbeat period 
for that potential reset scenario has expired. I don't see why that is 
at all necessary or useful.


If an early heartbeat period triggers an engine reset then the heartbeat 
pulse will go through. The heartbeat will thus see a happy system and 
not do anything further. If the given period does not trigger an engine 
reset but still does not get the pulse through (because the pulse is of 
too low a priority) then we move on to the next period and bump the 
priority. If the pre-emption has actually already been triggered anyway 
(and we are just waiting a while for it to timeout) then that's fine. 
The priority bump will have no effect because the context is already 
attempting to run. The heartbeat code doesn't care which priority level 
actually triggers the reset. It just cares whether or not the pulse 
finally makes it through. And the GuC doesn't care which heartbeat 
period the i915 is in. All it knows is that it has a request to schedule 
and whether the current context is pre-empting or not. So if period #1 
triggers the pre-emption but the timeout doesn't happen until period #3, 
who cares? The result is the same as if period #3 triggered the 
pre-emption and the timeout was shorter. The result being that the hung 
context is reset, the pulse makes it through and the heartbeat goes to 
sleep again.


The only period that really matters is the final one. At that point the 
pulse request is at highest priority and so must trigger a pre-emption 
request. We then need at least one full pre-emption period (plus some 
wiggle room for random delays in reset time, context switching, 
processing messages, etc.) to allow the GuC based timeout and reset to 
occur. Hence ensuring that the final heartbeat period is at least twice 
the pre-emption timeout (because 1.25 times is just messy when working 
with ints!).


That guarantees that GuC will get at least one complete opportunity to 
detect and recover the hang before i915 nukes the universe.


Whereas, bumping all heartbeat periods to be greater than the 
pre-emption timeout is wasteful and unnecessary. That leads to a total 
heartbeat time of about a minute. Which is a very long time to wait for 
a hang to be detected and recovered. Especi

Re: [Intel-gfx] [PATCH 1/3] drm/i915/guc: Limit scheduling properties to avoid overflow

2022-02-23 Thread John Harrison

On 2/23/2022 04:13, Tvrtko Ursulin wrote:

On 23/02/2022 02:11, John Harrison wrote:

On 2/22/2022 01:52, Tvrtko Ursulin wrote:

On 18/02/2022 21:33, john.c.harri...@intel.com wrote:

From: John Harrison 

GuC converts the pre-emption timeout and timeslice quantum values into
clock ticks internally. That significantly reduces the point of 32bit
overflow. On current platforms, worst case scenario is approximately


Where does 32-bit come from, the GuC side? We already use 64-bits so 
that something to fix to start with. Yep...
Yes, the GuC API is defined as 32bits only and then does a straight 
multiply by the clock speed with no range checking. We have requested 
64bit support but there was push back on the grounds that it is not 
something the GuC timer hardware supports and such long timeouts are 
not real world usable anyway.


As long as compute are happy with 100 seconds, then it "should be 
enough for everbody". :D

Compute disable all forms of reset and rely on manual kill. So yes.

But even if they aren't. That's all we can do at the moment. If there is 
a genuine customer requirement for more then we can push for full 64bit 
software implemented timers in the GuC but until that happens, we don't 
have much choice.






./gt/uc/intel_guc_fwif.h:   u32 execution_quantum;

./gt/uc/intel_guc_submission.c: desc->execution_quantum = 
engine->props.timeslice_duration_ms * 1000;


./gt/intel_engine_types.h:  unsigned long 
timeslice_duration_ms;


timeslice_store/preempt_timeout_store:
err = kstrtoull(buf, 0, &duration);

So both kconfig and sysfs can already overflow GuC, not only because 
of tick conversion internally but because at backend level nothing 
was done for assigning 64-bit into 32-bit. Or I failed to find where 
it is handled.
That's why I'm adding this range check to make sure we don't allow 
overflows.


Yes and no, this fixes it, but the first bug was not only due GuC 
internal tick conversion. It was present ever since the u64 from i915 
was shoved into u32 sent to GuC. So even if GuC used the value without 
additional multiplication, bug was be there. My point being when GuC 
backend was added timeout_ms values should have been limited/clamped 
to U32_MAX. The tick discovery is additional limit on top.
I'm not disagreeing. I'm just saying that the truncation wasn't noticed 
until I actually tried using very long timeouts to debug a particular 
problem. Now that it is noticed, we need some method of range checking 
and this simple clamp solves all the truncation problems.






110 seconds. Rather than allowing the user to set higher values and
then get confused by early timeouts, add limits when setting these
values.


Btw who is reviewing GuC patches these days - things have somehow 
gotten pretty quiet in activity and I don't think that's due absence 
of stuff to improve or fix? Asking since I think I noticed a few 
already which you posted and then crickets on the mailing list.

Too much work to do and not enough engineers to do it all :(.





Signed-off-by: John Harrison 
---
  drivers/gpu/drm/i915/gt/intel_engine_cs.c   | 15 +++
  drivers/gpu/drm/i915/gt/sysfs_engines.c | 14 ++
  drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h |  9 +
  3 files changed, 38 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c

index e53008b4dd05..2a1e9f36e6f5 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -389,6 +389,21 @@ static int intel_engine_setup(struct intel_gt 
*gt, enum intel_engine_id id,

  if (GRAPHICS_VER(i915) == 12 && engine->class == RENDER_CLASS)
  engine->props.preempt_timeout_ms = 0;
  +    /* Cap timeouts to prevent overflow inside GuC */
+    if (intel_guc_submission_is_wanted(>->uc.guc)) {
+    if (engine->props.timeslice_duration_ms > 
GUC_POLICY_MAX_EXEC_QUANTUM_MS) {


Hm "wanted".. There's been too much back and forth on the GuC load 
options over the years to keep track.. intel_engine_uses_guc work 
sounds like would work and read nicer.
I'm not adding a new feature check here. I'm just using the existing 
one. If we want to rename it yet again then that would be a different 
patch set.


$ grep intel_engine_uses_guc . -rl
./i915_perf.c
./i915_request.c
./selftests/intel_scheduler_helpers.c
./gem/i915_gem_context.c
./gt/intel_context.c
./gt/intel_engine.h
./gt/intel_engine_cs.c
./gt/intel_engine_heartbeat.c
./gt/intel_engine_pm.c
./gt/intel_reset.c
./gt/intel_lrc.c
./gt/selftest_context.c
./gt/selftest_engine_pm.c
./gt/selftest_hangcheck.c
./gt/selftest_mocs.c
./gt/selftest_workarounds.c

Sounds better to me than intel_guc_submission_is_wanted. What does the 
reader know whether "is wanted" translates to "is actually used". 
Shrug on "is 

Re: [Intel-gfx] [PATCH 2/3] drm/i915/gt: Make the heartbeat play nice with long pre-emption timeouts

2022-02-23 Thread John Harrison

On 2/23/2022 05:58, Tvrtko Ursulin wrote:

On 23/02/2022 02:45, John Harrison wrote:

On 2/22/2022 03:19, Tvrtko Ursulin wrote:

On 18/02/2022 21:33, john.c.harri...@intel.com wrote:

From: John Harrison 

Compute workloads are inherantly not pre-emptible for long periods on
current hardware. As a workaround for this, the pre-emption timeout
for compute capable engines was disabled. This is undesirable with GuC
submission as it prevents per engine reset of hung contexts. Hence the
next patch will re-enable the timeout but bumped up by an order of
magnititude.


(Some typos above.)

I'm spotting 'inherently' but not anything else.


Magnititude! O;)

Doh!

[snip]

Whereas, bumping all heartbeat periods to be greater than the 
pre-emption timeout is wasteful and unnecessary. That leads to a 
total heartbeat time of about a minute. Which is a very long time to 
wait for a hang to be detected and recovered. Especially when the 
official limit on a context responding to an 'are you dead' query is 
only 7.5 seconds.


Not sure how did you get one minute?
7.5 * 2 (to be safe) = 15. 15 * 5 (number of heartbeat periods) = 75 => 
1 minute 15 seconds


Even ignoring any safety factor and just going with 7.5 * 5 still gets 
you to 37.5 seconds which is over a half a minute and likely to race.




Regardless, crux of argument was to avoid GuC engine reset and 
heartbeat reset racing with each other, and to do that by considering 
the preempt timeout with the heartbeat interval. I was thinking about 
this scenario in this series:


[Please use fixed width font and no line wrap to view.]

A)

tP = preempt timeout
tH = hearbeat interval

tP = 3 * tH

1) Background load = I915_PRIORITY_DISPLAY

<-- [tH] --> Pulse1 <-- [tH] --> Pulse2 <-- [tH] --> Pulse3 < [2 * 
tH] > FULL RESET

   |
   \- preemption 
triggered, tP = 3 * tH --\

\-> preempt timeout would hit here

Here we have collateral damage due full reset, since we can't tell GuC 
to reset just one engine and we fudged tP just to "account" for 
heartbeats.
You are missing the whole point of the patch series which is that the 
last heartbeat period is '2 * tP' not '2 * tH'.

+        longer = READ_ONCE(engine->props.preempt_timeout_ms) * 2;

By making the last period double the pre-emption timeout, it is 
guaranteed that the FULL RESET stage cannot be hit before the hardware 
has attempted and timed-out on at least one pre-emption.


[snip]


<-- [tH] --> Pulse1 <-- [tH] --> Pulse2 <-- [tH] --> Pulse3 < [2 * 
tH] > full reset would be here

   |
   \- preemption triggered, tP = 3 * tH \
\-> Preempt timeout reset

Here is is kind of least worse, but question is why we fudged tP when 
it gives us nothing good in this case.


The point of fudging tP(RCS) is to give compute workloads longer to 
reach a pre-emptible point (given that EU walkers are basically not 
pre-emptible). The reason for doing the fudge is not connected to the 
heartbeat at all. The fact that it causes problems for the heartbeat is 
an undesired side effect.


Note that the use of 'tP(RCS) = tH * 3' was just an arbitrary 
calculation that gave us something that all interested parties were 
vaguely happy with. It could just as easily be a fixed, hard coded value 
of 7.5s but having it based on something configurable seemed more 
sensible. The other option was 'tP(RCS) = tP * 12' but that felt more 
arbitrary than basing it on the average heartbeat timeout. As in, three 
heartbeat periods is about what a normal prio task gets before it gets 
pre-empted by the heartbeat. So using that for general purpose 
pre-emptions (e.g. time slicing between multiple user apps) seems 
reasonable.




B)

Instead, my idea to account for preempt timeout when calculating when 
to schedule next hearbeat would look like this:


First of all tP can be left at a large value unrelated to tH. Lets say 
tP = 640ms. tH stays 2.5s.
640ms is not 'large'. The requirement is either zero (disabled) or 
region of 7.5s. The 640ms figure is the default for non-compute engines. 
Anything that can run EUs needs to be 'huge'.





1) Background load = I915_PRIORITY_DISPLAY

<-- [tH + tP] --> Pulse1 <-- [tH + tP] --> Pulse2 <-- [tH + tP] --> 
Pulse3 <-- [tH + tP] --> full reset would be here
Sure, this works but each period is now 2.5 + 7.5 = 10s. The full five 
periods is therefore 50s, which is practically a minute.


[snip]


Am I missing some requirement or you see another problem with this idea?

On a related topic, if GuC engine resets stop working when preempt 
timeout is set to zero - I think we need to somehow let the user 
know if they try to tweak it via sysfs. Perhaps go as far as -EINVA

Re: [Intel-gfx] [PATCH 5/8] drm/i915/guc: Move lrc desc setup to where it is needed

2022-02-23 Thread John Harrison

On 2/22/2022 17:12, Ceraolo Spurio, Daniele wrote:

On 2/17/2022 3:52 PM, john.c.harri...@intel.com wrote:

From: John Harrison 

The LRC descriptor was being initialised early on in the context
registration sequence. It could then be determined that the actual
registration needs to be delayed and the descriptor would be wiped
out. This is inefficient, so move the setup to later in the process
after the point of no return.

Signed-off-by: John Harrison 
---
  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 11 +--
  1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c

index 0ab2d1a24bf6..aa74ec74194a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2153,6 +2153,8 @@ static int __guc_action_register_context(struct 
intel_guc *guc,

   0, loop);
  }
  +static void prepare_context_registration_info(struct intel_context 
*ce);

+
  static int register_context(struct intel_context *ce, bool loop)
  {
  struct intel_guc *guc = ce_to_guc(ce);
@@ -2163,6 +2165,8 @@ static int register_context(struct 
intel_context *ce, bool loop)

  GEM_BUG_ON(intel_context_is_child(ce));
  trace_intel_context_register(ce);
  +    prepare_context_registration_info(ce);
+
  if (intel_context_is_parent(ce))
  ret = __guc_action_register_multi_lrc(guc, ce, ce->guc_id.id,
    offset, loop);
@@ -2246,7 +2250,6 @@ static void 
prepare_context_registration_info(struct intel_context *ce)

  struct intel_context *child;
    GEM_BUG_ON(!engine->mask);
-    GEM_BUG_ON(!sched_state_is_init(ce));
    /*
   * Ensure LRC + CT vmas are is same region as write barrier is 
done
@@ -2314,9 +2317,13 @@ static int try_context_registration(struct 
intel_context *ce, bool loop)

  bool context_registered;
  int ret = 0;
  +    GEM_BUG_ON(!sched_state_is_init(ce));
+
  context_registered = ctx_id_mapped(guc, desc_idx);
  -    prepare_context_registration_info(ce);
+    if (context_registered)
+    clr_ctx_id_mapping(guc, desc_idx);
+    set_ctx_id_mapping(guc, desc_idx, ce);


I think we can do the clr unconditionally. Also, should we drop the 
clr/set pair in prepare_context_registration_info? it shouldn't be 
needed, unless I'm missing a path where we don;t pass through here.


Daniele

I don't believe so.

The point is that the context id might have changed (it got stolen, 
re-used, etc. - all the state machine code below can cause aborts and 
retries and such like if something is pending and the register needs to 
be delayed). So we need to clear out the old mapping and add a new one 
to be safe. Also, I'm not sure if it is safe to do a xa_store to an 
already used entry as an update or if you are supposed to clear it 
first? But that's what the code did before and I'm trying to not change 
any actual behaviour here.


John.




    /*
   * The context_lookup xarray is used to determine if the hardware






Re: [Intel-gfx] [PATCH 1/3] drm/i915/guc: Limit scheduling properties to avoid overflow

2022-02-24 Thread John Harrison

On 2/24/2022 01:59, Tvrtko Ursulin wrote:

On 23/02/2022 19:03, John Harrison wrote:

On 2/23/2022 04:13, Tvrtko Ursulin wrote:

On 23/02/2022 02:11, John Harrison wrote:

On 2/22/2022 01:52, Tvrtko Ursulin wrote:

On 18/02/2022 21:33, john.c.harri...@intel.com wrote:

From: John Harrison 

GuC converts the pre-emption timeout and timeslice quantum values 
into
clock ticks internally. That significantly reduces the point of 
32bit

overflow. On current platforms, worst case scenario is approximately


Where does 32-bit come from, the GuC side? We already use 64-bits 
so that something to fix to start with. Yep...
Yes, the GuC API is defined as 32bits only and then does a straight 
multiply by the clock speed with no range checking. We have 
requested 64bit support but there was push back on the grounds that 
it is not something the GuC timer hardware supports and such long 
timeouts are not real world usable anyway.


As long as compute are happy with 100 seconds, then it "should be 
enough for everbody". :D

Compute disable all forms of reset and rely on manual kill. So yes.

But even if they aren't. That's all we can do at the moment. If there 
is a genuine customer requirement for more then we can push for full 
64bit software implemented timers in the GuC but until that happens, 
we don't have much choice.


Yeah.







./gt/uc/intel_guc_fwif.h:   u32 execution_quantum;

./gt/uc/intel_guc_submission.c: desc->execution_quantum = 
engine->props.timeslice_duration_ms * 1000;


./gt/intel_engine_types.h:  unsigned long 
timeslice_duration_ms;


timeslice_store/preempt_timeout_store:
err = kstrtoull(buf, 0, &duration);

So both kconfig and sysfs can already overflow GuC, not only 
because of tick conversion internally but because at backend level 
nothing was done for assigning 64-bit into 32-bit. Or I failed to 
find where it is handled.
That's why I'm adding this range check to make sure we don't allow 
overflows.


Yes and no, this fixes it, but the first bug was not only due GuC 
internal tick conversion. It was present ever since the u64 from 
i915 was shoved into u32 sent to GuC. So even if GuC used the value 
without additional multiplication, bug was be there. My point being 
when GuC backend was added timeout_ms values should have been 
limited/clamped to U32_MAX. The tick discovery is additional limit 
on top.
I'm not disagreeing. I'm just saying that the truncation wasn't 
noticed until I actually tried using very long timeouts to debug a 
particular problem. Now that it is noticed, we need some method of 
range checking and this simple clamp solves all the truncation problems.


Agreed in principle, just please mention in the commit message all 
aspects of the problem.


I think we can get away without a Fixes: tag since it requires user 
fiddling to break things in unexpected ways.


I would though put in a code a clamping which expresses both, 
something like min(u32, ..GUC LIMIT..). So the full story is 
documented forever. Or "if > u32 || > ..GUC LIMIT..) return -EINVAL". 
Just in case GuC limit one day changes but u32 stays. Perhaps internal 
ticks go away or anything and we are left with plain 1:1 millisecond 
relationship.
Can certainly add a comment along the lines of "GuC API only takes a 
32bit field but that is further reduced to GUC_LIMIT due to internal 
calculations which would otherwise overflow".


But if the GuC limit is > u32 then, by definition, that means the GuC 
API has changed to take a u64 instead of a u32. So there will no u32 
truncation any more. So I'm not seeing a need to explicitly test the 
integer size when the value check covers that.





110 seconds. Rather than allowing the user to set higher values and
then get confused by early timeouts, add limits when setting these
values.


Btw who is reviewing GuC patches these days - things have somehow 
gotten pretty quiet in activity and I don't think that's due 
absence of stuff to improve or fix? Asking since I think I noticed 
a few already which you posted and then crickets on the mailing list.

Too much work to do and not enough engineers to do it all :(.





Signed-off-by: John Harrison 
---
  drivers/gpu/drm/i915/gt/intel_engine_cs.c   | 15 +++
  drivers/gpu/drm/i915/gt/sysfs_engines.c | 14 ++
  drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h |  9 +
  3 files changed, 38 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c

index e53008b4dd05..2a1e9f36e6f5 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -389,6 +389,21 @@ static int intel_engine_setup(struct 
intel_gt *gt, enum intel_engine_id id,

  if (GRAPHICS_VER(i915) == 12 && engine->class == RENDER_CLASS)
  engine->props.preempt_timeout_ms = 0;
  +    

Re: [Intel-gfx] [PATCH 2/3] drm/i915/gt: Make the heartbeat play nice with long pre-emption timeouts

2022-02-24 Thread John Harrison

On 2/24/2022 03:41, Tvrtko Ursulin wrote:

On 23/02/2022 20:00, John Harrison wrote:

On 2/23/2022 05:58, Tvrtko Ursulin wrote:

On 23/02/2022 02:45, John Harrison wrote:

On 2/22/2022 03:19, Tvrtko Ursulin wrote:

On 18/02/2022 21:33, john.c.harri...@intel.com wrote:

From: John Harrison 

Compute workloads are inherantly not pre-emptible for long 
periods on

current hardware. As a workaround for this, the pre-emption timeout
for compute capable engines was disabled. This is undesirable 
with GuC
submission as it prevents per engine reset of hung contexts. 
Hence the

next patch will re-enable the timeout but bumped up by an order of
magnititude.


(Some typos above.)

I'm spotting 'inherently' but not anything else.


Magnititude! O;)

Doh!

[snip]

Whereas, bumping all heartbeat periods to be greater than the 
pre-emption timeout is wasteful and unnecessary. That leads to a 
total heartbeat time of about a minute. Which is a very long time 
to wait for a hang to be detected and recovered. Especially when 
the official limit on a context responding to an 'are you dead' 
query is only 7.5 seconds.


Not sure how did you get one minute?
7.5 * 2 (to be safe) = 15. 15 * 5 (number of heartbeat periods) = 75 
=> 1 minute 15 seconds


Even ignoring any safety factor and just going with 7.5 * 5 still 
gets you to 37.5 seconds which is over a half a minute and likely to 
race.


Ah because my starting point is there should be no preempt timeout = 
heartbeat * 3, I just think that's too ugly.
Then complain at the hardware designers to give us mid-thread 
pre-emption back. The heartbeat is only one source of pre-emption 
events. For example, a user can be running multiple contexts in parallel 
and expecting them to time slice on a single engine. Or maybe a user is 
just running one compute task in the background but is doing render work 
in the foreground. Etc.


There was a reason the original hack was to disable pre-emption rather 
than increase the heartbeat. This is simply a slightly less ugly version 
of the same hack. And unfortunately, the basic idea of the hack is 
non-negotiable.


As per other comments, 'tP(RCS) = tH *3' or 'tP(RCS) = tP(default) * 12' 
or 'tP(RCS) = 7500' are the available options. Given that the heartbeat 
is the ever present hard limit, it seems most plausible to base the hack 
on that. Any of the others works, though. Although I think a explicit 
hardcoded value is the most ugly. I guess the other option is to add 
CONFIG_DRM_I915_PREEMPT_TIMEOUT_COMPUTE and default that to 7500.


Take your pick. But 640ms is not allowed.



Regardless, crux of argument was to avoid GuC engine reset and 
heartbeat reset racing with each other, and to do that by 
considering the preempt timeout with the heartbeat interval. I was 
thinking about this scenario in this series:


[Please use fixed width font and no line wrap to view.]

A)

tP = preempt timeout
tH = hearbeat interval

tP = 3 * tH

1) Background load = I915_PRIORITY_DISPLAY

<-- [tH] --> Pulse1 <-- [tH] --> Pulse2 <-- [tH] --> Pulse3 < [2 
* tH] > FULL RESET

   |
   \- preemption 
triggered, tP = 3 * tH --\

\-> preempt timeout would hit here

Here we have collateral damage due full reset, since we can't tell 
GuC to reset just one engine and we fudged tP just to "account" for 
heartbeats.
You are missing the whole point of the patch series which is that the 
last heartbeat period is '2 * tP' not '2 * tH'.

+        longer = READ_ONCE(engine->props.preempt_timeout_ms) * 2;

By making the last period double the pre-emption timeout, it is 
guaranteed that the FULL RESET stage cannot be hit before the 
hardware has attempted and timed-out on at least one pre-emption.


Oh well :) that probably means the overall scheme is too odd for me. 
tp = 3tH and last pulse after 2tP I mean.
To be accurate, it is 'tP(RCS) = 3 * tH(default); tH(final) = 
tP(current) * 2;'. Seems fairly straight forward to me. It's not a 
recursive definition or anything like that. It gives us a total 
heartbeat timeout that is close to the original version but still allows 
at least one pre-emption event.





[snip]


<-- [tH] --> Pulse1 <-- [tH] --> Pulse2 <-- [tH] --> Pulse3 < [2 
* tH] > full reset would be here

   |
   \- preemption triggered, tP = 3 * tH \
\-> Preempt timeout reset

Here is is kind of least worse, but question is why we fudged tP 
when it gives us nothing good in this case.


The point of fudging tP(RCS) is to give compute workloads longer to 
reach a pre-emptible point (given that EU walkers are basically not 
pre-emptible). The reason for doing the fudge is not connected to the 
heartbeat at all. The fact that it ca

Re: [Intel-gfx] [PATCH 1/3] drm/i915/guc: Limit scheduling properties to avoid overflow

2022-02-24 Thread John Harrison

On 2/24/2022 11:19, John Harrison wrote:

[snip]

I'll change it to _uses_ and repost, then.


[    7.683149] kernel BUG at drivers/gpu/drm/i915/gt/uc/intel_guc.h:367!

Told you that one went bang.

John.



Re: [Intel-gfx] [PATCH 0/3] Improve anti-pre-emption w/a for compute workloads

2022-02-24 Thread John Harrison

On 2/23/2022 04:00, Tvrtko Ursulin wrote:

On 23/02/2022 02:22, John Harrison wrote:

On 2/22/2022 01:53, Tvrtko Ursulin wrote:

On 18/02/2022 21:33, john.c.harri...@intel.com wrote:

From: John Harrison 

Compute workloads are inherently not pre-emptible on current hardware.
Thus the pre-emption timeout was disabled as a workaround to prevent
unwanted resets. Instead, the hang detection was left to the heartbeat
and its (longer) timeout. This is undesirable with GuC submission as
the heartbeat is a full GT reset rather than a per engine reset and so
is much more destructive. Instead, just bump the pre-emption timeout


Can we have a feature request to allow asking GuC for an engine reset?

For what purpose?


To allow "stopped heartbeat" to reset the engine, however..

GuC manages the scheduling of contexts across engines. With virtual 
engines, the KMD has no knowledge of which engine a context might be 
executing on. Even without virtual engines, the KMD still has no 
knowledge of which context is currently executing on any given engine 
at any given time.


There is a reason why hang detection should be left to the entity 
that is doing the scheduling. Any other entity is second guessing at 
best.


The reason for keeping the heartbeat around even when GuC submission 
is enabled is for the case where the KMD/GuC have got out of sync 
with either other somehow or GuC itself has just crashed. I.e. when 
no submission at all is working and we need to reset the GuC itself 
and start over.


.. I wasn't really up to speed to know/remember heartbeats are nerfed 
already in GuC mode.
Not sure what you mean by that claim. Engine resets are handled by GuC 
because GuC handles the scheduling. You can't do the former if you 
aren't doing the latter. However, the heartbeat is still present and is 
still the watchdog by which engine resets are triggered. As per the rest 
of the submission process, the hang detection and recovery is split 
between i915 and GuC.





I am not sure it was the best way since full reset penalizes everyone 
for one hanging engine. Better question would be why leave heartbeats 
around to start with with GuC? If you want to use it to health check 
GuC, as you say, maybe just send a CT message and expect replies? Then 
full reset would make sense. It also achieves the goal of not 
seconding guessing the submission backend you raise.
Adding yet another hang detection mechanism is yet more complication and 
is unnecessary when we already have one that works well enough. As 
above, the heartbeat is still required for sending the pulses that cause 
pre-emptions and so let GuC detect hangs. It also provides a fallback 
against a dead GuC by default. So why invent yet another wheel?




Like it is now, and the need for this series demonstrates it, the 
whole thing has a pretty poor "impedance" match. Not even sure what 
intel_guc_find_hung_context is doing in intel_engine_hearbeat.c - why 
is that not in intel_gt_handle_error at least? Why is hearbeat code 
special and other callers of intel_gt_handle_error don't need it?
There is no guilty context if the reset was triggered via debugfs or 
similar. And as stated ad nauseam, i915 is no longer handling the 
scheduling and so cannot make assumptions about what is or is not 
running on the hardware at any given time. And obviously, if the reset 
initiated by GuC itself then i915 should not be second guessing the 
guilty context as the GuC notification has already told us who was 
responsible.


And to be clear, the 'poor impedance match' is purely because we don't 
have mid-thread pre-emption and so need a stupidly huge timeout on 
compute capable engines. Whereas, we don't want a total heatbeat timeout 
of a minute or more. That is the impedance mis-match. If the 640ms was 
acceptable for RCS then none of this hacky timeout algorithm mush would 
be needed.


John.




Regards,

Tvrtko




Re: [Intel-gfx] [PATCH 5/8] drm/i915/guc: Move lrc desc setup to where it is needed

2022-02-24 Thread John Harrison

On 2/23/2022 18:03, Ceraolo Spurio, Daniele wrote:

On 2/23/2022 12:23 PM, John Harrison wrote:

On 2/22/2022 17:12, Ceraolo Spurio, Daniele wrote:

On 2/17/2022 3:52 PM, john.c.harri...@intel.com wrote:

From: John Harrison 

The LRC descriptor was being initialised early on in the context
registration sequence. It could then be determined that the actual
registration needs to be delayed and the descriptor would be wiped
out. This is inefficient, so move the setup to later in the process
after the point of no return.

Signed-off-by: John Harrison 
---
  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 11 +--
  1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c

index 0ab2d1a24bf6..aa74ec74194a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2153,6 +2153,8 @@ static int 
__guc_action_register_context(struct intel_guc *guc,

   0, loop);
  }
  +static void prepare_context_registration_info(struct 
intel_context *ce);

+
  static int register_context(struct intel_context *ce, bool loop)
  {
  struct intel_guc *guc = ce_to_guc(ce);
@@ -2163,6 +2165,8 @@ static int register_context(struct 
intel_context *ce, bool loop)

  GEM_BUG_ON(intel_context_is_child(ce));
  trace_intel_context_register(ce);
  +    prepare_context_registration_info(ce);
+
  if (intel_context_is_parent(ce))
  ret = __guc_action_register_multi_lrc(guc, ce, 
ce->guc_id.id,

    offset, loop);
@@ -2246,7 +2250,6 @@ static void 
prepare_context_registration_info(struct intel_context *ce)

  struct intel_context *child;
    GEM_BUG_ON(!engine->mask);
-    GEM_BUG_ON(!sched_state_is_init(ce));
    /*
   * Ensure LRC + CT vmas are is same region as write barrier 
is done
@@ -2314,9 +2317,13 @@ static int try_context_registration(struct 
intel_context *ce, bool loop)

  bool context_registered;
  int ret = 0;
  +    GEM_BUG_ON(!sched_state_is_init(ce));
+
  context_registered = ctx_id_mapped(guc, desc_idx);
  -    prepare_context_registration_info(ce);
+    if (context_registered)
+    clr_ctx_id_mapping(guc, desc_idx);
+    set_ctx_id_mapping(guc, desc_idx, ce);


I think we can do the clr unconditionally. Also, should we drop the 
clr/set pair in prepare_context_registration_info? it shouldn't be 
needed, unless I'm missing a path where we don;t pass through here.


Daniele

I don't believe so.

The point is that the context id might have changed (it got stolen, 
re-used, etc. - all the state machine code below can cause aborts and 
retries and such like if something is pending and the register needs 
to be delayed). So we need to clear out the old mapping and add a new 
one to be safe. Also, I'm not sure if it is safe to do a xa_store to 
an already used entry as an update or if you are supposed to clear it 
first? But that's what the code did before and I'm trying to not 
change any actual behaviour here.


I was comparing with previous behavior. before this patch, we only do 
the setting of the ctx_id here (inside 
prepare_context_registration_info) and you're not changing any of the 
abort/retry behavior, so if it was enough before it should be enough now.
Hmm, I think I must have confused myself with the intermediate steps 
along the way. Yes, it looks like the clr/set in prepare is redundant by 
the end.




Regarding the xa ops, we did an unconditional clear before, so it 
should be ok to just do the same and have the clear and set back to 
back without checking if the context ID was already in use or not.
Actually, I was thinking you meant to drop the clr completely rather 
than just drop the condition. Yeah, that sounds fine.


Will post an update.

John.



Daniele



John.




    /*
   * The context_lookup xarray is used to determine if the 
hardware










Re: [PATCH v5 1/4] drm/i915/guc: Add fetch of hwconfig table

2022-02-24 Thread John Harrison

On 2/22/2022 02:36, Jordan Justen wrote:

From: John Harrison 

Implement support for fetching the hardware description table from the
GuC. The call is made twice - once without a destination buffer to
query the size and then a second time to fill in the buffer.

Note that the table is only available on ADL-P and later platforms.

v5 (of Jordan's posting):
  * Various changes made by Jordan and recommended by Michal
- Makefile ordering
- Adjust "struct intel_guc_hwconfig hwconfig" comment
- Set Copyright year to 2022 in intel_guc_hwconfig.c/.h
- Drop inline from hwconfig_to_guc()
- Replace hwconfig param with guc in __guc_action_get_hwconfig()
- Move zero size check into guc_hwconfig_discover_size()
- Change comment to say zero size offset/size is needed to get size
- Add has_guc_hwconfig to devinfo and drop has_table()
- Change drm_err to notice in __uc_init_hw() and use %pe

Cc: Michal Wajdeczko 
Signed-off-by: Rodrigo Vivi 
Signed-off-by: John Harrison 
Reviewed-by: Matthew Brost 
Acked-by: Jon Bloomfield 
Signed-off-by: Jordan Justen 
---
  drivers/gpu/drm/i915/Makefile |   1 +
  .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |   1 +
  .../gpu/drm/i915/gt/uc/abi/guc_errors_abi.h   |   4 +
  drivers/gpu/drm/i915/gt/uc/intel_guc.h|   3 +
  .../gpu/drm/i915/gt/uc/intel_guc_hwconfig.c   | 145 ++
  .../gpu/drm/i915/gt/uc/intel_guc_hwconfig.h   |  19 +++
  drivers/gpu/drm/i915/gt/uc/intel_uc.c |   7 +
  drivers/gpu/drm/i915/i915_pci.c   |   1 +
  drivers/gpu/drm/i915/intel_device_info.h  |   1 +
  9 files changed, 182 insertions(+)
  create mode 100644 drivers/gpu/drm/i915/gt/uc/intel_guc_hwconfig.c
  create mode 100644 drivers/gpu/drm/i915/gt/uc/intel_guc_hwconfig.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index e9ce09620eb5..661f1afb51d7 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -188,6 +188,7 @@ i915-y += gt/uc/intel_uc.o \
  gt/uc/intel_guc_ct.o \
  gt/uc/intel_guc_debugfs.o \
  gt/uc/intel_guc_fw.o \
+ gt/uc/intel_guc_hwconfig.o \
  gt/uc/intel_guc_log.o \
  gt/uc/intel_guc_log_debugfs.o \
  gt/uc/intel_guc_rc.o \
diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h 
b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
index fe5d7d261797..4a61c819f32b 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
@@ -137,6 +137,7 @@ enum intel_guc_action {
INTEL_GUC_ACTION_ENGINE_FAILURE_NOTIFICATION = 0x1009,
INTEL_GUC_ACTION_SETUP_PC_GUCRC = 0x3004,
INTEL_GUC_ACTION_AUTHENTICATE_HUC = 0x4000,
+   INTEL_GUC_ACTION_GET_HWCONFIG = 0x4100,
INTEL_GUC_ACTION_REGISTER_CONTEXT = 0x4502,
INTEL_GUC_ACTION_DEREGISTER_CONTEXT = 0x4503,
INTEL_GUC_ACTION_REGISTER_COMMAND_TRANSPORT_BUFFER = 0x4505,
diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_errors_abi.h 
b/drivers/gpu/drm/i915/gt/uc/abi/guc_errors_abi.h
index 488b6061ee89..f9e2a6aaef4a 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_errors_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_errors_abi.h
@@ -8,6 +8,10 @@
  
  enum intel_guc_response_status {

INTEL_GUC_RESPONSE_STATUS_SUCCESS = 0x0,
+   INTEL_GUC_RESPONSE_NOT_SUPPORTED = 0x20,
+   INTEL_GUC_RESPONSE_NO_ATTRIBUTE_TABLE = 0x201,
+   INTEL_GUC_RESPONSE_NO_DECRYPTION_KEY = 0x202,
+   INTEL_GUC_RESPONSE_DECRYPTION_FAILED = 0x204,
INTEL_GUC_RESPONSE_STATUS_GENERIC_FAIL = 0xF000,
  };
  
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h

index f9240d4baa69..2058eb8c3d0c 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -13,6 +13,7 @@
  #include "intel_guc_fw.h"
  #include "intel_guc_fwif.h"
  #include "intel_guc_ct.h"
+#include "intel_guc_hwconfig.h"
  #include "intel_guc_log.h"
  #include "intel_guc_reg.h"
  #include "intel_guc_slpc_types.h"
@@ -37,6 +38,8 @@ struct intel_guc {
struct intel_guc_ct ct;
/** @slpc: sub-structure containing SLPC related data and objects */
struct intel_guc_slpc slpc;
+   /** @hwconfig: data related to hardware configuration KLV blob */
+   struct intel_guc_hwconfig hwconfig;
  
  	/** @sched_engine: Global engine used to submit requests to GuC */

struct i915_sched_engine *sched_engine;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_hwconfig.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_hwconfig.c
new file mode 100644
index ..ad289603460c
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_hwconfig.c
@@ -0,0 +1,145 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+#include "gt/intel_gt.h"
+#include "i915_drv.h"
+#in

Re: [Intel-gfx] [PATCH v5 1/4] drm/i915/guc: Add fetch of hwconfig table

2022-02-25 Thread John Harrison

On 2/25/2022 05:26, Tvrtko Ursulin wrote:

On 25/02/2022 09:44, Michal Wajdeczko wrote:

On 25.02.2022 06:03, Jordan Justen wrote:

John Harrison  writes:


On 2/22/2022 02:36, Jordan Justen wrote:

From: John Harrison 

Implement support for fetching the hardware description table from 
the

GuC. The call is made twice - once without a destination buffer to
query the size and then a second time to fill in the buffer.

Note that the table is only available on ADL-P and later platforms.

v5 (of Jordan's posting):
   * Various changes made by Jordan and recommended by Michal
 - Makefile ordering
 - Adjust "struct intel_guc_hwconfig hwconfig" comment
 - Set Copyright year to 2022 in intel_guc_hwconfig.c/.h
 - Drop inline from hwconfig_to_guc()
 - Replace hwconfig param with guc in __guc_action_get_hwconfig()
 - Move zero size check into guc_hwconfig_discover_size()
 - Change comment to say zero size offset/size is needed to 
get size

 - Add has_guc_hwconfig to devinfo and drop has_table()
 - Change drm_err to notice in __uc_init_hw() and use %pe

Cc: Michal Wajdeczko 
Signed-off-by: Rodrigo Vivi 
Signed-off-by: John Harrison 
Reviewed-by: Matthew Brost 
Acked-by: Jon Bloomfield 
Signed-off-by: Jordan Justen 
---
   +    ret = intel_guc_hwconfig_init(&guc->hwconfig);
+    if (ret)
+    drm_notice(&i915->drm, "Failed to retrieve hwconfig 
table: %pe\n",

Why only drm_notice? As you are keen to point out, the UMDs won't work
if the table is not available. All the failure paths in your own
verification function are 'drm_err'. So why is it only a 'notice' if
there is no table at all?


This was requested by Michal in my v3 posting:

https://patchwork.freedesktop.org/patch/472936/?series=99787&rev=3

I don't think that it should be a failure for i915 if it is unable to
read the table, or if the table read is invalid. I think it should 
be up

to the UMD to react to the missing hwconfig however they think is
appropriate, but I would like the i915 to guarantee & document the
format returned to userspace to whatever extent is feasible.

As you point out there is a discrepancy, and I think I should be
consistent with whatever is used here in my "drm/i915/guc: Verify
hwconfig blob matches supported format" patch.

I guess I'd tend to agree with Michal that "maybe drm_notice since we
continue probe", but I would go along with either if you two want to
discuss further.


having consistent message level is a clear benefit but on other hand
these other 'errors' may indicate more serious problems related to use
of wrong/incompatible firmware that returns corrupted HWconfig (or we
use wrong actions), while since we are not using this HWconfig in the
As stated ad nauseam, you can rule out 'corrupted' hwconfig. The GuC 
firmware is signed and will not load if it has become corrupted somehow. 
Likewise, a 'wrong/incompatible' firmware will refuse to load. So it is 
physically impossible for the later verification stage to ever find an 
error.




driver we don't care that much that we failed to load HWconfig and
'notice' is enough.

but I'm fine with all messages being drm_err (as we will not have to
change that once again after HWconfig will be mandatory for the driver
as well)


I would be against drm_err.

#define KERN_EMERG  KERN_SOH "0"    /* system is unusable */
#define KERN_ALERT  KERN_SOH "1"    /* action must be taken 
immediately */

#define KERN_CRIT   KERN_SOH "2"    /* critical conditions */
#define KERN_ERR    KERN_SOH "3"    /* error conditions */
#define KERN_WARNING    KERN_SOH "4"    /* warning conditions */
#define KERN_NOTICE KERN_SOH "5"    /* normal but significant 
condition */

#define KERN_INFO   KERN_SOH "6"    /* informational */
#define KERN_DEBUG  KERN_SOH "7"    /* debug-level messages */

From the point of view of the kernel driver, this is not an error to 
its operation. It can at most be a warning, but notice is also fine by 
me. One could argue when reading "normal but significant condition" 
that it is not normal, when it is in fact unexpected, so if people 
prefer warning that is also okay by me. I still lean towards notice 
becuase of the hands-off nature i915 has with the pass-through of this 
blob.
From the point of view of the KMD, i915 will load and be 'functional' 
if it can't talk to the hardware at all. The UMDs won't work at all but 
the driver load will be 'fine'. That's a requirement to be able to get 
the user to a software fallback desktop in order to work out why the 
hardware isn't working (e.g. no GuC firmware file). I would view this as 
similar. The KMD might have loaded but the UMDs are not functional. That 
is definitely an 

  1   2   3   4   5   6   >