Re: [Intel-gfx] [RFC PATCH 60/97] drm/i915: Track 'serial' counts for virtual engines

2021-06-01 Thread John Harrison

On 6/1/2021 02:31, Tvrtko Ursulin wrote:

On 27/05/2021 18:01, John Harrison wrote:

On 5/27/2021 01:53, Tvrtko Ursulin wrote:

On 26/05/2021 19:45, John Harrison wrote:

On 5/26/2021 01:40, Tvrtko Ursulin wrote:

On 25/05/2021 18:52, Matthew Brost wrote:

On Tue, May 25, 2021 at 11:16:12AM +0100, Tvrtko Ursulin wrote:


On 06/05/2021 20:14, Matthew Brost wrote:

From: John Harrison 

The serial number tracking of engines happens at the backend of
request submission and was expecting to only be given physical
engines. However, in GuC submission mode, the decomposition of 
virtual

to physical engines does not happen in i915. Instead, requests are
submitted to their virtual engine mask all the way through to the
hardware (i.e. to GuC). This would mean that the heart beat code
thinks the physical engines are idle due to the serial number not
incrementing.

This patch updates the tracking to decompose virtual engines into
their physical constituents and tracks the request against 
each. This
is not entirely accurate as the GuC will only be issuing the 
request
to one physical engine. However, it is the best that i915 can 
do given

that it has no knowledge of the GuC's scheduling decisions.


Commit text sounds a bit defeatist. I think instead of making up 
the serial
counts, which has downsides (could you please document in the 
commit what

they are), we should think how to design things properly.



IMO, I don't think fixing serial counts is the scope of this 
series. We
should focus on getting GuC submission in not cleaning up all the 
crap
that is in the i915. Let's make a note of this though so we can 
revisit

later.


I will say again - commit message implies it is introducing an 
unspecified downside by not fully fixing an also unspecified 
issue. It is completely reasonable, and customary even, to ask for 
both to be documented in the commit message.
Not sure what exactly is 'unspecified'. I thought the commit 
message described both the problem (heartbeat not running when 
using virtual engines) and the result (heartbeat running on more 
engines than strictly necessary). But in greater detail...


The serial number tracking is a hack for the heartbeat code to know 
whether an engine is busy or idle, and therefore whether it should 
be pinged for aliveness. Whenever a submission is made to an 
engine, the serial number is incremented. The heartbeat code keeps 
a copy of the value. If the value has changed, the engine is busy 
and needs to be pinged.


This works fine for execlist mode where virtual engine 
decomposition is done inside i915. It fails miserably for GuC mode 
where the decomposition is done by the hardware. The reason being 
that the heartbeat code only looks at physical engines but the 
serial count is only incremented on the virtual engine. Thus, the 
heartbeat sees everything as idle and does not ping.


So hangcheck does not work. Or it works because GuC does it anyway. 
Either way, that's one thing to explicitly state in the commit message.


This patch decomposes the virtual engines for the sake of 
incrementing the serial count on each sub-engine in order to keep 
the heartbeat code happy. The downside is that now the heartbeat 
sees all sub-engines as busy rather than only the one the 
submission actually ends up on. There really isn't much that can be 
done about that. The heartbeat code is in i915 not GuC, the 
scheduler is in GuC not i915. The only way to improve it is to 
either move the heartbeat code into GuC as well and completely 
disable the i915 side, or add some way for i915 to interrogate GuC 
as to which engines are or are not active. Technically, we do have 
both. GuC has (or at least had) an option to force a context switch 
on every execution quantum pre-emption. However, that is much, 
much, more heavy weight than the heartbeat. For the latter, we do 
(almost) have the engine usage statistics for PMU and such like. 
I'm not sure how much effort it would be to wire that up to the 
heartbeat code instead of using the serial count.


In short, the serial count is ever so slightly inefficient in that 
it causes heartbeat pings on engines which are idle. On the other 
hand, it is way more efficient and simpler than the current 
alternatives.


And the hack to make hangcheck work creates this inefficiency where 
heartbeats are sent to idle engines. Which is probably fine just 
needs to be explained.



Does that answer the questions?


With the two points I re-raise clearly explained, possibly even 
patch title changed, yeah. I am just wanting for it to be more 
easily obvious to patch reader what it is functionally about - not 
just what implementation details have been change but why as well.


My understanding is that we don't explain every piece of code in 
minute detail in every checkin email that touches it. I thought my 
description was already pretty verbose. I've certainly seen way less 
informative check

Re: [Intel-gfx] [PATCH 07/13] drm/i915/guc: New definition of the CTB registration action

2021-06-09 Thread John Harrison

On 6/7/2021 18:23, Daniele Ceraolo Spurio wrote:

On 6/7/2021 11:03 AM, Matthew Brost wrote:

From: Michal Wajdeczko 

Definition of the CTB registration action has changed.
Add some ABI documentation and implement required changes.

Signed-off-by: Michal Wajdeczko 
Signed-off-by: Matthew Brost 
Cc: Piotr Piórkowski  #4
---
  .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  | 107 ++
  .../gt/uc/abi/guc_communication_ctb_abi.h |   4 -
  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |  76 -
  3 files changed, 152 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h 
b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h

index 90efef8a73e4..6426fc183692 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
@@ -6,6 +6,113 @@
  #ifndef _ABI_GUC_ACTIONS_ABI_H
  #define _ABI_GUC_ACTIONS_ABI_H
  +/**
+ * DOC: HOST2GUC_REGISTER_CTB
+ *
+ * This message is used as part of the `CTB based communication`_ 
setup.

+ *
+ * This message must be sent as `MMIO HXG Message`_.
+ *
+ * 
+---+---+--+
+ *  |   | Bits  | 
Description  |
+ * 
+===+===+==+
+ *  | 0 |    31 | ORIGIN = 
GUC_HXG_ORIGIN_HOST_    |
+ *  | 
+---+--+
+ *  |   | 30:28 | TYPE = 
GUC_HXG_TYPE_REQUEST_ |
+ *  | 
+---+--+
+ *  |   | 27:16 | DATA0 = 
MBZ  |
+ *  | 
+---+--+
+ *  |   |  15:0 | ACTION = _`GUC_ACTION_HOST2GUC_REGISTER_CTB` = 
0x5200    |


Specs says 4505

+ * 
+---+---+--+
+ *  | 1 | 31:12 | RESERVED = 
MBZ   |
+ *  | 
+---+--+
+ *  |   |  11:8 | **TYPE** - type for the `CT 
Buffer`_ |
+ *  |   | 
|  |
+ *  |   |   |   - _`GUC_CTB_TYPE_HOST2GUC` = 
0 |
+ *  |   |   |   - _`GUC_CTB_TYPE_GUC2HOST` = 
1 |
+ *  | 
+---+--+
+ *  |   |   7:0 | **SIZE** - size of the `CT Buffer`_ in 4K units 
minus 1  |
+ * 
+---+---+--+
+ *  | 2 |  31:0 | **DESC_ADDR** - GGTT address of the `CTB 
Descriptor`_    |
+ * 
+---+---+--+
+ *  | 3 |  31:0 | **BUFF_ADDF** - GGTT address of the `CT 
Buffer`_ |
+ * 
+---+---+--+

+*
+ * 
+---+---+--+
+ *  |   | Bits  | 
Description  |
+ * 
+===+===+==+
+ *  | 0 |    31 | ORIGIN = 
GUC_HXG_ORIGIN_GUC_ |
+ *  | 
+---+--+
+ *  |   | 30:28 | TYPE = 
GUC_HXG_TYPE_RESPONSE_SUCCESS_    |
+ *  | 
+---+--+
+ *  |   |  27:0 | DATA0 = 
MBZ  |
+ * 
+---+---+--+

+ */
+#define GUC_ACTION_HOST2GUC_REGISTER_CTB    0x4505 // FIXME 0x5200


Why FIXME? AFAICS the specs still says 4505, even if we plan to update 
at some point I don;t think this deserves a FIXME since nothing is 
incorrect.



+
+#define HOST2GUC_REGISTER_CTB_REQUEST_MSG_LEN 
(GUC_HXG_REQUEST_MSG_MIN_LEN + 3u)
+#define HOST2GUC_REGISTER_CTB_REQUEST_MSG_0_MBZ 
GUC_HXG_REQUEST_MSG_0_DATA0

+#define HOST2GUC_REGISTER_CTB_REQUEST_MSG_1_MBZ    (0xf << 12)
+#define HOST2GUC_REGISTER_CTB_REQUEST_MSG_1_TYPE    (0xf << 8)
+#define   GUC_CTB_TYPE_HOST2GUC    0u
+#define   GUC_CTB_TYPE_GUC2HOST    1u
+#define HOST2GUC_REGISTER_CTB_REQUEST_MSG_1_SIZE    (0xff << 0)
+#define HOST2GUC_REGISTER_CTB_REQUEST_MSG_2_DESC_ADDR 
GUC_HXG_REQUEST_MSG_n_DATAn
+#define HOST2GUC_REGISTER_CTB_REQUEST_MSG_3_BUFF_ADDR 
GUC_HXG_REQUEST_MSG_n_DATAn


The full mask still seems like overkill to me and I still think we 
should use BIT()/GENMASK() and a _MASK prefix, but not going to block 
on it.



+
+#define HOST2GUC_REGISTER_CTB_RESPONSE_MSG_LEN 
GUC_HXG_RESPONSE_MSG_MIN_LEN
+#define HOST2GUC_REGISTER_CTB_RESPONSE_MSG_0_MBZ 
GUC_HXG

Re: [Intel-gfx] [PATCH 01/26] drm/i915/guc: Move GuC guc_id allocation under submission state sub-struct

2021-10-06 Thread John Harrison

On 10/4/2021 15:06, Matthew Brost wrote:

Move guc_id allocation under submission state sub-struct as a future
patch will reuse the spin lock as a global submission state lock. Moving
this into sub-struct makes ownership of fields / lock clear.

Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/intel_context_types.h |  6 +-
  drivers/gpu/drm/i915/gt/uc/intel_guc.h| 26 +
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 56 ++-
  3 files changed, 47 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 12252c411159..e7e3984aab78 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -197,18 +197,18 @@ struct intel_context {
struct {
/**
 * @id: handle which is used to uniquely identify this context
-* with the GuC, protected by guc->contexts_lock
+* with the GuC, protected by guc->submission_state.lock
 */
u16 id;
/**
 * @ref: the number of references to the guc_id, when
 * transitioning in and out of zero protected by
-* guc->contexts_lock
+* guc->submission_state.lock
 */
atomic_t ref;
/**
 * @link: in guc->guc_id_list when the guc_id has no refs but is
-* still valid, protected by guc->contexts_lock
+* still valid, protected by guc->submission_state.lock
 */
struct list_head link;
} guc_id;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 5dd174babf7a..65b5e8eeef96 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -70,17 +70,21 @@ struct intel_guc {
void (*disable)(struct intel_guc *guc);
} interrupts;
  
-	/**

-* @contexts_lock: protects guc_ids, guc_id_list, ce->guc_id.id, and
-* ce->guc_id.ref when transitioning in and out of zero
-*/
-   spinlock_t contexts_lock;
-   /** @guc_ids: used to allocate unique ce->guc_id.id values */
-   struct ida guc_ids;
-   /**
-* @guc_id_list: list of intel_context with valid guc_ids but no refs
-*/
-   struct list_head guc_id_list;
+   struct {
+   /**
+* @lock: protects everything in submission_state
+*/
+   spinlock_t lock;
The old version also mentioned 'ce->guc_id.ref'. Should this not also 
mention that transition? Or was the old comment inaccurate. I'm not 
seeing any actual behaviour changes in the patch.




+   /**
+* @guc_ids: used to allocate new guc_ids
+*/
+   struct ida guc_ids;
+   /**
+* @guc_id_list: list of intel_context with valid guc_ids but no
+* refs
+*/
+   struct list_head guc_id_list;
+   } submission_state;
  
  	/**

 * @submission_supported: tracks whether we support GuC submission on
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index ba0de35f6323..ad5c18119d92 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -68,16 +68,16 @@
   * fence is used to stall all requests associated with this guc_id until the
   * corresponding G2H returns indicating the guc_id has been deregistered.
   *
- * guc_ids:
+ * submission_state.guc_ids:
   * Unique number associated with private GuC context data passed in during
   * context registration / submission / deregistration. 64k available. Simple 
ida
   * is used for allocation.
   *
- * Stealing guc_ids:
- * If no guc_ids are available they can be stolen from another context at
- * request creation time if that context is unpinned. If a guc_id can't be 
found
- * we punt this problem to the user as we believe this is near impossible to 
hit
- * during normal use cases.
+ * Stealing submission_state.guc_ids:
+ * If no submission_state.guc_ids are available they can be stolen from another
I would abbreviate this instance as well, submission_state.guc_id is 
quite the mouthful. Unless this somehow magically links back to the 
structure entry in the kerneldoc output?


John.


+ * context at request creation time if that context is unpinned. If a guc_id
+ * can't be found we punt this problem to the user as we believe this is near
+ * impossible to hit during normal use cases.
   *
   * Locking:
   * In the GuC submission code we have 3 basic spin locks which protect
@@ -89,7 +89,7 @@
   * sched_engine can be submitting at a time. Currently only one sched_engine 
is
   * used for all of 

Re: [Intel-gfx] [PATCH 02/26] drm/i915/guc: Take GT PM ref when deregistering context

2021-10-06 Thread John Harrison

On 10/4/2021 15:06, Matthew Brost wrote:

Taking a PM reference to prevent intel_gt_wait_for_idle from short
circuiting while a deregister context H2G is in flight. To do this must
issue the deregister H2G from a worker as context can be destroyed from
an atomic context and taking GT PM ref blows up. Previously we took a
runtime PM from this atomic context which worked but will stop working
once runtime pm autosuspend in enabled.

So this patch is two fold, stop intel_gt_wait_for_idle from short
circuting and fix runtime pm autosuspend.

v2:
  (John Harrison)
   - Split structure changes out in different patch
  (Tvrtko)
   - Don't drop lock in deregister_destroyed_contexts

Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/intel_context.c   |   2 +
  drivers/gpu/drm/i915/gt/intel_context_types.h |   7 +
  drivers/gpu/drm/i915/gt/intel_engine_pm.h |   5 +
  drivers/gpu/drm/i915/gt/intel_gt_pm.h |   4 +
  drivers/gpu/drm/i915/gt/uc/intel_guc.h|  11 ++
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 146 +++---
  6 files changed, 121 insertions(+), 54 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index e9a0cad5c34d..1076066f41e0 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -399,6 +399,8 @@ intel_context_init(struct intel_context *ce, struct 
intel_engine_cs *engine)
ce->guc_id.id = GUC_INVALID_LRC_ID;
INIT_LIST_HEAD(&ce->guc_id.link);
  
+	INIT_LIST_HEAD(&ce->destroyed_link);

+
/*
 * Initialize fence to be complete as this is expected to be complete
 * unless there is a pending schedule disable outstanding.
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index e7e3984aab78..4613d027cbc3 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -213,6 +213,13 @@ struct intel_context {
struct list_head link;
} guc_id;
  
+	/**

+* @destroyed_link: link in guc->submission_state.destroyed_contexts, in
+* list when context is pending to be destroyed (deregistered with the
+* GuC), protected by guc->submission_state.lock
+*/
+   struct list_head destroyed_link;
+
  #ifdef CONFIG_DRM_I915_SELFTEST
/**
 * @drop_schedule_enable: Force drop of schedule enable G2H for selftest
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.h 
b/drivers/gpu/drm/i915/gt/intel_engine_pm.h
index 8520c595f5e1..6fdeae668e6e 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.h
@@ -16,6 +16,11 @@ intel_engine_pm_is_awake(const struct intel_engine_cs 
*engine)
return intel_wakeref_is_active(&engine->wakeref);
  }
  
+static inline void __intel_engine_pm_get(struct intel_engine_cs *engine)

+{
+   __intel_wakeref_get(&engine->wakeref);
+}
+
  static inline void intel_engine_pm_get(struct intel_engine_cs *engine)
  {
intel_wakeref_get(&engine->wakeref);
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h 
b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
index d0588d8aaa44..05de6c1af25b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
@@ -41,6 +41,10 @@ static inline void intel_gt_pm_put_async(struct intel_gt *gt)
intel_wakeref_put_async(>->wakeref);
  }
  
+#define with_intel_gt_pm(gt, tmp) \

+   for (tmp = 1, intel_gt_pm_get(gt); tmp; \
+intel_gt_pm_put(gt), tmp = 0)
+
  static inline int intel_gt_pm_wait_for_idle(struct intel_gt *gt)
  {
return intel_wakeref_wait_for_idle(>->wakeref);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 65b5e8eeef96..25a598e2b6e8 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -84,6 +84,17 @@ struct intel_guc {
 * refs
 */
struct list_head guc_id_list;
+   /**
+* @destroyed_contexts: list of contexts waiting to be destroyed
+* (deregistered with the GuC)
+*/
+   struct list_head destroyed_contexts;
+   /**
+* @destroyed_worker: worker to deregister contexts, need as we
+* need to take a GT PM reference and can't from destroy
+* function as it might be in an atomic context (no sleeping)
+*/
+   struct work_struct destroyed_worker;
} submission_state;
  
  	/**

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index ad5c18119d92..17da2fea1bff 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc

Re: [Intel-gfx] [PATCH 03/26] drm/i915/guc: Take engine PM when a context is pinned with GuC submission

2021-10-06 Thread John Harrison

On 10/4/2021 15:06, Matthew Brost wrote:

Taking a PM reference to prevent intel_gt_wait_for_idle from short
circuiting while a scheduling of user context could be enabled.
I'm not sure what 'while a scheduling of user context could be enabled' 
means.


John.


Returning GT idle when it is not can cause all sorts of issues
throughout the stack.

v2:
  (Daniel Vetter)
   - Add might_lock annotations to pin / unpin function
v3:
  (CI)
   - Drop intel_engine_pm_might_put from unpin path as an async put is
     used
v4:
  (John Harrison)
   - Make intel_engine_pm_might_get/put work with GuC virtual engines
   - Update commit message

Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/intel_context.c   |  2 ++
  drivers/gpu/drm/i915/gt/intel_engine_pm.h | 32 +
  drivers/gpu/drm/i915/gt/intel_gt_pm.h | 10 ++
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 36 +--
  drivers/gpu/drm/i915/intel_wakeref.h  | 12 +++
  5 files changed, 89 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index 1076066f41e0..f601323b939f 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -240,6 +240,8 @@ int __intel_context_do_pin_ww(struct intel_context *ce,
if (err)
goto err_post_unpin;
  
+	intel_engine_pm_might_get(ce->engine);

+
if (unlikely(intel_context_is_closed(ce))) {
err = -ENOENT;
goto err_unlock;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.h 
b/drivers/gpu/drm/i915/gt/intel_engine_pm.h
index 6fdeae668e6e..d68675925b79 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.h
@@ -6,9 +6,11 @@
  #ifndef INTEL_ENGINE_PM_H
  #define INTEL_ENGINE_PM_H
  
+#include "i915_drv.h"

  #include "i915_request.h"
  #include "intel_engine_types.h"
  #include "intel_wakeref.h"
+#include "intel_gt_pm.h"
  
  static inline bool

  intel_engine_pm_is_awake(const struct intel_engine_cs *engine)
@@ -31,6 +33,21 @@ static inline bool intel_engine_pm_get_if_awake(struct 
intel_engine_cs *engine)
return intel_wakeref_get_if_active(&engine->wakeref);
  }
  
+static inline void intel_engine_pm_might_get(struct intel_engine_cs *engine)

+{
+   if (!intel_engine_is_virtual(engine)) {
+   intel_wakeref_might_get(&engine->wakeref);
+   } else {
+   struct intel_gt *gt = engine->gt;
+   struct intel_engine_cs *tengine;
+   intel_engine_mask_t tmp, mask = engine->mask;
+
+   for_each_engine_masked(tengine, gt, mask, tmp)
+   intel_wakeref_might_get(&tengine->wakeref);
+   }
+   intel_gt_pm_might_get(engine->gt);
+}
+
  static inline void intel_engine_pm_put(struct intel_engine_cs *engine)
  {
intel_wakeref_put(&engine->wakeref);
@@ -52,6 +69,21 @@ static inline void intel_engine_pm_flush(struct 
intel_engine_cs *engine)
intel_wakeref_unlock_wait(&engine->wakeref);
  }
  
+static inline void intel_engine_pm_might_put(struct intel_engine_cs *engine)

+{
+   if (!intel_engine_is_virtual(engine)) {
+   intel_wakeref_might_put(&engine->wakeref);
+   } else {
+   struct intel_gt *gt = engine->gt;
+   struct intel_engine_cs *tengine;
+   intel_engine_mask_t tmp, mask = engine->mask;
+
+   for_each_engine_masked(tengine, gt, mask, tmp)
+   intel_wakeref_might_put(&tengine->wakeref);
+   }
+   intel_gt_pm_might_put(engine->gt);
+}
+
  static inline struct i915_request *
  intel_engine_create_kernel_request(struct intel_engine_cs *engine)
  {
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h 
b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
index 05de6c1af25b..bc898df7a48c 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
@@ -31,6 +31,11 @@ static inline bool intel_gt_pm_get_if_awake(struct intel_gt 
*gt)
return intel_wakeref_get_if_active(>->wakeref);
  }
  
+static inline void intel_gt_pm_might_get(struct intel_gt *gt)

+{
+   intel_wakeref_might_get(>->wakeref);
+}
+
  static inline void intel_gt_pm_put(struct intel_gt *gt)
  {
intel_wakeref_put(>->wakeref);
@@ -41,6 +46,11 @@ static inline void intel_gt_pm_put_async(struct intel_gt *gt)
intel_wakeref_put_async(>->wakeref);
  }
  
+static inline void intel_gt_pm_might_put(struct intel_gt *gt)

+{
+   intel_wakeref_might_put(>->wakeref);
+}
+
  #define with_intel_gt_pm(gt, tmp) \
for (tmp = 1, intel_gt_pm_get(gt); tmp; \
 intel_gt_pm_put(gt), tmp = 0)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/d

Re: [Intel-gfx] [PATCH 04/26] drm/i915/guc: Don't call switch_to_kernel_context with GuC submission

2021-10-06 Thread John Harrison

On 10/4/2021 15:06, Matthew Brost wrote:

Calling switch_to_kernel_context isn't needed if the engine PM reference
is taken while all user contexts are pinned as if don't have PM ref that
guarantees that all user contexts scheduling is disabled. By not calling
switch_to_kernel_context we save on issuing a request to the engine.

v2:
  (Daniel Vetter)
   - Add FIXME comment about pushing switch_to_kernel_context to backend
v3:
  (John Harrison)
   - Update commit message
   - Fix workding comment

Signed-off-by: Matthew Brost 
Reviewed-by: Daniel Vetter 

Reviewed-by: John Harrison 


---
  drivers/gpu/drm/i915/gt/intel_engine_pm.c | 13 +
  1 file changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c 
b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
index dacd62773735..a1334b48dde7 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
@@ -162,6 +162,19 @@ static bool switch_to_kernel_context(struct 
intel_engine_cs *engine)
unsigned long flags;
bool result = true;
  
+	/*

+* This is execlist specific behaviour intended to ensure the GPU is
+* idle by switching to a known 'safe' context. With GuC submission, the
+* same idle guarantee is achieved by other means (disabling
+* scheduling). Further, switching to a 'safe' context has no effect
+* with GuC submission as the scheduler can just switch back again.
+*
+* FIXME: Move this backend scheduler specific behaviour into the
+* scheduler backend.
+*/
+   if (intel_engine_uses_guc(engine))
+   return true;
+
/* GPU is pointing to the void, as good as in the kernel context. */
if (intel_gt_is_wedged(engine->gt))
return true;




Re: [Intel-gfx] [PATCH 01/26] drm/i915/guc: Move GuC guc_id allocation under submission state sub-struct

2021-10-07 Thread John Harrison

On 10/7/2021 08:05, Matthew Brost wrote:

On Wed, Oct 06, 2021 at 08:06:41PM -0700, John Harrison wrote:

On 10/4/2021 15:06, Matthew Brost wrote:

Move guc_id allocation under submission state sub-struct as a future
patch will reuse the spin lock as a global submission state lock. Moving
this into sub-struct makes ownership of fields / lock clear.

Signed-off-by: Matthew Brost 
---
   drivers/gpu/drm/i915/gt/intel_context_types.h |  6 +-
   drivers/gpu/drm/i915/gt/uc/intel_guc.h| 26 +
   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 56 ++-
   3 files changed, 47 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 12252c411159..e7e3984aab78 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -197,18 +197,18 @@ struct intel_context {
struct {
/**
 * @id: handle which is used to uniquely identify this context
-* with the GuC, protected by guc->contexts_lock
+* with the GuC, protected by guc->submission_state.lock
 */
u16 id;
/**
 * @ref: the number of references to the guc_id, when
 * transitioning in and out of zero protected by
-* guc->contexts_lock
+* guc->submission_state.lock
 */
atomic_t ref;
/**
 * @link: in guc->guc_id_list when the guc_id has no refs but is
-* still valid, protected by guc->contexts_lock
+* still valid, protected by guc->submission_state.lock
 */
struct list_head link;
} guc_id;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 5dd174babf7a..65b5e8eeef96 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -70,17 +70,21 @@ struct intel_guc {
void (*disable)(struct intel_guc *guc);
} interrupts;
-   /**
-* @contexts_lock: protects guc_ids, guc_id_list, ce->guc_id.id, and
-* ce->guc_id.ref when transitioning in and out of zero
-*/
-   spinlock_t contexts_lock;
-   /** @guc_ids: used to allocate unique ce->guc_id.id values */
-   struct ida guc_ids;
-   /**
-* @guc_id_list: list of intel_context with valid guc_ids but no refs
-*/
-   struct list_head guc_id_list;
+   struct {
+   /**
+* @lock: protects everything in submission_state
+*/
+   spinlock_t lock;

The old version also mentioned 'ce->guc_id.ref'. Should this not also
mention that transition? Or was the old comment inaccurate. I'm not seeing
any actual behaviour changes in the patch.



Can add that back in.


+   /**
+* @guc_ids: used to allocate new guc_ids
+*/
+   struct ida guc_ids;
+   /**
+* @guc_id_list: list of intel_context with valid guc_ids but no
+* refs
+*/
+   struct list_head guc_id_list;
+   } submission_state;
/**
 * @submission_supported: tracks whether we support GuC submission on
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index ba0de35f6323..ad5c18119d92 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -68,16 +68,16 @@
* fence is used to stall all requests associated with this guc_id until the
* corresponding G2H returns indicating the guc_id has been deregistered.
*
- * guc_ids:
+ * submission_state.guc_ids:
* Unique number associated with private GuC context data passed in during
* context registration / submission / deregistration. 64k available. Simple 
ida
* is used for allocation.
*
- * Stealing guc_ids:
- * If no guc_ids are available they can be stolen from another context at
- * request creation time if that context is unpinned. If a guc_id can't be 
found
- * we punt this problem to the user as we believe this is near impossible to 
hit
- * during normal use cases.
+ * Stealing submission_state.guc_ids:
+ * If no submission_state.guc_ids are available they can be stolen from another

I would abbreviate this instance as well, submission_state.guc_id is quite
the mouthful. Unless this somehow magically links back to the structure
entry in the kerneldoc output?


It might, not really sure but agree the submission_state should be
dropped. Think changed because of global find replace.

Matt

Okay. With those nits fixed:
Reviewed by: John Harrison 


John.


+ * context at request creation ti

Re: [Intel-gfx] [PATCH 05/26] drm/i915: Add logical engine mapping

2021-10-07 Thread John Harrison
ted to user space via
+* query IOCTL and used to communicate with the GuC in logical space.
+* The logical instance of a physical engine can change based on product
+* / fusing and defined in the bspec.
I would use 'and' rather than '/' when it line wraps like that. 
Otherwise, it looks like you tried to end the comment, but failed and 
then kept typing!


Also, not sure about 'and defined in the bspec'. I would just drop that 
line. I think 'based on product and fusing' is sufficient. Otherwise, 
you should be including the bspec link.


With that tweaked:
Reviewed-by: John Harrison 

John.


+*/
+   intel_engine_mask_t logical_mask;
  
  	u8 class;

u8 instance;
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 7147fe80919e..5ed1e222c308 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -3877,6 +3877,7 @@ execlists_create_virtual(struct intel_engine_cs 
**siblings, unsigned int count)
  
  		ve->siblings[ve->num_siblings++] = sibling;

ve->base.mask |= sibling->mask;
+   ve->base.logical_mask |= sibling->logical_mask;
  
  		/*

 * All physical engines must be compatible for their emission
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
index 2c6ea64af7ec..621c893a009f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
@@ -176,7 +176,7 @@ static void guc_mapping_table_init(struct intel_gt *gt,
for_each_engine(engine, gt, id) {
u8 guc_class = engine_class_to_guc_class(engine->class);
  
-		system_info->mapping_table[guc_class][engine->instance] =

+   
system_info->mapping_table[guc_class][ilog2(engine->logical_mask)] =
engine->instance;
}
  }
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 8b82da50c2bc..451d9ae861a6 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1423,23 +1423,6 @@ static int deregister_context(struct intel_context *ce, 
u32 guc_id)
return __guc_action_deregister_context(guc, guc_id);
  }
  
-static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask)

-{
-   switch (class) {
-   case RENDER_CLASS:
-   return mask >> RCS0;
-   case VIDEO_ENHANCEMENT_CLASS:
-   return mask >> VECS0;
-   case VIDEO_DECODE_CLASS:
-   return mask >> VCS0;
-   case COPY_ENGINE_CLASS:
-   return mask >> BCS0;
-   default:
-   MISSING_CASE(class);
-   return 0;
-   }
-}
-
  static void guc_context_policy_init(struct intel_engine_cs *engine,
struct guc_lrc_desc *desc)
  {
@@ -1481,8 +1464,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce, 
bool loop)
  
  	desc = __get_lrc_desc(guc, desc_idx);

desc->engine_class = engine_class_to_guc_class(engine->class);
-   desc->engine_submit_mask = adjust_engine_mask(engine->class,
- engine->mask);
+   desc->engine_submit_mask = engine->logical_mask;
desc->hw_context_desc = ce->lrc.lrca;
desc->priority = ce->guc_state.prio;
desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
@@ -3271,6 +3253,7 @@ guc_create_virtual(struct intel_engine_cs **siblings, 
unsigned int count)
}
  
  		ve->base.mask |= sibling->mask;

+   ve->base.logical_mask |= sibling->logical_mask;
  
  		if (n != 0 && ve->base.class != sibling->class) {

DRM_DEBUG("invalid mixing of engine class, sibling %d, 
already %d\n",




Re: [Intel-gfx] [PATCH 07/26] drm/i915/guc: Introduce context parent-child relationship

2021-10-07 Thread John Harrison

On 10/4/2021 15:06, Matthew Brost wrote:

Introduce context parent-child relationship. Once this relationship is
created all pinning / unpinning operations are directed to the parent
context. The parent context is responsible for pinning all of its'

No need for an apostrophe.


children and itself.

This is a precursor to the full GuC multi-lrc implementation but aligns
to how GuC mutli-lrc interface is defined - a single H2G is used
register / deregister all of the contexts simultaneously.

Subsequent patches in the series will implement the pinning / unpinning
operations for parent / child contexts.

v2:
  (Daniel Vetter)
   - Add kernel doc, add wrapper to access parent to ensure safety
v3:
  (John Harrison)
   - Fix comment explaing GEM_BUG_ON in to_parent()
   - Make variable names generic (non-GuC specific)

Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/intel_context.c   | 29 +
  drivers/gpu/drm/i915/gt/intel_context.h   | 41 +++
  drivers/gpu/drm/i915/gt/intel_context_types.h | 21 ++
  3 files changed, 91 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index f601323b939f..c5bb7ccfb3f8 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -403,6 +403,8 @@ intel_context_init(struct intel_context *ce, struct 
intel_engine_cs *engine)
  
  	INIT_LIST_HEAD(&ce->destroyed_link);
  
+	INIT_LIST_HEAD(&ce->parallel.child_list);

+
/*
 * Initialize fence to be complete as this is expected to be complete
 * unless there is a pending schedule disable outstanding.
@@ -417,10 +419,17 @@ intel_context_init(struct intel_context *ce, struct 
intel_engine_cs *engine)
  
  void intel_context_fini(struct intel_context *ce)

  {
+   struct intel_context *child, *next;
+
if (ce->timeline)
intel_timeline_put(ce->timeline);
i915_vm_put(ce->vm);
  
+	/* Need to put the creation ref for the children */

+   if (intel_context_is_parent(ce))
+   for_each_child_safe(ce, child, next)
+   intel_context_put(child);
+
mutex_destroy(&ce->pin_mutex);
i915_active_fini(&ce->active);
i915_sw_fence_fini(&ce->guc_state.blocked);
@@ -537,6 +546,26 @@ struct i915_request 
*intel_context_find_active_request(struct intel_context *ce)
return active;
  }
  
+void intel_context_bind_parent_child(struct intel_context *parent,

+struct intel_context *child)
+{
+   /*
+* Callers responsibility to validate that this function is used
+* correctly but we use GEM_BUG_ON here ensure that they do.
+*/
+   GEM_BUG_ON(!intel_engine_uses_guc(parent->engine));
+   GEM_BUG_ON(intel_context_is_pinned(parent));
+   GEM_BUG_ON(intel_context_is_child(parent));
+   GEM_BUG_ON(intel_context_is_pinned(child));
+   GEM_BUG_ON(intel_context_is_child(child));
+   GEM_BUG_ON(intel_context_is_parent(child));
+
+   parent->parallel.number_children++;
+   list_add_tail(&child->parallel.child_link,
+ &parent->parallel.child_list);
+   child->parallel.parent = parent;
+}
+
  #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
  #include "selftest_context.c"
  #endif
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h 
b/drivers/gpu/drm/i915/gt/intel_context.h
index c41098950746..b63c10a144af 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -44,6 +44,47 @@ void intel_context_free(struct intel_context *ce);
  int intel_context_reconfigure_sseu(struct intel_context *ce,
   const struct intel_sseu sseu);
  
+static inline bool intel_context_is_child(struct intel_context *ce)

+{
+   return !!ce->parallel.parent;
+}
+
+static inline bool intel_context_is_parent(struct intel_context *ce)
+{
+   return !!ce->parallel.number_children;
+}
+
+static inline bool intel_context_is_pinned(struct intel_context *ce);
+
+static inline struct intel_context *
+intel_context_to_parent(struct intel_context *ce)
+{
+   if (intel_context_is_child(ce)) {
+   /*
+* The parent holds ref count to the child so it is always safe
+* for the parent to access the child, but the child has a
+* pointer to the parent without a ref. To ensure this is safe
+* the child should only access the parent pointer while the
+* parent is pinned.
+*/
+   GEM_BUG_ON(!intel_context_is_pinned(ce->parallel.parent));
+
+   return ce->parallel.parent;
+   } else {
+   return ce;
+   }
+}
+
+void intel_context_bind_parent_child(struct intel_context *parent,
+

Re: [Intel-gfx] [PATCH 08/26] drm/i915/guc: Add multi-lrc context registration

2021-10-07 Thread John Harrison

On 10/4/2021 15:06, Matthew Brost wrote:

Add multi-lrc context registration H2G. In addition a workqueue and
process descriptor are setup during multi-lrc context registration as
these data structures are needed for multi-lrc submission.

v2:
  (John Harrison)
   - Move GuC specific fields into sub-struct
   - Clean up WQ defines
   - Add comment explaining math to derive WQ / PD address

Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/intel_context_types.h |  12 ++
  drivers/gpu/drm/i915/gt/intel_lrc.c   |   5 +
  .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |   1 +
  drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   2 -
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 114 +-
  5 files changed, 131 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 76dfca57cb45..48decb5ee954 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -239,6 +239,18 @@ struct intel_context {
struct intel_context *parent;
/** @number_children: number of children if parent */
u8 number_children;
+   /** @guc: GuC specific members for parallel submission */
+   struct {
+   /** @wqi_head: head pointer in work queue */
+   u16 wqi_head;
+   /** @wqi_tail: tail pointer in work queue */
+   u16 wqi_tail;
+   /**
+* @parent_page: page in context state (ce->state) used
+* by parent for work queue, process descriptor
+*/
+   u8 parent_page;
+   } guc;
} parallel;
  
  #ifdef CONFIG_DRM_I915_SELFTEST

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 3ef9eaf8c50e..57339d5c1fc8 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -942,6 +942,11 @@ __lrc_alloc_state(struct intel_context *ce, struct 
intel_engine_cs *engine)
context_size += PAGE_SIZE;
}
  
+	if (intel_context_is_parent(ce) && intel_engine_uses_guc(engine)) {

+   ce->parallel.guc.parent_page = context_size / PAGE_SIZE;
+   context_size += PAGE_SIZE;
+   }
+
obj = i915_gem_object_create_lmem(engine->i915, context_size,
  I915_BO_ALLOC_PM_VOLATILE);
if (IS_ERR(obj))
diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h 
b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
index 8ff58aff..ba10bd374cee 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
@@ -142,6 +142,7 @@ enum intel_guc_action {
INTEL_GUC_ACTION_REGISTER_COMMAND_TRANSPORT_BUFFER = 0x4505,
INTEL_GUC_ACTION_DEREGISTER_COMMAND_TRANSPORT_BUFFER = 0x4506,
INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE = 0x4600,
+   INTEL_GUC_ACTION_REGISTER_CONTEXT_MULTI_LRC = 0x4601,
INTEL_GUC_ACTION_RESET_CLIENT = 0x5507,
INTEL_GUC_ACTION_LIMIT
  };
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
index fa4be13c8854..0eeb2a9feeed 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -52,8 +52,6 @@
  
  #define GUC_DOORBELL_INVALID		256
  
-#define GUC_WQ_SIZE			(PAGE_SIZE * 2)

-
  /* Work queue item header definitions */
  #define WQ_STATUS_ACTIVE  1
  #define WQ_STATUS_SUSPENDED   2
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 451d9ae861a6..ab6d7fc1b0b1 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -344,6 +344,45 @@ static inline struct i915_priolist *to_priolist(struct 
rb_node *rb)
return rb_entry(rb, struct i915_priolist, node);
  }
  
+/*

+ * When using multi-lrc submission an extra page in the context state is
+ * reserved for the process descriptor and work queue.
+ *
+ * The layout of this page is below:
+ * 0   guc_process_desc
+ * ... unused
+ * PAGE_SIZE / 2   work queue start
+ * ... work queue
+ * PAGE_SIZE - 1   work queue end
+ */
+#define WQ_SIZE(PAGE_SIZE / 2)
+#define WQ_OFFSET  (PAGE_SIZE - WQ_SIZE)
I thought you were going with '#define PARENT_SCRATCH SIZE PAGE_SIZE' 
and then using that everywhere else? Unless there is a fundamental 
reason why the above must be exactly a page in size then I think the 
size should 

Re: [Intel-gfx] [PATCH 09/26] drm/i915/guc: Ensure GuC schedule operations do not operate on child contexts

2021-10-07 Thread John Harrison

On 10/4/2021 15:06, Matthew Brost wrote:

In GuC parent-child contexts the parent context controls the scheduling,
ensure only the parent does the scheduling operations.

Signed-off-by: Matthew Brost 

Reviewed-by: John Harrison 


---
  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 13 -
  1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index ab6d7fc1b0b1..1f2809187513 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -324,6 +324,12 @@ static inline void decr_context_committed_requests(struct 
intel_context *ce)
GEM_BUG_ON(ce->guc_state.number_committed_requests < 0);
  }
  
+static struct intel_context *

+request_to_scheduling_context(struct i915_request *rq)
+{
+   return intel_context_to_parent(rq->context);
+}
+
  static inline bool context_guc_id_invalid(struct intel_context *ce)
  {
return ce->guc_id.id == GUC_INVALID_LRC_ID;
@@ -1710,6 +1716,7 @@ static void __guc_context_sched_disable(struct intel_guc 
*guc,
  
  	GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID);
  
+	GEM_BUG_ON(intel_context_is_child(ce));

trace_intel_context_sched_disable(ce);
  
  	guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),

@@ -1935,6 +1942,8 @@ static void guc_context_sched_disable(struct 
intel_context *ce)
intel_wakeref_t wakeref;
u16 guc_id;
  
+	GEM_BUG_ON(intel_context_is_child(ce));

+
spin_lock_irqsave(&ce->guc_state.lock, flags);
  
  	/*

@@ -2303,6 +2312,8 @@ static void guc_signal_context_fence(struct intel_context 
*ce)
  {
unsigned long flags;
  
+	GEM_BUG_ON(intel_context_is_child(ce));

+
spin_lock_irqsave(&ce->guc_state.lock, flags);
clr_context_wait_for_deregister_to_register(ce);
__guc_signal_context_fence(ce);
@@ -2333,7 +2344,7 @@ static void guc_context_init(struct intel_context *ce)
  
  static int guc_request_alloc(struct i915_request *rq)

  {
-   struct intel_context *ce = rq->context;
+   struct intel_context *ce = request_to_scheduling_context(rq);
struct intel_guc *guc = ce_to_guc(ce);
unsigned long flags;
int ret;




Re: [Intel-gfx] [PATCH 10/26] drm/i915/guc: Assign contexts in parent-child relationship consecutive guc_ids

2021-10-07 Thread John Harrison

On 10/4/2021 15:06, Matthew Brost wrote:

Assign contexts in parent-child relationship consecutive guc_ids. This
is accomplished by partitioning guc_id space between ones that need to
be consecutive (1/16 available guc_ids) and ones that do not (15/16 of
available guc_ids). The consecutive search is implemented via the bitmap
API.

This is a precursor to the full GuC multi-lrc implementation but aligns
to how GuC mutli-lrc interface is defined - guc_ids must be consecutive
when using the GuC multi-lrc interface.

v2:
  (Daniel Vetter)
   - Explicitly state why we assign consecutive guc_ids
v3:
  (John Harrison)
   - Bring back in spin lock

Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/uc/intel_guc.h|   6 +-
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 104 ++
  2 files changed, 86 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 25a598e2b6e8..a9f4ec972bfb 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -76,9 +76,13 @@ struct intel_guc {
 */
spinlock_t lock;
/**
-* @guc_ids: used to allocate new guc_ids
+* @guc_ids: used to allocate new guc_ids, single-lrc
 */
struct ida guc_ids;
+   /**
+* @guc_ids_bitmap: used to allocate new guc_ids, multi-lrc
+*/
+   unsigned long *guc_ids_bitmap;
/**
 * @guc_id_list: list of intel_context with valid guc_ids but no
 * refs
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 1f2809187513..79e7732e83b2 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -128,6 +128,16 @@ guc_create_virtual(struct intel_engine_cs **siblings, 
unsigned int count);
  
  #define GUC_REQUEST_SIZE 64 /* bytes */
  
+/*

+ * We reserve 1/16 of the guc_ids for multi-lrc as these need to be contiguous
+ * per the GuC submission interface. A different allocation algorithm is used
+ * (bitmap vs. ida) between multi-lrc and single-lrc hence the reason to
+ * partition the guc_id space. We believe the number of multi-lrc contexts in
+ * use should be low and 1/16 should be sufficient. Minimum of 32 guc_ids for
+ * multi-lrc.
+ */
+#define NUMBER_MULTI_LRC_GUC_ID(GUC_MAX_LRC_DESCRIPTORS / 16)
+
  /*
   * Below is a set of functions which control the GuC scheduling state which
   * require a lock.
@@ -1206,6 +1216,11 @@ int intel_guc_submission_init(struct intel_guc *guc)
INIT_WORK(&guc->submission_state.destroyed_worker,
  destroyed_worker_func);
  
+	guc->submission_state.guc_ids_bitmap =

+   bitmap_zalloc(NUMBER_MULTI_LRC_GUC_ID, GFP_KERNEL);
+   if (!guc->submission_state.guc_ids_bitmap)
+   return -ENOMEM;
+
return 0;
  }
  
@@ -1217,6 +1232,7 @@ void intel_guc_submission_fini(struct intel_guc *guc)

guc_lrc_desc_pool_destroy(guc);
guc_flush_destroyed_contexts(guc);
i915_sched_engine_put(guc->sched_engine);
+   bitmap_free(guc->submission_state.guc_ids_bitmap);
  }
  
  static inline void queue_request(struct i915_sched_engine *sched_engine,

@@ -1268,18 +1284,43 @@ static void guc_submit_request(struct i915_request *rq)
spin_unlock_irqrestore(&sched_engine->lock, flags);
  }
  
-static int new_guc_id(struct intel_guc *guc)

+static int new_guc_id(struct intel_guc *guc, struct intel_context *ce)
  {
-   return ida_simple_get(&guc->submission_state.guc_ids, 0,
- GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL |
- __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
+   int ret;
+
+   GEM_BUG_ON(intel_context_is_child(ce));
+
+   if (intel_context_is_parent(ce))
+   ret = 
bitmap_find_free_region(guc->submission_state.guc_ids_bitmap,
+ NUMBER_MULTI_LRC_GUC_ID,
+ 
order_base_2(ce->parallel.number_children
+  + 1));
+   else
+   ret = ida_simple_get(&guc->submission_state.guc_ids,
+NUMBER_MULTI_LRC_GUC_ID,
+GUC_MAX_LRC_DESCRIPTORS,
+GFP_KERNEL | __GFP_RETRY_MAYFAIL |
+__GFP_NOWARN);
+   if (unlikely(ret < 0))
+   return ret;
+
+   ce->guc_id.id = ret;
+   return 0;
  }
  
  static void __release_guc_id(struct intel_guc *guc, struct intel_context *ce)

  {
+   GEM_BUG_ON(intel_context_is_child(ce));
+
if (!context_guc_id_i

Re: [Intel-gfx] [PATCH 03/26] drm/i915/guc: Take engine PM when a context is pinned with GuC submission

2021-10-07 Thread John Harrison

On 10/7/2021 08:19, Matthew Brost wrote:

On Wed, Oct 06, 2021 at 08:45:42PM -0700, John Harrison wrote:

On 10/4/2021 15:06, Matthew Brost wrote:

Taking a PM reference to prevent intel_gt_wait_for_idle from short
circuiting while a scheduling of user context could be enabled.

I'm not sure what 'while a scheduling of user context could be enabled'
means.


Not really sure how this isn't clear.

It means if a user context has scheduling enabled this function cannot
short circuit returning idle.

Matt
Okay. The 'a scheduling' was throwing me off. And I was reading 'could 
be enabled' as saying something that might happen in the future. English 
is great at being ambiguous ;). Maybe 'while any user context has 
scheduling enabled' would be simpler?


John.

  

John.


Returning GT idle when it is not can cause all sorts of issues
throughout the stack.

v2:
   (Daniel Vetter)
- Add might_lock annotations to pin / unpin function
v3:
   (CI)
- Drop intel_engine_pm_might_put from unpin path as an async put is
  used
v4:
   (John Harrison)
- Make intel_engine_pm_might_get/put work with GuC virtual engines
- Update commit message

Signed-off-by: Matthew Brost 
---
   drivers/gpu/drm/i915/gt/intel_context.c   |  2 ++
   drivers/gpu/drm/i915/gt/intel_engine_pm.h | 32 +
   drivers/gpu/drm/i915/gt/intel_gt_pm.h | 10 ++
   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 36 +--
   drivers/gpu/drm/i915/intel_wakeref.h  | 12 +++
   5 files changed, 89 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index 1076066f41e0..f601323b939f 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -240,6 +240,8 @@ int __intel_context_do_pin_ww(struct intel_context *ce,
if (err)
goto err_post_unpin;
+   intel_engine_pm_might_get(ce->engine);
+
if (unlikely(intel_context_is_closed(ce))) {
err = -ENOENT;
goto err_unlock;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.h 
b/drivers/gpu/drm/i915/gt/intel_engine_pm.h
index 6fdeae668e6e..d68675925b79 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.h
@@ -6,9 +6,11 @@
   #ifndef INTEL_ENGINE_PM_H
   #define INTEL_ENGINE_PM_H
+#include "i915_drv.h"
   #include "i915_request.h"
   #include "intel_engine_types.h"
   #include "intel_wakeref.h"
+#include "intel_gt_pm.h"
   static inline bool
   intel_engine_pm_is_awake(const struct intel_engine_cs *engine)
@@ -31,6 +33,21 @@ static inline bool intel_engine_pm_get_if_awake(struct 
intel_engine_cs *engine)
return intel_wakeref_get_if_active(&engine->wakeref);
   }
+static inline void intel_engine_pm_might_get(struct intel_engine_cs *engine)
+{
+   if (!intel_engine_is_virtual(engine)) {
+   intel_wakeref_might_get(&engine->wakeref);
+   } else {
+   struct intel_gt *gt = engine->gt;
+   struct intel_engine_cs *tengine;
+   intel_engine_mask_t tmp, mask = engine->mask;
+
+   for_each_engine_masked(tengine, gt, mask, tmp)
+   intel_wakeref_might_get(&tengine->wakeref);
+   }
+   intel_gt_pm_might_get(engine->gt);
+}
+
   static inline void intel_engine_pm_put(struct intel_engine_cs *engine)
   {
intel_wakeref_put(&engine->wakeref);
@@ -52,6 +69,21 @@ static inline void intel_engine_pm_flush(struct 
intel_engine_cs *engine)
intel_wakeref_unlock_wait(&engine->wakeref);
   }
+static inline void intel_engine_pm_might_put(struct intel_engine_cs *engine)
+{
+   if (!intel_engine_is_virtual(engine)) {
+   intel_wakeref_might_put(&engine->wakeref);
+   } else {
+   struct intel_gt *gt = engine->gt;
+   struct intel_engine_cs *tengine;
+   intel_engine_mask_t tmp, mask = engine->mask;
+
+   for_each_engine_masked(tengine, gt, mask, tmp)
+   intel_wakeref_might_put(&tengine->wakeref);
+   }
+   intel_gt_pm_might_put(engine->gt);
+}
+
   static inline struct i915_request *
   intel_engine_create_kernel_request(struct intel_engine_cs *engine)
   {
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h 
b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
index 05de6c1af25b..bc898df7a48c 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
@@ -31,6 +31,11 @@ static inline bool intel_gt_pm_get_if_awake(struct intel_gt 
*gt)
return intel_wakeref_get_if_active(>->wakeref);
   }
+static inline void intel_gt_pm_might_get(struct intel_gt *gt)
+{
+   intel_wakeref_might_get(>->wakeref);
+}
+

Re: [Intel-gfx] [PATCH 10/26] drm/i915/guc: Assign contexts in parent-child relationship consecutive guc_ids

2021-10-08 Thread John Harrison

On 10/7/2021 18:21, Matthew Brost wrote:

On Thu, Oct 07, 2021 at 03:03:04PM -0700, John Harrison wrote:

On 10/4/2021 15:06, Matthew Brost wrote:

Assign contexts in parent-child relationship consecutive guc_ids. This
is accomplished by partitioning guc_id space between ones that need to
be consecutive (1/16 available guc_ids) and ones that do not (15/16 of
available guc_ids). The consecutive search is implemented via the bitmap
API.

This is a precursor to the full GuC multi-lrc implementation but aligns
to how GuC mutli-lrc interface is defined - guc_ids must be consecutive
when using the GuC multi-lrc interface.

v2:
   (Daniel Vetter)
- Explicitly state why we assign consecutive guc_ids
v3:
   (John Harrison)
- Bring back in spin lock

Signed-off-by: Matthew Brost 
---
   drivers/gpu/drm/i915/gt/uc/intel_guc.h|   6 +-
   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 104 ++
   2 files changed, 86 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 25a598e2b6e8..a9f4ec972bfb 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -76,9 +76,13 @@ struct intel_guc {
 */
spinlock_t lock;
/**
-* @guc_ids: used to allocate new guc_ids
+* @guc_ids: used to allocate new guc_ids, single-lrc
 */
struct ida guc_ids;
+   /**
+* @guc_ids_bitmap: used to allocate new guc_ids, multi-lrc
+*/
+   unsigned long *guc_ids_bitmap;
/**
 * @guc_id_list: list of intel_context with valid guc_ids but no
 * refs
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 1f2809187513..79e7732e83b2 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -128,6 +128,16 @@ guc_create_virtual(struct intel_engine_cs **siblings, 
unsigned int count);
   #define GUC_REQUEST_SIZE 64 /* bytes */
+/*
+ * We reserve 1/16 of the guc_ids for multi-lrc as these need to be contiguous
+ * per the GuC submission interface. A different allocation algorithm is used
+ * (bitmap vs. ida) between multi-lrc and single-lrc hence the reason to
+ * partition the guc_id space. We believe the number of multi-lrc contexts in
+ * use should be low and 1/16 should be sufficient. Minimum of 32 guc_ids for
+ * multi-lrc.
+ */
+#define NUMBER_MULTI_LRC_GUC_ID(GUC_MAX_LRC_DESCRIPTORS / 16)
+
   /*
* Below is a set of functions which control the GuC scheduling state which
* require a lock.
@@ -1206,6 +1216,11 @@ int intel_guc_submission_init(struct intel_guc *guc)
INIT_WORK(&guc->submission_state.destroyed_worker,
  destroyed_worker_func);
+   guc->submission_state.guc_ids_bitmap =
+   bitmap_zalloc(NUMBER_MULTI_LRC_GUC_ID, GFP_KERNEL);
+   if (!guc->submission_state.guc_ids_bitmap)
+   return -ENOMEM;
+
return 0;
   }
@@ -1217,6 +1232,7 @@ void intel_guc_submission_fini(struct intel_guc *guc)
guc_lrc_desc_pool_destroy(guc);
guc_flush_destroyed_contexts(guc);
i915_sched_engine_put(guc->sched_engine);
+   bitmap_free(guc->submission_state.guc_ids_bitmap);
   }
   static inline void queue_request(struct i915_sched_engine *sched_engine,
@@ -1268,18 +1284,43 @@ static void guc_submit_request(struct i915_request *rq)
spin_unlock_irqrestore(&sched_engine->lock, flags);
   }
-static int new_guc_id(struct intel_guc *guc)
+static int new_guc_id(struct intel_guc *guc, struct intel_context *ce)
   {
-   return ida_simple_get(&guc->submission_state.guc_ids, 0,
- GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL |
- __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
+   int ret;
+
+   GEM_BUG_ON(intel_context_is_child(ce));
+
+   if (intel_context_is_parent(ce))
+   ret = 
bitmap_find_free_region(guc->submission_state.guc_ids_bitmap,
+ NUMBER_MULTI_LRC_GUC_ID,
+ 
order_base_2(ce->parallel.number_children
+  + 1));
+   else
+   ret = ida_simple_get(&guc->submission_state.guc_ids,
+NUMBER_MULTI_LRC_GUC_ID,
+GUC_MAX_LRC_DESCRIPTORS,
+GFP_KERNEL | __GFP_RETRY_MAYFAIL |
+__GFP_NOWARN);
+   if (unlikely(ret < 0))
+   return ret;
+
+   ce->guc_id.id = ret;
+   return 0;
   }
   static void __release_guc_id(struct intel_guc *gu

Re: [Intel-gfx] [PATCH 08/26] drm/i915/guc: Add multi-lrc context registration

2021-10-08 Thread John Harrison

On 10/7/2021 12:50, John Harrison wrote:

On 10/4/2021 15:06, Matthew Brost wrote:

Add multi-lrc context registration H2G. In addition a workqueue and
process descriptor are setup during multi-lrc context registration as
these data structures are needed for multi-lrc submission.

v2:
  (John Harrison)
   - Move GuC specific fields into sub-struct
   - Clean up WQ defines
   - Add comment explaining math to derive WQ / PD address

Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/intel_context_types.h |  12 ++
  drivers/gpu/drm/i915/gt/intel_lrc.c   |   5 +
  .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |   1 +
  drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   2 -
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 114 +-
  5 files changed, 131 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h

index 76dfca57cb45..48decb5ee954 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -239,6 +239,18 @@ struct intel_context {
  struct intel_context *parent;
  /** @number_children: number of children if parent */
  u8 number_children;
+    /** @guc: GuC specific members for parallel submission */
+    struct {
+    /** @wqi_head: head pointer in work queue */
+    u16 wqi_head;
+    /** @wqi_tail: tail pointer in work queue */
+    u16 wqi_tail;
PS: As per comments on previous rev, something somewhere needs to 
explicitly state what WQI means. One suggestion was to do that here, 
ideally with maybe a brief description of what the queue is, how it is 
used, etc. Although probably it would be better kept in a GuC specific 
file. E.g. added to guc_fwif.h in patch #12.


John.


+    /**
+ * @parent_page: page in context state (ce->state) used
+ * by parent for work queue, process descriptor
+ */
+    u8 parent_page;
+    } guc;
  } parallel;
    #ifdef CONFIG_DRM_I915_SELFTEST
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
b/drivers/gpu/drm/i915/gt/intel_lrc.c

index 3ef9eaf8c50e..57339d5c1fc8 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -942,6 +942,11 @@ __lrc_alloc_state(struct intel_context *ce, 
struct intel_engine_cs *engine)

  context_size += PAGE_SIZE;
  }
  +    if (intel_context_is_parent(ce) && 
intel_engine_uses_guc(engine)) {

+    ce->parallel.guc.parent_page = context_size / PAGE_SIZE;
+    context_size += PAGE_SIZE;
+    }
+
  obj = i915_gem_object_create_lmem(engine->i915, context_size,
    I915_BO_ALLOC_PM_VOLATILE);
  if (IS_ERR(obj))
diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h 
b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h

index 8ff58aff..ba10bd374cee 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
@@ -142,6 +142,7 @@ enum intel_guc_action {
  INTEL_GUC_ACTION_REGISTER_COMMAND_TRANSPORT_BUFFER = 0x4505,
  INTEL_GUC_ACTION_DEREGISTER_COMMAND_TRANSPORT_BUFFER = 0x4506,
  INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE = 0x4600,
+    INTEL_GUC_ACTION_REGISTER_CONTEXT_MULTI_LRC = 0x4601,
  INTEL_GUC_ACTION_RESET_CLIENT = 0x5507,
  INTEL_GUC_ACTION_LIMIT
  };
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h

index fa4be13c8854..0eeb2a9feeed 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -52,8 +52,6 @@
    #define GUC_DOORBELL_INVALID    256
  -#define GUC_WQ_SIZE    (PAGE_SIZE * 2)
-
  /* Work queue item header definitions */
  #define WQ_STATUS_ACTIVE    1
  #define WQ_STATUS_SUSPENDED    2
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c

index 451d9ae861a6..ab6d7fc1b0b1 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -344,6 +344,45 @@ static inline struct i915_priolist 
*to_priolist(struct rb_node *rb)

  return rb_entry(rb, struct i915_priolist, node);
  }
  +/*
+ * When using multi-lrc submission an extra page in the context 
state is

+ * reserved for the process descriptor and work queue.
+ *
+ * The layout of this page is below:
+ * 0    guc_process_desc
+ * ...    unused
+ * PAGE_SIZE / 2    work queue start
+ * ...    work queue
+ * PAGE_SIZE - 1    work queue end
+ */
+#define WQ_SIZE    (PAGE_SIZE / 2)
+#define WQ_OFFSET    (PAGE_SIZE - WQ_SIZE)
I thought you were going with '#define PARENT_SCRATCH SIZE PAGE_SIZE' 
and then using that everywhere else? Unless there is a fundamental

Re: [Intel-gfx] [PATCH 12/26] drm/i915/guc: Implement multi-lrc submission

2021-10-08 Thread John Harrison

On 10/4/2021 15:06, Matthew Brost wrote:

Implement multi-lrc submission via a single workqueue entry and single
H2G. The workqueue entry contains an updated tail value for each
request, of all the contexts in the multi-lrc submission, and updates
these values simultaneously. As such, the tasklet and bypass path have
been updated to coalesce requests into a single submission.

v2:
  (John Harrison)
   - s/wqe/wqi
   - Use FIELD_PREP macros
   - Add GEM_BUG_ONs ensures length fits within field
   - Add comment / white space to intel_guc_write_barrier
  (Kernel test robot)
   - Make need_tasklet a static function

Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/uc/intel_guc.c|  26 ++
  drivers/gpu/drm/i915/gt/uc/intel_guc.h|   8 +
  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |  24 +-
  drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |  23 +-
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 319 --
  drivers/gpu/drm/i915/i915_request.h   |   8 +
  6 files changed, 335 insertions(+), 73 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index 8f8182bf7c11..7191e8439290 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -756,3 +756,29 @@ void intel_guc_load_status(struct intel_guc *guc, struct 
drm_printer *p)
}
}
  }
+
+void intel_guc_write_barrier(struct intel_guc *guc)
+{
+   struct intel_gt *gt = guc_to_gt(guc);
+
+   if (i915_gem_object_is_lmem(guc->ct.vma->obj)) {
+   /*
+* Ensure intel_uncore_write_fw can be used rather than
+* intel_uncore_write.
+*/
+   GEM_BUG_ON(guc->send_regs.fw_domains);
+
+   /*
+* This register is used by the i915 and GuC for MMIO based
+* communication. Once we are in this code CTBs are the only
+* method the i915 uses to communicate with the GuC so it is
+* safe to write to this register (a value of 0 is NOP for MMIO
+* communication). If we ever start mixing CTBs and MMIOs a new
+* register will have to be chosen.
+*/
Hmm, missed it before but this comment is very CTB centric and the 
barrier function is now being used for parallel submission work queues. 
Seems like an extra comment should be added to cover that case. Just 
something simple about WQ usage is also guaranteed to be post CTB switch 
over.



+   intel_uncore_write_fw(gt->uncore, GEN11_SOFT_SCRATCH(0), 0);
+   } else {
+   /* wmb() sufficient for a barrier if in smem */
+   wmb();
+   }
+}
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index a9f4ec972bfb..147f39cc0f2f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -46,6 +46,12 @@ struct intel_guc {
 * submitted until the stalled request is processed.
 */
struct i915_request *stalled_request;
+   enum {
+   STALL_NONE,
+   STALL_REGISTER_CONTEXT,
+   STALL_MOVE_LRC_TAIL,
+   STALL_ADD_REQUEST,
+   } submission_stall_reason;
  
  	/* intel_guc_recv interrupt related state */

/** @irq_lock: protects GuC irq state */
@@ -361,4 +367,6 @@ void intel_guc_submission_cancel_requests(struct intel_guc 
*guc);
  
  void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
  
+void intel_guc_write_barrier(struct intel_guc *guc);

+
  #endif
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 20c710a74498..10d1878d2826 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -377,28 +377,6 @@ static u32 ct_get_next_fence(struct intel_guc_ct *ct)
return ++ct->requests.last_fence;
  }
  
-static void write_barrier(struct intel_guc_ct *ct)

-{
-   struct intel_guc *guc = ct_to_guc(ct);
-   struct intel_gt *gt = guc_to_gt(guc);
-
-   if (i915_gem_object_is_lmem(guc->ct.vma->obj)) {
-   GEM_BUG_ON(guc->send_regs.fw_domains);
-   /*
-* This register is used by the i915 and GuC for MMIO based
-* communication. Once we are in this code CTBs are the only
-* method the i915 uses to communicate with the GuC so it is
-* safe to write to this register (a value of 0 is NOP for MMIO
-* communication). If we ever start mixing CTBs and MMIOs a new
-* register will have to be chosen.
-*/
-   intel_uncore_write_fw(gt->uncore, GEN11_SOFT_SCRATCH(0), 0);
-   } else {
-   /* wmb() sufficient for a barrier if in smem */
-   wmb();
-   }
-}
-
 

Re: [Intel-gfx] [PATCH 14/26] drm/i915/guc: Implement multi-lrc reset

2021-10-08 Thread John Harrison

On 10/4/2021 15:06, Matthew Brost wrote:

Update context and full GPU reset to work with multi-lrc. The idea is
parent context tracks all the active requests inflight for itself and
its' children. The parent context owns the reset replaying / canceling

Still its' should be its.


requests as needed.

v2:
  (John Harrison)
   - Simply loop in find active request
   - Add comments to find ative request / reset loop

Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/intel_context.c   | 15 +++-
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 69 ++-
  2 files changed, 63 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index c5bb7ccfb3f8..3b340eb59ada 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -528,20 +528,29 @@ struct i915_request *intel_context_create_request(struct 
intel_context *ce)
  
  struct i915_request *intel_context_find_active_request(struct intel_context *ce)

  {
+   struct intel_context *parent = intel_context_to_parent(ce);
struct i915_request *rq, *active = NULL;
unsigned long flags;
  
  	GEM_BUG_ON(!intel_engine_uses_guc(ce->engine));
  
-	spin_lock_irqsave(&ce->guc_state.lock, flags);

-   list_for_each_entry_reverse(rq, &ce->guc_state.requests,
+   /*
+* We search the parent list to find an active request on the submitted
+* context. The parent list contains the requests for all the contexts
+* in the relationship so we have to do a compare of each request's
+* context must be done.

"have to do ... must be done" - no need for both.


+*/
+   spin_lock_irqsave(&parent->guc_state.lock, flags);
+   list_for_each_entry_reverse(rq, &parent->guc_state.requests,
sched.link) {
+   if (rq->context != ce)
+   continue;
if (i915_request_completed(rq))
break;
  
  		active = rq;

}
-   spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+   spin_unlock_irqrestore(&parent->guc_state.lock, flags);
  
  	return active;

  }
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 6be7adf89e4f..d661a69ef4f7 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -681,6 +681,11 @@ static inline int rq_prio(const struct i915_request *rq)
return rq->sched.attr.priority;
  }
  
+static inline bool is_multi_lrc(struct intel_context *ce)

+{
+   return intel_context_is_parallel(ce);
+}
+
  static bool is_multi_lrc_rq(struct i915_request *rq)
  {
return intel_context_is_parallel(rq->context);
@@ -1214,10 +1219,15 @@ __unwind_incomplete_requests(struct intel_context *ce)
  
  static void __guc_reset_context(struct intel_context *ce, bool stalled)

  {
+   bool local_stalled;
struct i915_request *rq;
unsigned long flags;
u32 head;
+   int i, number_children = ce->parallel.number_children;
bool skip = false;
+   struct intel_context *parent = ce;
+
+   GEM_BUG_ON(intel_context_is_child(ce));
  
  	intel_context_get(ce);
  
@@ -1243,25 +1253,38 @@ static void __guc_reset_context(struct intel_context *ce, bool stalled)

if (unlikely(skip))
goto out_put;
  
-	rq = intel_context_find_active_request(ce);

-   if (!rq) {
-   head = ce->ring->tail;
-   stalled = false;
-   goto out_replay;
-   }
+   /*
+* For each context in the relationship find the hanging request
+* resetting each context / request as needed
+*/
+   for (i = 0; i < number_children + 1; ++i) {
+   if (!intel_context_is_pinned(ce))
+   goto next_context;
+
+   local_stalled = false;
+   rq = intel_context_find_active_request(ce);
+   if (!rq) {
+   head = ce->ring->tail;
+   goto out_replay;
+   }
  
-	if (!i915_request_started(rq))

-   stalled = false;
+   GEM_BUG_ON(i915_active_is_idle(&ce->active));
+   head = intel_ring_wrap(ce->ring, rq->head);
  
-	GEM_BUG_ON(i915_active_is_idle(&ce->active));

-   head = intel_ring_wrap(ce->ring, rq->head);
-   __i915_request_reset(rq, stalled);
+   if (i915_request_started(rq))
I didn't see an answer as to why the started test and the wrap call need 
to be reversed?


John.


+   local_stalled = true;
  
+		__i915_request_reset(rq, local_stalled && stalled);

  out_replay:
-   guc_reset_state(ce, head, stalled);
-   __u

Re: [Intel-gfx] [PATCH 15/26] drm/i915/guc: Update debugfs for GuC multi-lrc

2021-10-08 Thread John Harrison

On 10/4/2021 15:06, Matthew Brost wrote:

Display the workqueue status in debugfs for GuC contexts that are in
parent-child relationship.

v2:
  (John Harrison)
   - Output number children in debugfs

Signed-off-by: Matthew Brost 

Reviewed-by: John Harrison 


---
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 53 ++-
  1 file changed, 39 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index d661a69ef4f7..f69e984683aa 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -3704,6 +3704,26 @@ static inline void guc_log_context_priority(struct 
drm_printer *p,
drm_printf(p, "\n");
  }
  
+

+static inline void guc_log_context(struct drm_printer *p,
+  struct intel_context *ce)
+{
+   drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id.id);
+   drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
+   drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n",
+  ce->ring->head,
+  ce->lrc_reg_state[CTX_RING_HEAD]);
+   drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n",
+  ce->ring->tail,
+  ce->lrc_reg_state[CTX_RING_TAIL]);
+   drm_printf(p, "\t\tContext Pin Count: %u\n",
+  atomic_read(&ce->pin_count));
+   drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
+  atomic_read(&ce->guc_id.ref));
+   drm_printf(p, "\t\tSchedule State: 0x%x\n\n",
+  ce->guc_state.sched_state);
+}
+
  void intel_guc_submission_print_context_info(struct intel_guc *guc,
 struct drm_printer *p)
  {
@@ -3713,22 +3733,27 @@ void intel_guc_submission_print_context_info(struct 
intel_guc *guc,
  
  	xa_lock_irqsave(&guc->context_lookup, flags);

xa_for_each(&guc->context_lookup, index, ce) {
-   drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id.id);
-   drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
-   drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n",
-  ce->ring->head,
-  ce->lrc_reg_state[CTX_RING_HEAD]);
-   drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n",
-  ce->ring->tail,
-  ce->lrc_reg_state[CTX_RING_TAIL]);
-   drm_printf(p, "\t\tContext Pin Count: %u\n",
-  atomic_read(&ce->pin_count));
-   drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
-  atomic_read(&ce->guc_id.ref));
-   drm_printf(p, "\t\tSchedule State: 0x%x\n\n",
-  ce->guc_state.sched_state);
+   GEM_BUG_ON(intel_context_is_child(ce));
  
+		guc_log_context(p, ce);

guc_log_context_priority(p, ce);
+
+   if (intel_context_is_parent(ce)) {
+   struct guc_process_desc *desc = __get_process_desc(ce);
+   struct intel_context *child;
+
+   drm_printf(p, "\t\tNumber children: %u\n",
+  ce->parallel.number_children);
+   drm_printf(p, "\t\tWQI Head: %u\n",
+  READ_ONCE(desc->head));
+   drm_printf(p, "\t\tWQI Tail: %u\n",
+  READ_ONCE(desc->tail));
+   drm_printf(p, "\t\tWQI Status: %u\n\n",
+  READ_ONCE(desc->wq_status));
+
+   for_each_child(ce, child)
+   guc_log_context(p, child);
+   }
}
xa_unlock_irqrestore(&guc->context_lookup, flags);
  }




Re: [Intel-gfx] [PATCH 16/26] drm/i915: Fix bug in user proto-context creation that leaked contexts

2021-10-08 Thread John Harrison

On 10/4/2021 15:06, Matthew Brost wrote:

Set number of engines before attempting to create contexts so the
function free_engines can clean up properly. Also check return of
alloc_engines for NULL.

v2:
  (Tvrtko)
   - Send as stand alone patch
  (John Harrison)
   - Check for alloc_engines returning NULL
v3:
  (Checkpatch / Tvrtko)
   - Remove braces around single line if statement

Cc: Jason Ekstrand 
Fixes: d4433c7600f7 ("drm/i915/gem: Use the proto-context to handle create 
parameters (v5)")
Reviewed-by: Tvrtko Ursulin 
Signed-off-by: Matthew Brost 
Cc: 

Reviewed-by: John Harrison 


---
  drivers/gpu/drm/i915/gem/i915_gem_context.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 8208fd5b72c3..8c7ea6e56262 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -898,6 +898,10 @@ static struct i915_gem_engines *user_engines(struct 
i915_gem_context *ctx,
unsigned int n;
  
  	e = alloc_engines(num_engines);

+   if (!e)
+   return ERR_PTR(-ENOMEM);
+   e->num_engines = num_engines;
+
for (n = 0; n < num_engines; n++) {
struct intel_context *ce;
int ret;
@@ -931,7 +935,6 @@ static struct i915_gem_engines *user_engines(struct 
i915_gem_context *ctx,
goto free_engines;
}
}
-   e->num_engines = num_engines;
  
  	return e;
  




Re: [Intel-gfx] [PATCH 17/26] drm/i915/guc: Connect UAPI to GuC multi-lrc interface

2021-10-11 Thread John Harrison

On 10/4/2021 15:06, Matthew Brost wrote:

Introduce 'set parallel submit' extension to connect UAPI to GuC
multi-lrc interface. Kernel doc in new uAPI should explain it all.

IGT: https://patchwork.freedesktop.org/patch/447008/?series=93071&rev=1
media UMD: https://github.com/intel/media-driver/pull/1252

v2:
  (Daniel Vetter)
   - Add IGT link and placeholder for media UMD link
v3:
  (Kernel test robot)
   - Fix warning in unpin engines call
  (John Harrison)
   - Reword a bunch of the kernel doc

Cc: Tvrtko Ursulin 
Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gem/i915_gem_context.c   | 221 +-
  .../gpu/drm/i915/gem/i915_gem_context_types.h |   6 +
  drivers/gpu/drm/i915/gt/intel_context_types.h |   9 +-
  drivers/gpu/drm/i915/gt/intel_engine.h|  12 +-
  drivers/gpu/drm/i915/gt/intel_engine_cs.c |   6 +-
  .../drm/i915/gt/intel_execlists_submission.c  |   6 +-
  drivers/gpu/drm/i915/gt/selftest_execlists.c  |  12 +-
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 114 -
  include/uapi/drm/i915_drm.h   | 131 +++
  9 files changed, 489 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 8c7ea6e56262..6290bc20ccb1 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -522,9 +522,150 @@ set_proto_ctx_engines_bond(struct i915_user_extension 
__user *base, void *data)
return 0;
  }
  
+static int

+set_proto_ctx_engines_parallel_submit(struct i915_user_extension __user *base,
+ void *data)
+{
+   struct i915_context_engines_parallel_submit __user *ext =
+   container_of_user(base, typeof(*ext), base);
+   const struct set_proto_ctx_engines *set = data;
+   struct drm_i915_private *i915 = set->i915;
+   u64 flags;
+   int err = 0, n, i, j;
+   u16 slot, width, num_siblings;
+   struct intel_engine_cs **siblings = NULL;
+   intel_engine_mask_t prev_mask;
+
+   /* Disabling for now */
+   return -ENODEV;
+
+   /* FIXME: This is NIY for execlists */
+   if (!(intel_uc_uses_guc_submission(&i915->gt.uc)))
+   return -ENODEV;
+
+   if (get_user(slot, &ext->engine_index))
+   return -EFAULT;
+
+   if (get_user(width, &ext->width))
+   return -EFAULT;
+
+   if (get_user(num_siblings, &ext->num_siblings))
+   return -EFAULT;
+
+   if (slot >= set->num_engines) {
+   drm_dbg(&i915->drm, "Invalid placement value, %d >= %d\n",
+   slot, set->num_engines);
+   return -EINVAL;
+   }
+
+   if (set->engines[slot].type != I915_GEM_ENGINE_TYPE_INVALID) {
+   drm_dbg(&i915->drm,
+   "Invalid placement[%d], already occupied\n", slot);
+   return -EINVAL;
+   }
+
+   if (get_user(flags, &ext->flags))
+   return -EFAULT;
+
+   if (flags) {
+   drm_dbg(&i915->drm, "Unknown flags 0x%02llx", flags);
+   return -EINVAL;
+   }
+
+   for (n = 0; n < ARRAY_SIZE(ext->mbz64); n++) {
+   err = check_user_mbz(&ext->mbz64[n]);
+   if (err)
+   return err;
+   }
+
+   if (width < 2) {
+   drm_dbg(&i915->drm, "Width (%d) < 2\n", width);
+   return -EINVAL;
+   }
+
+   if (num_siblings < 1) {
+   drm_dbg(&i915->drm, "Number siblings (%d) < 1\n",
+   num_siblings);
+   return -EINVAL;
+   }
+
+   siblings = kmalloc_array(num_siblings * width,
+sizeof(*siblings),
+GFP_KERNEL);
+   if (!siblings)
+   return -ENOMEM;
+
+   /* Create contexts / engines */
+   for (i = 0; i < width; ++i) {
+   intel_engine_mask_t current_mask = 0;
+   struct i915_engine_class_instance prev_engine;
+
+   for (j = 0; j < num_siblings; ++j) {
+   struct i915_engine_class_instance ci;
+
+   n = i * num_siblings + j;
+   if (copy_from_user(&ci, &ext->engines[n], sizeof(ci))) {
+   err = -EFAULT;
+   goto out_err;
+   }
+
+   siblings[n] =
+   intel_engine_lookup_user(i915, ci.engine_class,
+ci.engine_instance);
+   if (!siblings[n]) {
+   drm_dbg(&i915->drm,
+   "Invali

Re: [Intel-gfx] [PATCH 20/26] drm/i915/guc: Implement no mid batch preemption for multi-lrc

2021-10-11 Thread John Harrison

On 10/4/2021 15:06, Matthew Brost wrote:

For some users of multi-lrc, e.g. split frame, it isn't safe to preempt
mid BB. To safely enable preemption at the BB boundary, a handshake
between to parent and child is needed. This is implemented via custom

between to parent -> between parent

emit_bb_start & emit_fini_breadcrumb functions and enabled via by

via by -> by

I'm also not seeing any mention of the forced re-group behavioural 
change in either the comments or commit description.



default if a context is configured by set parallel extension.

v2:
  (John Harrison)
   - Fix a few comments wording
   - Add struture for parent page layout

Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/intel_context.c   |   2 +-
  drivers/gpu/drm/i915/gt/intel_context_types.h |   2 +
  drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   2 +-
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 330 +-
  4 files changed, 324 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index 3b340eb59ada..ee84259959d0 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -569,7 +569,7 @@ void intel_context_bind_parent_child(struct intel_context 
*parent,
GEM_BUG_ON(intel_context_is_child(child));
GEM_BUG_ON(intel_context_is_parent(child));
  
-	parent->parallel.number_children++;

+   parent->parallel.child_index = parent->parallel.number_children++;
list_add_tail(&child->parallel.child_link,
  &parent->parallel.child_list);
child->parallel.parent = parent;
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 1d880303a7e4..95a5b94b4ece 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -250,6 +250,8 @@ struct intel_context {
struct i915_request *last_rq;
/** @number_children: number of children if parent */
u8 number_children;
+   /** @child_index: index into child_list if child */
+   u8 child_index;
/** @guc: GuC specific members for parallel submission */
struct {
/** @wqi_head: head pointer in work queue */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
index a00eeddc1449..663950d3badc 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -181,7 +181,7 @@ struct guc_process_desc {
u32 wq_status;
u32 engine_presence;
u32 priority;
-   u32 reserved[30];
+   u32 reserved[36];

Not seeing the promised explanation of this bug fix.


  } __packed;
  
  #define CONTEXT_REGISTRATION_FLAG_KMD	BIT(0)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 12ee8ca76249..f28e36aa77c2 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -11,6 +11,7 @@
  #include "gt/intel_context.h"
  #include "gt/intel_engine_pm.h"
  #include "gt/intel_engine_heartbeat.h"
+#include "gt/intel_gpu_commands.h"
  #include "gt/intel_gt.h"
  #include "gt/intel_gt_irq.h"
  #include "gt/intel_gt_pm.h"
@@ -368,10 +369,16 @@ static inline struct i915_priolist *to_priolist(struct 
rb_node *rb)
  
  /*

   * When using multi-lrc submission an extra page in the context state is
- * reserved for the process descriptor and work queue.
+ * reserved for the process descriptor, work queue, and handshake between the
+ * parent + childlren contexts to insert safe preemption points between each 
set
+ * of BBs.
   *
   * The layout of this page is below:
   * 0  guc_process_desc
+ * + sizeof(struct guc_process_desc)   child go
+ * + CACHELINE_BYTES   child join[0]
+ * ...
+ * + CACHELINE_BYTES   child join[n - 1]
   * ...unused
   * PAGE_SIZE / 2  work queue start
   * ...work queue
@@ -379,7 +386,25 @@ static inline struct i915_priolist *to_priolist(struct 
rb_node *rb)
   */
  #define WQ_SIZE   (PAGE_SIZE / 2)
  #define WQ_OFFSET (PAGE_SIZE - WQ_SIZE)
-static u32 __get_process_desc_offset(struct intel_context *ce)
+
+struct parent_page {
+   struct guc_process_desc pdesc;
+
+   u32 child_go_memory;
+   u8 unused0[CACHELINE_BYTES - sizeof(u32)];
+
+   struct {
+   u32 child_join_memory;
+   u8 unused1[CACHELINE_BYTES - sizeof(u32)];
+   } j

Re: [Intel-gfx] [PATCH 21/26] drm/i915: Multi-BB execbuf

2021-10-12 Thread John Harrison

On 10/4/2021 15:06, Matthew Brost wrote:

Allow multiple batch buffers to be submitted in a single execbuf IOCTL
after a context has been configured with the 'set_parallel' extension.
The number batches is implicit based on the contexts configuration.

This is implemented with a series of loops. First a loop is used to find
all the batches, a loop to pin all the HW contexts, a loop to create all
the requests, a loop to submit (emit BB start, etc...) all the requests,
a loop to tie the requests to the VMAs they touch, and finally a loop to
commit the requests to the backend.

A composite fence is also created for the generated requests to return
to the user and to stick in dma resv slots.

No behavior from the existing IOCTL should be changed aside from when
throttling because the ring for a context is full, wait on the request

throttling because the ring for -> throttling the ring because

full, wait -> full. In this situation, i915 will now wait


while holding the object locks.

, previously it would have dropped the locks for the wait.

And maybe explain why this change is necessary?




IGT: https://patchwork.freedesktop.org/patch/447008/?series=93071&rev=1
media UMD: https://github.com/intel/media-driver/pull/1252

v2:
  (Matthew Brost)
   - Return proper error value if i915_request_create fails
v3:
  (John Harrison)
   - Add comment explaining create / add order loops + locking
   - Update commit message explaining different in IOCTL behavior
   - Line wrap some comments
   - eb_add_request returns void
   - Return -EINVAL rather triggering BUG_ON if cmd parser used
  (Checkpatch)
   - Check eb->batch_len[*current_batch]

Signed-off-by: Matthew Brost 
---
  .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 793 --
  drivers/gpu/drm/i915/gt/intel_context.h   |   8 +-
  drivers/gpu/drm/i915/gt/intel_context_types.h |  10 +
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c |   2 +
  drivers/gpu/drm/i915/i915_request.h   |   9 +
  drivers/gpu/drm/i915/i915_vma.c   |  21 +-
  drivers/gpu/drm/i915/i915_vma.h   |  13 +-
  7 files changed, 599 insertions(+), 257 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 2f2434b52317..5c7fb6f68bbb 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -244,17 +244,25 @@ struct i915_execbuffer {
struct drm_i915_gem_exec_object2 *exec; /** ioctl execobj[] */
struct eb_vma *vma;
  
-	struct intel_engine_cs *engine; /** engine to queue the request to */

+   struct intel_gt *gt; /* gt for the execbuf */
struct intel_context *context; /* logical state for the request */
struct i915_gem_context *gem_context; /** caller's context */
  
-	struct i915_request *request; /** our request to build */

-   struct eb_vma *batch; /** identity of the batch obj/vma */
+   /** our requests to build */
+   struct i915_request *requests[MAX_ENGINE_INSTANCE + 1];
+   /** identity of the batch obj/vma */
+   struct eb_vma *batches[MAX_ENGINE_INSTANCE + 1];
struct i915_vma *trampoline; /** trampoline used for chaining */
  
+	/** used for excl fence in dma_resv objects when > 1 BB submitted */

+   struct dma_fence *composite_fence;
+
/** actual size of execobj[] as we may extend it for the cmdparser */
unsigned int buffer_count;
  
+	/* number of batches in execbuf IOCTL */

+   unsigned int num_batches;
+
/** list of vma not yet bound during reservation phase */
struct list_head unbound;
  
@@ -281,7 +289,8 @@ struct i915_execbuffer {
  
  	u64 invalid_flags; /** Set of execobj.flags that are invalid */
  
-	u64 batch_len; /** Length of batch within object */

+   /** Length of batch within object */
+   u64 batch_len[MAX_ENGINE_INSTANCE + 1];
u32 batch_start_offset; /** Location within object of batch */
u32 batch_flags; /** Flags composed for emit_bb_start() */
struct intel_gt_buffer_pool_node *batch_pool; /** pool node for batch 
buffer */
@@ -299,14 +308,13 @@ struct i915_execbuffer {
  };
  
  static int eb_parse(struct i915_execbuffer *eb);

-static struct i915_request *eb_pin_engine(struct i915_execbuffer *eb,
- bool throttle);
+static int eb_pin_engine(struct i915_execbuffer *eb, bool throttle);
  static void eb_unpin_engine(struct i915_execbuffer *eb);
  
  static inline bool eb_use_cmdparser(const struct i915_execbuffer *eb)

  {
-   return intel_engine_requires_cmd_parser(eb->engine) ||
-   (intel_engine_using_cmd_parser(eb->engine) &&
+   return intel_engine_requires_cmd_parser(eb->context->engine) ||
+   (intel_engine_using_cmd_parser(eb->context->engine) &&
 eb->args->batch_len);
  }
  
@@ -544,11 +

Re: [Intel-gfx] [PATCH 22/26] drm/i915/guc: Handle errors in multi-lrc requests

2021-10-12 Thread John Harrison

On 10/4/2021 15:06, Matthew Brost wrote:

If an error occurs in the front end when multi-lrc requests are getting
generated we need to skip these in the backend but we still need to
emit the breadcrumbs seqno. An issues arises because with multi-lrc
breadcrumbs there is a handshake between the parent and children to make
forward progress. If all the requests are not present this handshake
doesn't work. To work around this, if multi-lrc request has an error we
skip the handshake but still emit the breadcrumbs seqno.

v2:
  (John Harrison)
   - Add comment explaining the skipping of the handshake logic
   - Fix typos in the commit message

Signed-off-by: Matthew Brost 
---
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 71 ++-
  1 file changed, 68 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 83b0d2a114af..05e8b199e4ce 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -4072,8 +4072,8 @@ static int 
emit_bb_start_child_no_preempt_mid_batch(struct i915_request *rq,
  }
  
  static u32 *

-emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq,
-u32 *cs)
+__emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq,
+  u32 *cs)
  {
struct intel_context *ce = rq->context;
u8 i;
@@ -4101,6 +4101,46 @@ emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct 
i915_request *rq,
  get_children_go_addr(ce),
  0);
  
+	return cs;

+}
+
+/*
+ * If this true, a submission of multi-lrc requests had an error and the
+ * requests need to be skipped. The front end (execuf IOCTL) should've called
+ * i915_request_skip which squashes the BB but we still need to emit the fini
+ * breadrcrumbs seqno write. At this point we don't know how many of the
+ * requests in the multi-lrc submission were generated so we can't do the
+ * handshake between the parent and children (e.g. if 4 requests should be
+ * generated but 2nd hit an error only 1 would be seen by the GuC backend).
+ * Simply skip the handshake, but still emit the breadcrumbd seqno, if an error
+ * has occurred on any of the requests in submission / relationship.
+ */
+static inline bool skip_handshake(struct i915_request *rq)
+{
+   return test_bit(I915_FENCE_FLAG_SKIP_PARALLEL, &rq->fence.flags);
+}
+
+static u32 *
+emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq,
+u32 *cs)
+{
+   struct intel_context *ce = rq->context;
+
+   GEM_BUG_ON(!intel_context_is_parent(ce));
+
+   if (unlikely(skip_handshake(rq))) {
+   /*
+* NOP everything in
+* __emit_fini_breadcrumb_parent_no_preempt_mid_batch, the -6
The line wrapping makes this look confusing. It seems like the function 
name should fit on the line before. Even if it is a few characters over 
(although the limit is now 100 not 80, I think), the checkpatch warning 
is worth the readability of the code.



+* comes of the length emission below.

-> comes from the length of the emits below.

John.


+*/
+   memset(cs, 0, sizeof(u32) *
+  (ce->engine->emit_fini_breadcrumb_dw - 6));
+   cs += ce->engine->emit_fini_breadcrumb_dw - 6;
+   } else {
+   cs = __emit_fini_breadcrumb_parent_no_preempt_mid_batch(rq, cs);
+   }
+
/* Emit fini breadcrumb */
cs = gen8_emit_ggtt_write(cs,
  rq->fence.seqno,
@@ -4117,7 +4157,8 @@ emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct 
i915_request *rq,
  }
  
  static u32 *

-emit_fini_breadcrumb_child_no_preempt_mid_batch(struct i915_request *rq, u32 
*cs)
+__emit_fini_breadcrumb_child_no_preempt_mid_batch(struct i915_request *rq,
+ u32 *cs)
  {
struct intel_context *ce = rq->context;
struct intel_context *parent = intel_context_to_parent(ce);
@@ -4144,6 +4185,30 @@ emit_fini_breadcrumb_child_no_preempt_mid_batch(struct 
i915_request *rq, u32 *cs
*cs++ = get_children_go_addr(parent);
*cs++ = 0;
  
+	return cs;

+}
+
+static u32 *
+emit_fini_breadcrumb_child_no_preempt_mid_batch(struct i915_request *rq,
+   u32 *cs)
+{
+   struct intel_context *ce = rq->context;
+
+   GEM_BUG_ON(!intel_context_is_child(ce));
+
+   if (unlikely(skip_handshake(rq))) {
+   /*
+* NOP everything in
+* __emit_fini_breadcrumb_child_no_preempt_mid_batch, the -6
+* comes from the length the emis

Re: [Intel-gfx] [PATCH 23/26] drm/i915: Make request conflict tracking understand parallel submits

2021-10-12 Thread John Harrison

On 10/4/2021 15:06, Matthew Brost wrote:

If an object in the excl or shared slot is a composite fence from a
parallel submit and the current request in the conflict tracking is from
the same parallel context there is no need to enforce ordering as the
ordering already implicit. Make the request conflict tracking understand

ordering already -> ordering is already


this by comparing the parents parallel fence values and skipping the

parents -> parent's


conflict insertion if the values match.
Presumably, this is to cope with the fact that the parallel submit 
fences do not look like regular submission fences. And hence the 
existing code that says 'new fence belongs to same context as old fence, 
so safe to ignore' does not work with parallel submission. However, this 
change does not appear to be adding parallel submit support to an 
existing 'same context' check. It seems to be a brand new check that 
does not exist for single submission. What makes parallel submit 
different? If we aren't skipping same context fences for single submits, 
why do we need it for parallel? Conversely, if we need it for parallel 
then why don't we need it for single?


And if the single submission version is simply somewhere else in the 
code, why do the parallel version here instead of at the same place?


John.



Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/i915_request.c | 43 +++--
  1 file changed, 29 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c 
b/drivers/gpu/drm/i915/i915_request.c
index e9bfa32f9270..cf89624020ad 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1325,6 +1325,25 @@ i915_request_await_external(struct i915_request *rq, 
struct dma_fence *fence)
return err;
  }
  
+static inline bool is_parallel_rq(struct i915_request *rq)

+{
+   return intel_context_is_parallel(rq->context);
+}
+
+static inline struct intel_context *request_to_parent(struct i915_request *rq)
+{
+   return intel_context_to_parent(rq->context);
+}
+
+static bool is_same_parallel_context(struct i915_request *to,
+struct i915_request *from)
+{
+   if (is_parallel_rq(to))

Should this not say '&& is_parallel_rq(from)'?


+   return request_to_parent(to) == request_to_parent(from);
+
+   return false;
+}
+
  int
  i915_request_await_execution(struct i915_request *rq,
 struct dma_fence *fence)
@@ -1356,11 +1375,14 @@ i915_request_await_execution(struct i915_request *rq,
 * want to run our callback in all cases.
 */
  
-		if (dma_fence_is_i915(fence))

+   if (dma_fence_is_i915(fence)) {
+   if (is_same_parallel_context(rq, to_request(fence)))
+   continue;
ret = __i915_request_await_execution(rq,
 to_request(fence));
-   else
+   } else {
ret = i915_request_await_external(rq, fence);
+   }
if (ret < 0)
return ret;
} while (--nchild);
@@ -1461,10 +1483,13 @@ i915_request_await_dma_fence(struct i915_request *rq, 
struct dma_fence *fence)
 fence))
continue;
  
-		if (dma_fence_is_i915(fence))

+   if (dma_fence_is_i915(fence)) {
+   if (is_same_parallel_context(rq, to_request(fence)))
+   continue;
ret = i915_request_await_request(rq, to_request(fence));
-   else
+   } else {
ret = i915_request_await_external(rq, fence);
+   }
if (ret < 0)
return ret;
  
@@ -1539,16 +1564,6 @@ i915_request_await_object(struct i915_request *to,

return ret;
  }
  
-static inline bool is_parallel_rq(struct i915_request *rq)

-{
-   return intel_context_is_parallel(rq->context);
-}
-
-static inline struct intel_context *request_to_parent(struct i915_request *rq)
-{
-   return intel_context_to_parent(rq->context);
-}
-
  static struct i915_request *
  __i915_request_ensure_parallel_ordering(struct i915_request *rq,
struct intel_timeline *timeline)




Re: [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Parallel submission aka multi-bb execbuf (rev4)

2021-10-12 Thread John Harrison

On 10/4/2021 15:21, Patchwork wrote:

== Series Details ==

Series: Parallel submission aka multi-bb execbuf (rev4)
URL   : https://patchwork.freedesktop.org/series/92789/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
e2a47a99bf9d drm/i915/guc: Move GuC guc_id allocation under submission state 
sub-struct
f83d8f1539fa drm/i915/guc: Take GT PM ref when deregistering context
-:79: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'gt' - possible side-effects?
#79: FILE: drivers/gpu/drm/i915/gt/intel_gt_pm.h:44:
+#define with_intel_gt_pm(gt, tmp) \
+   for (tmp = 1, intel_gt_pm_get(gt); tmp; \
+intel_gt_pm_put(gt), tmp = 0)

-:79: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'tmp' - possible side-effects?
#79: FILE: drivers/gpu/drm/i915/gt/intel_gt_pm.h:44:
+#define with_intel_gt_pm(gt, tmp) \
+   for (tmp = 1, intel_gt_pm_get(gt); tmp; \
+intel_gt_pm_put(gt), tmp = 0)
Not sure what these two are complaining about? But 'gt' and 'tmp' should 
be wrapped with parentheses when used?




total: 0 errors, 0 warnings, 2 checks, 290 lines checked
93e5284929b3 drm/i915/guc: Take engine PM when a context is pinned with GuC 
submission
4dd6554d994d drm/i915/guc: Don't call switch_to_kernel_context with GuC 
submission
8629b55f536c drm/i915: Add logical engine mapping
8117ec0a1ca7 drm/i915: Expose logical engine instance to user
aa8e1eb4dd4e drm/i915/guc: Introduce context parent-child relationship
aaf50eacc2fd drm/i915/guc: Add multi-lrc context registration
e5f6f50e66d1 drm/i915/guc: Ensure GuC schedule operations do not operate on 
child contexts
adf21ba138f3 drm/i915/guc: Assign contexts in parent-child relationship 
consecutive guc_ids
40ef33318b81 drm/i915/guc: Implement parallel context pin / unpin functions
1ad560c70346 drm/i915/guc: Implement multi-lrc submission
-:364: CHECK:SPACING: spaces preferred around that '*' (ctx:ExV)
#364: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:771:
+   *wqi++ = child->ring->tail / sizeof(u64);
^

This seems like a bogus warning.



total: 0 errors, 0 warnings, 1 checks, 570 lines checked
466c01457dec drm/i915/guc: Insert submit fences between requests in 
parent-child relationship
2ece815c1f18 drm/i915/guc: Implement multi-lrc reset
7add5784199f drm/i915/guc: Update debugfs for GuC multi-lrc
-:23: CHECK:LINE_SPACING: Please don't use multiple blank lines
#23: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:3707:
  
+

This should be fixed.



total: 0 errors, 0 warnings, 1 checks, 67 lines checked
966991d7bbed drm/i915: Fix bug in user proto-context creation that leaked 
contexts
0eb3d3bf0c84 drm/i915/guc: Connect UAPI to GuC multi-lrc interface
68c6596b649a drm/i915/doc: Update parallel submit doc to point to i915_drm.h
-:13: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does 
MAINTAINERS need updating?
#13:
deleted file mode 100644

total: 0 errors, 1 warnings, 0 checks, 10 lines checked
8290f5d15ca2 drm/i915/guc: Add basic GuC multi-lrc selftest
-:22: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does 
MAINTAINERS need updating?
#22:
new file mode 100644

These two can be ignored.


total: 0 errors, 1 warnings, 0 checks, 190 lines checked
ade3768c42d5 drm/i915/guc: Implement no mid batch preemption for multi-lrc
57882939d788 drm/i915: Multi-BB execbuf
-:369: CHECK:MACRO_ARG_REUSE: Macro argument reuse '_i' - possible side-effects?
#369: FILE: drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:1854:
+#define for_each_batch_create_order(_eb, _i) \
+   for (_i = 0; _i < (_eb)->num_batches; ++_i)

Again, not sure the 'reuse' comment means but should also use '(_i)'?



-:371: ERROR:MULTISTATEMENT_MACRO_USE_DO_WHILE: Macros with multiple statements 
should be enclosed in a do - while loop
#371: FILE: drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:1856:
+#define for_each_batch_add_order(_eb, _i) \
+   BUILD_BUG_ON(!typecheck(int, _i)); \
+   for (_i = (_eb)->num_batches - 1; _i >= 0; --_i)

This seems bogus. Wrapping it in a do/while will break the purpose!



-:371: CHECK:MACRO_ARG_REUSE: Macro argument reuse '_i' - possible side-effects?
#371: FILE: drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:1856:
+#define for_each_batch_add_order(_eb, _i) \
+   BUILD_BUG_ON(!typecheck(int, _i)); \
+   for (_i = (_eb)->num_batches - 1; _i >= 0; --_i)

As above.



total: 1 errors, 0 warnings, 2 checks, 1298 lines checked
28b699ece289 drm/i915/guc: Handle errors in multi-lrc requests
962e6b3dce59 drm/i915: Make request conflict tracking understand parallel 
submits
368ab12f5205 drm/i915: Update I915_GEM_BUSY IOCTL to understand composite fences
b52570f01859 drm/i915: Enable multi-bb execbuf
8766155832d7 drm/i915/execlists: Weak parallel submission support for execlists






Re: [Intel-gfx] ✗ Fi.CI.DOCS: warning for Parallel submission aka multi-bb execbuf (rev4)

2021-10-12 Thread John Harrison

On 10/4/2021 15:26, Patchwork wrote:

== Series Details ==

Series: Parallel submission aka multi-bb execbuf (rev4)
URL   : https://patchwork.freedesktop.org/series/92789/
State : warning

== Summary ==

$ make htmldocs 2>&1 > /dev/null | grep i915
./drivers/gpu/drm/i915/gt/uc/intel_guc.h:166: warning: Function parameter or 
member 'submission_stall_reason' not described in 'intel_guc'
./drivers/gpu/drm/i915/gt/uc/intel_guc.h:166: warning: Function parameter or 
member 'submission_state' not described in 'intel_guc'



These seem like valid things that need to be fixed.

John.



Re: [Intel-gfx] [PATCH 10/26] drm/i915/guc: Assign contexts in parent-child relationship consecutive guc_ids

2021-10-13 Thread John Harrison

On 10/13/2021 11:03, Matthew Brost wrote:

On Fri, Oct 08, 2021 at 09:40:43AM -0700, John Harrison wrote:

On 10/7/2021 18:21, Matthew Brost wrote:

On Thu, Oct 07, 2021 at 03:03:04PM -0700, John Harrison wrote:

On 10/4/2021 15:06, Matthew Brost wrote:

Assign contexts in parent-child relationship consecutive guc_ids. This
is accomplished by partitioning guc_id space between ones that need to
be consecutive (1/16 available guc_ids) and ones that do not (15/16 of
available guc_ids). The consecutive search is implemented via the bitmap
API.

This is a precursor to the full GuC multi-lrc implementation but aligns
to how GuC mutli-lrc interface is defined - guc_ids must be consecutive
when using the GuC multi-lrc interface.

v2:
(Daniel Vetter)
 - Explicitly state why we assign consecutive guc_ids
v3:
(John Harrison)
 - Bring back in spin lock

Signed-off-by: Matthew Brost 
---
drivers/gpu/drm/i915/gt/uc/intel_guc.h|   6 +-
.../gpu/drm/i915/gt/uc/intel_guc_submission.c | 104 ++
2 files changed, 86 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 25a598e2b6e8..a9f4ec972bfb 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -76,9 +76,13 @@ struct intel_guc {
 */
spinlock_t lock;
/**
-* @guc_ids: used to allocate new guc_ids
+* @guc_ids: used to allocate new guc_ids, single-lrc
 */
struct ida guc_ids;
+   /**
+* @guc_ids_bitmap: used to allocate new guc_ids, multi-lrc
+*/
+   unsigned long *guc_ids_bitmap;
/**
 * @guc_id_list: list of intel_context with valid guc_ids but no
 * refs
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 1f2809187513..79e7732e83b2 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -128,6 +128,16 @@ guc_create_virtual(struct intel_engine_cs **siblings, 
unsigned int count);
#define GUC_REQUEST_SIZE 64 /* bytes */
+/*
+ * We reserve 1/16 of the guc_ids for multi-lrc as these need to be contiguous
+ * per the GuC submission interface. A different allocation algorithm is used
+ * (bitmap vs. ida) between multi-lrc and single-lrc hence the reason to
+ * partition the guc_id space. We believe the number of multi-lrc contexts in
+ * use should be low and 1/16 should be sufficient. Minimum of 32 guc_ids for
+ * multi-lrc.
+ */
+#define NUMBER_MULTI_LRC_GUC_ID(GUC_MAX_LRC_DESCRIPTORS / 16)
+
/*
 * Below is a set of functions which control the GuC scheduling state which
 * require a lock.
@@ -1206,6 +1216,11 @@ int intel_guc_submission_init(struct intel_guc *guc)
INIT_WORK(&guc->submission_state.destroyed_worker,
  destroyed_worker_func);
+   guc->submission_state.guc_ids_bitmap =
+   bitmap_zalloc(NUMBER_MULTI_LRC_GUC_ID, GFP_KERNEL);
+   if (!guc->submission_state.guc_ids_bitmap)
+   return -ENOMEM;
+
return 0;
}
@@ -1217,6 +1232,7 @@ void intel_guc_submission_fini(struct intel_guc *guc)
guc_lrc_desc_pool_destroy(guc);
guc_flush_destroyed_contexts(guc);
i915_sched_engine_put(guc->sched_engine);
+   bitmap_free(guc->submission_state.guc_ids_bitmap);
}
static inline void queue_request(struct i915_sched_engine *sched_engine,
@@ -1268,18 +1284,43 @@ static void guc_submit_request(struct i915_request *rq)
spin_unlock_irqrestore(&sched_engine->lock, flags);
}
-static int new_guc_id(struct intel_guc *guc)
+static int new_guc_id(struct intel_guc *guc, struct intel_context *ce)
{
-   return ida_simple_get(&guc->submission_state.guc_ids, 0,
- GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL |
- __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
+   int ret;
+
+   GEM_BUG_ON(intel_context_is_child(ce));
+
+   if (intel_context_is_parent(ce))
+   ret = 
bitmap_find_free_region(guc->submission_state.guc_ids_bitmap,
+ NUMBER_MULTI_LRC_GUC_ID,
+ 
order_base_2(ce->parallel.number_children
+  + 1));
+   else
+   ret = ida_simple_get(&guc->submission_state.guc_ids,
+NUMBER_MULTI_LRC_GUC_ID,
+GUC_MAX_LRC_DESCRIPTORS,
+GFP_KERNEL | __GFP_RETRY_MAYFAIL |
+__GFP_NOWARN);
+   if (unlikely(ret < 0))
+   return ret

Re: [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Parallel submission aka multi-bb execbuf (rev4)

2021-10-13 Thread John Harrison

On 10/12/2021 17:15, Matthew Brost wrote:

On Tue, Oct 12, 2021 at 03:15:00PM -0700, John Harrison wrote:

On 10/4/2021 15:21, Patchwork wrote:

== Series Details ==

Series: Parallel submission aka multi-bb execbuf (rev4)
URL   : https://patchwork.freedesktop.org/series/92789/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
e2a47a99bf9d drm/i915/guc: Move GuC guc_id allocation under submission state 
sub-struct
f83d8f1539fa drm/i915/guc: Take GT PM ref when deregistering context
-:79: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'gt' - possible side-effects?
#79: FILE: drivers/gpu/drm/i915/gt/intel_gt_pm.h:44:
+#define with_intel_gt_pm(gt, tmp) \
+   for (tmp = 1, intel_gt_pm_get(gt); tmp; \
+intel_gt_pm_put(gt), tmp = 0)

-:79: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'tmp' - possible side-effects?
#79: FILE: drivers/gpu/drm/i915/gt/intel_gt_pm.h:44:
+#define with_intel_gt_pm(gt, tmp) \
+   for (tmp = 1, intel_gt_pm_get(gt); tmp; \
+intel_gt_pm_put(gt), tmp = 0)

Not sure what these two are complaining about? But 'gt' and 'tmp' should be
wrapped with parentheses when used?


Not, sure but I think this one is fine.


total: 0 errors, 0 warnings, 2 checks, 290 lines checked
93e5284929b3 drm/i915/guc: Take engine PM when a context is pinned with GuC 
submission
4dd6554d994d drm/i915/guc: Don't call switch_to_kernel_context with GuC 
submission
8629b55f536c drm/i915: Add logical engine mapping
8117ec0a1ca7 drm/i915: Expose logical engine instance to user
aa8e1eb4dd4e drm/i915/guc: Introduce context parent-child relationship
aaf50eacc2fd drm/i915/guc: Add multi-lrc context registration
e5f6f50e66d1 drm/i915/guc: Ensure GuC schedule operations do not operate on 
child contexts
adf21ba138f3 drm/i915/guc: Assign contexts in parent-child relationship 
consecutive guc_ids
40ef33318b81 drm/i915/guc: Implement parallel context pin / unpin functions
1ad560c70346 drm/i915/guc: Implement multi-lrc submission
-:364: CHECK:SPACING: spaces preferred around that '*' (ctx:ExV)
#364: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:771:
+   *wqi++ = child->ring->tail / sizeof(u64);
^

This seems like a bogus warning.


Agree.


total: 0 errors, 0 warnings, 1 checks, 570 lines checked
466c01457dec drm/i915/guc: Insert submit fences between requests in 
parent-child relationship
2ece815c1f18 drm/i915/guc: Implement multi-lrc reset
7add5784199f drm/i915/guc: Update debugfs for GuC multi-lrc
-:23: CHECK:LINE_SPACING: Please don't use multiple blank lines
#23: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:3707:
+

This should be fixed.


Done.
  

total: 0 errors, 0 warnings, 1 checks, 67 lines checked
966991d7bbed drm/i915: Fix bug in user proto-context creation that leaked 
contexts
0eb3d3bf0c84 drm/i915/guc: Connect UAPI to GuC multi-lrc interface
68c6596b649a drm/i915/doc: Update parallel submit doc to point to i915_drm.h
-:13: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does 
MAINTAINERS need updating?
#13:
deleted file mode 100644

total: 0 errors, 1 warnings, 0 checks, 10 lines checked
8290f5d15ca2 drm/i915/guc: Add basic GuC multi-lrc selftest
-:22: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does 
MAINTAINERS need updating?
#22:
new file mode 100644

These two can be ignored.

Agree.


total: 0 errors, 1 warnings, 0 checks, 190 lines checked
ade3768c42d5 drm/i915/guc: Implement no mid batch preemption for multi-lrc
57882939d788 drm/i915: Multi-BB execbuf
-:369: CHECK:MACRO_ARG_REUSE: Macro argument reuse '_i' - possible side-effects?
#369: FILE: drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:1854:
+#define for_each_batch_create_order(_eb, _i) \
+   for (_i = 0; _i < (_eb)->num_batches; ++_i)

Again, not sure the 'reuse' comment means but should also use '(_i)'?


I haven't been able to figure out how to fix these ones. I think you
only need () if you dref the variable.
The () is to prevent any kind of operator precedence confusion when 
passing in something more exciting than a simple variable. Doesn't have 
to be a deref, it could be any operator. Granted, extremely unlikely for 
this particular macro but generally good practice just in case. E.g. 
someone passes in weird things like 'a, func()' as '_i'.


John.

  

-:371: ERROR:MULTISTATEMENT_MACRO_USE_DO_WHILE: Macros with multiple statements 
should be enclosed in a do - while loop
#371: FILE: drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:1856:
+#define for_each_batch_add_order(_eb, _i) \
+   BUILD_BUG_ON(!typecheck(int, _i)); \
+   for (_i = (_eb)->num_batches - 1; _i >= 0; --_i)

This seems bogus. Wrapping it in a do/while will break the purpose!


Right. Added the BUILD_BUG_ON here because I did have a bug where I used
an unsigned with this macro and that breaks the macro.

Matt


-:371:

Re: [Intel-gfx] [PATCH 23/26] drm/i915: Make request conflict tracking understand parallel submits

2021-10-13 Thread John Harrison

On 10/13/2021 10:51, Matthew Brost wrote:

On Tue, Oct 12, 2021 at 03:08:05PM -0700, John Harrison wrote:

On 10/4/2021 15:06, Matthew Brost wrote:

If an object in the excl or shared slot is a composite fence from a
parallel submit and the current request in the conflict tracking is from
the same parallel context there is no need to enforce ordering as the
ordering already implicit. Make the request conflict tracking understand

ordering already -> ordering is already


this by comparing the parents parallel fence values and skipping the

parents -> parent's


conflict insertion if the values match.

Presumably, this is to cope with the fact that the parallel submit fences do
not look like regular submission fences. And hence the existing code that
says 'new fence belongs to same context as old fence, so safe to ignore'
does not work with parallel submission. However, this change does not appear
to be adding parallel submit support to an existing 'same context' check. It
seems to be a brand new check that does not exist for single submission.
What makes parallel submit different? If we aren't skipping same context
fences for single submits, why do we need it for parallel? Conversely, if we
need it for parallel then why don't we need it for single?

And if the single submission version is simply somewhere else in the code,
why do the parallel version here instead of at the same place?

John.


Signed-off-by: Matthew Brost 
---
   drivers/gpu/drm/i915/i915_request.c | 43 +++--
   1 file changed, 29 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c 
b/drivers/gpu/drm/i915/i915_request.c
index e9bfa32f9270..cf89624020ad 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1325,6 +1325,25 @@ i915_request_await_external(struct i915_request *rq, 
struct dma_fence *fence)
return err;
   }
+static inline bool is_parallel_rq(struct i915_request *rq)
+{
+   return intel_context_is_parallel(rq->context);
+}
+
+static inline struct intel_context *request_to_parent(struct i915_request *rq)
+{
+   return intel_context_to_parent(rq->context);
+}
+
+static bool is_same_parallel_context(struct i915_request *to,
+struct i915_request *from)
+{
+   if (is_parallel_rq(to))

Should this not say '&& is_parallel_rq(from)'?


Missed this one. That isn't necessary as if from is not a parallel
submit the following compare of parents will always return false. I
could add if you insist as either way works.

Matt
It was more a question of whether req_to_parent() works fine 
irrespective of whether the rq is a parent, child or single?


John.




+   return request_to_parent(to) == request_to_parent(from);
+
+   return false;
+}
+
   int
   i915_request_await_execution(struct i915_request *rq,
 struct dma_fence *fence)
@@ -1356,11 +1375,14 @@ i915_request_await_execution(struct i915_request *rq,
 * want to run our callback in all cases.
 */
-   if (dma_fence_is_i915(fence))
+   if (dma_fence_is_i915(fence)) {
+   if (is_same_parallel_context(rq, to_request(fence)))
+   continue;
ret = __i915_request_await_execution(rq,
 to_request(fence));
-   else
+   } else {
ret = i915_request_await_external(rq, fence);
+   }
if (ret < 0)
return ret;
} while (--nchild);
@@ -1461,10 +1483,13 @@ i915_request_await_dma_fence(struct i915_request *rq, 
struct dma_fence *fence)
 fence))
continue;
-   if (dma_fence_is_i915(fence))
+   if (dma_fence_is_i915(fence)) {
+   if (is_same_parallel_context(rq, to_request(fence)))
+   continue;
ret = i915_request_await_request(rq, to_request(fence));
-   else
+   } else {
ret = i915_request_await_external(rq, fence);
+   }
if (ret < 0)
return ret;
@@ -1539,16 +1564,6 @@ i915_request_await_object(struct i915_request *to,
return ret;
   }
-static inline bool is_parallel_rq(struct i915_request *rq)
-{
-   return intel_context_is_parallel(rq->context);
-}
-
-static inline struct intel_context *request_to_parent(struct i915_request *rq)
-{
-   return intel_context_to_parent(rq->context);
-}
-
   static struct i915_request *
   __i915_request_ensure_parallel_ordering(struct i915_request *rq,
struct intel_timeline *timeline)




Re: [Intel-gfx] [PATCH 23/26] drm/i915: Make request conflict tracking understand parallel submits

2021-10-13 Thread John Harrison

On 10/12/2021 17:32, Matthew Brost wrote:

On Tue, Oct 12, 2021 at 03:08:05PM -0700, John Harrison wrote:

On 10/4/2021 15:06, Matthew Brost wrote:

If an object in the excl or shared slot is a composite fence from a
parallel submit and the current request in the conflict tracking is from
the same parallel context there is no need to enforce ordering as the
ordering already implicit. Make the request conflict tracking understand

ordering already -> ordering is already


Yep.


this by comparing the parents parallel fence values and skipping the

parents -> parent's


Yep.


conflict insertion if the values match.

Presumably, this is to cope with the fact that the parallel submit fences do
not look like regular submission fences. And hence the existing code that
says 'new fence belongs to same context as old fence, so safe to ignore'
does not work with parallel submission. However, this change does not appear

Yes. The check for 'if (fence->context == rq->fence.context)' doesn't
work with parallel submission as each rq->fence.context corresponds to a
timeline. With parallel submission each intel_context in the parallel
submit has its own timeline (seqno) so the compare fails for different
intel_context within the same parallel submit. This is the reason for
the additional compare on parallel submits parents, if they have the
same parent it is the same parallel submission and there is no need to
enforce additional ordering.


to be adding parallel submit support to an existing 'same context' check. It
seems to be a brand new check that does not exist for single submission.
What makes parallel submit different? If we aren't skipping same context
fences for single submits, why do we need it for parallel? Conversely, if we
need it for parallel then why don't we need it for single?


I'm confused by what you are asking here. The existing same context
check is fine for parallel submits - it will just return true when we
compare requests with the same intel_context and new additional check
only true parallel submissions with the same parent.


And if the single submission version is simply somewhere else in the code,
why do the parallel version here instead of at the same place?


Again I'm confused by what you are asking. We might just need to sync on
a quick call.

That's okay. I think I had partly confused myself ;).

I was just meaning that the parallel compliant version of the 'ctxtA == 
ctxtB -> skip' test should be coded adjacent to the single submission 
version of the same test. I had somehow completely missed that the 
single submission version is indeed the line above in 
i915_request_await_execution(). So the two are indeed very definitely 
next to each other.


It's all good :).

John.




Matt
  

John.


Signed-off-by: Matthew Brost 
---
   drivers/gpu/drm/i915/i915_request.c | 43 +++--
   1 file changed, 29 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c 
b/drivers/gpu/drm/i915/i915_request.c
index e9bfa32f9270..cf89624020ad 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1325,6 +1325,25 @@ i915_request_await_external(struct i915_request *rq, 
struct dma_fence *fence)
return err;
   }
+static inline bool is_parallel_rq(struct i915_request *rq)
+{
+   return intel_context_is_parallel(rq->context);
+}
+
+static inline struct intel_context *request_to_parent(struct i915_request *rq)
+{
+   return intel_context_to_parent(rq->context);
+}
+
+static bool is_same_parallel_context(struct i915_request *to,
+struct i915_request *from)
+{
+   if (is_parallel_rq(to))

Should this not say '&& is_parallel_rq(from)'?


+   return request_to_parent(to) == request_to_parent(from);
+
+   return false;
+}
+
   int
   i915_request_await_execution(struct i915_request *rq,
 struct dma_fence *fence)
@@ -1356,11 +1375,14 @@ i915_request_await_execution(struct i915_request *rq,
 * want to run our callback in all cases.
 */
-   if (dma_fence_is_i915(fence))
+   if (dma_fence_is_i915(fence)) {
+   if (is_same_parallel_context(rq, to_request(fence)))
+   continue;
ret = __i915_request_await_execution(rq,
 to_request(fence));
-   else
+   } else {
ret = i915_request_await_external(rq, fence);
+   }
if (ret < 0)
return ret;
} while (--nchild);
@@ -1461,10 +1483,13 @@ i915_request_await_dma_fence(struct i915_request *rq, 
struct dma_fence *fence)
 fence)

Re: [Intel-gfx] [PATCH 01/25] drm/i915/guc: Move GuC guc_id allocation under submission state sub-struct

2021-10-13 Thread John Harrison

On 10/13/2021 13:42, Matthew Brost wrote:

Move guc_id allocation under submission state sub-struct as a future
patch will reuse the spin lock as a global submission state lock. Moving
this into sub-struct makes ownership of fields / lock clear.

v2:
  (Docs)
   - Add comment for submission_state sub-structure
v3:
  (John Harrison)
   - Fixup a few comments

Signed-off-by: Matthew Brost 
Reviewed-by: John Harrison 
---
  drivers/gpu/drm/i915/gt/intel_context_types.h |  6 +--
  drivers/gpu/drm/i915/gt/uc/intel_guc.h| 28 +++
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 48 ++-
  3 files changed, 47 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 12252c411159..e7e3984aab78 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -197,18 +197,18 @@ struct intel_context {
struct {
/**
 * @id: handle which is used to uniquely identify this context
-* with the GuC, protected by guc->contexts_lock
+* with the GuC, protected by guc->submission_state.lock
 */
u16 id;
/**
 * @ref: the number of references to the guc_id, when
 * transitioning in and out of zero protected by
-* guc->contexts_lock
+* guc->submission_state.lock
 */
atomic_t ref;
/**
 * @link: in guc->guc_id_list when the guc_id has no refs but is
-* still valid, protected by guc->contexts_lock
+* still valid, protected by guc->submission_state.lock
 */
struct list_head link;
} guc_id;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 5dd174babf7a..82e248c2290c 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -71,16 +71,26 @@ struct intel_guc {
} interrupts;
  
  	/**

-* @contexts_lock: protects guc_ids, guc_id_list, ce->guc_id.id, and
-* ce->guc_id.ref when transitioning in and out of zero
+* @submission_state: sub-structure for submission state protected by
+* single lock
 */
-   spinlock_t contexts_lock;
-   /** @guc_ids: used to allocate unique ce->guc_id.id values */
-   struct ida guc_ids;
-   /**
-* @guc_id_list: list of intel_context with valid guc_ids but no refs
-*/
-   struct list_head guc_id_list;
+   struct {
+   /**
+* @lock: protects everything in submission_state,
+* ce->guc_id.id, and ce->guc_id.ref when transitioning in and
+* out of zero
+*/
+   spinlock_t lock;
+   /**
+* @guc_ids: used to allocate new guc_ids
+*/
+   struct ida guc_ids;
+   /**
+* @guc_id_list: list of intel_context with valid guc_ids but no
+* refs
+*/
+   struct list_head guc_id_list;
+   } submission_state;
  
  	/**

 * @submission_supported: tracks whether we support GuC submission on
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index ba0de35f6323..b2646b088c7f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -68,14 +68,14 @@
   * fence is used to stall all requests associated with this guc_id until the
   * corresponding G2H returns indicating the guc_id has been deregistered.
   *
- * guc_ids:
+ * submission_state.guc_ids:
   * Unique number associated with private GuC context data passed in during
   * context registration / submission / deregistration. 64k available. Simple 
ida
   * is used for allocation.
   *
   * Stealing guc_ids:
   * If no guc_ids are available they can be stolen from another context at
- * request creation time if that context is unpinned. If a guc_id can't be 
found
+ * request creation time if that context is unpinned. If a guc_id an't be found

Oops?

John.



   * we punt this problem to the user as we believe this is near impossible to 
hit
   * during normal use cases.
   *
@@ -89,7 +89,7 @@
   * sched_engine can be submitting at a time. Currently only one sched_engine 
is
   * used for all of GuC submission but that could change in the future.
   *
- * guc->contexts_lock
+ * guc->submission_state.lock
   * Protects guc_id allocation for the given GuC, i.e. only one context can be
   * doing guc_id allocation operations at a time for each GuC in the system.
   *
@@ -103,7 +103,7 @@
   *
   * Lock ordering rules:
 

Re: [Intel-gfx] [PATCH 02/25] drm/i915/guc: Take GT PM ref when deregistering context

2021-10-13 Thread John Harrison

On 10/13/2021 13:42, Matthew Brost wrote:

Taking a PM reference to prevent intel_gt_wait_for_idle from short
circuiting while a deregister context H2G is in flight. To do this must
issue the deregister H2G from a worker as context can be destroyed from
an atomic context and taking GT PM ref blows up. Previously we took a
runtime PM from this atomic context which worked but will stop working
once runtime pm autosuspend in enabled.

So this patch is two fold, stop intel_gt_wait_for_idle from short
circuting and fix runtime pm autosuspend.

v2:
  (John Harrison)
   - Split structure changes out in different patch
  (Tvrtko)
   - Don't drop lock in deregister_destroyed_contexts
v3:
  (John Harrison)
   - Flush destroyed contexts before destroying context reg pool

Signed-off-by: Matthew Brost 

Reviewed-by: John Harrison 


---
  drivers/gpu/drm/i915/gt/intel_context.c   |   2 +
  drivers/gpu/drm/i915/gt/intel_context_types.h |   7 +
  drivers/gpu/drm/i915/gt/intel_engine_pm.h |   5 +
  drivers/gpu/drm/i915/gt/intel_gt_pm.h |   4 +
  drivers/gpu/drm/i915/gt/uc/intel_guc.h|  11 ++
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 146 +++---
  6 files changed, 121 insertions(+), 54 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index 35babd02ddfe..d008ef8623ce 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -400,6 +400,8 @@ intel_context_init(struct intel_context *ce, struct 
intel_engine_cs *engine)
ce->guc_id.id = GUC_INVALID_LRC_ID;
INIT_LIST_HEAD(&ce->guc_id.link);
  
+	INIT_LIST_HEAD(&ce->destroyed_link);

+
/*
 * Initialize fence to be complete as this is expected to be complete
 * unless there is a pending schedule disable outstanding.
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index e7e3984aab78..4613d027cbc3 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -213,6 +213,13 @@ struct intel_context {
struct list_head link;
} guc_id;
  
+	/**

+* @destroyed_link: link in guc->submission_state.destroyed_contexts, in
+* list when context is pending to be destroyed (deregistered with the
+* GuC), protected by guc->submission_state.lock
+*/
+   struct list_head destroyed_link;
+
  #ifdef CONFIG_DRM_I915_SELFTEST
/**
 * @drop_schedule_enable: Force drop of schedule enable G2H for selftest
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.h 
b/drivers/gpu/drm/i915/gt/intel_engine_pm.h
index 8520c595f5e1..6fdeae668e6e 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.h
@@ -16,6 +16,11 @@ intel_engine_pm_is_awake(const struct intel_engine_cs 
*engine)
return intel_wakeref_is_active(&engine->wakeref);
  }
  
+static inline void __intel_engine_pm_get(struct intel_engine_cs *engine)

+{
+   __intel_wakeref_get(&engine->wakeref);
+}
+
  static inline void intel_engine_pm_get(struct intel_engine_cs *engine)
  {
intel_wakeref_get(&engine->wakeref);
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h 
b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
index d0588d8aaa44..05de6c1af25b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
@@ -41,6 +41,10 @@ static inline void intel_gt_pm_put_async(struct intel_gt *gt)
intel_wakeref_put_async(>->wakeref);
  }
  
+#define with_intel_gt_pm(gt, tmp) \

+   for (tmp = 1, intel_gt_pm_get(gt); tmp; \
+intel_gt_pm_put(gt), tmp = 0)
+
  static inline int intel_gt_pm_wait_for_idle(struct intel_gt *gt)
  {
return intel_wakeref_wait_for_idle(>->wakeref);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 82e248c2290c..74f071a0b6d5 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -90,6 +90,17 @@ struct intel_guc {
 * refs
 */
struct list_head guc_id_list;
+   /**
+* @destroyed_contexts: list of contexts waiting to be destroyed
+* (deregistered with the GuC)
+*/
+   struct list_head destroyed_contexts;
+   /**
+* @destroyed_worker: worker to deregister contexts, need as we
+* need to take a GT PM reference and can't from destroy
+* function as it might be in an atomic context (no sleeping)
+*/
+   struct work_struct destroyed_worker;
} submission_state;
  
  	/**

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 

Re: [Intel-gfx] [PATCH 03/25] drm/i915/guc: Take engine PM when a context is pinned with GuC submission

2021-10-13 Thread John Harrison

On 10/13/2021 13:42, Matthew Brost wrote:

Taking a PM reference to prevent intel_gt_wait_for_idle from short
circuiting while any user context has scheduling enabled. Returning GT
idle when it is not can cause all sorts of issues throughout the stack.

v2:
  (Daniel Vetter)
   - Add might_lock annotations to pin / unpin function
v3:
  (CI)
   - Drop intel_engine_pm_might_put from unpin path as an async put is
 used
v4:
  (John Harrison)
   - Make intel_engine_pm_might_get/put work with GuC virtual engines
   - Update commit message
v5:
   - Update commit message again

Signed-off-by: Matthew Brost 

Reviewed-by: John Harrison 


---
  drivers/gpu/drm/i915/gt/intel_context.c   |  2 ++
  drivers/gpu/drm/i915/gt/intel_engine_pm.h | 32 +
  drivers/gpu/drm/i915/gt/intel_gt_pm.h | 10 ++
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 36 +--
  drivers/gpu/drm/i915/intel_wakeref.h  | 12 +++
  5 files changed, 89 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index d008ef8623ce..f98c9f470ba1 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -240,6 +240,8 @@ int __intel_context_do_pin_ww(struct intel_context *ce,
if (err)
goto err_post_unpin;
  
+	intel_engine_pm_might_get(ce->engine);

+
if (unlikely(intel_context_is_closed(ce))) {
err = -ENOENT;
goto err_unlock;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.h 
b/drivers/gpu/drm/i915/gt/intel_engine_pm.h
index 6fdeae668e6e..d68675925b79 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.h
@@ -6,9 +6,11 @@
  #ifndef INTEL_ENGINE_PM_H
  #define INTEL_ENGINE_PM_H
  
+#include "i915_drv.h"

  #include "i915_request.h"
  #include "intel_engine_types.h"
  #include "intel_wakeref.h"
+#include "intel_gt_pm.h"
  
  static inline bool

  intel_engine_pm_is_awake(const struct intel_engine_cs *engine)
@@ -31,6 +33,21 @@ static inline bool intel_engine_pm_get_if_awake(struct 
intel_engine_cs *engine)
return intel_wakeref_get_if_active(&engine->wakeref);
  }
  
+static inline void intel_engine_pm_might_get(struct intel_engine_cs *engine)

+{
+   if (!intel_engine_is_virtual(engine)) {
+   intel_wakeref_might_get(&engine->wakeref);
+   } else {
+   struct intel_gt *gt = engine->gt;
+   struct intel_engine_cs *tengine;
+   intel_engine_mask_t tmp, mask = engine->mask;
+
+   for_each_engine_masked(tengine, gt, mask, tmp)
+   intel_wakeref_might_get(&tengine->wakeref);
+   }
+   intel_gt_pm_might_get(engine->gt);
+}
+
  static inline void intel_engine_pm_put(struct intel_engine_cs *engine)
  {
intel_wakeref_put(&engine->wakeref);
@@ -52,6 +69,21 @@ static inline void intel_engine_pm_flush(struct 
intel_engine_cs *engine)
intel_wakeref_unlock_wait(&engine->wakeref);
  }
  
+static inline void intel_engine_pm_might_put(struct intel_engine_cs *engine)

+{
+   if (!intel_engine_is_virtual(engine)) {
+   intel_wakeref_might_put(&engine->wakeref);
+   } else {
+   struct intel_gt *gt = engine->gt;
+   struct intel_engine_cs *tengine;
+   intel_engine_mask_t tmp, mask = engine->mask;
+
+   for_each_engine_masked(tengine, gt, mask, tmp)
+   intel_wakeref_might_put(&tengine->wakeref);
+   }
+   intel_gt_pm_might_put(engine->gt);
+}
+
  static inline struct i915_request *
  intel_engine_create_kernel_request(struct intel_engine_cs *engine)
  {
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h 
b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
index 05de6c1af25b..bc898df7a48c 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
@@ -31,6 +31,11 @@ static inline bool intel_gt_pm_get_if_awake(struct intel_gt 
*gt)
return intel_wakeref_get_if_active(>->wakeref);
  }
  
+static inline void intel_gt_pm_might_get(struct intel_gt *gt)

+{
+   intel_wakeref_might_get(>->wakeref);
+}
+
  static inline void intel_gt_pm_put(struct intel_gt *gt)
  {
intel_wakeref_put(>->wakeref);
@@ -41,6 +46,11 @@ static inline void intel_gt_pm_put_async(struct intel_gt *gt)
intel_wakeref_put_async(>->wakeref);
  }
  
+static inline void intel_gt_pm_might_put(struct intel_gt *gt)

+{
+   intel_wakeref_might_put(>->wakeref);
+}
+
  #define with_intel_gt_pm(gt, tmp) \
for (tmp = 1, intel_gt_pm_get(gt); tmp; \
 intel_gt_pm_put(gt), tmp = 0)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
i

Re: [Intel-gfx] [PATCH 08/25] drm/i915/guc: Add multi-lrc context registration

2021-10-13 Thread John Harrison

On 10/13/2021 13:42, Matthew Brost wrote:

Add multi-lrc context registration H2G. In addition a workqueue and
process descriptor are setup during multi-lrc context registration as
these data structures are needed for multi-lrc submission.

v2:
  (John Harrison)
   - Move GuC specific fields into sub-struct
   - Clean up WQ defines
   - Add comment explaining math to derive WQ / PD address
v3:
  (John Harrison)
   - Add PARENT_SCRATCH_SIZE define
   - Update comment explaining multi-lrc register

Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/intel_context_types.h |  12 ++
  drivers/gpu/drm/i915/gt/intel_lrc.c   |   5 +
  .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |   1 +
  drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   2 -
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 116 +-
  5 files changed, 133 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 76dfca57cb45..48decb5ee954 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -239,6 +239,18 @@ struct intel_context {
struct intel_context *parent;
/** @number_children: number of children if parent */
u8 number_children;
+   /** @guc: GuC specific members for parallel submission */
+   struct {
+   /** @wqi_head: head pointer in work queue */
+   u16 wqi_head;
+   /** @wqi_tail: tail pointer in work queue */
+   u16 wqi_tail;
+   /**
+* @parent_page: page in context state (ce->state) used
+* by parent for work queue, process descriptor
+*/
+   u8 parent_page;
+   } guc;
} parallel;
  
  #ifdef CONFIG_DRM_I915_SELFTEST

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 3ef9eaf8c50e..57339d5c1fc8 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -942,6 +942,11 @@ __lrc_alloc_state(struct intel_context *ce, struct 
intel_engine_cs *engine)
context_size += PAGE_SIZE;
}
  
+	if (intel_context_is_parent(ce) && intel_engine_uses_guc(engine)) {

+   ce->parallel.guc.parent_page = context_size / PAGE_SIZE;
+   context_size += PAGE_SIZE;

This needs to be += PARENT_SCRATCH_SIZE.

John.


+   }
+
obj = i915_gem_object_create_lmem(engine->i915, context_size,
  I915_BO_ALLOC_PM_VOLATILE);
if (IS_ERR(obj))
diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h 
b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
index 8ff58aff..ba10bd374cee 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
@@ -142,6 +142,7 @@ enum intel_guc_action {
INTEL_GUC_ACTION_REGISTER_COMMAND_TRANSPORT_BUFFER = 0x4505,
INTEL_GUC_ACTION_DEREGISTER_COMMAND_TRANSPORT_BUFFER = 0x4506,
INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE = 0x4600,
+   INTEL_GUC_ACTION_REGISTER_CONTEXT_MULTI_LRC = 0x4601,
INTEL_GUC_ACTION_RESET_CLIENT = 0x5507,
INTEL_GUC_ACTION_LIMIT
  };
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
index fa4be13c8854..0eeb2a9feeed 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -52,8 +52,6 @@
  
  #define GUC_DOORBELL_INVALID		256
  
-#define GUC_WQ_SIZE			(PAGE_SIZE * 2)

-
  /* Work queue item header definitions */
  #define WQ_STATUS_ACTIVE  1
  #define WQ_STATUS_SUSPENDED   2
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 84b8e64b148f..58a6f494be8f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -344,6 +344,47 @@ static inline struct i915_priolist *to_priolist(struct 
rb_node *rb)
return rb_entry(rb, struct i915_priolist, node);
  }
  
+/*

+ * When using multi-lrc submission a scratch memory area is reserved in the
+ * parent's context state for the process descriptor and work queue. Currently
+ * the scratch area is sized to a page.
+ *
+ * The layout of this scratch area is below:
+ * 0   guc_process_desc
+ * ... unused
+ * PARENT_SCRATCH_SIZE / 2 work queue start
+ * ... work queue
+ * PARENT_SCRATCH_SIZE - 1 work queue end
+ */
+#define PARENT_SCRATCH_SIZEPAGE_SIZE
+#define WQ_SIZE(PARENT_SCRA

Re: [Intel-gfx] [PATCH 12/25] drm/i915/guc: Implement multi-lrc submission

2021-10-13 Thread John Harrison

On 10/13/2021 13:42, Matthew Brost wrote:

Implement multi-lrc submission via a single workqueue entry and single
H2G. The workqueue entry contains an updated tail value for each
request, of all the contexts in the multi-lrc submission, and updates
these values simultaneously. As such, the tasklet and bypass path have
been updated to coalesce requests into a single submission.

v2:
  (John Harrison)
   - s/wqe/wqi
   - Use FIELD_PREP macros
   - Add GEM_BUG_ONs ensures length fits within field
   - Add comment / white space to intel_guc_write_barrier
  (Kernel test robot)
   - Make need_tasklet a static function
v3:
  (Docs)
   - A comment for submission_stall_reason
v4:
  (Kernel test robot)
   - Initialize return value in bypass tasklt submit function
  (John Harrison)
   - Add comment near work queue defs
   - Add BUILD_BUG_ON to ensure WQ_SIZE is a power of 2
   - Update write_barrier comment to talk about work queue

Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/uc/intel_guc.c|  29 ++
  drivers/gpu/drm/i915/gt/uc/intel_guc.h|  11 +
  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |  24 +-
  drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |  30 +-
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 323 +++---
  drivers/gpu/drm/i915/i915_request.h   |   8 +
  6 files changed, 350 insertions(+), 75 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index 8f8182bf7c11..6e228343e8cb 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -756,3 +756,32 @@ void intel_guc_load_status(struct intel_guc *guc, struct 
drm_printer *p)
}
}
  }
+
+void intel_guc_write_barrier(struct intel_guc *guc)
+{
+   struct intel_gt *gt = guc_to_gt(guc);
+
+   if (i915_gem_object_is_lmem(guc->ct.vma->obj)) {
+   /*
+* Ensure intel_uncore_write_fw can be used rather than
+* intel_uncore_write.
+*/
+   GEM_BUG_ON(guc->send_regs.fw_domains);
+
+   /*
+* This register is used by the i915 and GuC for MMIO based
+* communication. Once we are in this code CTBs are the only
+* method the i915 uses to communicate with the GuC so it is
+* safe to write to this register (a value of 0 is NOP for MMIO
+* communication). If we ever start mixing CTBs and MMIOs a new
+* register will have to be chosen. This function is also used
+* to enforce ordering of a work queue item write and an update
+* to the process descriptor. When a work queue is being used,
+* CTBs are also the only mechanism of communication.
+*/
+   intel_uncore_write_fw(gt->uncore, GEN11_SOFT_SCRATCH(0), 0);
+   } else {
+   /* wmb() sufficient for a barrier if in smem */
+   wmb();
+   }
+}
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 4ca197f400ba..31cf9fb48c7e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -46,6 +46,15 @@ struct intel_guc {
 * submitted until the stalled request is processed.
 */
struct i915_request *stalled_request;
+   /**
+* @submission_stall_reason: reason why submission is stalled
+*/
+   enum {
+   STALL_NONE,
+   STALL_REGISTER_CONTEXT,
+   STALL_MOVE_LRC_TAIL,
+   STALL_ADD_REQUEST,
+   } submission_stall_reason;
  
  	/* intel_guc_recv interrupt related state */

/** @irq_lock: protects GuC irq state */
@@ -367,4 +376,6 @@ void intel_guc_submission_cancel_requests(struct intel_guc 
*guc);
  
  void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
  
+void intel_guc_write_barrier(struct intel_guc *guc);

+
  #endif
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 0a3504bc0b61..a0cc34be7b56 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -383,28 +383,6 @@ static u32 ct_get_next_fence(struct intel_guc_ct *ct)
return ++ct->requests.last_fence;
  }
  
-static void write_barrier(struct intel_guc_ct *ct)

-{
-   struct intel_guc *guc = ct_to_guc(ct);
-   struct intel_gt *gt = guc_to_gt(guc);
-
-   if (i915_gem_object_is_lmem(guc->ct.vma->obj)) {
-   GEM_BUG_ON(guc->send_regs.fw_domains);
-   /*
-* This register is used by the i915 and GuC for MMIO based
-* communication. Once we are in this code CTBs are the only
-* method the i915 uses to communicate with the GuC so it is
-* safe to write to this register (a va

Re: [Intel-gfx] [PATCH 14/25] drm/i915/guc: Implement multi-lrc reset

2021-10-13 Thread John Harrison

On 10/13/2021 13:42, Matthew Brost wrote:

Update context and full GPU reset to work with multi-lrc. The idea is
parent context tracks all the active requests inflight for itself and
its children. The parent context owns the reset replaying / canceling
requests as needed.

v2:
  (John Harrison)
   - Simply loop in find active request
   - Add comments to find ative request / reset loop
v3:
  (John Harrison)
   - s/its'/its/g
   - Fix comment when searching for active request
   - Reorder if state in __guc_reset_context

Signed-off-by: Matthew Brost 

Reviewed-by: John Harrison 


---
  drivers/gpu/drm/i915/gt/intel_context.c   | 15 +++-
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 69 ++-
  2 files changed, 63 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index 79f321c6c008..6aab60584ee5 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -529,20 +529,29 @@ struct i915_request *intel_context_create_request(struct 
intel_context *ce)
  
  struct i915_request *intel_context_find_active_request(struct intel_context *ce)

  {
+   struct intel_context *parent = intel_context_to_parent(ce);
struct i915_request *rq, *active = NULL;
unsigned long flags;
  
  	GEM_BUG_ON(!intel_engine_uses_guc(ce->engine));
  
-	spin_lock_irqsave(&ce->guc_state.lock, flags);

-   list_for_each_entry_reverse(rq, &ce->guc_state.requests,
+   /*
+* We search the parent list to find an active request on the submitted
+* context. The parent list contains the requests for all the contexts
+* in the relationship so we have to do a compare of each request's
+* context.
+*/
+   spin_lock_irqsave(&parent->guc_state.lock, flags);
+   list_for_each_entry_reverse(rq, &parent->guc_state.requests,
sched.link) {
+   if (rq->context != ce)
+   continue;
if (i915_request_completed(rq))
break;
  
  		active = rq;

}
-   spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+   spin_unlock_irqrestore(&parent->guc_state.lock, flags);
  
  	return active;

  }
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index f690b7c2b295..bc052d206861 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -683,6 +683,11 @@ static inline int rq_prio(const struct i915_request *rq)
return rq->sched.attr.priority;
  }
  
+static inline bool is_multi_lrc(struct intel_context *ce)

+{
+   return intel_context_is_parallel(ce);
+}
+
  static bool is_multi_lrc_rq(struct i915_request *rq)
  {
return intel_context_is_parallel(rq->context);
@@ -1218,10 +1223,15 @@ __unwind_incomplete_requests(struct intel_context *ce)
  
  static void __guc_reset_context(struct intel_context *ce, bool stalled)

  {
+   bool local_stalled;
struct i915_request *rq;
unsigned long flags;
u32 head;
+   int i, number_children = ce->parallel.number_children;
bool skip = false;
+   struct intel_context *parent = ce;
+
+   GEM_BUG_ON(intel_context_is_child(ce));
  
  	intel_context_get(ce);
  
@@ -1247,25 +1257,38 @@ static void __guc_reset_context(struct intel_context *ce, bool stalled)

if (unlikely(skip))
goto out_put;
  
-	rq = intel_context_find_active_request(ce);

-   if (!rq) {
-   head = ce->ring->tail;
-   stalled = false;
-   goto out_replay;
-   }
+   /*
+* For each context in the relationship find the hanging request
+* resetting each context / request as needed
+*/
+   for (i = 0; i < number_children + 1; ++i) {
+   if (!intel_context_is_pinned(ce))
+   goto next_context;
+
+   local_stalled = false;
+   rq = intel_context_find_active_request(ce);
+   if (!rq) {
+   head = ce->ring->tail;
+   goto out_replay;
+   }
  
-	if (!i915_request_started(rq))

-   stalled = false;
+   if (i915_request_started(rq))
+   local_stalled = true;
  
-	GEM_BUG_ON(i915_active_is_idle(&ce->active));

-   head = intel_ring_wrap(ce->ring, rq->head);
-   __i915_request_reset(rq, stalled);
+   GEM_BUG_ON(i915_active_is_idle(&ce->active));
+   head = intel_ring_wrap(ce->ring, rq->head);
  
+		__i915_request_reset(rq, local_stalled && stalled);

  out_replay:
-   guc_reset_state(ce, head, stalled);
-   __unwind_incomplete_requests(ce);
+   

Re: [Intel-gfx] [PATCH 19/25] drm/i915/guc: Implement no mid batch preemption for multi-lrc

2021-10-13 Thread John Harrison

On 10/13/2021 13:42, Matthew Brost wrote:

For some users of multi-lrc, e.g. split frame, it isn't safe to preempt
mid BB. To safely enable preemption at the BB boundary, a handshake
between parent and child is needed, syncing the set of BBs at the
beginning and end of each batch. This is implemented via custom
emit_bb_start & emit_fini_breadcrumb functions and enabled by default if
a context is configured by set parallel extension.

Lastly, this patch updates the process descriptor to the correct size as
the memory used in the handshake is directly after the process
descriptor.

v2:
  (John Harrison)
   - Fix a few comments wording
   - Add struture for parent page layout
v3:
  (Jojhn Harrison)
   - A structure for sync semaphore
   - Use offsetof to calc address
   - Update commit message

Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/intel_context.c   |   2 +-
  drivers/gpu/drm/i915/gt/intel_context_types.h |   2 +
  drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   2 +-
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 333 +-
  4 files changed, 326 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index 6aab60584ee5..5634d14052bc 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -570,7 +570,7 @@ void intel_context_bind_parent_child(struct intel_context 
*parent,
GEM_BUG_ON(intel_context_is_child(child));
GEM_BUG_ON(intel_context_is_parent(child));
  
-	parent->parallel.number_children++;

+   parent->parallel.child_index = parent->parallel.number_children++;
list_add_tail(&child->parallel.child_link,
  &parent->parallel.child_list);
child->parallel.parent = parent;
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 1d880303a7e4..95a5b94b4ece 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -250,6 +250,8 @@ struct intel_context {
struct i915_request *last_rq;
/** @number_children: number of children if parent */
u8 number_children;
+   /** @child_index: index into child_list if child */
+   u8 child_index;
/** @guc: GuC specific members for parallel submission */
struct {
/** @wqi_head: head pointer in work queue */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
index c14fc15dd3a8..2eba6b598e66 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -186,7 +186,7 @@ struct guc_process_desc {
u32 wq_status;
u32 engine_presence;
u32 priority;
-   u32 reserved[30];
+   u32 reserved[36];
  } __packed;
  
  #define CONTEXT_REGISTRATION_FLAG_KMD	BIT(0)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 09a3a9dd7ff6..ae08a196ba0a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -11,6 +11,7 @@
  #include "gt/intel_context.h"
  #include "gt/intel_engine_pm.h"
  #include "gt/intel_engine_heartbeat.h"
+#include "gt/intel_gpu_commands.h"
  #include "gt/intel_gt.h"
  #include "gt/intel_gt_irq.h"
  #include "gt/intel_gt_pm.h"
@@ -368,11 +369,16 @@ static inline struct i915_priolist *to_priolist(struct 
rb_node *rb)
  
  /*

   * When using multi-lrc submission a scratch memory area is reserved in the
- * parent's context state for the process descriptor and work queue. Currently
- * the scratch area is sized to a page.
+ * parent's context state for the process descriptor, work queue, and handhake

handhake -> handshake


+ * between the parent + children contexts to insert safe preemption points
+ * between each of BBs. Currently the scratch area is sized to a page.

of BBs -> of the BBs

With those fixed:
Reviewed-by: John Harrison 



   *
   * The layout of this scratch area is below:
   * 0  guc_process_desc
+ * + sizeof(struct guc_process_desc)   child go
+ * + CACHELINE_BYTES   child join[0]
+ * ...
+ * + CACHELINE_BYTES   child join[n - 1]
   * ...unused
   * PARENT_SCRATCH_SIZE / 2work queue start
   * ...work queue
@@ -381,7 +387,25 @@ static inline struct i915_priolist *to_priolist(struct 
rb_node *rb)
  #define PARENT_SCRATCH_SIZE   PAGE_SIZE
  #define WQ_SIZE   (PARENT_SCRATCH_SIZE / 2)
  #define WQ_OFF

Re: [Intel-gfx] [PATCH 20/25] drm/i915: Multi-BB execbuf

2021-10-13 Thread John Harrison

On 10/13/2021 13:42, Matthew Brost wrote:

Allow multiple batch buffers to be submitted in a single execbuf IOCTL
after a context has been configured with the 'set_parallel' extension.
The number batches is implicit based on the contexts configuration.

This is implemented with a series of loops. First a loop is used to find
all the batches, a loop to pin all the HW contexts, a loop to create all
the requests, a loop to submit (emit BB start, etc...) all the requests,
a loop to tie the requests to the VMAs they touch, and finally a loop to
commit the requests to the backend.

A composite fence is also created for the generated requests to return
to the user and to stick in dma resv slots.

No behavior from the existing IOCTL should be changed aside from when
throttling because the ring for a context is full. In this situation,
i915 will now wait while holding the object locks. This change was done
because the code is much simpler to wait while holding the locks and we
believe there isn't a huge benefit of dropping these locks. If this
proves false we can restructure the code to drop the locks during the
wait.

IGT: https://patchwork.freedesktop.org/patch/447008/?series=93071&rev=1
media UMD: https://github.com/intel/media-driver/pull/1252

v2:
  (Matthew Brost)
   - Return proper error value if i915_request_create fails
v3:
  (John Harrison)
   - Add comment explaining create / add order loops + locking
   - Update commit message explaining different in IOCTL behavior
   - Line wrap some comments
   - eb_add_request returns void
   - Return -EINVAL rather triggering BUG_ON if cmd parser used
  (Checkpatch)
   - Check eb->batch_len[*current_batch]
v4:
  (CI)
   - Set batch len if passed if via execbuf args
   - Call __i915_request_skip after __i915_request_commit
  (Kernel test robot)
   - Initialize rq to NULL in eb_pin_timeline

Signed-off-by: Matthew Brost 
---
  .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 783 --
  drivers/gpu/drm/i915/gt/intel_context.h   |   8 +-
  drivers/gpu/drm/i915/gt/intel_context_types.h |  10 +
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c |   2 +
  drivers/gpu/drm/i915/i915_request.h   |   9 +
  drivers/gpu/drm/i915/i915_vma.c   |  21 +-
  drivers/gpu/drm/i915/i915_vma.h   |  13 +-
  7 files changed, 595 insertions(+), 251 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index c75afc8784e3..6509c9d8c298 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -246,17 +246,25 @@ struct i915_execbuffer {
struct drm_i915_gem_exec_object2 *exec; /** ioctl execobj[] */
struct eb_vma *vma;
  
-	struct intel_engine_cs *engine; /** engine to queue the request to */

+   struct intel_gt *gt; /* gt for the execbuf */
struct intel_context *context; /* logical state for the request */
struct i915_gem_context *gem_context; /** caller's context */
  
-	struct i915_request *request; /** our request to build */

-   struct eb_vma *batch; /** identity of the batch obj/vma */
+   /** our requests to build */
+   struct i915_request *requests[MAX_ENGINE_INSTANCE + 1];
+   /** identity of the batch obj/vma */
+   struct eb_vma *batches[MAX_ENGINE_INSTANCE + 1];
struct i915_vma *trampoline; /** trampoline used for chaining */
  
+	/** used for excl fence in dma_resv objects when > 1 BB submitted */

+   struct dma_fence *composite_fence;
+
/** actual size of execobj[] as we may extend it for the cmdparser */
unsigned int buffer_count;
  
+	/* number of batches in execbuf IOCTL */

+   unsigned int num_batches;
+
/** list of vma not yet bound during reservation phase */
struct list_head unbound;
  
@@ -283,7 +291,8 @@ struct i915_execbuffer {
  
  	u64 invalid_flags; /** Set of execobj.flags that are invalid */
  
-	u64 batch_len; /** Length of batch within object */

+   /** Length of batch within object */
+   u64 batch_len[MAX_ENGINE_INSTANCE + 1];
u32 batch_start_offset; /** Location within object of batch */
u32 batch_flags; /** Flags composed for emit_bb_start() */
struct intel_gt_buffer_pool_node *batch_pool; /** pool node for batch 
buffer */
@@ -301,14 +310,13 @@ struct i915_execbuffer {
  };
  
  static int eb_parse(struct i915_execbuffer *eb);

-static struct i915_request *eb_pin_engine(struct i915_execbuffer *eb,
- bool throttle);
+static int eb_pin_engine(struct i915_execbuffer *eb, bool throttle);
  static void eb_unpin_engine(struct i915_execbuffer *eb);
  
  static inline bool eb_use_cmdparser(const struct i915_execbuffer *eb)

  {
-   return intel_engine_requires_cmd_parser(eb->engine) ||
-   (intel_engine_using_cmd_parser(eb->engine) &&
+   return intel_en

Re: [Intel-gfx] [PATCH 21/25] drm/i915/guc: Handle errors in multi-lrc requests

2021-10-13 Thread John Harrison

On 10/13/2021 13:42, Matthew Brost wrote:

If an error occurs in the front end when multi-lrc requests are getting
generated we need to skip these in the backend but we still need to
emit the breadcrumbs seqno. An issues arises because with multi-lrc
breadcrumbs there is a handshake between the parent and children to make
forward progress. If all the requests are not present this handshake
doesn't work. To work around this, if multi-lrc request has an error we
skip the handshake but still emit the breadcrumbs seqno.

v2:
  (John Harrison)
   - Add comment explaining the skipping of the handshake logic
   - Fix typos in the commit message
v3:
  (John Harrison)
   - Fix up some comments about the math to NOP the ring

Signed-off-by: Matthew Brost 

Reviewed-by: John Harrison 


---
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 69 ++-
  1 file changed, 66 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index bfafe996e2d2..80d8ce68ff59 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -4076,8 +4076,8 @@ static int 
emit_bb_start_child_no_preempt_mid_batch(struct i915_request *rq,
  }
  
  static u32 *

-emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq,
-u32 *cs)
+__emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq,
+  u32 *cs)
  {
struct intel_context *ce = rq->context;
u8 i;
@@ -4105,6 +4105,45 @@ emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct 
i915_request *rq,
  get_children_go_addr(ce),
  0);
  
+	return cs;

+}
+
+/*
+ * If this true, a submission of multi-lrc requests had an error and the
+ * requests need to be skipped. The front end (execuf IOCTL) should've called
+ * i915_request_skip which squashes the BB but we still need to emit the fini
+ * breadrcrumbs seqno write. At this point we don't know how many of the
+ * requests in the multi-lrc submission were generated so we can't do the
+ * handshake between the parent and children (e.g. if 4 requests should be
+ * generated but 2nd hit an error only 1 would be seen by the GuC backend).
+ * Simply skip the handshake, but still emit the breadcrumbd seqno, if an error
+ * has occurred on any of the requests in submission / relationship.
+ */
+static inline bool skip_handshake(struct i915_request *rq)
+{
+   return test_bit(I915_FENCE_FLAG_SKIP_PARALLEL, &rq->fence.flags);
+}
+
+static u32 *
+emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq,
+u32 *cs)
+{
+   struct intel_context *ce = rq->context;
+
+   GEM_BUG_ON(!intel_context_is_parent(ce));
+
+   if (unlikely(skip_handshake(rq))) {
+   /*
+* NOP everything in 
__emit_fini_breadcrumb_parent_no_preempt_mid_batch,
+* the -6 comes from the length of the emits below.
+*/
+   memset(cs, 0, sizeof(u32) *
+  (ce->engine->emit_fini_breadcrumb_dw - 6));
+   cs += ce->engine->emit_fini_breadcrumb_dw - 6;
+   } else {
+   cs = __emit_fini_breadcrumb_parent_no_preempt_mid_batch(rq, cs);
+   }
+
/* Emit fini breadcrumb */
cs = gen8_emit_ggtt_write(cs,
  rq->fence.seqno,
@@ -4121,7 +4160,8 @@ emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct 
i915_request *rq,
  }
  
  static u32 *

-emit_fini_breadcrumb_child_no_preempt_mid_batch(struct i915_request *rq, u32 
*cs)
+__emit_fini_breadcrumb_child_no_preempt_mid_batch(struct i915_request *rq,
+ u32 *cs)
  {
struct intel_context *ce = rq->context;
struct intel_context *parent = intel_context_to_parent(ce);
@@ -4148,6 +4188,29 @@ emit_fini_breadcrumb_child_no_preempt_mid_batch(struct 
i915_request *rq, u32 *cs
*cs++ = get_children_go_addr(parent);
*cs++ = 0;
  
+	return cs;

+}
+
+static u32 *
+emit_fini_breadcrumb_child_no_preempt_mid_batch(struct i915_request *rq,
+   u32 *cs)
+{
+   struct intel_context *ce = rq->context;
+
+   GEM_BUG_ON(!intel_context_is_child(ce));
+
+   if (unlikely(skip_handshake(rq))) {
+   /*
+* NOP everything in 
__emit_fini_breadcrumb_child_no_preempt_mid_batch,
+* the -6 comes from the length of the emits below.
+*/
+   memset(cs, 0, sizeof(u32) *
+  (ce->engine->emit_fini_breadcrumb_dw - 6));
+   cs += ce->engine-&g

Re: [Intel-gfx] [PATCH 22/25] drm/i915: Make request conflict tracking understand parallel submits

2021-10-13 Thread John Harrison

On 10/13/2021 13:42, Matthew Brost wrote:

If an object in the excl or shared slot is a composite fence from a
parallel submit and the current request in the conflict tracking is from
the same parallel context there is no need to enforce ordering as the
ordering is already implicit. Make the request conflict tracking
understand this by comparing a parallel submit's parent context and
skipping conflict insertion if the values match.

v2:
  (John Harrison)
   - Reword commit message

Signed-off-by: Matthew Brost 

Reviewed-by: John Harrison 


---
  drivers/gpu/drm/i915/i915_request.c | 43 +++--
  1 file changed, 29 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c 
b/drivers/gpu/drm/i915/i915_request.c
index 8bdf9f2f9b90..820a1f38b271 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1335,6 +1335,25 @@ i915_request_await_external(struct i915_request *rq, 
struct dma_fence *fence)
return err;
  }
  
+static inline bool is_parallel_rq(struct i915_request *rq)

+{
+   return intel_context_is_parallel(rq->context);
+}
+
+static inline struct intel_context *request_to_parent(struct i915_request *rq)
+{
+   return intel_context_to_parent(rq->context);
+}
+
+static bool is_same_parallel_context(struct i915_request *to,
+struct i915_request *from)
+{
+   if (is_parallel_rq(to))
+   return request_to_parent(to) == request_to_parent(from);
+
+   return false;
+}
+
  int
  i915_request_await_execution(struct i915_request *rq,
 struct dma_fence *fence)
@@ -1366,11 +1385,14 @@ i915_request_await_execution(struct i915_request *rq,
 * want to run our callback in all cases.
 */
  
-		if (dma_fence_is_i915(fence))

+   if (dma_fence_is_i915(fence)) {
+   if (is_same_parallel_context(rq, to_request(fence)))
+   continue;
ret = __i915_request_await_execution(rq,
 to_request(fence));
-   else
+   } else {
ret = i915_request_await_external(rq, fence);
+   }
if (ret < 0)
return ret;
} while (--nchild);
@@ -1471,10 +1493,13 @@ i915_request_await_dma_fence(struct i915_request *rq, 
struct dma_fence *fence)
 fence))
continue;
  
-		if (dma_fence_is_i915(fence))

+   if (dma_fence_is_i915(fence)) {
+   if (is_same_parallel_context(rq, to_request(fence)))
+   continue;
ret = i915_request_await_request(rq, to_request(fence));
-   else
+   } else {
ret = i915_request_await_external(rq, fence);
+   }
if (ret < 0)
return ret;
  
@@ -1525,16 +1550,6 @@ i915_request_await_object(struct i915_request *to,

return ret;
  }
  
-static inline bool is_parallel_rq(struct i915_request *rq)

-{
-   return intel_context_is_parallel(rq->context);
-}
-
-static inline struct intel_context *request_to_parent(struct i915_request *rq)
-{
-   return intel_context_to_parent(rq->context);
-}
-
  static struct i915_request *
  __i915_request_ensure_parallel_ordering(struct i915_request *rq,
struct intel_timeline *timeline)




Re: [Intel-gfx] [PATCH 16/25] drm/i915/guc: Connect UAPI to GuC multi-lrc interface

2021-10-13 Thread John Harrison

On 10/13/2021 13:42, Matthew Brost wrote:

Introduce 'set parallel submit' extension to connect UAPI to GuC
multi-lrc interface. Kernel doc in new uAPI should explain it all.

IGT: https://patchwork.freedesktop.org/patch/447008/?series=93071&rev=1
media UMD: https://github.com/intel/media-driver/pull/1252

v2:
  (Daniel Vetter)
   - Add IGT link and placeholder for media UMD link
v3:
  (Kernel test robot)
   - Fix warning in unpin engines call
  (John Harrison)
   - Reword a bunch of the kernel doc
v4:
  (John Harrison)
   - Add comment why perma-pin is done after setting gem context
   - Update some comments / docs for proto contexts

Cc: Tvrtko Ursulin 
Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gem/i915_gem_context.c   | 228 +-
  .../gpu/drm/i915/gem/i915_gem_context_types.h |  16 +-
  drivers/gpu/drm/i915/gt/intel_context_types.h |   9 +-
  drivers/gpu/drm/i915/gt/intel_engine.h|  12 +-
  drivers/gpu/drm/i915/gt/intel_engine_cs.c |   6 +-
  .../drm/i915/gt/intel_execlists_submission.c  |   6 +-
  drivers/gpu/drm/i915/gt/selftest_execlists.c  |  12 +-
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 114 -
  include/uapi/drm/i915_drm.h   | 131 ++
  9 files changed, 503 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index d225d3dd0b40..6f23aff6e642 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -556,9 +556,150 @@ set_proto_ctx_engines_bond(struct i915_user_extension 
__user *base, void *data)
return 0;
  }
  
+static int

+set_proto_ctx_engines_parallel_submit(struct i915_user_extension __user *base,
+ void *data)
+{
+   struct i915_context_engines_parallel_submit __user *ext =
+   container_of_user(base, typeof(*ext), base);
+   const struct set_proto_ctx_engines *set = data;
+   struct drm_i915_private *i915 = set->i915;
+   u64 flags;
+   int err = 0, n, i, j;
+   u16 slot, width, num_siblings;
+   struct intel_engine_cs **siblings = NULL;
+   intel_engine_mask_t prev_mask;
+
+   /* Disabling for now */
+   return -ENODEV;
+
+   /* FIXME: This is NIY for execlists */
+   if (!(intel_uc_uses_guc_submission(&i915->gt.uc)))
+   return -ENODEV;
+
+   if (get_user(slot, &ext->engine_index))
+   return -EFAULT;
+
+   if (get_user(width, &ext->width))
+   return -EFAULT;
+
+   if (get_user(num_siblings, &ext->num_siblings))
+   return -EFAULT;
+
+   if (slot >= set->num_engines) {
+   drm_dbg(&i915->drm, "Invalid placement value, %d >= %d\n",
+   slot, set->num_engines);
+   return -EINVAL;
+   }
+
+   if (set->engines[slot].type != I915_GEM_ENGINE_TYPE_INVALID) {
+   drm_dbg(&i915->drm,
+   "Invalid placement[%d], already occupied\n", slot);
+   return -EINVAL;
+   }
+
+   if (get_user(flags, &ext->flags))
+   return -EFAULT;
+
+   if (flags) {
+   drm_dbg(&i915->drm, "Unknown flags 0x%02llx", flags);
+   return -EINVAL;
+   }
+
+   for (n = 0; n < ARRAY_SIZE(ext->mbz64); n++) {
+   err = check_user_mbz(&ext->mbz64[n]);
+   if (err)
+   return err;
+   }
+
+   if (width < 2) {
+   drm_dbg(&i915->drm, "Width (%d) < 2\n", width);
+   return -EINVAL;
+   }
+
+   if (num_siblings < 1) {
+   drm_dbg(&i915->drm, "Number siblings (%d) < 1\n",
+   num_siblings);
+   return -EINVAL;
+   }
+
+   siblings = kmalloc_array(num_siblings * width,
+sizeof(*siblings),
+GFP_KERNEL);
+   if (!siblings)
+   return -ENOMEM;
+
+   /* Create contexts / engines */
+   for (i = 0; i < width; ++i) {
+   intel_engine_mask_t current_mask = 0;
+   struct i915_engine_class_instance prev_engine;
+
+   for (j = 0; j < num_siblings; ++j) {
+   struct i915_engine_class_instance ci;
+
+   n = i * num_siblings + j;
+   if (copy_from_user(&ci, &ext->engines[n], sizeof(ci))) {
+   err = -EFAULT;
+   goto out_err;
+   }
+
+   siblings[n] =
+   intel_engine_lookup_user(i915, ci.engine_class,
+ci.engine

Re: [Intel-gfx] [PATCH i-g-t] tests/i915: Skip gem_exec_fair on GuC based platforms

2021-10-13 Thread John Harrison

On 10/13/2021 15:53, Dixit, Ashutosh wrote:

On Wed, 13 Oct 2021 15:43:17 -0700,  wrote:

From: John Harrison 

The gem_exec_fair test is specifically testing scheduler algorithm
performance. However, GuC does not implement the same algorithm as
execlist mode and this test is not applicable. So, until sw arch
approves a new algorithm and it is implemented in GuC, stop running
the test.

Signed-off-by: John Harrison 
---
  tests/i915/gem_exec_fair.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/tests/i915/gem_exec_fair.c b/tests/i915/gem_exec_fair.c
index ef5a450f6..ca9c73c6e 100644
--- a/tests/i915/gem_exec_fair.c
+++ b/tests/i915/gem_exec_fair.c
@@ -1314,6 +1314,12 @@ igt_main
igt_require(gem_scheduler_enabled(i915));
igt_require(gem_scheduler_has_ctx_priority(i915));

+   /*
+* These tests are for a specific scheduling model which is
+* not currently implemented by GuC. So skip on GuC platforms.
+*/
+   igt_require(intel_gen(intel_get_drm_devid(i915)) < 12);

Probably a feature check rather than a version check is better? Can we use
say gem_has_guc_submission() instead?

Though appears gem_has_guc_submission() only checks if guc submission is
available, not if it is actually in use (unless guc will used when
available automatically)? Is it possible to add the check if guc submission
is actually in use? Or a check for guc scheduler?
I believe this has come up a few times before. My understanding is that 
no, there is no current official/safe way for userland to check if GuC 
submission is enabled (you can read some of the debugfs files and make 
an educated guess but that isn't exactly an official interface). And the 
answer was that it isn't worth adding a UAPI specifically for it. Not 
least because it would be a UAPI solely for use by IGT which is not allowed.


John.





+
cfg = intel_ctx_cfg_all_physical(i915);

igt_info("CS timestamp frequency: %d\n",
--
2.25.1





Re: [Intel-gfx] [PATCH 16/25] drm/i915/guc: Connect UAPI to GuC multi-lrc interface

2021-10-14 Thread John Harrison

On 10/14/2021 08:32, Matthew Brost wrote:

On Wed, Oct 13, 2021 at 06:02:42PM -0700, John Harrison wrote:

On 10/13/2021 13:42, Matthew Brost wrote:

Introduce 'set parallel submit' extension to connect UAPI to GuC
multi-lrc interface. Kernel doc in new uAPI should explain it all.

IGT: https://patchwork.freedesktop.org/patch/447008/?series=93071&rev=1
media UMD: https://github.com/intel/media-driver/pull/1252

v2:
   (Daniel Vetter)
- Add IGT link and placeholder for media UMD link
v3:
   (Kernel test robot)
- Fix warning in unpin engines call
   (John Harrison)
- Reword a bunch of the kernel doc
v4:
   (John Harrison)
- Add comment why perma-pin is done after setting gem context
- Update some comments / docs for proto contexts

Cc: Tvrtko Ursulin 
Signed-off-by: Matthew Brost 
---
   drivers/gpu/drm/i915/gem/i915_gem_context.c   | 228 +-
   .../gpu/drm/i915/gem/i915_gem_context_types.h |  16 +-
   drivers/gpu/drm/i915/gt/intel_context_types.h |   9 +-
   drivers/gpu/drm/i915/gt/intel_engine.h|  12 +-
   drivers/gpu/drm/i915/gt/intel_engine_cs.c |   6 +-
   .../drm/i915/gt/intel_execlists_submission.c  |   6 +-
   drivers/gpu/drm/i915/gt/selftest_execlists.c  |  12 +-
   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 114 -
   include/uapi/drm/i915_drm.h   | 131 ++
   9 files changed, 503 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index d225d3dd0b40..6f23aff6e642 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -556,9 +556,150 @@ set_proto_ctx_engines_bond(struct i915_user_extension 
__user *base, void *data)
return 0;
   }
+static int
+set_proto_ctx_engines_parallel_submit(struct i915_user_extension __user *base,
+ void *data)
+{
+   struct i915_context_engines_parallel_submit __user *ext =
+   container_of_user(base, typeof(*ext), base);
+   const struct set_proto_ctx_engines *set = data;
+   struct drm_i915_private *i915 = set->i915;
+   u64 flags;
+   int err = 0, n, i, j;
+   u16 slot, width, num_siblings;
+   struct intel_engine_cs **siblings = NULL;
+   intel_engine_mask_t prev_mask;
+
+   /* Disabling for now */
+   return -ENODEV;
+
+   /* FIXME: This is NIY for execlists */
+   if (!(intel_uc_uses_guc_submission(&i915->gt.uc)))
+   return -ENODEV;
+
+   if (get_user(slot, &ext->engine_index))
+   return -EFAULT;
+
+   if (get_user(width, &ext->width))
+   return -EFAULT;
+
+   if (get_user(num_siblings, &ext->num_siblings))
+   return -EFAULT;
+
+   if (slot >= set->num_engines) {
+   drm_dbg(&i915->drm, "Invalid placement value, %d >= %d\n",
+   slot, set->num_engines);
+   return -EINVAL;
+   }
+
+   if (set->engines[slot].type != I915_GEM_ENGINE_TYPE_INVALID) {
+   drm_dbg(&i915->drm,
+   "Invalid placement[%d], already occupied\n", slot);
+   return -EINVAL;
+   }
+
+   if (get_user(flags, &ext->flags))
+   return -EFAULT;
+
+   if (flags) {
+   drm_dbg(&i915->drm, "Unknown flags 0x%02llx", flags);
+   return -EINVAL;
+   }
+
+   for (n = 0; n < ARRAY_SIZE(ext->mbz64); n++) {
+   err = check_user_mbz(&ext->mbz64[n]);
+   if (err)
+   return err;
+   }
+
+   if (width < 2) {
+   drm_dbg(&i915->drm, "Width (%d) < 2\n", width);
+   return -EINVAL;
+   }
+
+   if (num_siblings < 1) {
+   drm_dbg(&i915->drm, "Number siblings (%d) < 1\n",
+   num_siblings);
+   return -EINVAL;
+   }
+
+   siblings = kmalloc_array(num_siblings * width,
+sizeof(*siblings),
+GFP_KERNEL);
+   if (!siblings)
+   return -ENOMEM;
+
+   /* Create contexts / engines */
+   for (i = 0; i < width; ++i) {
+   intel_engine_mask_t current_mask = 0;
+   struct i915_engine_class_instance prev_engine;
+
+   for (j = 0; j < num_siblings; ++j) {
+   struct i915_engine_class_instance ci;
+
+   n = i * num_siblings + j;
+   if (copy_from_user(&ci, &ext->engines[n], sizeof(ci))) {
+   err = -EFAULT;
+   goto out_err;
+   }
+
+   siblings[n] =
+   intel_engine_lo

Re: [Intel-gfx] [PATCH 11/25] drm/i915/guc: Implement parallel context pin / unpin functions

2021-10-14 Thread John Harrison

On 10/13/2021 13:42, Matthew Brost wrote:

Parallel contexts are perma-pinned by the upper layers which makes the
backend implementation rather simple. The parent pins the guc_id and
children increment the parent's pin count on pin to ensure all the
contexts are unpinned before we disable scheduling with the GuC / or
deregister the context.

v2:
  (Daniel Vetter)
   - Perma-pin parallel contexts

Signed-off-by: Matthew Brost 

Reviewed-by: John Harrison 


---
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 70 +++
  1 file changed, 70 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index c4d7a5c3b558..9fc40e3c1794 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2585,6 +2585,76 @@ static const struct intel_context_ops 
virtual_guc_context_ops = {
.get_sibling = guc_virtual_get_sibling,
  };
  
+/* Future patches will use this function */

+__maybe_unused
+static int guc_parent_context_pin(struct intel_context *ce, void *vaddr)
+{
+   struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
+   struct intel_guc *guc = ce_to_guc(ce);
+   int ret;
+
+   GEM_BUG_ON(!intel_context_is_parent(ce));
+   GEM_BUG_ON(!intel_engine_is_virtual(ce->engine));
+
+   ret = pin_guc_id(guc, ce);
+   if (unlikely(ret < 0))
+   return ret;
+
+   return __guc_context_pin(ce, engine, vaddr);
+}
+
+/* Future patches will use this function */
+__maybe_unused
+static int guc_child_context_pin(struct intel_context *ce, void *vaddr)
+{
+   struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
+
+   GEM_BUG_ON(!intel_context_is_child(ce));
+   GEM_BUG_ON(!intel_engine_is_virtual(ce->engine));
+
+   __intel_context_pin(ce->parallel.parent);
+   return __guc_context_pin(ce, engine, vaddr);
+}
+
+/* Future patches will use this function */
+__maybe_unused
+static void guc_parent_context_unpin(struct intel_context *ce)
+{
+   struct intel_guc *guc = ce_to_guc(ce);
+
+   GEM_BUG_ON(context_enabled(ce));
+   GEM_BUG_ON(intel_context_is_barrier(ce));
+   GEM_BUG_ON(!intel_context_is_parent(ce));
+   GEM_BUG_ON(!intel_engine_is_virtual(ce->engine));
+
+   unpin_guc_id(guc, ce);
+   lrc_unpin(ce);
+}
+
+/* Future patches will use this function */
+__maybe_unused
+static void guc_child_context_unpin(struct intel_context *ce)
+{
+   GEM_BUG_ON(context_enabled(ce));
+   GEM_BUG_ON(intel_context_is_barrier(ce));
+   GEM_BUG_ON(!intel_context_is_child(ce));
+   GEM_BUG_ON(!intel_engine_is_virtual(ce->engine));
+
+   lrc_unpin(ce);
+}
+
+/* Future patches will use this function */
+__maybe_unused
+static void guc_child_context_post_unpin(struct intel_context *ce)
+{
+   GEM_BUG_ON(!intel_context_is_child(ce));
+   GEM_BUG_ON(!intel_context_is_pinned(ce->parallel.parent));
+   GEM_BUG_ON(!intel_engine_is_virtual(ce->engine));
+
+   lrc_post_unpin(ce);
+   intel_context_unpin(ce->parallel.parent);
+}
+
  static bool
  guc_irq_enable_breadcrumbs(struct intel_breadcrumbs *b)
  {




Re: [Intel-gfx] [PATCH 16/25] drm/i915/guc: Connect UAPI to GuC multi-lrc interface

2021-10-14 Thread John Harrison

On 10/14/2021 09:41, Matthew Brost wrote:

On Thu, Oct 14, 2021 at 09:43:36AM -0700, John Harrison wrote:

On 10/14/2021 08:32, Matthew Brost wrote:

On Wed, Oct 13, 2021 at 06:02:42PM -0700, John Harrison wrote:

On 10/13/2021 13:42, Matthew Brost wrote:

Introduce 'set parallel submit' extension to connect UAPI to GuC
multi-lrc interface. Kernel doc in new uAPI should explain it all.

IGT: https://patchwork.freedesktop.org/patch/447008/?series=93071&rev=1
media UMD: https://github.com/intel/media-driver/pull/1252

v2:
(Daniel Vetter)
 - Add IGT link and placeholder for media UMD link
v3:
(Kernel test robot)
 - Fix warning in unpin engines call
(John Harrison)
 - Reword a bunch of the kernel doc
v4:
(John Harrison)
 - Add comment why perma-pin is done after setting gem context
 - Update some comments / docs for proto contexts

Cc: Tvrtko Ursulin 
Signed-off-by: Matthew Brost 
---
drivers/gpu/drm/i915/gem/i915_gem_context.c   | 228 +-
.../gpu/drm/i915/gem/i915_gem_context_types.h |  16 +-
drivers/gpu/drm/i915/gt/intel_context_types.h |   9 +-
drivers/gpu/drm/i915/gt/intel_engine.h|  12 +-
drivers/gpu/drm/i915/gt/intel_engine_cs.c |   6 +-
.../drm/i915/gt/intel_execlists_submission.c  |   6 +-
drivers/gpu/drm/i915/gt/selftest_execlists.c  |  12 +-
.../gpu/drm/i915/gt/uc/intel_guc_submission.c | 114 -
include/uapi/drm/i915_drm.h   | 131 ++
9 files changed, 503 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index d225d3dd0b40..6f23aff6e642 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -556,9 +556,150 @@ set_proto_ctx_engines_bond(struct i915_user_extension 
__user *base, void *data)
return 0;
}
+static int
+set_proto_ctx_engines_parallel_submit(struct i915_user_extension __user *base,
+ void *data)
+{
+   struct i915_context_engines_parallel_submit __user *ext =
+   container_of_user(base, typeof(*ext), base);
+   const struct set_proto_ctx_engines *set = data;
+   struct drm_i915_private *i915 = set->i915;
+   u64 flags;
+   int err = 0, n, i, j;
+   u16 slot, width, num_siblings;
+   struct intel_engine_cs **siblings = NULL;
+   intel_engine_mask_t prev_mask;
+
+   /* Disabling for now */
+   return -ENODEV;
+
+   /* FIXME: This is NIY for execlists */
+   if (!(intel_uc_uses_guc_submission(&i915->gt.uc)))
+   return -ENODEV;
+
+   if (get_user(slot, &ext->engine_index))
+   return -EFAULT;
+
+   if (get_user(width, &ext->width))
+   return -EFAULT;
+
+   if (get_user(num_siblings, &ext->num_siblings))
+   return -EFAULT;
+
+   if (slot >= set->num_engines) {
+   drm_dbg(&i915->drm, "Invalid placement value, %d >= %d\n",
+   slot, set->num_engines);
+   return -EINVAL;
+   }
+
+   if (set->engines[slot].type != I915_GEM_ENGINE_TYPE_INVALID) {
+   drm_dbg(&i915->drm,
+   "Invalid placement[%d], already occupied\n", slot);
+   return -EINVAL;
+   }
+
+   if (get_user(flags, &ext->flags))
+   return -EFAULT;
+
+   if (flags) {
+   drm_dbg(&i915->drm, "Unknown flags 0x%02llx", flags);
+   return -EINVAL;
+   }
+
+   for (n = 0; n < ARRAY_SIZE(ext->mbz64); n++) {
+   err = check_user_mbz(&ext->mbz64[n]);
+   if (err)
+   return err;
+   }
+
+   if (width < 2) {
+   drm_dbg(&i915->drm, "Width (%d) < 2\n", width);
+   return -EINVAL;
+   }
+
+   if (num_siblings < 1) {
+   drm_dbg(&i915->drm, "Number siblings (%d) < 1\n",
+   num_siblings);
+   return -EINVAL;
+   }
+
+   siblings = kmalloc_array(num_siblings * width,
+sizeof(*siblings),
+GFP_KERNEL);
+   if (!siblings)
+   return -ENOMEM;
+
+   /* Create contexts / engines */
+   for (i = 0; i < width; ++i) {
+   intel_engine_mask_t current_mask = 0;
+   struct i915_engine_class_instance prev_engine;
+
+   for (j = 0; j < num_siblings; ++j) {
+   struct i915_engine_class_instance ci;
+
+   n = i * num_siblings + j;
+   if (copy_from_user(&ci, &ext->engines[n], sizeof(ci))) {
+   err = -EFAU

Re: [Intel-gfx] [PATCH 08/25] drm/i915/guc: Add multi-lrc context registration

2021-10-14 Thread John Harrison

On 10/14/2021 10:19, Matthew Brost wrote:

Add multi-lrc context registration H2G. In addition a workqueue and
process descriptor are setup during multi-lrc context registration as
these data structures are needed for multi-lrc submission.

v2:
  (John Harrison)
   - Move GuC specific fields into sub-struct
   - Clean up WQ defines
   - Add comment explaining math to derive WQ / PD address
v3:
  (John Harrison)
   - Add PARENT_SCRATCH_SIZE define
   - Update comment explaining multi-lrc register
v4:
  (John Harrison)
   - Move PARENT_SCRATCH_SIZE to common file

Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/intel_context.h   |   2 +
  drivers/gpu/drm/i915/gt/intel_context_types.h |  12 ++
  drivers/gpu/drm/i915/gt/intel_lrc.c   |   5 +
  .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |   1 +
  drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   2 -
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 115 +-
  6 files changed, 134 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.h 
b/drivers/gpu/drm/i915/gt/intel_context.h
index b63c10a144af..9f0995150a7a 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -44,6 +44,8 @@ void intel_context_free(struct intel_context *ce);
  int intel_context_reconfigure_sseu(struct intel_context *ce,
   const struct intel_sseu sseu);
  
+#define PARENT_SCRATCH_SIZE	PAGE_SIZE
Would have been nice to have a comment. At least something like 'For 
multi-LRC submission, see uc/intel_guc_submission.c for details'. But 
the description is there in the other file for those who want to look. 
So either way:

Reviewed-by: John Harrison 



+
  static inline bool intel_context_is_child(struct intel_context *ce)
  {
return !!ce->parallel.parent;
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 76dfca57cb45..48decb5ee954 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -239,6 +239,18 @@ struct intel_context {
struct intel_context *parent;
/** @number_children: number of children if parent */
u8 number_children;
+   /** @guc: GuC specific members for parallel submission */
+   struct {
+   /** @wqi_head: head pointer in work queue */
+   u16 wqi_head;
+   /** @wqi_tail: tail pointer in work queue */
+   u16 wqi_tail;
+   /**
+* @parent_page: page in context state (ce->state) used
+* by parent for work queue, process descriptor
+*/
+   u8 parent_page;
+   } guc;
} parallel;
  
  #ifdef CONFIG_DRM_I915_SELFTEST

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 3ef9eaf8c50e..56156cf18c41 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -942,6 +942,11 @@ __lrc_alloc_state(struct intel_context *ce, struct 
intel_engine_cs *engine)
context_size += PAGE_SIZE;
}
  
+	if (intel_context_is_parent(ce) && intel_engine_uses_guc(engine)) {

+   ce->parallel.guc.parent_page = context_size / PAGE_SIZE;
+   context_size += PARENT_SCRATCH_SIZE;
+   }
+
obj = i915_gem_object_create_lmem(engine->i915, context_size,
  I915_BO_ALLOC_PM_VOLATILE);
if (IS_ERR(obj))
diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h 
b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
index 8ff58aff..ba10bd374cee 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
@@ -142,6 +142,7 @@ enum intel_guc_action {
INTEL_GUC_ACTION_REGISTER_COMMAND_TRANSPORT_BUFFER = 0x4505,
INTEL_GUC_ACTION_DEREGISTER_COMMAND_TRANSPORT_BUFFER = 0x4506,
INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE = 0x4600,
+   INTEL_GUC_ACTION_REGISTER_CONTEXT_MULTI_LRC = 0x4601,
INTEL_GUC_ACTION_RESET_CLIENT = 0x5507,
INTEL_GUC_ACTION_LIMIT
  };
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
index fa4be13c8854..0eeb2a9feeed 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -52,8 +52,6 @@
  
  #define GUC_DOORBELL_INVALID		256
  
-#define GUC_WQ_SIZE			(PAGE_SIZE * 2)

-
  /* Work queue item header definitions */
  #define WQ_STATUS_ACTIVE  1
  #define WQ_STATUS_SUSPENDED   2
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index fd6594bc1b96..d9f5be00e586 100644
--

Re: [Intel-gfx] [PATCH 16/25] drm/i915/guc: Connect UAPI to GuC multi-lrc interface

2021-10-14 Thread John Harrison

On 10/14/2021 10:19, Matthew Brost wrote:

Introduce 'set parallel submit' extension to connect UAPI to GuC
multi-lrc interface. Kernel doc in new uAPI should explain it all.

IGT: https://patchwork.freedesktop.org/patch/447008/?series=93071&rev=1
media UMD: https://github.com/intel/media-driver/pull/1252

v2:
  (Daniel Vetter)
   - Add IGT link and placeholder for media UMD link
v3:
  (Kernel test robot)
   - Fix warning in unpin engines call
  (John Harrison)
   - Reword a bunch of the kernel doc
v4:
  (John Harrison)
   - Add comment why perma-pin is done after setting gem context
   - Update some comments / docs for proto contexts
v5:
  (John Harrison)
   - Rework perma-pin comment
   - Add BUG_IN if context is pinned when setting gem context

IN?



Cc: Tvrtko Ursulin 
Signed-off-by: Matthew Brost 

Reviewed-by: John Harrison 


---
  drivers/gpu/drm/i915/gem/i915_gem_context.c   | 230 +-
  .../gpu/drm/i915/gem/i915_gem_context_types.h |  16 +-
  drivers/gpu/drm/i915/gt/intel_context_types.h |   9 +-
  drivers/gpu/drm/i915/gt/intel_engine.h|  12 +-
  drivers/gpu/drm/i915/gt/intel_engine_cs.c |   6 +-
  .../drm/i915/gt/intel_execlists_submission.c  |   6 +-
  drivers/gpu/drm/i915/gt/selftest_execlists.c  |  12 +-
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 114 -
  include/uapi/drm/i915_drm.h   | 131 ++
  9 files changed, 505 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index d225d3dd0b40..9a00f11fef46 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -556,9 +556,150 @@ set_proto_ctx_engines_bond(struct i915_user_extension 
__user *base, void *data)
return 0;
  }
  
+static int

+set_proto_ctx_engines_parallel_submit(struct i915_user_extension __user *base,
+ void *data)
+{
+   struct i915_context_engines_parallel_submit __user *ext =
+   container_of_user(base, typeof(*ext), base);
+   const struct set_proto_ctx_engines *set = data;
+   struct drm_i915_private *i915 = set->i915;
+   u64 flags;
+   int err = 0, n, i, j;
+   u16 slot, width, num_siblings;
+   struct intel_engine_cs **siblings = NULL;
+   intel_engine_mask_t prev_mask;
+
+   /* Disabling for now */
+   return -ENODEV;
+
+   /* FIXME: This is NIY for execlists */
+   if (!(intel_uc_uses_guc_submission(&i915->gt.uc)))
+   return -ENODEV;
+
+   if (get_user(slot, &ext->engine_index))
+   return -EFAULT;
+
+   if (get_user(width, &ext->width))
+   return -EFAULT;
+
+   if (get_user(num_siblings, &ext->num_siblings))
+   return -EFAULT;
+
+   if (slot >= set->num_engines) {
+   drm_dbg(&i915->drm, "Invalid placement value, %d >= %d\n",
+   slot, set->num_engines);
+   return -EINVAL;
+   }
+
+   if (set->engines[slot].type != I915_GEM_ENGINE_TYPE_INVALID) {
+   drm_dbg(&i915->drm,
+   "Invalid placement[%d], already occupied\n", slot);
+   return -EINVAL;
+   }
+
+   if (get_user(flags, &ext->flags))
+   return -EFAULT;
+
+   if (flags) {
+   drm_dbg(&i915->drm, "Unknown flags 0x%02llx", flags);
+   return -EINVAL;
+   }
+
+   for (n = 0; n < ARRAY_SIZE(ext->mbz64); n++) {
+   err = check_user_mbz(&ext->mbz64[n]);
+   if (err)
+   return err;
+   }
+
+   if (width < 2) {
+   drm_dbg(&i915->drm, "Width (%d) < 2\n", width);
+   return -EINVAL;
+   }
+
+   if (num_siblings < 1) {
+   drm_dbg(&i915->drm, "Number siblings (%d) < 1\n",
+   num_siblings);
+   return -EINVAL;
+   }
+
+   siblings = kmalloc_array(num_siblings * width,
+sizeof(*siblings),
+GFP_KERNEL);
+   if (!siblings)
+   return -ENOMEM;
+
+   /* Create contexts / engines */
+   for (i = 0; i < width; ++i) {
+   intel_engine_mask_t current_mask = 0;
+   struct i915_engine_class_instance prev_engine;
+
+   for (j = 0; j < num_siblings; ++j) {
+   struct i915_engine_class_instance ci;
+
+   n = i * num_siblings + j;
+   if (copy_from_user(&ci, &ext->engines[n], sizeof(ci))) {
+   err = -EFAULT;
+   goto out_err;
+   }
+
+   siblings[n] =
+  

Re: [Intel-gfx] [PATCH 20/25] drm/i915: Multi-BB execbuf

2021-10-14 Thread John Harrison

On 10/14/2021 10:20, Matthew Brost wrote:

Allow multiple batch buffers to be submitted in a single execbuf IOCTL
after a context has been configured with the 'set_parallel' extension.
The number batches is implicit based on the contexts configuration.

This is implemented with a series of loops. First a loop is used to find
all the batches, a loop to pin all the HW contexts, a loop to create all
the requests, a loop to submit (emit BB start, etc...) all the requests,
a loop to tie the requests to the VMAs they touch, and finally a loop to
commit the requests to the backend.

A composite fence is also created for the generated requests to return
to the user and to stick in dma resv slots.

No behavior from the existing IOCTL should be changed aside from when
throttling because the ring for a context is full. In this situation,
i915 will now wait while holding the object locks. This change was done
because the code is much simpler to wait while holding the locks and we
believe there isn't a huge benefit of dropping these locks. If this
proves false we can restructure the code to drop the locks during the
wait.

IGT: https://patchwork.freedesktop.org/patch/447008/?series=93071&rev=1
media UMD: https://github.com/intel/media-driver/pull/1252

v2:
  (Matthew Brost)
   - Return proper error value if i915_request_create fails
v3:
  (John Harrison)
   - Add comment explaining create / add order loops + locking
   - Update commit message explaining different in IOCTL behavior
   - Line wrap some comments
   - eb_add_request returns void
   - Return -EINVAL rather triggering BUG_ON if cmd parser used
  (Checkpatch)
   - Check eb->batch_len[*current_batch]
v4:
  (CI)
   - Set batch len if passed if via execbuf args
   - Call __i915_request_skip after __i915_request_commit
  (Kernel test robot)
   - Initialize rq to NULL in eb_pin_timeline
v5:
  (John Harrison)
   - Fix typo in comments near bb order loops

Signed-off-by: Matthew Brost 

Reviewed-by: John Harrison 


---
  .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 783 --
  drivers/gpu/drm/i915/gt/intel_context.h   |   8 +-
  drivers/gpu/drm/i915/gt/intel_context_types.h |  10 +
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c |   2 +
  drivers/gpu/drm/i915/i915_request.h   |   9 +
  drivers/gpu/drm/i915/i915_vma.c   |  21 +-
  drivers/gpu/drm/i915/i915_vma.h   |  13 +-
  7 files changed, 595 insertions(+), 251 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index c75afc8784e3..fc30856e81fa 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -246,17 +246,25 @@ struct i915_execbuffer {
struct drm_i915_gem_exec_object2 *exec; /** ioctl execobj[] */
struct eb_vma *vma;
  
-	struct intel_engine_cs *engine; /** engine to queue the request to */

+   struct intel_gt *gt; /* gt for the execbuf */
struct intel_context *context; /* logical state for the request */
struct i915_gem_context *gem_context; /** caller's context */
  
-	struct i915_request *request; /** our request to build */

-   struct eb_vma *batch; /** identity of the batch obj/vma */
+   /** our requests to build */
+   struct i915_request *requests[MAX_ENGINE_INSTANCE + 1];
+   /** identity of the batch obj/vma */
+   struct eb_vma *batches[MAX_ENGINE_INSTANCE + 1];
struct i915_vma *trampoline; /** trampoline used for chaining */
  
+	/** used for excl fence in dma_resv objects when > 1 BB submitted */

+   struct dma_fence *composite_fence;
+
/** actual size of execobj[] as we may extend it for the cmdparser */
unsigned int buffer_count;
  
+	/* number of batches in execbuf IOCTL */

+   unsigned int num_batches;
+
/** list of vma not yet bound during reservation phase */
struct list_head unbound;
  
@@ -283,7 +291,8 @@ struct i915_execbuffer {
  
  	u64 invalid_flags; /** Set of execobj.flags that are invalid */
  
-	u64 batch_len; /** Length of batch within object */

+   /** Length of batch within object */
+   u64 batch_len[MAX_ENGINE_INSTANCE + 1];
u32 batch_start_offset; /** Location within object of batch */
u32 batch_flags; /** Flags composed for emit_bb_start() */
struct intel_gt_buffer_pool_node *batch_pool; /** pool node for batch 
buffer */
@@ -301,14 +310,13 @@ struct i915_execbuffer {
  };
  
  static int eb_parse(struct i915_execbuffer *eb);

-static struct i915_request *eb_pin_engine(struct i915_execbuffer *eb,
- bool throttle);
+static int eb_pin_engine(struct i915_execbuffer *eb, bool throttle);
  static void eb_unpin_engine(struct i915_execbuffer *eb);
  
  static inline bool eb_use_cmdparser(const struct i915_execbuffer *eb)

  {
-   return intel_engine_

Re: [Intel-gfx] [PATCH 24/25] drm/i915: Enable multi-bb execbuf

2021-10-14 Thread John Harrison

On 10/14/2021 10:20, Matthew Brost wrote:

Enable multi-bb execbuf by enabling the set_parallel extension.

Signed-off-by: Matthew Brost 

Reviewed-by: John Harrison 


---
  drivers/gpu/drm/i915/gem/i915_gem_context.c | 3 ---
  1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 9a00f11fef46..fb33d0322960 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -570,9 +570,6 @@ set_proto_ctx_engines_parallel_submit(struct 
i915_user_extension __user *base,
struct intel_engine_cs **siblings = NULL;
intel_engine_mask_t prev_mask;
  
-	/* Disabling for now */

-   return -ENODEV;
-
/* FIXME: This is NIY for execlists */
if (!(intel_uc_uses_guc_submission(&i915->gt.uc)))
return -ENODEV;




Re: [Intel-gfx] [PATCH 25/25] drm/i915/execlists: Weak parallel submission support for execlists

2021-10-14 Thread John Harrison

On 10/14/2021 10:20, Matthew Brost wrote:

A weak implementation of parallel submission (multi-bb execbuf IOCTL) for
execlists. Doing as little as possible to support this interface for
execlists - basically just passing submit fences between each request
generated and virtual engines are not allowed. This is on par with what
is there for the existing (hopefully soon deprecated) bonding interface.

We perma-pin these execlists contexts to align with GuC implementation.

Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gem/i915_gem_context.c   | 10 ++--
  drivers/gpu/drm/i915/gt/intel_context.c   |  4 +-
  .../drm/i915/gt/intel_execlists_submission.c  | 56 ++-
  drivers/gpu/drm/i915/gt/intel_lrc.c   |  2 +
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  2 -
  5 files changed, 64 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index fb33d0322960..35e87a7d0ea9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -570,10 +570,6 @@ set_proto_ctx_engines_parallel_submit(struct 
i915_user_extension __user *base,
struct intel_engine_cs **siblings = NULL;
intel_engine_mask_t prev_mask;
  
-	/* FIXME: This is NIY for execlists */

-   if (!(intel_uc_uses_guc_submission(&i915->gt.uc)))
-   return -ENODEV;
-
if (get_user(slot, &ext->engine_index))
return -EFAULT;
  
@@ -583,6 +579,12 @@ set_proto_ctx_engines_parallel_submit(struct i915_user_extension __user *base,

if (get_user(num_siblings, &ext->num_siblings))
return -EFAULT;
  
+	if (!intel_uc_uses_guc_submission(&i915->gt.uc) && num_siblings != 1) {

+   drm_dbg(&i915->drm, "Only 1 sibling (%d) supported in non-GuC 
mode\n",
+   num_siblings);
+   return -EINVAL;
+   }
+
if (slot >= set->num_engines) {
drm_dbg(&i915->drm, "Invalid placement value, %d >= %d\n",
slot, set->num_engines);
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index 5634d14052bc..1bec92e1d8e6 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -79,7 +79,8 @@ static int intel_context_active_acquire(struct intel_context 
*ce)
  
  	__i915_active_acquire(&ce->active);
  
-	if (intel_context_is_barrier(ce) || intel_engine_uses_guc(ce->engine))

+   if (intel_context_is_barrier(ce) || intel_engine_uses_guc(ce->engine) ||
+   intel_context_is_parallel(ce))
return 0;
  
  	/* Preallocate tracking nodes */

@@ -563,7 +564,6 @@ void intel_context_bind_parent_child(struct intel_context 
*parent,
 * Callers responsibility to validate that this function is used
 * correctly but we use GEM_BUG_ON here ensure that they do.
 */
-   GEM_BUG_ON(!intel_engine_uses_guc(parent->engine));
GEM_BUG_ON(intel_context_is_pinned(parent));
GEM_BUG_ON(intel_context_is_child(parent));
GEM_BUG_ON(intel_context_is_pinned(child));
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index bedb80057046..8cd986bdf26c 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -927,8 +927,7 @@ static void execlists_submit_ports(struct intel_engine_cs 
*engine)
  
  static bool ctx_single_port_submission(const struct intel_context *ce)

  {
-   return (IS_ENABLED(CONFIG_DRM_I915_GVT) &&
-   intel_context_force_single_submission(ce));
+   return intel_context_force_single_submission(ce);
Does this change not affect all execlist operation rather than just 
parallel submission?



  }
  
  static bool can_merge_ctx(const struct intel_context *prev,

@@ -2598,6 +2597,58 @@ static void execlists_context_cancel_request(struct 
intel_context *ce,
  current->comm);
  }
  
+static struct intel_context *

+execlists_create_parallel(struct intel_engine_cs **engines,
+ unsigned int num_siblings,
+ unsigned int width)
+{
+   struct intel_engine_cs **siblings = NULL;
+   struct intel_context *parent = NULL, *ce, *err;
+   int i, j;
+
+   GEM_BUG_ON(num_siblings != 1);
+
+   siblings = kmalloc_array(num_siblings,
+sizeof(*siblings),
+GFP_KERNEL);
+   if (!siblings)
+   return ERR_PTR(-ENOMEM);
+
+   for (i = 0; i < width; ++i) {
+   for (j = 0; j < num_siblings; ++j)
+   siblings[j] = engines[i * num_siblings + j];
What is the purpose of this array? The only usage that I can see is 
siblings[0] on the line below. The rest of the entries never se

Re: [Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes

2021-10-15 Thread John Harrison

On 10/15/2021 07:52, Tvrtko Ursulin wrote:

On 04/10/2021 08:36, Jani Nikula wrote:
On Fri, 24 Sep 2021, Ville Syrjälä  
wrote:

On Tue, Sep 21, 2021 at 06:50:39PM -0700, Matthew Brost wrote:

From: Hugh Dickins 

5.15-rc1 crashes with blank screen when booting up on two ThinkPads
using i915.  Bisections converge convincingly, but arrive at different
and surprising "culprits", none of them the actual culprit.

netconsole (with init_netconsole() hacked to call i915_init() when
logging has started, instead of by module_init()) tells the story:

kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245!
with RSI: 814d408b pointing to sw_fence_dummy_notify().
I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that
function needs to be 4-byte aligned.

v2:
  (Jani Nikula)
   - Change BUG_ON to WARN_ON
v3:
  (Jani / Tvrtko)
   - Short circuit __i915_sw_fence_init on WARN_ON
v4:
  (Lucas)
   - Break WARN_ON changes out in a different patch

Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation")
Signed-off-by: Hugh Dickins 
Signed-off-by: Matthew Brost 
Reviewed-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/intel_context.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c

index ff637147b1a9..e7f78bc7ebfc 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -362,8 +362,8 @@ static int __intel_context_active(struct 
i915_active *active)

  return 0;
  }
  -static int sw_fence_dummy_notify(struct i915_sw_fence *sf,
- enum i915_sw_fence_notify state)
+static int __i915_sw_fence_call
+sw_fence_dummy_notify(struct i915_sw_fence *sf, enum 
i915_sw_fence_notify state)

  {
  return NOTIFY_DONE;
  }


This thing seems broken beyond just this alignment stuff. I'm getting
this spew from DEBUG_OBJECTS all the time on a glk here:


Nobody followed through with this, so:

https://lore.kernel.org/r/20211002020257.34a0e...@oasis.local.home

and

cdc1e6e225e3 ("drm/i915: fix blank screen booting crashes")


John you pushed this yesterday? Will this cause a problem now if we 
have two commits for the same bug:

I'm thoroughly confused.

I finally got far enough down my backlog to reach this and it did not 
appear to be in the tree yet so I tried pushing it. The DIM tool gave me 
a bunch of errors that didn't seem to make any sense. It certainly gave 
me the impression that it did not actually do anything. So I gave up on 
it. But now it seems like it did actually push something? And it was 
already merged after all?


John.



commit b0179f0d18dd7e6fb6b1c52c49ac21365257e97e
Author: Hugh Dickins 
AuthorDate: Tue Sep 21 18:50:39 2021 -0700
Commit: John Harrison 
CommitDate: Thu Oct 14 18:29:01 2021 -0700

    drm/i915: fix blank screen booting crashes



commit cdc1e6e225e3256d56dc6648411630e71d7c776b
Author: Hugh Dickins 
AuthorDate: Sat Oct 2 03:17:29 2021 -0700
Commit: Linus Torvalds 
CommitDate: Sat Oct 2 09:39:15 2021 -0700

    drm/i915: fix blank screen booting crashes

Regards,

Tvrtko




BR,
Jani.




[   48.122629] [ cut here ]
[   48.122640] ODEBUG: init destroyed (active state 0) object type: 
i915_sw_fence hint: sw_fence_dummy_notify+0x0/0x10 [i915]
[   48.122963] WARNING: CPU: 0 PID: 815 at lib/debugobjects.c:505 
debug_print_object+0x6e/0x90
[   48.122976] Modules linked in: i915 i2c_algo_bit ttm 
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops 
prime_numbers intel_gtt agpgart fuse nls_iso8859_1 nls_cp437 vfat 
fat intel_rapl_msr wmi_bmof intel_rapl_common x86_pkg_temp_thermal 
r8169 realtek mdio_devres coretemp libphy efi_pstore evdev sdhci_pci 
cqhci sdhci mei_me mmc_core i2c_i801 intel_pmc_core mei led_class 
wmi i2c_smbus sch_fq_codel drm ip_tables x_tables ipv6 autofs4
[   48.123119] CPU: 0 PID: 815 Comm: kms_async_flips Not tainted 
5.15.0-rc2-hsw+ #131
[   48.123125] Hardware name: Intel Corporation NUC7CJYH/NUC7JYB, 
BIOS JYGLKCPX.86A.0027.2018.0125.1347 01/25/2018

[   48.123129] RIP: 0010:debug_print_object+0x6e/0x90
[   48.123137] Code: 07 08 02 83 c0 01 8b 4b 14 4c 8b 45 00 48 c7 c7 
a0 19 0a 82 89 05 66 07 08 02 8b 43 10 48 8b 14 c5 c0 0d e4 81 e8 d7 
2e 3c 00 <0f> 0b 83 05 c5 c0 0c 01 01 48 83 c4 08 5b 5d c3 83 05 b7 
c0 0c 01

[   48.123142] RSP: 0018:c9dabae0 EFLAGS: 00010282
[   48.123150] RAX:  RBX: 88810004f848 RCX: 

[   48.123154] RDX: 8001 RSI: 8112673f RDI: 
8112673f
[   48.123159] RBP: a0577480 R08: 88827fbfcfe8 R09: 
0009fffb
[   48.123163] R10: fffe R11: 3fff R12: 
88810a04d100
[   48.123167] R13: 88810a07d308 R14: 888109990800 R15: 
88810997b800
[   48.123171] FS:  7624b9c0() GS:888276e0() 
knlGS:

[   48.1

Re: [Intel-gfx] [igt-dev] [PATCH v2 i-g-t] tests/i915: Skip gem_exec_fair on GuC based platforms

2021-10-15 Thread John Harrison

On 10/15/2021 07:52, Dixit, Ashutosh wrote:

On Thu, 14 Oct 2021 12:42:38 -0700,  wrote:

+   /*
+* These tests are for a specific scheduling model which is
+* not currently implemented by GuC. So skip on GuC platforms.
+*/
+   devid = intel_get_drm_devid(i915);
+   igt_require((intel_gen(devid) < 12) || IS_TIGERLAKE(devid) ||
+   IS_ROCKETLAKE(devid) || IS_ALDERLAKE_S(devid));

As I hinted on v1 let's just do this here:

igt_require(gem_has_guc_submission(i915));

So that we can can have a single unified way of detecting if GuC is being
used throughout IGT. Today it is gem_has_guc_submission() and it works with
the current kernel.
Earlier, you were saying that 'has' was only checking for capability not 
usage. Which would be pretty useless for this situation. Looking at the 
code, though it sort of does work. It checks the live value of the 
enable_guc module parameter. If that says that GuC submission is enabled 
then either we are using GuC submission or we have no engines (because a 
failure to start the submission backend is terminal, there is no 
fallback to execlist mode if GuC didn't work). So it can be used.


I say sort of, though, because the code also sets 'has_execlists' when 
it sets 'has_guc'. Which means that the gem_has_execlists() test is not 
useable as an indication that the execlist back end is being used. So 
gem_has_execlists() and gem_has_guc_submission() are basically very 
non-orthogonal. One is a test of hardware presence irrespective of use, 
the other is a test of usage irrespective of presence. The comment in 
the code is 'query whether the driver is using execlists as a hardware 
submission method'. So it seems like that was the original intention. 
Whether it has been broken since or was just broken from the beginning 
is unclear.


John.



Re: [Intel-gfx] [PATCH] drm/i915/selftests: Increase timeout in requests perf selftest

2021-10-20 Thread John Harrison

On 10/11/2021 10:57, Matthew Brost wrote:

perf_parallel_engines is micro benchmark to test i915 request
scheduling. The test creates a thread per physical engine and submits
NOP requests and waits the requests to complete in a loop. In execlists
mode this works perfectly fine as powerful CPU has enough cores to feed
each engine and process the CSBs. With GuC submission the uC gets
overwhelmed as all threads feed into a single CTB channel and the GuC
gets bombarded with CSBs as contexts are immediately switched in and out
on the engines due to the zero runtime of the requests. When the GuC is
overwhelmed scheduling of contexts is unfair due to the nature of the
GuC scheduling algorithm. This behavior is understood and deemed
acceptable as this micro benchmark isn't close to real world use case.
Increasing the timeout of wait period for requests to complete. This
makes the test understand that is ok for contexts to get starved in this
scenario.

A future patch / cleanup may just delete these micro benchmark tests as
they basically mean nothing. We care about real workloads not made up
ones.

Signed-off-by: Matthew Brost 

Reviewed-by: John Harrison 


---
  drivers/gpu/drm/i915/selftests/i915_request.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c 
b/drivers/gpu/drm/i915/selftests/i915_request.c
index d67710d10615..6496671a113c 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -2805,7 +2805,7 @@ static int p_sync0(void *arg)
i915_request_add(rq);
  
  		err = 0;

-   if (i915_request_wait(rq, 0, HZ / 5) < 0)
+   if (i915_request_wait(rq, 0, HZ) < 0)
err = -ETIME;
i915_request_put(rq);
if (err)
@@ -2876,7 +2876,7 @@ static int p_sync1(void *arg)
i915_request_add(rq);
  
  		err = 0;

-   if (prev && i915_request_wait(prev, 0, HZ / 5) < 0)
+   if (prev && i915_request_wait(prev, 0, HZ) < 0)
err = -ETIME;
i915_request_put(prev);
prev = rq;




Re: [Intel-gfx] [PATCH] drm/i915/selftests: Allow engine reset failure to do a GT reset in hangcheck selftest

2021-10-22 Thread John Harrison

On 10/22/2021 10:03, Matthew Brost wrote:

On Fri, Oct 22, 2021 at 08:23:55AM +0200, Thomas Hellström wrote:

On 10/21/21 22:37, Matthew Brost wrote:

On Thu, Oct 21, 2021 at 08:15:49AM +0200, Thomas Hellström wrote:

Hi, Matthew,

On Mon, 2021-10-11 at 16:47 -0700, Matthew Brost wrote:

The hangcheck selftest blocks per engine resets by setting magic bits
in
the reset flags. This is incorrect for GuC submission because if the
GuC
fails to reset an engine we would like to do a full GT reset. Do no
set
these magic bits when using GuC submission.

Side note this lockless algorithm with magic bits to block resets
really
should be ripped out.


Lockless algorithm aside, from a quick look at the code in
intel_reset.c it appears to me like the interface that falls back to a
full GT reset is intel_gt_handle_error() whereas intel_engine_reset()
is explicitly intended to not do that, so is there a discrepancy
between GuC and non-GuC here?


With GuC submission when an engine reset fails, we get an engine reset
failure notification which triggers a full GT reset
(intel_guc_engine_failure_process_msg in intel_guc_submission.c). That
reset is blocking by setting these magic bits. Clearing the bits in this
function doesn't seem to unblock that reset either, the driver tries to
unload with a worker blocked, and results in the blow up. Something with
this lockless algorithm could be wrong as clear of the bit should
unlblock the reset but it is doesn't. We can look into that but in the
meantime we need to fix this test to be able to fail gracefully and not
crash CI.

Yeah, for that lockless algorithm if needed, we might want to use a ww_mutex
per engine or something,

Do ww_mutex sleep? From what I can tell this lockless algorithm was
added because even though resets are protected by mutex, there are some
places in the IRQ context where we need to prevent resets from
happening, hence the lockless protection + the mutex - what a mess. Long
term this needs to rethought.


but point was that AFAICT at least one of the tests that set those flags
explicitly tested the functionality that no other engines than the intended
one was reset when the intel_engine_reset() function was used, and then if
GuC submission doesn't honor that, wouldn't a better approach be to make a

No. In execlists this test explictly calls the engine reset function and
explictly prevents other parts of the i915 from calling the engine reset
function - this is why it sets that bit.

In GuC submission the i915 can't do engine resets, the GuC does. In this
case the engine reset fails which triggers a G2H message which tells the
i915 to do a GT reset. If this bit is set the worker blocks on this bit
in the GT reset and the driver blows up on unload as this worker isn't
complete (believe it has a PM ref or something).


code comment around intel_engine_reset() to explain the differences and

intel_engine_reset() return -ENODEV in GuC submission as the i915 isn't
allowed to engine resets.


disable that particular test for GuC?. Also wouldn't we for example we see a
duplicated full GT reset with GuC if intel_engine_reset() fails as part of
the intel_gt_handle_error() function?


Yes, but the GT reset in this test is done after clearing the bits by
the test. In the case of the GuC the GT reset is async operation done by
a worker that receives the G2H message saying the engine reset failed.


I guess we could live with the hangcheck test being disabled for guc
submission until this is sorted out?


Wouldn't help. See above this an async operation from G2H message. We
can't disable the async G2H handler as without other G2H messages the
world breaks. The only other possible fix would be add an IGT only
variable that if set skips the handling this G2H only.
And to be clear, the engine reset is not supposed to fail. Whether 
issued by GuC or i915, the GDRST register is supposed to self clear 
according to the bspec. If we are being sent the G2H notification for an 
engine reset failure then the assumption is that the hardware is broken. 
This is not a situation that is ever intended to occur in a production 
system. Therefore, it is not something we should spend huge amounts of 
effort on making a perfect selftest for.


The current theory is that the timeout in GuC is not quite long enough 
for DG1. Given that the bspec does not specify any kind of timeout, it 
is only a best guess anyway! Once that has been tuned correctly, we 
should never hit this case again. Not ever, Not in a selftest, not in an 
end user use case, just not ever.


John.




I can assure with this patch, if the test fails, it fails gracefully
which is what we want.

Matt


/Thomas



Matt


/Thomas



Signed-off-by: Matthew Brost 
---
   drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 12 
   1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
index 7e2d99dd012d..90a03c60c80c 100644

Re: [Intel-gfx] [PATCH 00/47] GuC submission support

2021-10-25 Thread John Harrison

On 10/25/2021 02:37, Joonas Lahtinen wrote:

Quoting Matthew Brost (2021-10-22 19:42:19)

On Fri, Oct 22, 2021 at 12:35:04PM +0300, Joonas Lahtinen wrote:

Hi Matt & John,

Can you please queue patches with the right Fixes: references to convert
all the GuC tracepoints to be protected by the LOW_LEVEL_TRACEPOINTS
protection for now. Please do so before next Wednesday so we get it
queued in drm-intel-next-fixes.


Don't we already do that? I checked i915_trace.h and every tracepoint I
added (intel_context class, i915_request_guc_submit) is protected by
LOW_LEVEL_TRACEPOINTS.

The only thing I changed outside of that protection is adding the guc_id
field to existing i915_request class tracepoints.

It's the first search hit for "guc" inside the i915_trace.h file :)


Without the guc_id in
those tracepoints these are basically useless with GuC submission. We
could revert that if it is a huge deal but as I said then they are
useless...

Let's eliminate it for now and restore the tracepoint exactly as it was.

For what purpose?

Your request above was about not adding new tracepoints outside of a low 
level CONFIG setting. I can understand that on the grounds of not 
swamping high level tracing with low level details that are not 
important to the general developer.


But this is not about adding extra tracepoints, this is about making the 
existing tracepoints usable. With GuC submission, the GuC id is a vital 
piece of information. Without that, you cannot correlate anything that 
is happening between i915, GuC and the hardware. Which basically means 
that for a GuC submission based platform, those tracepoints are useless 
without this information. And GuC submission is POR for all platforms 
from ADL-P/DG1 onwards. So by not allowing this update, you are 
preventing any kind of meaningful debug of any scheduling/execution type 
issues.


Again, if you are wanting to reduce spam in higher level debug then 
sure, make the entire set of scheduling tracepoints LOW_LEVEL only. But 
keeping them around in a censored manner is pointless. They are not ABI, 
they are allowed to change as and when necessary. And now, it is 
necessary to update them to match the new POR submission model for 
current and all future platforms.





If there is an immediate need, we should instead have an auxilary tracepoint
which is enabled only through LOW_LEVEL_TRACEPOINTS and that amends the
information of the basic tracepoint.

For the longer term solution we should align towards the dma fence
tracepoints. When those are combined with the OA information, one should
be able to get a good understanding of both the software and hardware
scheduling decisions.
I don't follow this. OA information does not tell you any details of 
what the GuC is doing. DRM/DMA generic tracepoints certainly won't tell 
you any hardware/firmware or even i915 specific information.


And that is a much longer term goal than being able to debug current 
platforms with the current driver.


John.




Regards, Joonas


Matt


There's the orthogonal track to discuss what would be the stable set of
tracepoints we could expose. However, before that discussion is closed,
let's keep a rather strict line to avoid potential maintenance burned.

We can then relax in the future as needed.

Regards, Joonas

Quoting Matthew Brost (2021-06-24 10:04:29)

As discussed in [1], [2] we are enabling GuC submission support in the
i915. This is a subset of the patches in step 5 described in [1],
basically it is absolute to enable CI with GuC submission on gen11+
platforms.

This series itself will likely be broken down into smaller patch sets to
merge. Likely into CTBs changes, basic submission, virtual engines, and
resets.

A following series will address the missing patches remaining from [1].

Locally tested on TGL machine and basic tests seem to be passing.

Signed-off-by: Matthew Brost 

[1] https://patchwork.freedesktop.org/series/89844/
[2] https://patchwork.freedesktop.org/series/91417/

Daniele Ceraolo Spurio (1):
   drm/i915/guc: Unblock GuC submission on Gen11+

John Harrison (10):
   drm/i915/guc: Module load failure test for CT buffer creation
   drm/i915: Track 'serial' counts for virtual engines
   drm/i915/guc: Provide mmio list to be saved/restored on engine reset
   drm/i915/guc: Don't complain about reset races
   drm/i915/guc: Enable GuC engine reset
   drm/i915/guc: Fix for error capture after full GPU reset with GuC
   drm/i915/guc: Hook GuC scheduling policies up
   drm/i915/guc: Connect reset modparam updates to GuC policy flags
   drm/i915/guc: Include scheduling policies in the debugfs state dump
   drm/i915/guc: Add golden context to GuC ADS

Matthew Brost (36):
   drm/i915/guc: Relax CTB response timeout
   drm/i915/guc: Improve error message for unsolicited CT response
   drm/i915/guc: Increase size of CTB buffers
   drm/i915/guc: Add non blocking CTB send function
   drm/i915/guc: A

Re: [Intel-gfx] [PATCH] drm/i915/selftests: Allow engine reset failure to do a GT reset in hangcheck selftest

2021-10-25 Thread John Harrison

On 10/23/2021 11:36, Thomas Hellström wrote:

On 10/23/21 20:18, Matthew Brost wrote:

On Sat, Oct 23, 2021 at 07:46:48PM +0200, Thomas Hellström wrote:

On 10/22/21 20:09, John Harrison wrote:

And to be clear, the engine reset is not supposed to fail. Whether
issued by GuC or i915, the GDRST register is supposed to self clear
according to the bspec. If we are being sent the G2H notification 
for an
engine reset failure then the assumption is that the hardware is 
broken.

This is not a situation that is ever intended to occur in a production
system. Therefore, it is not something we should spend huge amounts of
effort on making a perfect selftest for.

I don't agree. Selftests are there to verify that assumptions made and
contracts in the code hold and that hardware behaves as intended / 
assumed.
No selftest should ideally trigger in a production driver / system. 
That
doesn't mean we can remove all selftests or ignore updating them for 
altered
assumptions / contracts. I think it's important here to acknowledge 
the fact

that this and the perf selftest have found two problems that need
consideration for fixing for a production system.


I'm confused - we are going down the rabbit hole here.

Back to this patch. This test was written for very specific execlists
behavior. It was updated to also support the GuC. In that update we
missed fixing the failure path, well because it always passed. Now it
has failed, we see that it doesn't fail gracefully, and takes down the
machine. This patch fixes that. It also openned my eyes to the horror
show reset locking that needs to be fixed long term.


Well the email above wasn't really about the correctness of this 
particular patch (I should probably have altered the subject to 
reflect that) but rather about the assumption that failures that 
should never occur in a production system are not worth spending time 
on selftests for.
My point is that we have to make assumptions that the hardware is 
basically functional. Otherwise, where do you stop? Do you write a 
selftest for every conceivable operation of the hardware and prove that 
it still works every single day? No. That is pointless and we don't have 
the resources to test everything that the hardware can possibly do. We 
have to cope as gracefully as possible in the case where the hardware 
does not behave as intended, such as not killing the entire OS when a 
selftest fails. But I don't think we should be spending time on writing 
a perfect test for something that is supposed to be impossible at the 
hardware level. The purpose of the selftests is to test the driver 
behaviour, not the hardware.


John.



For the patch itself, I'll take a deeper look at the patch and get back.

/Thomas






Re: [Intel-gfx] [PATCH] drm/i915/trace: Hide backend specific fields behind Kconfig

2021-10-25 Thread John Harrison

On 10/25/2021 09:34, Matthew Brost wrote:

Hide the guc_id and tail fields, for request trace points, behind
CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig option. Trace points
are ABI (maybe?) so don't change them without kernel developers Kconfig
options.
The i915 sw arch team have previously hard blocked requests for changes 
to trace points from user land tool developers on the grounds that trace 
points are not ABI and are free to change at whim as and when the i915 
internal implementation changes. They are purely for use of developers 
to debug the i915 driver as the i915 driver currently stands at any 
given instant.


So I don't see how it can be argued that we must not update any trace 
points to allow for debugging of i915 scheduling issues on current 
platforms. And having to enable extra config options just to keep 
existing higher level trace points usable seems broken.


John.




Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/i915_trace.h | 27 +++
  1 file changed, 27 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_trace.h 
b/drivers/gpu/drm/i915/i915_trace.h
index 9795f456cccf..4f5238d02b51 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -787,6 +787,7 @@ TRACE_EVENT(i915_request_queue,
  __entry->ctx, __entry->seqno, __entry->flags)
  );
  
+#if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS)

  DECLARE_EVENT_CLASS(i915_request,
TP_PROTO(struct i915_request *rq),
TP_ARGS(rq),
@@ -816,6 +817,32 @@ DECLARE_EVENT_CLASS(i915_request,
  __entry->guc_id, __entry->ctx, __entry->seqno,
  __entry->tail)
  );
+#else
+DECLARE_EVENT_CLASS(i915_request,
+   TP_PROTO(struct i915_request *rq),
+   TP_ARGS(rq),
+
+   TP_STRUCT__entry(
+__field(u32, dev)
+__field(u64, ctx)
+__field(u16, class)
+__field(u16, instance)
+__field(u32, seqno)
+),
+
+   TP_fast_assign(
+  __entry->dev = rq->engine->i915->drm.primary->index;
+  __entry->class = rq->engine->uabi_class;
+  __entry->instance = rq->engine->uabi_instance;
+  __entry->ctx = rq->fence.context;
+  __entry->seqno = rq->fence.seqno;
+  ),
+
+   TP_printk("dev=%u, engine=%u:%u, ctx=%llu, seqno=%u",
+ __entry->dev, __entry->class, __entry->instance,
+ __entry->ctx, __entry->seqno)
+);
+#endif
  
  DEFINE_EVENT(i915_request, i915_request_add,

 TP_PROTO(struct i915_request *rq),




Re: [Intel-gfx] [PATCH] drm/i915/selftests: Allow engine reset failure to do a GT reset in hangcheck selftest

2021-10-26 Thread John Harrison

On 10/11/2021 16:47, Matthew Brost wrote:

The hangcheck selftest blocks per engine resets by setting magic bits in
the reset flags. This is incorrect for GuC submission because if the GuC
fails to reset an engine we would like to do a full GT reset. Do no set
these magic bits when using GuC submission.

Side note this lockless algorithm with magic bits to block resets really
should be ripped out.
As a first step, I am seeing a pointless BUILD_BUG but no BUILD_BUG at 
all for what really needs to be verified. Specifically, in 
intel_gt_handle_error, inside the engine reset loop, there is:
    BUILD_BUG_ON(I915_RESET_MODESET >= 
I915_RESET_ENGINE);


Given that the above two values are explicit #defines of '1' and '2'. 
I'm not seeing any value to this assert. On the other hand, what I am 
not seeing anywhere is an assert that 'I915_RESET_ENGINE + max_engine_id 
< I915_WEDGED_ON_INIT'. That being the thing that would actually go 
horribly wrong if the engine count increased too far. Seems like there 
should be one of those in intel_engines_init_mmio, using 
ARRAY_SIZE(intel_engines) as the max id.



It looks like 'busy-reset' and 'reset-idle' parts of 'igt_ctx_sseu' in 
gem/selftests/i915_gem_context.c also mess around with these flags and 
then try to issue a manual engine reset. Presumably those tests are also 
going to have issues with GuC submission.


The workarounds, mocs and reset selftests also do strange things. Those 
ones did get updated to support GuC submission by not attempting manual 
engine resets but using the GuC based hang detection instead. However, 
it seems like they would also suffer the same deadlock scenario if the 
GuC based reset were to fail.


I'm wondering if the better fix is to remove the use of the 
I915_RESET_ENGINE flags completely when using GuC submission. So far as 
I can tell, they are only used (outside of selftest hackery) in 
intel_gt_handle_error to guard against multiple concurrent resets within 
i915. Guarding against multiple concurrent resets in GuC is the GuC's 
job. That leaves GuC based engine reset concurrent with i915 based full 
GT reset. But that is fine because the full GT reset toasts all engines 
AND the GuC. So it doesn't matter what GuC might or might not have been 
doing at the time. The only other issue is multiple concurrent full GT 
resets, but that is protected against by I915_RESET_BACKOFF.


So my thought is to add an 'if(!guc_submission)' wrapper around the set 
and clear of the reset flags immediately before/after the call to 
intel_gt_reset_global().


Fixing it there means the selftests can do what they like with the flags 
without causing problems for GuC submission. It also means being one 
step closer to removing the dodgy lockless locking completely, at least 
when using GuC submission.


John.




Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 12 
  1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c 
b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
index 7e2d99dd012d..90a03c60c80c 100644
--- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
+++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
@@ -734,7 +734,8 @@ static int __igt_reset_engine(struct intel_gt *gt, bool 
active)
reset_engine_count = i915_reset_engine_count(global, engine);
  
  		st_engine_heartbeat_disable(engine);

-   set_bit(I915_RESET_ENGINE + id, >->reset.flags);
+   if (!using_guc)
+   set_bit(I915_RESET_ENGINE + id, >->reset.flags);
count = 0;
do {
struct i915_request *rq = NULL;
@@ -824,7 +825,8 @@ static int __igt_reset_engine(struct intel_gt *gt, bool 
active)
if (err)
break;
} while (time_before(jiffies, end_time));
-   clear_bit(I915_RESET_ENGINE + id, >->reset.flags);
+   if (!using_guc)
+   clear_bit(I915_RESET_ENGINE + id, >->reset.flags);
st_engine_heartbeat_enable(engine);
pr_info("%s: Completed %lu %s resets\n",
engine->name, count, active ? "active" : "idle");
@@ -1042,7 +1044,8 @@ static int __igt_reset_engines(struct intel_gt *gt,
yield(); /* start all threads before we begin */
  
  		st_engine_heartbeat_disable_no_pm(engine);

-   set_bit(I915_RESET_ENGINE + id, >->reset.flags);
+   if (!using_guc)
+   set_bit(I915_RESET_ENGINE + id, >->reset.flags);
do {
struct i915_request *rq = NULL;
struct intel_selftest_saved_policy saved;
@@ -1165,7 +1168,8 @@ static int __igt_reset_engines(struct intel_gt *gt,
if (err)
break;
} while (time_before(jiffies, end_time)

Re: [Intel-gfx] [PATCH] drm/i915/selftests: Allow engine reset failure to do a GT reset in hangcheck selftest

2021-10-26 Thread John Harrison

On 10/21/2021 23:23, Thomas Hellström wrote:

On 10/21/21 22:37, Matthew Brost wrote:

On Thu, Oct 21, 2021 at 08:15:49AM +0200, Thomas Hellström wrote:

Hi, Matthew,

On Mon, 2021-10-11 at 16:47 -0700, Matthew Brost wrote:

The hangcheck selftest blocks per engine resets by setting magic bits
in
the reset flags. This is incorrect for GuC submission because if the
GuC
fails to reset an engine we would like to do a full GT reset. Do no
set
these magic bits when using GuC submission.

Side note this lockless algorithm with magic bits to block resets
really
should be ripped out.


Lockless algorithm aside, from a quick look at the code in
intel_reset.c it appears to me like the interface that falls back to a
full GT reset is intel_gt_handle_error() whereas intel_engine_reset()
is explicitly intended to not do that, so is there a discrepancy
between GuC and non-GuC here?


With GuC submission when an engine reset fails, we get an engine reset
failure notification which triggers a full GT reset
(intel_guc_engine_failure_process_msg in intel_guc_submission.c). That
reset is blocking by setting these magic bits. Clearing the bits in this
function doesn't seem to unblock that reset either, the driver tries to
unload with a worker blocked, and results in the blow up. Something with
this lockless algorithm could be wrong as clear of the bit should
unlblock the reset but it is doesn't. We can look into that but in the
meantime we need to fix this test to be able to fail gracefully and not
crash CI.


Yeah, for that lockless algorithm if needed, we might want to use a 
ww_mutex per engine or something,
but point was that AFAICT at least one of the tests that set those 
flags explicitly tested the functionality that no other engines than 
the intended one was reset when the intel_engine_reset() function was 
used, and then if GuC submission doesn't honor that, wouldn't a better 
approach be to make a code comment around intel_engine_reset() to 
explain the differences and disable that particular test for GuC?. 
Also wouldn't we for example we see a duplicated full GT reset with 
GuC if intel_engine_reset() fails as part of the 
intel_gt_handle_error() function?

Re-reading this thread, I think there is a misunderstanding.

The selftests themselves have already been updated to support GuC based 
engine resets. That is done by submitting a hanging context and letting 
the GuC detect the hang and issue a reset. There is no mechanism 
available for i915 to directly issue or request an engine based reset 
(because i915 does not know what is running on any given engine at any 
given time, being disconnected from the scheduler).


So the tests are already correctly testing per engine resets and do not 
go anywhere near either intel_engine_reset() or intel_gt_handle_error() 
when GuC submission is used. The problem is what happens if the engine 
reset fails (which supposedly can only happen with broken hardware). In 
that scenario, there is an asynchronous message from GuC to i915 to 
notify us of the failure. The KMD receives that notification and then 
(eventually) calls intel_gt_handle_error() to issue a full GT reset. 
However, that is blocked because the selftest is not expecting it and 
has vetoed the possibility. A fix is required to allow that full GT 
reset to proceed and recover the hardware. At that point, the selftest 
should indeed fail because the reset was larger than expected. That 
should be handled by the fact the selftest issued work to other engines 
beside the target and expects those requests to complete successfully. 
In the case of the escalated GT reset, all those requests will be killed 
off as well. Thus the test will correctly fail.


John.




I guess we could live with the hangcheck test being disabled for guc 
submission until this is sorted out?


/Thomas




Matt


/Thomas



Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 12 
  1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
index 7e2d99dd012d..90a03c60c80c 100644
--- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
+++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
@@ -734,7 +734,8 @@ static int __igt_reset_engine(struct intel_gt
*gt, bool active)
 reset_engine_count = i915_reset_engine_count(global,
engine);
   st_engine_heartbeat_disable(engine);
-   set_bit(I915_RESET_ENGINE + id, >->reset.flags);
+   if (!using_guc)
+   set_bit(I915_RESET_ENGINE + id, >-

reset.flags);

 count = 0;
 do {
 struct i915_request *rq = NULL;
@@ -824,7 +825,8 @@ static int __igt_reset_engine(struct intel_gt
*gt, bool active)
 if (err)
 break;
 } while (time_before(jiffies, end_time));
-   clear_bit(I

Re: [Intel-gfx] [PATCH] drm/i915/execlists: Weak parallel submission support for execlists

2021-10-26 Thread John Harrison

On 10/20/2021 14:47, Matthew Brost wrote:

A weak implementation of parallel submission (multi-bb execbuf IOCTL) for
execlists. Doing as little as possible to support this interface for
execlists - basically just passing submit fences between each request
generated and virtual engines are not allowed. This is on par with what
is there for the existing (hopefully soon deprecated) bonding interface.

We perma-pin these execlists contexts to align with GuC implementation.

v2:
  (John Harrison)
   - Drop siblings array as num_siblings must be 1

Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gem/i915_gem_context.c   | 10 +++--
  drivers/gpu/drm/i915/gt/intel_context.c   |  4 +-
  .../drm/i915/gt/intel_execlists_submission.c  | 44 ++-
  drivers/gpu/drm/i915/gt/intel_lrc.c   |  2 +
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  2 -
  5 files changed, 52 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index fb33d0322960..35e87a7d0ea9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -570,10 +570,6 @@ set_proto_ctx_engines_parallel_submit(struct 
i915_user_extension __user *base,
struct intel_engine_cs **siblings = NULL;
intel_engine_mask_t prev_mask;
  
-	/* FIXME: This is NIY for execlists */

-   if (!(intel_uc_uses_guc_submission(&i915->gt.uc)))
-   return -ENODEV;
-
if (get_user(slot, &ext->engine_index))
return -EFAULT;
  
@@ -583,6 +579,12 @@ set_proto_ctx_engines_parallel_submit(struct i915_user_extension __user *base,

if (get_user(num_siblings, &ext->num_siblings))
return -EFAULT;
  
+	if (!intel_uc_uses_guc_submission(&i915->gt.uc) && num_siblings != 1) {

+   drm_dbg(&i915->drm, "Only 1 sibling (%d) supported in non-GuC 
mode\n",
+   num_siblings);
+   return -EINVAL;
+   }
+
if (slot >= set->num_engines) {
drm_dbg(&i915->drm, "Invalid placement value, %d >= %d\n",
slot, set->num_engines);
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index 5634d14052bc..1bec92e1d8e6 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -79,7 +79,8 @@ static int intel_context_active_acquire(struct intel_context 
*ce)
  
  	__i915_active_acquire(&ce->active);
  
-	if (intel_context_is_barrier(ce) || intel_engine_uses_guc(ce->engine))

+   if (intel_context_is_barrier(ce) || intel_engine_uses_guc(ce->engine) ||
+   intel_context_is_parallel(ce))
return 0;
  
  	/* Preallocate tracking nodes */

@@ -563,7 +564,6 @@ void intel_context_bind_parent_child(struct intel_context 
*parent,
 * Callers responsibility to validate that this function is used
 * correctly but we use GEM_BUG_ON here ensure that they do.
 */
-   GEM_BUG_ON(!intel_engine_uses_guc(parent->engine));
GEM_BUG_ON(intel_context_is_pinned(parent));
GEM_BUG_ON(intel_context_is_child(parent));
GEM_BUG_ON(intel_context_is_pinned(child));
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index bedb80057046..2865b422300d 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -927,8 +927,7 @@ static void execlists_submit_ports(struct intel_engine_cs 
*engine)
  
  static bool ctx_single_port_submission(const struct intel_context *ce)

  {
-   return (IS_ENABLED(CONFIG_DRM_I915_GVT) &&
-   intel_context_force_single_submission(ce));
+   return intel_context_force_single_submission(ce);

I think this is actually going to break GVT.

Not so much this change here but the whole use of single submission 
outside of GVT. It looks like the GVT driver overloads the single 
submission flag to tag requests that it owns. If we start using that 
flag elsewhere when GVT is active, I think that will cause much 
confusion within the GVT code.


The correct fix would be to create a new flag just for GVT usage 
alongside the single submission one. GVT would then set both but only 
check for its own private flag. The parallel code would obviously only 
set the existing single submission flag.




  }
  
  static bool can_merge_ctx(const struct intel_context *prev,

@@ -2598,6 +2597,46 @@ static void execlists_context_cancel_request(struct 
intel_context *ce,
  current->comm);
  }
  
+static struct intel_context *

+execlists_create_parallel(struct intel_engine_cs **engines,
+ unsigned int num_siblings,
+

Re: [Intel-gfx] [PATCH] drm/i915/execlists: Weak parallel submission support for execlists

2021-10-27 Thread John Harrison

On 10/27/2021 12:17, Matthew Brost wrote:

On Tue, Oct 26, 2021 at 02:58:00PM -0700, John Harrison wrote:

On 10/20/2021 14:47, Matthew Brost wrote:

A weak implementation of parallel submission (multi-bb execbuf IOCTL) for
execlists. Doing as little as possible to support this interface for
execlists - basically just passing submit fences between each request
generated and virtual engines are not allowed. This is on par with what
is there for the existing (hopefully soon deprecated) bonding interface.

We perma-pin these execlists contexts to align with GuC implementation.

v2:
   (John Harrison)
- Drop siblings array as num_siblings must be 1

Signed-off-by: Matthew Brost 
---
   drivers/gpu/drm/i915/gem/i915_gem_context.c   | 10 +++--
   drivers/gpu/drm/i915/gt/intel_context.c   |  4 +-
   .../drm/i915/gt/intel_execlists_submission.c  | 44 ++-
   drivers/gpu/drm/i915/gt/intel_lrc.c   |  2 +
   .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  2 -
   5 files changed, 52 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index fb33d0322960..35e87a7d0ea9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -570,10 +570,6 @@ set_proto_ctx_engines_parallel_submit(struct 
i915_user_extension __user *base,
struct intel_engine_cs **siblings = NULL;
intel_engine_mask_t prev_mask;
-   /* FIXME: This is NIY for execlists */
-   if (!(intel_uc_uses_guc_submission(&i915->gt.uc)))
-   return -ENODEV;
-
if (get_user(slot, &ext->engine_index))
return -EFAULT;
@@ -583,6 +579,12 @@ set_proto_ctx_engines_parallel_submit(struct 
i915_user_extension __user *base,
if (get_user(num_siblings, &ext->num_siblings))
return -EFAULT;
+   if (!intel_uc_uses_guc_submission(&i915->gt.uc) && num_siblings != 1) {
+   drm_dbg(&i915->drm, "Only 1 sibling (%d) supported in non-GuC 
mode\n",
+   num_siblings);
+   return -EINVAL;
+   }
+
if (slot >= set->num_engines) {
drm_dbg(&i915->drm, "Invalid placement value, %d >= %d\n",
slot, set->num_engines);
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index 5634d14052bc..1bec92e1d8e6 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -79,7 +79,8 @@ static int intel_context_active_acquire(struct intel_context 
*ce)
__i915_active_acquire(&ce->active);
-   if (intel_context_is_barrier(ce) || intel_engine_uses_guc(ce->engine))
+   if (intel_context_is_barrier(ce) || intel_engine_uses_guc(ce->engine) ||
+   intel_context_is_parallel(ce))
return 0;
/* Preallocate tracking nodes */
@@ -563,7 +564,6 @@ void intel_context_bind_parent_child(struct intel_context 
*parent,
 * Callers responsibility to validate that this function is used
 * correctly but we use GEM_BUG_ON here ensure that they do.
 */
-   GEM_BUG_ON(!intel_engine_uses_guc(parent->engine));
GEM_BUG_ON(intel_context_is_pinned(parent));
GEM_BUG_ON(intel_context_is_child(parent));
GEM_BUG_ON(intel_context_is_pinned(child));
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index bedb80057046..2865b422300d 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -927,8 +927,7 @@ static void execlists_submit_ports(struct intel_engine_cs 
*engine)
   static bool ctx_single_port_submission(const struct intel_context *ce)
   {
-   return (IS_ENABLED(CONFIG_DRM_I915_GVT) &&
-   intel_context_force_single_submission(ce));
+   return intel_context_force_single_submission(ce);

I think this is actually going to break GVT.

Not so much this change here but the whole use of single submission outside
of GVT. It looks like the GVT driver overloads the single submission flag to
tag requests that it owns. If we start using that flag elsewhere when GVT is
active, I think that will cause much confusion within the GVT code.

The correct fix would be to create a new flag just for GVT usage alongside
the single submission one. GVT would then set both but only check for its
own private flag. The parallel code would obviously only set the existing
single submission flag.


Ok, see below.


   }
   static bool can_merge_ctx(const struct intel_context *prev,
@@ -2598,6 +2597,46 @@ static void execlists_context_cancel_request(struct 
intel_context *ce,
  current->comm);
   }
+static struct intel_context *
+ex

Re: [Intel-gfx] [PATCH] drm/i915/selftests: Allow engine reset failure to do a GT reset in hangcheck selftest

2021-10-27 Thread John Harrison

On 10/26/2021 23:36, Thomas Hellström wrote:

Hi, John,

On 10/26/21 21:55, John Harrison wrote:

On 10/21/2021 23:23, Thomas Hellström wrote:

On 10/21/21 22:37, Matthew Brost wrote:

On Thu, Oct 21, 2021 at 08:15:49AM +0200, Thomas Hellström wrote:

Hi, Matthew,

On Mon, 2021-10-11 at 16:47 -0700, Matthew Brost wrote:
The hangcheck selftest blocks per engine resets by setting magic 
bits

in
the reset flags. This is incorrect for GuC submission because if the
GuC
fails to reset an engine we would like to do a full GT reset. Do no
set
these magic bits when using GuC submission.

Side note this lockless algorithm with magic bits to block resets
really
should be ripped out.


Lockless algorithm aside, from a quick look at the code in
intel_reset.c it appears to me like the interface that falls back 
to a

full GT reset is intel_gt_handle_error() whereas intel_engine_reset()
is explicitly intended to not do that, so is there a discrepancy
between GuC and non-GuC here?


With GuC submission when an engine reset fails, we get an engine reset
failure notification which triggers a full GT reset
(intel_guc_engine_failure_process_msg in intel_guc_submission.c). That
reset is blocking by setting these magic bits. Clearing the bits in 
this
function doesn't seem to unblock that reset either, the driver 
tries to
unload with a worker blocked, and results in the blow up. Something 
with

this lockless algorithm could be wrong as clear of the bit should
unlblock the reset but it is doesn't. We can look into that but in the
meantime we need to fix this test to be able to fail gracefully and 
not

crash CI.


Yeah, for that lockless algorithm if needed, we might want to use a 
ww_mutex per engine or something,
but point was that AFAICT at least one of the tests that set those 
flags explicitly tested the functionality that no other engines than 
the intended one was reset when the intel_engine_reset() function 
was used, and then if GuC submission doesn't honor that, wouldn't a 
better approach be to make a code comment around 
intel_engine_reset() to explain the differences and disable that 
particular test for GuC?. Also wouldn't we for example we see a 
duplicated full GT reset with GuC if intel_engine_reset() fails as 
part of the intel_gt_handle_error() function?

Re-reading this thread, I think there is a misunderstanding.

The selftests themselves have already been updated to support GuC 
based engine resets. That is done by submitting a hanging context and 
letting the GuC detect the hang and issue a reset. There is no 
mechanism available for i915 to directly issue or request an engine 
based reset (because i915 does not know what is running on any given 
engine at any given time, being disconnected from the scheduler).


So the tests are already correctly testing per engine resets and do 
not go anywhere near either intel_engine_reset() or 
intel_gt_handle_error() when GuC submission is used. The problem is 
what happens if the engine reset fails (which supposedly can only 
happen with broken hardware). In that scenario, there is an 
asynchronous message from GuC to i915 to notify us of the failure. 
The KMD receives that notification and then (eventually) calls 
intel_gt_handle_error() to issue a full GT reset. However, that is 
blocked because the selftest is not expecting it and has vetoed the 
possibility.


This is where my understanding of the discussion differs. According to 
Matthew, the selftest actually proceeds to clear the bits, but the 
worker that calls into intel_gt_handle_error() never wakes up. (and 
that's probably due to clear_bit() being used instead of 
clear_and_wake_up_bit()).
Hmm, missed that point. Yeah, sounds like the missing wake_up suffix is 
what is causing the deadlock. I can't see any other reason why the reset 
handler would not proceed once the flags are cleared. And it looks like 
the selftest should timeout out waiting for the request and continue on 
to clear the bits just fine.





And my problem with this particular patch is that it adds even more 
"if (!guc_submission)" which is already sprinkled all over the place 
in the selftests to the point that it becomes difficult to see what 
(if anything) the tests are really testing.
I agree with this. Fixing the problem at source seems like a better 
solution than hacking lots of different bits in different tests.



For example igt_reset_nop_engine() from a cursory look looks like it's 
doing something but inside the engine loop it becomes clear that the 
test doesn't do *anything* except iterate over engines. Same for 
igt_reset_engines() in the !TEST_ACTIVE case and for 
igt_reset_idle_engine(). For some other tests the reset_count checks 
are gone, leaving only a test that we actually do a reset.
The nop_engine test is meant to be a no-op. It is testing what happens 
when you reset an idle engine. That is not something we can do with GuC 
based engine resets - there 

Re: [Intel-gfx] [PATCH] drm/i915/resets: Don't set / test for per-engine reset bits with GuC submission

2021-10-28 Thread John Harrison

On 10/28/2021 15:42, Matthew Brost wrote:

Don't set, test for, or clear per-engine reset bits with GuC submission
as the GuC owns the per engine resets not the i915. Setting, testing
for, and clearing these bits is causing issues with the hangcheck
selftest. Rather than change to test to not use these bits, rip the use
of these bits out from the reset code.
To be clear, there are other tests poking these bits in addition to 
hangcheck. Not sure if they would suffer from the same problems but I 
don't see why they wouldn't.


Reviewed-by: John Harrison 




Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/intel_reset.c | 27 +--
  1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c 
b/drivers/gpu/drm/i915/gt/intel_reset.c
index 91200c43951f..51b56b8e5003 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -1367,20 +1367,27 @@ void intel_gt_handle_error(struct intel_gt *gt,
/* Make sure i915_reset_trylock() sees the I915_RESET_BACKOFF */
synchronize_rcu_expedited();
  
-	/* Prevent any other reset-engine attempt. */

-   for_each_engine(engine, gt, tmp) {
-   while (test_and_set_bit(I915_RESET_ENGINE + engine->id,
-   >->reset.flags))
-   wait_on_bit(>->reset.flags,
-   I915_RESET_ENGINE + engine->id,
-   TASK_UNINTERRUPTIBLE);
+   /*
+* Prevent any other reset-engine attempt. We don't do this for GuC
+* submission the GuC owns the per-engine reset, not the i915.
+*/
+   if (!intel_uc_uses_guc_submission(>->uc)) {
+   for_each_engine(engine, gt, tmp) {
+   while (test_and_set_bit(I915_RESET_ENGINE + engine->id,
+   >->reset.flags))
+   wait_on_bit(>->reset.flags,
+   I915_RESET_ENGINE + engine->id,
+   TASK_UNINTERRUPTIBLE);
+   }
}
  
  	intel_gt_reset_global(gt, engine_mask, msg);
  
-	for_each_engine(engine, gt, tmp)

-   clear_bit_unlock(I915_RESET_ENGINE + engine->id,
->->reset.flags);
+   if (!intel_uc_uses_guc_submission(>->uc)) {
+   for_each_engine(engine, gt, tmp)
+   clear_bit_unlock(I915_RESET_ENGINE + engine->id,
+>->reset.flags);
+   }
clear_bit_unlock(I915_RESET_BACKOFF, >->reset.flags);
smp_mb__after_atomic();
wake_up_all(>->reset.queue);




Re: [Intel-gfx] [igt-dev] [PATCH i-g-t 2/8] tests/i915/gem_exec_capture: Cope with larger page sizes

2021-10-29 Thread John Harrison

On 10/29/2021 10:39, Matthew Brost wrote:

On Thu, Oct 21, 2021 at 04:40:38PM -0700, john.c.harri...@intel.com wrote:

From: John Harrison 

At some point, larger than 4KB page sizes were added to the i915
driver. This included adding an informational line to the buffer
entries in error capture logs. However, the error capture test was not
updated to skip this string, thus it would silently abort processing.

Signed-off-by: John Harrison 
---
  tests/i915/gem_exec_capture.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
index 53649cdb2..47ca64dd6 100644
--- a/tests/i915/gem_exec_capture.c
+++ b/tests/i915/gem_exec_capture.c
@@ -484,6 +484,12 @@ static void many(int fd, int dir, uint64_t size, unsigned 
int flags)
addr |= strtoul(str + 1, &str, 16);
igt_assert(*str++ == '\n');
  
+		/* gtt_page_sizes = 0x0001 */

+   if (strncmp(str, "gtt_page_sizes = 0x", 19) == 0) {
+   str += 19 + 8;
+   igt_assert(*str++ == '\n');
+   }

Can you explain this logic to me, for the life of me I can't figure out
what this doing. That probably warrent's a more detailed comment too.
It's no different to the rest of the processing that this code was 
already doing.


if( start_of_current_line == "gtt_page_sizes = 0x") {
    current_line += strlen(above_string) + strlen(8-digit hex string);
    assert( next_character_of_current_line == end_of_line);
}

I.e. skip over any line that just contains the page size message.

John.



Matt


+
if (!(*str == ':' || *str == '~'))
continue;
  
--

2.25.1





Re: [Intel-gfx] [PATCH 06/27] drm/i915/guc: Take engine PM when a context is pinned with GuC submission

2021-09-13 Thread John Harrison

On 9/9/2021 17:41, Matthew Brost wrote:

On Thu, Sep 09, 2021 at 03:46:43PM -0700, John Harrison wrote:

On 8/20/2021 15:44, Matthew Brost wrote:

Taking a PM reference to prevent intel_gt_wait_for_idle from short
circuiting while a scheduling of user context could be enabled.

As with earlier PM patch, needs more explanation of what the problem is and
why it is only now a problem.



Same explaination, will add here.


v2:
   (Daniel Vetter)
- Add might_lock annotations to pin / unpin function

Signed-off-by: Matthew Brost 
---
   drivers/gpu/drm/i915/gt/intel_context.c   |  3 ++
   drivers/gpu/drm/i915/gt/intel_engine_pm.h | 15 
   drivers/gpu/drm/i915/gt/intel_gt_pm.h | 10 ++
   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 36 +--
   drivers/gpu/drm/i915/intel_wakeref.h  | 12 +++
   5 files changed, 73 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index c8595da64ad8..508cfe5770c0 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -240,6 +240,8 @@ int __intel_context_do_pin_ww(struct intel_context *ce,
if (err)
goto err_post_unpin;
+   intel_engine_pm_might_get(ce->engine);
+
if (unlikely(intel_context_is_closed(ce))) {
err = -ENOENT;
goto err_unlock;
@@ -313,6 +315,7 @@ void __intel_context_do_unpin(struct intel_context *ce, int 
sub)
return;
CE_TRACE(ce, "unpin\n");
+   intel_engine_pm_might_put(ce->engine);
ce->ops->unpin(ce);
ce->ops->post_unpin(ce);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.h 
b/drivers/gpu/drm/i915/gt/intel_engine_pm.h
index 17a5028ea177..3fe2ae1bcc26 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.h
@@ -9,6 +9,7 @@
   #include "i915_request.h"
   #include "intel_engine_types.h"
   #include "intel_wakeref.h"
+#include "intel_gt_pm.h"
   static inline bool
   intel_engine_pm_is_awake(const struct intel_engine_cs *engine)
@@ -31,6 +32,13 @@ static inline bool intel_engine_pm_get_if_awake(struct 
intel_engine_cs *engine)
return intel_wakeref_get_if_active(&engine->wakeref);
   }
+static inline void intel_engine_pm_might_get(struct intel_engine_cs *engine)
+{
+   if (!intel_engine_is_virtual(engine))
+   intel_wakeref_might_get(&engine->wakeref);

Why doesn't this need to iterate through the physical engines of the virtual
engine?


Yea, technically it should. This is just an annotation though to check
if we do something horribly wrong in our code. If we use any physical
engine in our stack this annotation should pop and we can fix it. I just
don't see what making this 100% correct for virtual engines buys us. If
you want I can fix this but thinking the more complex we make this
annotation the less likely it just gets compiled out with lockdep off
which is what we are aiming for.
But if the annotation is missing a fundamental lock then it is surely 
not actually going to do any good? Not sure if you need to iterate over 
all child engines + parent but it seems like you should be calling 
might_lock() on at least one engine's mutex to feed the lockdep annotation.


John.


Matt


John.


+   intel_gt_pm_might_get(engine->gt);
+}
+
   static inline void intel_engine_pm_put(struct intel_engine_cs *engine)
   {
intel_wakeref_put(&engine->wakeref);
@@ -52,6 +60,13 @@ static inline void intel_engine_pm_flush(struct 
intel_engine_cs *engine)
intel_wakeref_unlock_wait(&engine->wakeref);
   }
+static inline void intel_engine_pm_might_put(struct intel_engine_cs *engine)
+{
+   if (!intel_engine_is_virtual(engine))
+   intel_wakeref_might_put(&engine->wakeref);
+   intel_gt_pm_might_put(engine->gt);
+}
+
   static inline struct i915_request *
   intel_engine_create_kernel_request(struct intel_engine_cs *engine)
   {
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h 
b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
index a17bf0d4592b..3c173033ce23 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
@@ -31,6 +31,11 @@ static inline bool intel_gt_pm_get_if_awake(struct intel_gt 
*gt)
return intel_wakeref_get_if_active(>->wakeref);
   }
+static inline void intel_gt_pm_might_get(struct intel_gt *gt)
+{
+   intel_wakeref_might_get(>->wakeref);
+}
+
   static inline void intel_gt_pm_put(struct intel_gt *gt)
   {
intel_wakeref_put(>->wakeref);
@@ -41,6 +46,11 @@ static inline void intel_gt_pm_put_async(struct intel_gt *gt)
intel_wakeref_put_async(>->wakeref);
   }
+static inline void intel_gt_pm_might_put(struct intel_gt *gt)
+{
+   intel_wakeref_might_put(>-&

Re: [Intel-gfx] [PATCH 07/27] drm/i915/guc: Don't call switch_to_kernel_context with GuC submission

2021-09-13 Thread John Harrison

On 9/13/2021 09:54, Matthew Brost wrote:

On Thu, Sep 09, 2021 at 03:51:27PM -0700, John Harrison wrote:

On 8/20/2021 15:44, Matthew Brost wrote:

Calling switch_to_kernel_context isn't needed if the engine PM reference
is taken while all contexts are pinned. By not calling
switch_to_kernel_context we save on issuing a request to the engine.

I thought the intention of the switch_to_kernel was to ensure that the GPU
is not touching any user context and is basically idle. That is not a valid
assumption with an external scheduler such as GuC. So why is the description
above only mentioning PM references? What is the connection between the PM
ref and the switch_to_kernel?

Also, the comment in the code does not mention anything about PM references,
it just says 'not necessary with GuC' but no explanation at all.


Yea, this need to be explained better. How about this?

Calling switch_to_kernel_context isn't needed if the engine PM reference
is take while all user contexts have scheduling enabled. Once scheduling
is disabled on all user contexts the GuC is guaranteed to not touch any
user context state which is effectively the same pointing to a kernel
context.

Matt

I'm still not seeing how the PM reference is involved?

Also, IMHO the focus is wrong in the above text. The fundamental 
requirement is the ensure the hardware is idle. Execlist achieves this 
by switching to a safe context. GuC achieves it by disabling scheduling. 
Indeed, switching to a 'safe' context really has no effect with GuC 
submission. So 'effectively the same as pointing to a kernel context' is 
an incorrect description. I would go with something like:


   "This is execlist specific behaviour intended to ensure the GPU is
   idle by switching to a known 'safe' context. With GuC submission,
   the same idle guarantee is achieved by other means (disabling
   scheduling). Further, switching to a 'safe' context has no effect
   with GuC submission as the scheduler can just switch back again.
   FIXME: Move this backend scheduler specific behaviour into the
   scheduler backend."


John.





v2:
   (Daniel Vetter)
- Add FIXME comment about pushing switch_to_kernel_context to backend

Signed-off-by: Matthew Brost 
Reviewed-by: Daniel Vetter 
---
   drivers/gpu/drm/i915/gt/intel_engine_pm.c | 9 +
   1 file changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c 
b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
index 1f07ac4e0672..11fee66daf60 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
@@ -162,6 +162,15 @@ static bool switch_to_kernel_context(struct 
intel_engine_cs *engine)
unsigned long flags;
bool result = true;
+   /*
+* No need to switch_to_kernel_context if GuC submission
+*
+* FIXME: This execlists specific backend behavior in generic code, this

"This execlists" -> "This is execlist"

"this should be" -> "it should be"

John.


+* should be pushed to the backend.
+*/
+   if (intel_engine_uses_guc(engine))
+   return true;
+
/* GPU is pointing to the void, as good as in the kernel context. */
if (intel_gt_is_wedged(engine->gt))
return true;




Re: [Intel-gfx] [PATCH 09/27] drm/i915: Expose logical engine instance to user

2021-09-13 Thread John Harrison

On 8/20/2021 15:44, Matthew Brost wrote:

Expose logical engine instance to user via query engine info IOCTL. This
is required for split-frame workloads as these needs to be placed on
engines in a logically contiguous order. The logical mapping can change
based on fusing. Rather than having user have knowledge of the fusing we
simply just expose the logical mapping with the existing query engine
info IOCTL.

IGT: https://patchwork.freedesktop.org/patch/445637/?series=92854&rev=1
media UMD: link coming soon

v2:
  (Daniel Vetter)
   - Add IGT link, placeholder for media UMD

Cc: Tvrtko Ursulin 
Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/i915_query.c | 2 ++
  include/uapi/drm/i915_drm.h   | 8 +++-
  2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_query.c 
b/drivers/gpu/drm/i915/i915_query.c
index e49da36c62fb..8a72923fbdba 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -124,7 +124,9 @@ query_engine_info(struct drm_i915_private *i915,
for_each_uabi_engine(engine, i915) {
info.engine.engine_class = engine->uabi_class;
info.engine.engine_instance = engine->uabi_instance;
+   info.flags = I915_ENGINE_INFO_HAS_LOGICAL_INSTANCE;
info.capabilities = engine->uabi_capabilities;
+   info.logical_instance = ilog2(engine->logical_mask);
  
  		if (copy_to_user(info_ptr, &info, sizeof(info)))

return -EFAULT;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index bde5860b3686..b1248a67b4f8 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -2726,14 +2726,20 @@ struct drm_i915_engine_info {
  
  	/** @flags: Engine flags. */

__u64 flags;
+#define I915_ENGINE_INFO_HAS_LOGICAL_INSTANCE  (1 << 0)
  
  	/** @capabilities: Capabilities of this engine. */

__u64 capabilities;
  #define I915_VIDEO_CLASS_CAPABILITY_HEVC  (1 << 0)
  #define I915_VIDEO_AND_ENHANCE_CLASS_CAPABILITY_SFC   (1 << 1)
  
+	/** @logical_instance: Logical instance of engine */

+   __u16 logical_instance;
+
/** @rsvd1: Reserved fields. */
-   __u64 rsvd1[4];
+   __u16 rsvd1[3];
+   /** @rsvd2: Reserved fields. */
+   __u64 rsvd2[3];
  };
  
  /**
Any idea why the padding? Would be useful if the comment said 'this 
structure must be at least/exactly X bytes in size / a multiple of X 
bytes in size because ...' or whatever.


However, not really anything to do with this patch as such, so either way:
Reviewed-by: John Harrison 



Re: [Intel-gfx] [PATCH 10/27] drm/i915/guc: Introduce context parent-child relationship

2021-09-13 Thread John Harrison

On 8/20/2021 15:44, Matthew Brost wrote:

Introduce context parent-child relationship. Once this relationship is
created all pinning / unpinning operations are directed to the parent
context. The parent context is responsible for pinning all of its'
children and itself.

This is a precursor to the full GuC multi-lrc implementation but aligns
to how GuC mutli-lrc interface is defined - a single H2G is used
register / deregister all of the contexts simultaneously.

Subsequent patches in the series will implement the pinning / unpinning
operations for parent / child contexts.

v2:
  (Daniel Vetter)
   - Add kernel doc, add wrapper to access parent to ensure safety

Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/intel_context.c   | 29 ++
  drivers/gpu/drm/i915/gt/intel_context.h   | 39 +++
  drivers/gpu/drm/i915/gt/intel_context_types.h | 23 +++
  3 files changed, 91 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index 508cfe5770c0..00d1aee6d199 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -404,6 +404,8 @@ intel_context_init(struct intel_context *ce, struct 
intel_engine_cs *engine)
  
  	INIT_LIST_HEAD(&ce->destroyed_link);
  

No need for this blank line?


+   INIT_LIST_HEAD(&ce->guc_child_list);
+
/*
 * Initialize fence to be complete as this is expected to be complete
 * unless there is a pending schedule disable outstanding.
@@ -418,10 +420,17 @@ intel_context_init(struct intel_context *ce, struct 
intel_engine_cs *engine)
  
  void intel_context_fini(struct intel_context *ce)

  {
+   struct intel_context *child, *next;
+
if (ce->timeline)
intel_timeline_put(ce->timeline);
i915_vm_put(ce->vm);
  
+	/* Need to put the creation ref for the children */

+   if (intel_context_is_parent(ce))
+   for_each_child_safe(ce, child, next)
+   intel_context_put(child);
+
mutex_destroy(&ce->pin_mutex);
i915_active_fini(&ce->active);
  }
@@ -537,6 +546,26 @@ struct i915_request 
*intel_context_find_active_request(struct intel_context *ce)
return active;
  }
  
+void intel_context_bind_parent_child(struct intel_context *parent,

+struct intel_context *child)
+{
+   /*
+* Callers responsibility to validate that this function is used
+* correctly but we use GEM_BUG_ON here ensure that they do.
+*/
+   GEM_BUG_ON(!intel_engine_uses_guc(parent->engine));
+   GEM_BUG_ON(intel_context_is_pinned(parent));
+   GEM_BUG_ON(intel_context_is_child(parent));
+   GEM_BUG_ON(intel_context_is_pinned(child));
+   GEM_BUG_ON(intel_context_is_child(child));
+   GEM_BUG_ON(intel_context_is_parent(child));
+
+   parent->guc_number_children++;
+   list_add_tail(&child->guc_child_link,
+ &parent->guc_child_list);
+   child->parent = parent;
+}
+
  #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
  #include "selftest_context.c"
  #endif
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h 
b/drivers/gpu/drm/i915/gt/intel_context.h
index c41098950746..c2985822ab74 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -44,6 +44,45 @@ void intel_context_free(struct intel_context *ce);
  int intel_context_reconfigure_sseu(struct intel_context *ce,
   const struct intel_sseu sseu);
  
+static inline bool intel_context_is_child(struct intel_context *ce)

+{
+   return !!ce->parent;
+}
+
+static inline bool intel_context_is_parent(struct intel_context *ce)
+{
+   return !!ce->guc_number_children;
+}
+
+static inline bool intel_context_is_pinned(struct intel_context *ce);

No point declaring 'static inline' if there is no function body?


+
+static inline struct intel_context *
+intel_context_to_parent(struct intel_context *ce)
+{
+if (intel_context_is_child(ce)) {
+   /*
+* The parent holds ref count to the child so it is always safe
+* for the parent to access the child, but the child has pointer

has pointer -> has a pointer


+* to the parent without a ref. To ensure this is safe the child
+* should only access the parent pointer while the parent is
+* pinned.
+*/
+GEM_BUG_ON(!intel_context_is_pinned(ce->parent));
+
+return ce->parent;
+} else {
+return ce;
+}
+}
+
+void intel_context_bind_parent_child(struct intel_context *parent,
+struct intel_context *child);
+
+#define for_each_child(parent, ce)\
+   list_for_each_entry(ce, &(parent)->guc_child_list, guc_child_link)
+#define for_each_child_safe(parent, ce, cn)\
+   list_for_each_

Re: [Intel-gfx] [PATCH 4/4] drm/i915/guc: Refcount context during error capture

2021-09-14 Thread John Harrison

On 9/14/2021 07:29, Daniel Vetter wrote:

On Mon, Sep 13, 2021 at 10:09:56PM -0700, Matthew Brost wrote:

From: John Harrison 

When i915 receives a context reset notification from GuC, it triggers
an error capture before resetting any outstanding requsts of that
context. Unfortunately, the error capture is not a time bound
operation. In certain situations it can take a long time, particularly
when multiple large LMEM buffers must be read back and eoncoded. If
this delay is longer than other timeouts (heartbeat, test recovery,
etc.) then a full GT reset can be triggered in the middle.

That can result in the context being reset by GuC actually being
destroyed before the error capture completes and the GuC submission
code resumes. Thus, the GuC side can start dereferencing stale
pointers and Bad Things ensue.

So add a refcount get of the context during the entire reset
operation. That way, the context can't be destroyed part way through
no matter what other resets or user interactions occur.

v2:
  (Matthew Brost)
   - Update patch to work with async error capture

Signed-off-by: John Harrison 
Signed-off-by: Matthew Brost 

This sounds like a fundamental issue in our reset/scheduler design. If we
have multiple timeout-things working in parallel, then there's going to be
an endless whack-a-mole fireworks show.

Reset is not a perf critical path (aside from media timeout, which guc
handles internally anyway). Simplicity trumps everything else. The fix
here is to guarantee that anything related to reset cannot happen in
parallel with anything else related to reset/timeout. At least on a
per-engine (and really on a per-reset domain) basis.

The fix we've developed for drm/sched is that the driver can allocate a
single-thread work queue, pass it to each drm/sched instance, and all
timeout handling is run in there.

For i915 it's more of a mess since we have a ton of random things that
time out/reset potentially going on in parallel. But that's the design we
should head towards.

_not_ sprinkling random refcounts all over the place until most of the
oops/splats disappear. That's cargo-culting, not engineering.
-Daniel

Not sure I follow this.

The code pulls an intel_context object out of a structure and proceeds 
to dereference it in what can be a slow piece of code that is running in 
a worker thread and is therefore already asynchronous to other activity. 
Acquiring a reference count on that object while holding its pointer is 
standard practice, I thought. That's the whole point of reference counting!


To be clear, this is not adding a brand new reference count object. It 
is merely taking the correct lock on an object while accessing that object.


It uses the xarray's lock while accessing the xarray and then the ce's 
lock while accessing the ce and makes sure to overlap the two to prevent 
any race conditions. To me, that seems like a) correct object access 
practice and b) it should have been there in the first place.


John.





---
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 24 +--
  1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 1986a57b52cc..02917fc4d4a8 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2888,6 +2888,8 @@ static void capture_worker_func(struct work_struct *w)
intel_engine_set_hung_context(engine, ce);
with_intel_runtime_pm(&i915->runtime_pm, wakeref)
i915_capture_error_state(gt, ce->engine->mask);
+
+   intel_context_put(ce);
  }
  
  static void capture_error_state(struct intel_guc *guc,

@@ -2924,7 +2926,7 @@ static void guc_context_replay(struct intel_context *ce)
tasklet_hi_schedule(&sched_engine->tasklet);
  }
  
-static void guc_handle_context_reset(struct intel_guc *guc,

+static bool guc_handle_context_reset(struct intel_guc *guc,
 struct intel_context *ce)
  {
trace_intel_context_reset(ce);
@@ -2937,7 +2939,11 @@ static void guc_handle_context_reset(struct intel_guc 
*guc,
   !context_blocked(ce))) {
capture_error_state(guc, ce);
guc_context_replay(ce);
+
+   return false;
}
+
+   return true;
  }
  
  int intel_guc_context_reset_process_msg(struct intel_guc *guc,

@@ -2945,6 +2951,7 @@ int intel_guc_context_reset_process_msg(struct intel_guc 
*guc,
  {
struct intel_context *ce;
int desc_idx;
+   unsigned long flags;
  
  	if (unlikely(len != 1)) {

drm_err(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
@@ -2952,11 +2959,24 @@ int intel_guc_context_reset_process_msg(struct 
intel_guc *guc,
}
  
  	desc_idx = msg[0];

+
+   /*
+* The context lookup u

Re: [Intel-gfx] [PATCH 12/27] drm/i915/guc: Add multi-lrc context registration

2021-09-15 Thread John Harrison

On 8/20/2021 15:44, Matthew Brost wrote:

Add multi-lrc context registration H2G. In addition a workqueue and
process descriptor are setup during multi-lrc context registration as
these data structures are needed for multi-lrc submission.

Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/intel_context_types.h |  12 ++
  drivers/gpu/drm/i915/gt/intel_lrc.c   |   5 +
  drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   2 +-
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 109 +-
  4 files changed, 126 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 0fafc178cf2c..6f567ebeb039 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -232,8 +232,20 @@ struct intel_context {
/** @parent: pointer to parent if child */
struct intel_context *parent;
  
+

+   /** @guc_wqi_head: head pointer in work queue */
+   u16 guc_wqi_head;
+   /** @guc_wqi_tail: tail pointer in work queue */
+   u16 guc_wqi_tail;
+
These should be in the 'guc_state' sub-struct? Would be good to keep all 
GuC specific content in one self-contained struct. Especially given the 
other child/parent fields are no going to be guc_ prefixed any more.




/** @guc_number_children: number of children if parent */
u8 guc_number_children;
+
+   /**
+* @parent_page: page in context used by parent for work queue,
Maybe 'page in context record'? Otherwise, exactly what 'context' is 
meant here? It isn't the 'struct intel_context'. The contetx record is 
saved as 'ce->state' / 'ce->lrc_reg_state', yes? Is it possible to link 
to either of those field? Probably not given that they don't appear to 
have any kerneldoc description :(. Maybe add that in too :).



+* work queue descriptor
Later on, it is described as 'process descriptor and work queue'. It 
would be good to be consistent.



+*/
+   u8 parent_page;
};
  
  #ifdef CONFIG_DRM_I915_SELFTEST

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
b/drivers/gpu/drm/i915/gt/intel_lrc.c
index bb4af4977920..0ddbad4e062a 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -861,6 +861,11 @@ __lrc_alloc_state(struct intel_context *ce, struct 
intel_engine_cs *engine)
context_size += PAGE_SIZE;
}
  
+	if (intel_context_is_parent(ce)) {

+   ce->parent_page = context_size / PAGE_SIZE;
+   context_size += PAGE_SIZE;
+   }
+
obj = i915_gem_object_create_lmem(engine->i915, context_size, 0);
if (IS_ERR(obj))
obj = i915_gem_object_create_shmem(engine->i915, context_size);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
index fa4be13c8854..0e600a3b8f1e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -52,7 +52,7 @@
  
  #define GUC_DOORBELL_INVALID		256
  
-#define GUC_WQ_SIZE			(PAGE_SIZE * 2)

+#define GUC_WQ_SIZE(PAGE_SIZE / 2)
Is this size actually dictated by the GuC API? Or is it just a driver 
level decision? If the latter, shouldn't this be below instead?


  
  /* Work queue item header definitions */

  #define WQ_STATUS_ACTIVE  1
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 14b24298cdd7..dbcb9ab28a9a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -340,6 +340,39 @@ static struct i915_priolist *to_priolist(struct rb_node 
*rb)
return rb_entry(rb, struct i915_priolist, node);
  }
  
+/*

+ * When using multi-lrc submission an extra page in the context state is
+ * reserved for the process descriptor and work queue.
+ *
+ * The layout of this page is below:
+ * 0   guc_process_desc
+ * ... unused
+ * PAGE_SIZE / 2   work queue start
+ * ... work queue
+ * PAGE_SIZE - 1   work queue end
+ */
+#define WQ_OFFSET  (PAGE_SIZE / 2)
Can this not be derived from GUC_WQ_SIZE given that the two are 
fundamentally linked? E.g. '#define WQ_OFFSET (PAGE_SIZE - 
GUC_WQ_SIZE)'? And maybe have a '#define WQ_TOTAL_SIZE PAGE_SIZE' and 
use that in all of WQ_OFFSET, GUC_WQ_SIZE and the allocation itself in 
intel_lrc.c?


Also, the process descriptor is actually an array of descriptors sized 
by the number of children? Or am I misunderstanding the code below? In 
so, shouldn't there be a 'COMPILE_BUG_ON((MAX_ENGINE_INSTANCE * 
sizeof(descriptor)) < (WQ_

Re: [Intel-gfx] [PATCH 13/27] drm/i915/guc: Ensure GuC schedule operations do not operate on child contexts

2021-09-15 Thread John Harrison

On 8/20/2021 15:44, Matthew Brost wrote:

In GuC parent-child contexts the parent context controls the scheduling,
ensure only the parent does the scheduling operations.

Signed-off-by: Matthew Brost 
---
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 24 ++-
  1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index dbcb9ab28a9a..00d54bb00bfb 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -320,6 +320,12 @@ static void decr_context_committed_requests(struct 
intel_context *ce)
GEM_BUG_ON(ce->guc_state.number_committed_requests < 0);
  }
  
+static struct intel_context *

+request_to_scheduling_context(struct i915_request *rq)
+{
+   return intel_context_to_parent(rq->context);
+}
+
  static bool context_guc_id_invalid(struct intel_context *ce)
  {
return ce->guc_id.id == GUC_INVALID_LRC_ID;
@@ -1684,6 +1690,7 @@ static void __guc_context_sched_disable(struct intel_guc 
*guc,
  
  	GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID);
  
+	GEM_BUG_ON(intel_context_is_child(ce));

trace_intel_context_sched_disable(ce);
  
  	guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),

@@ -1898,6 +1905,8 @@ static void guc_context_sched_disable(struct 
intel_context *ce)
u16 guc_id;
bool enabled;
  
+	GEM_BUG_ON(intel_context_is_child(ce));

+
if (submission_disabled(guc) || context_guc_id_invalid(ce) ||
!lrc_desc_registered(guc, ce->guc_id.id)) {
spin_lock_irqsave(&ce->guc_state.lock, flags);
@@ -2286,6 +2295,8 @@ static void guc_signal_context_fence(struct intel_context 
*ce)
  {
unsigned long flags;
  
+	GEM_BUG_ON(intel_context_is_child(ce));

+
spin_lock_irqsave(&ce->guc_state.lock, flags);
clr_context_wait_for_deregister_to_register(ce);
__guc_signal_context_fence(ce);
@@ -2315,7 +2326,7 @@ static void guc_context_init(struct intel_context *ce)
  
  static int guc_request_alloc(struct i915_request *rq)

  {
-   struct intel_context *ce = rq->context;
+   struct intel_context *ce = request_to_scheduling_context(rq);
struct intel_guc *guc = ce_to_guc(ce);
unsigned long flags;
int ret;
@@ -2358,11 +2369,12 @@ static int guc_request_alloc(struct i915_request *rq)
 * exhausted and return -EAGAIN to the user indicating that they can try
 * again in the future.
 *
-* There is no need for a lock here as the timeline mutex ensures at
-* most one context can be executing this code path at once. The
-* guc_id_ref is incremented once for every request in flight and
-* decremented on each retire. When it is zero, a lock around the
-* increment (in pin_guc_id) is needed to seal a race with unpin_guc_id.
+* There is no need for a lock here as the timeline mutex (or
+* parallel_submit mutex in the case of multi-lrc) ensures at most one
+* context can be executing this code path at once. The guc_id_ref is
Isn't that now two? One uni-LRC holding the timeline mutex and one 
multi-LRC holding the parallel submit mutex?


John.


+* incremented once for every request in flight and decremented on each
+* retire. When it is zero, a lock around the increment (in pin_guc_id)
+* is needed to seal a race with unpin_guc_id.
 */
if (atomic_add_unless(&ce->guc_id.ref, 1, 0))
goto out;




Re: [Intel-gfx] [PATCH] drm/i915/guc/slpc: remove unneeded clflush calls

2021-09-15 Thread John Harrison

On 9/15/2021 12:24, Belgaumkar, Vinay wrote:

On 9/14/2021 12:51 PM, Lucas De Marchi wrote:

The clflush calls here aren't doing anything since we are not writting
something and flushing the cache lines to be visible to GuC. Here the
intention seems to be to make sure whatever GuC has written is visible
to the CPU before we read them. However a clflush from the CPU side is
the wrong instruction to use.
Is there a right instruction to use? Either we need to verify that no 
flush/invalidate is required or we need to add in a replacement that 
does the correct thing?


John.



 From code inspection on the other clflush() calls in i915/gt/uc/ these
are the only ones with this behavrior. The others are apparently making
sure what we write is visible to GuC.

Signed-off-by: Lucas De Marchi 
---
  drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c | 3 ---
  1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c

index 65a3e7fdb2b2..2e996b77df80 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
@@ -108,7 +108,6 @@ static u32 slpc_get_state(struct intel_guc_slpc 
*slpc)

    GEM_BUG_ON(!slpc->vma);
  -    drm_clflush_virt_range(slpc->vaddr, sizeof(u32));
  data = slpc->vaddr;
    return data->header.global_state;
@@ -172,8 +171,6 @@ static int slpc_query_task_state(struct 
intel_guc_slpc *slpc)

  drm_err(&i915->drm, "Failed to query task state (%pe)\n",
  ERR_PTR(ret));
  -    drm_clflush_virt_range(slpc->vaddr, SLPC_PAGE_SIZE_BYTES);
-


LGTM.
Reviewed-by: Vinay Belgaumkar 


  return ret;
  }





Re: [Intel-gfx] [PATCH 14/27] drm/i915/guc: Assign contexts in parent-child relationship consecutive guc_ids

2021-09-15 Thread John Harrison

On 8/20/2021 15:44, Matthew Brost wrote:

Assign contexts in parent-child relationship consecutive guc_ids. This
is accomplished by partitioning guc_id space between ones that need to
be consecutive (1/16 available guc_ids) and ones that do not (15/16 of
available guc_ids). The consecutive search is implemented via the bitmap
API.

This is a precursor to the full GuC multi-lrc implementation but aligns
to how GuC mutli-lrc interface is defined - guc_ids must be consecutive
when using the GuC multi-lrc interface.

v2:
  (Daniel Vetter)
   - Explictly state why we assign consecutive guc_ids

Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/uc/intel_guc.h|   6 +-
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 107 +-
  2 files changed, 86 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 023953e77553..3f95b1b4f15c 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -61,9 +61,13 @@ struct intel_guc {
 */
spinlock_t lock;
/**
-* @guc_ids: used to allocate new guc_ids
+* @guc_ids: used to allocate new guc_ids, single-lrc
 */
struct ida guc_ids;
+   /**
+* @guc_ids_bitmap: used to allocate new guc_ids, multi-lrc
+*/
+   unsigned long *guc_ids_bitmap;
/** @num_guc_ids: number of guc_ids that can be used */
u32 num_guc_ids;
/** @max_guc_ids: max number of guc_ids that can be used */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 00d54bb00bfb..e9dfd43d29a0 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -125,6 +125,18 @@ guc_create_virtual(struct intel_engine_cs **siblings, 
unsigned int count);
  
  #define GUC_REQUEST_SIZE 64 /* bytes */
  
+/*

+ * We reserve 1/16 of the guc_ids for multi-lrc as these need to be contiguous
+ * per the GuC submission interface. A different allocation algorithm is used
+ * (bitmap vs. ida) between multi-lrc and single-lrc hence the reason to
The 'hence' clause seems to be attached to the wrong reason. The id 
space is partition because of the contiguous vs random requirements of 
multi vs single LRC, not because a different allocator is used in one 
partion vs the other.



+ * partition the guc_id space. We believe the number of multi-lrc contexts in
+ * use should be low and 1/16 should be sufficient. Minimum of 32 guc_ids for
+ * multi-lrc.
+ */
+#define NUMBER_MULTI_LRC_GUC_ID(guc) \
+   ((guc)->submission_state.num_guc_ids / 16 > 32 ? \
+(guc)->submission_state.num_guc_ids / 16 : 32)
+
  /*
   * Below is a set of functions which control the GuC scheduling state which
   * require a lock.
@@ -1176,6 +1188,10 @@ int intel_guc_submission_init(struct intel_guc *guc)
INIT_LIST_HEAD(&guc->submission_state.destroyed_contexts);
intel_gt_pm_unpark_work_init(&guc->submission_state.destroyed_worker,
 destroyed_worker_func);
+   guc->submission_state.guc_ids_bitmap =
+   bitmap_zalloc(NUMBER_MULTI_LRC_GUC_ID(guc), GFP_KERNEL);
+   if (!guc->submission_state.guc_ids_bitmap)
+   return -ENOMEM;
  
  	return 0;

  }
@@ -1188,6 +1204,7 @@ void intel_guc_submission_fini(struct intel_guc *guc)
guc_lrc_desc_pool_destroy(guc);
guc_flush_destroyed_contexts(guc);
i915_sched_engine_put(guc->sched_engine);
+   bitmap_free(guc->submission_state.guc_ids_bitmap);
  }
  
  static void queue_request(struct i915_sched_engine *sched_engine,

@@ -1239,18 +1256,43 @@ static void guc_submit_request(struct i915_request *rq)
spin_unlock_irqrestore(&sched_engine->lock, flags);
  }
  
-static int new_guc_id(struct intel_guc *guc)

+static int new_guc_id(struct intel_guc *guc, struct intel_context *ce)
  {
-   return ida_simple_get(&guc->submission_state.guc_ids, 0,
- guc->submission_state.num_guc_ids, GFP_KERNEL |
- __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
+   int ret;
+
+   GEM_BUG_ON(intel_context_is_child(ce));
+
+   if (intel_context_is_parent(ce))
+   ret = 
bitmap_find_free_region(guc->submission_state.guc_ids_bitmap,
+ NUMBER_MULTI_LRC_GUC_ID(guc),
+ 
order_base_2(ce->guc_number_children
+  + 1));
+   else
+   ret = ida_simple_get(&guc->submission_state.guc_ids,
+NUMBER_MULTI_LRC_GUC_ID(guc),
+guc->submission_state.num_guc_ids,
+

Re: [Intel-gfx] [PATCH 12/27] drm/i915/guc: Add multi-lrc context registration

2021-09-15 Thread John Harrison

On 9/15/2021 12:31, Matthew Brost wrote:

On Wed, Sep 15, 2021 at 12:21:35PM -0700, John Harrison wrote:

On 8/20/2021 15:44, Matthew Brost wrote:

Add multi-lrc context registration H2G. In addition a workqueue and
process descriptor are setup during multi-lrc context registration as
these data structures are needed for multi-lrc submission.

Signed-off-by: Matthew Brost 
---
   drivers/gpu/drm/i915/gt/intel_context_types.h |  12 ++
   drivers/gpu/drm/i915/gt/intel_lrc.c   |   5 +
   drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   2 +-
   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 109 +-
   4 files changed, 126 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 0fafc178cf2c..6f567ebeb039 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -232,8 +232,20 @@ struct intel_context {
/** @parent: pointer to parent if child */
struct intel_context *parent;
+
+   /** @guc_wqi_head: head pointer in work queue */
+   u16 guc_wqi_head;
+   /** @guc_wqi_tail: tail pointer in work queue */
+   u16 guc_wqi_tail;
+

These should be in the 'guc_state' sub-struct? Would be good to keep all GuC
specific content in one self-contained struct. Especially given the other
child/parent fields are no going to be guc_ prefixed any more.


Right now I have everything in guc_state protected by guc_state.lock,
these fields are not protected by this lock. IMO it is better to use a
different sub-structure for the parallel fields (even if anonymous).
Hmm, I still think it is bad to be scattering back-end specific fields 
amongst regular fields. The GuC patches include a whole bunch of 
complaints about execlist back-end specific stuff leaking through to the 
higher levels, we really shouldn't be guilty of doing the same with GuC 
if at all possible. At the very least, the GuC specific fields should be 
grouped together at the end of the struct rather than inter-mingled.





/** @guc_number_children: number of children if parent */
u8 guc_number_children;
+
+   /**
+* @parent_page: page in context used by parent for work queue,

Maybe 'page in context record'? Otherwise, exactly what 'context' is meant
here? It isn't the 'struct intel_context'. The contetx record is saved as
'ce->state' / 'ce->lrc_reg_state', yes? Is it possible to link to either of

It is the page in ce->state / page minus LRC reg offset in
ce->lrg_reg_state. Will update the commit to make that clear.


those field? Probably not given that they don't appear to have any kerneldoc
description :(. Maybe add that in too :).


+* work queue descriptor

Later on, it is described as 'process descriptor and work queue'. It would
be good to be consistent.


Yep. Will fix.


+*/
+   u8 parent_page;
};
   #ifdef CONFIG_DRM_I915_SELFTEST
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
b/drivers/gpu/drm/i915/gt/intel_lrc.c
index bb4af4977920..0ddbad4e062a 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -861,6 +861,11 @@ __lrc_alloc_state(struct intel_context *ce, struct 
intel_engine_cs *engine)
context_size += PAGE_SIZE;
}
+   if (intel_context_is_parent(ce)) {
+   ce->parent_page = context_size / PAGE_SIZE;
+   context_size += PAGE_SIZE;
+   }
+
obj = i915_gem_object_create_lmem(engine->i915, context_size, 0);
if (IS_ERR(obj))
obj = i915_gem_object_create_shmem(engine->i915, context_size);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
index fa4be13c8854..0e600a3b8f1e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -52,7 +52,7 @@
   #define GUC_DOORBELL_INVALID 256
-#define GUC_WQ_SIZE(PAGE_SIZE * 2)
+#define GUC_WQ_SIZE(PAGE_SIZE / 2)

Is this size actually dictated by the GuC API? Or is it just a driver level
decision? If the latter, shouldn't this be below instead?


Driver level decision. What exactly do you mean by below?
The next chunk of the patch - where WQ_OFFSET is defined and the whole 
caboodle is described.


  

   /* Work queue item header definitions */
   #define WQ_STATUS_ACTIVE 1
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 14b24298cdd7..dbcb9ab28a9a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -340,6 +340,39 @@ stati

Re: [Intel-gfx] [PATCH 4/5] drm/i915/guc: Enable GuC submission by default on DG1

2021-09-16 Thread John Harrison

On 9/16/2021 09:28, Matthew Brost wrote:

Enable GuC submission by default on DG1

Signed-off-by: Matthew Brost 

Reviewed-by: John Harrison 


---
  drivers/gpu/drm/i915/gt/uc/intel_uc.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c 
b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index 86c318516e14..2fef3b0bbe95 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -35,7 +35,7 @@ static void uc_expand_default_options(struct intel_uc *uc)
}
  
  	/* Intermediate platforms are HuC authentication only */

-   if (IS_DG1(i915) || IS_ALDERLAKE_S(i915)) {
+   if (IS_ALDERLAKE_S(i915)) {
i915->params.enable_guc = ENABLE_GUC_LOAD_HUC;
return;
}




Re: [Intel-gfx] [igt-dev] [PATCH i-g-t 1/1] tests/i915/query: Query, parse and validate the hwconfig table

2021-09-16 Thread John Harrison

On 9/16/2021 01:59, Petri Latvala wrote:

On Wed, Sep 15, 2021 at 02:55:58PM -0700, john.c.harri...@intel.com wrote:

From: Rodrigo Vivi 

Newer platforms have an embedded table giving details about that
platform's hardware configuration. This table can be retrieved from
the KMD via the existing query API. So add a test for it as both an
example of how to fetch the table and to validate the contents as much
as is possible.

Signed-off-by: Rodrigo Vivi 
Signed-off-by: John Harrison 
Cc: Slawomir Milczarek 
Reviewed-by: Matthew Brost 
---
  include/drm-uapi/i915_drm.h |   1 +
  lib/intel_hwconfig_types.h  | 106 +++
  tests/i915/i915_query.c | 168 
  3 files changed, 275 insertions(+)
  create mode 100644 lib/intel_hwconfig_types.h

diff --git a/include/drm-uapi/i915_drm.h b/include/drm-uapi/i915_drm.h
index b9632bb2c..ae0c8dfad 100644
--- a/include/drm-uapi/i915_drm.h
+++ b/include/drm-uapi/i915_drm.h
@@ -2451,6 +2451,7 @@ struct drm_i915_query_item {
  #define DRM_I915_QUERY_ENGINE_INFO2
  #define DRM_I915_QUERY_PERF_CONFIG  3
  #define DRM_I915_QUERY_MEMORY_REGIONS   4
+#define DRM_I915_QUERY_HWCONFIG_TABLE   5
  /* Must be kept compact -- no holes and well documented */

Please update i915_drm.h with a copy from the kernel and state in the
commit message which kernel commit sha it's from. If this change is
not in the kernel yet, add this token to lib/i915/i915_drm_local.h
instead.


Neither side is merged yet. My understanding is that all sides need to 
be posted in parallel for CI to work. Once green and reviewed, the 
kernel side gets merged first. Then the IGT/UMD patches get updated with 
the official kernel headers, reposted and then merged.


John.



Re: [Intel-gfx] [PATCH 15/27] drm/i915/guc: Implement multi-lrc submission

2021-09-20 Thread John Harrison

On 8/20/2021 15:44, Matthew Brost wrote:

Implement multi-lrc submission via a single workqueue entry and single
H2G. The workqueue entry contains an updated tail value for each
request, of all the contexts in the multi-lrc submission, and updates
these values simultaneously. As such, the tasklet and bypass path have
been updated to coalesce requests into a single submission.

Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/uc/intel_guc.c|  21 ++
  drivers/gpu/drm/i915/gt/uc/intel_guc.h|   8 +
  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |  24 +-
  drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   6 +-
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 312 +++---
  drivers/gpu/drm/i915/i915_request.h   |   8 +
  6 files changed, 317 insertions(+), 62 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index fbfcae727d7f..879aef662b2e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -748,3 +748,24 @@ void intel_guc_load_status(struct intel_guc *guc, struct 
drm_printer *p)
}
}
  }
+
+void intel_guc_write_barrier(struct intel_guc *guc)
+{
+   struct intel_gt *gt = guc_to_gt(guc);
+
+   if (i915_gem_object_is_lmem(guc->ct.vma->obj)) {
+   GEM_BUG_ON(guc->send_regs.fw_domains);
Granted, this patch is just moving code from one file to another not 
changing it. However, I think it would be worth adding a blank line in 
here. Otherwise the 'this register' comment below can be confusingly 
read as referring to the send_regs.fw_domain entry above.


And maybe add a comment why it is a bug for the send_regs value to be 
set? I'm not seeing any obvious connection between it and the reset of 
this code.



+   /*
+* This register is used by the i915 and GuC for MMIO based
+* communication. Once we are in this code CTBs are the only
+* method the i915 uses to communicate with the GuC so it is
+* safe to write to this register (a value of 0 is NOP for MMIO
+* communication). If we ever start mixing CTBs and MMIOs a new
+* register will have to be chosen.
+*/
+   intel_uncore_write_fw(gt->uncore, GEN11_SOFT_SCRATCH(0), 0);
+   } else {
+   /* wmb() sufficient for a barrier if in smem */
+   wmb();
+   }
+}
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 3f95b1b4f15c..0ead2406d03c 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -37,6 +37,12 @@ struct intel_guc {
/* Global engine used to submit requests to GuC */
struct i915_sched_engine *sched_engine;
struct i915_request *stalled_request;
+   enum {
+   STALL_NONE,
+   STALL_REGISTER_CONTEXT,
+   STALL_MOVE_LRC_TAIL,
+   STALL_ADD_REQUEST,
+   } submission_stall_reason;
  
  	/* intel_guc_recv interrupt related state */

spinlock_t irq_lock;
@@ -332,4 +338,6 @@ void intel_guc_submission_cancel_requests(struct intel_guc 
*guc);
  
  void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
  
+void intel_guc_write_barrier(struct intel_guc *guc);

+
  #endif
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 20c710a74498..10d1878d2826 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -377,28 +377,6 @@ static u32 ct_get_next_fence(struct intel_guc_ct *ct)
return ++ct->requests.last_fence;
  }
  
-static void write_barrier(struct intel_guc_ct *ct)

-{
-   struct intel_guc *guc = ct_to_guc(ct);
-   struct intel_gt *gt = guc_to_gt(guc);
-
-   if (i915_gem_object_is_lmem(guc->ct.vma->obj)) {
-   GEM_BUG_ON(guc->send_regs.fw_domains);
-   /*
-* This register is used by the i915 and GuC for MMIO based
-* communication. Once we are in this code CTBs are the only
-* method the i915 uses to communicate with the GuC so it is
-* safe to write to this register (a value of 0 is NOP for MMIO
-* communication). If we ever start mixing CTBs and MMIOs a new
-* register will have to be chosen.
-*/
-   intel_uncore_write_fw(gt->uncore, GEN11_SOFT_SCRATCH(0), 0);
-   } else {
-   /* wmb() sufficient for a barrier if in smem */
-   wmb();
-   }
-}
-
  static int ct_write(struct intel_guc_ct *ct,
const u32 *action,
u32 len /* in dwords */,
@@ -468,7 +446,7 @@ static int ct_write(struct intel_guc_ct *ct,
 * make sure H2G buffer update and LRC tail update (if this triggering a
  

Re: [Intel-gfx] [PATCH 16/27] drm/i915/guc: Insert submit fences between requests in parent-child relationship

2021-09-20 Thread John Harrison

On 8/20/2021 15:44, Matthew Brost wrote:

The GuC must receive requests in the order submitted for contexts in a
parent-child relationship to function correctly. To ensure this, insert
a submit fence between the current request and last request submitted
for requests / contexts in a parent child relationship. This is
conceptually similar to a single timeline.

Signed-off-by: Matthew Brost 
Cc: John Harrison 
---
  drivers/gpu/drm/i915/gt/intel_context.h   |   5 +
  drivers/gpu/drm/i915/gt/intel_context_types.h |   7 +
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c |   5 +-
  drivers/gpu/drm/i915/i915_request.c   | 120 ++
  4 files changed, 109 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.h 
b/drivers/gpu/drm/i915/gt/intel_context.h
index c2985822ab74..9dcc1b14697b 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -75,6 +75,11 @@ intel_context_to_parent(struct intel_context *ce)
  }
  }
  
+static inline bool intel_context_is_parallel(struct intel_context *ce)

+{
+   return intel_context_is_child(ce) || intel_context_is_parent(ce);
+}
+
  void intel_context_bind_parent_child(struct intel_context *parent,
 struct intel_context *child);
  
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h

index 6f567ebeb039..a63329520c35 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -246,6 +246,13 @@ struct intel_context {
 * work queue descriptor
 */
u8 parent_page;
+
+   /**
+* @last_rq: last request submitted on a parallel context, used
+* to insert submit fences between request in the parallel

request -> requests

With that fixed:
Reviewed-by: John Harrison 



+* context.
+*/
+   struct i915_request *last_rq;
};
  
  #ifdef CONFIG_DRM_I915_SELFTEST

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index b107ad095248..f0b60fecf253 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -672,8 +672,7 @@ static int rq_prio(const struct i915_request *rq)
  
  static bool is_multi_lrc_rq(struct i915_request *rq)

  {
-   return intel_context_is_child(rq->context) ||
-   intel_context_is_parent(rq->context);
+   return intel_context_is_parallel(rq->context);
  }
  
  static bool can_merge_rq(struct i915_request *rq,

@@ -2843,6 +2842,8 @@ static void guc_parent_context_unpin(struct intel_context 
*ce)
GEM_BUG_ON(!intel_context_is_parent(ce));
GEM_BUG_ON(!intel_engine_is_virtual(ce->engine));
  
+	if (ce->last_rq)

+   i915_request_put(ce->last_rq);
unpin_guc_id(guc, ce);
lrc_unpin(ce);
  }
diff --git a/drivers/gpu/drm/i915/i915_request.c 
b/drivers/gpu/drm/i915/i915_request.c
index ce446716d092..2e51c8999088 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1546,36 +1546,62 @@ i915_request_await_object(struct i915_request *to,
return ret;
  }
  
+static inline bool is_parallel_rq(struct i915_request *rq)

+{
+   return intel_context_is_parallel(rq->context);
+}
+
+static inline struct intel_context *request_to_parent(struct i915_request *rq)
+{
+   return intel_context_to_parent(rq->context);
+}
+
  static struct i915_request *
-__i915_request_add_to_timeline(struct i915_request *rq)
+__i915_request_ensure_parallel_ordering(struct i915_request *rq,
+   struct intel_timeline *timeline)
  {
-   struct intel_timeline *timeline = i915_request_timeline(rq);
struct i915_request *prev;
  
-	/*

-* Dependency tracking and request ordering along the timeline
-* is special cased so that we can eliminate redundant ordering
-* operations while building the request (we know that the timeline
-* itself is ordered, and here we guarantee it).
-*
-* As we know we will need to emit tracking along the timeline,
-* we embed the hooks into our request struct -- at the cost of
-* having to have specialised no-allocation interfaces (which will
-* be beneficial elsewhere).
-*
-* A second benefit to open-coding i915_request_await_request is
-* that we can apply a slight variant of the rules specialised
-* for timelines that jump between engines (such as virtual engines).
-* If we consider the case of virtual engine, we must emit a dma-fence
-* to prevent scheduling of the second request until the first is
-* complete (to maximise our greedy late load balancing) an

Re: [Intel-gfx] [PATCH 17/27] drm/i915/guc: Implement multi-lrc reset

2021-09-20 Thread John Harrison

On 8/20/2021 15:44, Matthew Brost wrote:

Update context and full GPU reset to work with multi-lrc. The idea is
parent context tracks all the active requests inflight for itself and
its' children. The parent context owns the reset replaying / canceling

its' -> its


requests as needed.

Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/intel_context.c   | 11 ++--
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 63 +--
  2 files changed, 51 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index 00d1aee6d199..5615be32879c 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -528,20 +528,21 @@ struct i915_request *intel_context_create_request(struct 
intel_context *ce)
  
  struct i915_request *intel_context_find_active_request(struct intel_context *ce)

  {
+   struct intel_context *parent = intel_context_to_parent(ce);
struct i915_request *rq, *active = NULL;
unsigned long flags;
  
  	GEM_BUG_ON(!intel_engine_uses_guc(ce->engine));

Should this not check the parent as well/instead?

And to be clear, this can be called on regular contexts (where ce == 
parent) and on both the parent or child contexts of multi-LRC contexts 
(where ce may or may not match parent)?



  
-	spin_lock_irqsave(&ce->guc_state.lock, flags);

-   list_for_each_entry_reverse(rq, &ce->guc_state.requests,
+   spin_lock_irqsave(&parent->guc_state.lock, flags);
+   list_for_each_entry_reverse(rq, &parent->guc_state.requests,
sched.link) {
-   if (i915_request_completed(rq))
+   if (i915_request_completed(rq) && rq->context == ce)

'rq->context == ce' means:

1. single-LRC context, rq is owned by ce
2. multi-LRC context, ce is child, rq really belongs to ce but is being
   tracked by parent
3. multi-LRC context, ce is parent, rq really is owned by ce

So when 'rq->ce != ce', it means that the request is owned by a 
different child to 'ce' but within the same multi-LRC group. So we want 
to ignore that request and keep searching until we find one that is 
really owned by the target ce?



break;
  
-		active = rq;

+   active = (rq->context == ce) ? rq : active;
Would be clearer to say 'if(rq->ce != ce) continue;' and leave 'active = 
rq;' alone?


And again, the intention is to ignore requests that are owned by other 
members of the same multi-LRC group?


Would be good to add some documentation to this function to explain the 
above (assuming my description is correct?).



}
-   spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+   spin_unlock_irqrestore(&parent->guc_state.lock, flags);
  
  	return active;

  }
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index f0b60fecf253..e34e0ea9136a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -670,6 +670,11 @@ static int rq_prio(const struct i915_request *rq)
return rq->sched.attr.priority;
  }
  
+static inline bool is_multi_lrc(struct intel_context *ce)

+{
+   return intel_context_is_parallel(ce);
+}
+
  static bool is_multi_lrc_rq(struct i915_request *rq)
  {
return intel_context_is_parallel(rq->context);
@@ -1179,10 +1184,13 @@ __unwind_incomplete_requests(struct intel_context *ce)
  
  static void __guc_reset_context(struct intel_context *ce, bool stalled)

  {
+   bool local_stalled;
struct i915_request *rq;
unsigned long flags;
u32 head;
+   int i, number_children = ce->guc_number_children;
If this is a child context, does it not need to pull the child count 
from the parent? Likewise the list/link pointers below? Or does each 
child context have a full list of its siblings + parent?



bool skip = false;
+   struct intel_context *parent = ce;
  
  	intel_context_get(ce);
  
@@ -1209,25 +1217,34 @@ static void __guc_reset_context(struct intel_context *ce, bool stalled)

if (unlikely(skip))
goto out_put;
  
-	rq = intel_context_find_active_request(ce);

-   if (!rq) {
-   head = ce->ring->tail;
-   stalled = false;
-   goto out_replay;
-   }
+   for (i = 0; i < number_children + 1; ++i) {
+   if (!intel_context_is_pinned(ce))
+   goto next_context;
+
+   local_stalled = false;
+   rq = intel_context_find_active_request(ce);
+   if (!rq) {
+   head = ce->ring->tail;
+   goto out_replay;
+   }
  
-	if (!i915_request_started(rq))

-   stalled = false;
+   GEM_BUG_ON(i915_active_is_idle(&ce->active));
+   head = intel_ring_wrap(ce->ring, rq->head);
  
-	GEM_BUG

Re: [Intel-gfx] [PATCH 18/27] drm/i915/guc: Update debugfs for GuC multi-lrc

2021-09-20 Thread John Harrison




On 8/20/2021 15:44, Matthew Brost wrote:

Display the workqueue status in debugfs for GuC contexts that are in
parent-child relationship.

Signed-off-by: Matthew Brost 
---
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 51 ++-
  1 file changed, 37 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index e34e0ea9136a..07eee9a399c8 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -3673,6 +3673,26 @@ static void guc_log_context_priority(struct drm_printer 
*p,
drm_printf(p, "\n");
  }
  
+

+static inline void guc_log_context(struct drm_printer *p,
+  struct intel_context *ce)
+{
+   drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id.id);
+   drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
+   drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n",
+  ce->ring->head,
+  ce->lrc_reg_state[CTX_RING_HEAD]);
+   drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n",
+  ce->ring->tail,
+  ce->lrc_reg_state[CTX_RING_TAIL]);
+   drm_printf(p, "\t\tContext Pin Count: %u\n",
+  atomic_read(&ce->pin_count));
+   drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
+  atomic_read(&ce->guc_id.ref));
+   drm_printf(p, "\t\tSchedule State: 0x%x\n\n",
+  ce->guc_state.sched_state);
+}
+
  void intel_guc_submission_print_context_info(struct intel_guc *guc,
 struct drm_printer *p)
  {
@@ -3682,22 +3702,25 @@ void intel_guc_submission_print_context_info(struct 
intel_guc *guc,
  
  	xa_lock_irqsave(&guc->context_lookup, flags);

xa_for_each(&guc->context_lookup, index, ce) {
-   drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id.id);
-   drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
-   drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n",
-  ce->ring->head,
-  ce->lrc_reg_state[CTX_RING_HEAD]);
-   drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n",
-  ce->ring->tail,
-  ce->lrc_reg_state[CTX_RING_TAIL]);
-   drm_printf(p, "\t\tContext Pin Count: %u\n",
-  atomic_read(&ce->pin_count));
-   drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
-  atomic_read(&ce->guc_id.ref));
-   drm_printf(p, "\t\tSchedule State: 0x%x\n\n",
-  ce->guc_state.sched_state);
+   GEM_BUG_ON(intel_context_is_child(ce));
  
+		guc_log_context(p, ce);

guc_log_context_priority(p, ce);
+
+   if (intel_context_is_parent(ce)) {
+   struct guc_process_desc *desc = __get_process_desc(ce);
+   struct intel_context *child;
+
+   drm_printf(p, "\t\tWQI Head: %u\n",
+  READ_ONCE(desc->head));
+   drm_printf(p, "\t\tWQI Tail: %u\n",
+  READ_ONCE(desc->tail));
+   drm_printf(p, "\t\tWQI Status: %u\n\n",
+  READ_ONCE(desc->wq_status));
+
+   for_each_child(ce, child)
+   guc_log_context(p, child);
There should be some indication that this is a child context and/or how 
many children there are. Otherwise how does the reader differentiation 
between the list of child contexts and the next parent/single context?


John.


+   }
}
xa_unlock_irqrestore(&guc->context_lookup, flags);
  }




Re: [Intel-gfx] [PATCH 19/27] drm/i915: Fix bug in user proto-context creation that leaked contexts

2021-09-20 Thread John Harrison

On 8/20/2021 15:44, Matthew Brost wrote:

Set number of engines before attempting to create contexts so the
function free_engines can clean up properly.

Fixes: d4433c7600f7 ("drm/i915/gem: Use the proto-context to handle create 
parameters (v5)")
Signed-off-by: Matthew Brost 
Cc: 
---
  drivers/gpu/drm/i915/gem/i915_gem_context.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index dbaeb924a437..bcaaf514876b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -944,6 +944,7 @@ static struct i915_gem_engines *user_engines(struct 
i915_gem_context *ctx,
unsigned int n;
  
  	e = alloc_engines(num_engines);
This can return null when out of memory. There needs to be an early exit 
check before dereferencing a null pointer. Not sure if that is a worse 
bug or not than leaking memory! Either way, it would be good to fix that 
too.


John.


+   e->num_engines = num_engines;
for (n = 0; n < num_engines; n++) {
struct intel_context *ce;
int ret;
@@ -977,7 +978,6 @@ static struct i915_gem_engines *user_engines(struct 
i915_gem_context *ctx,
goto free_engines;
}
}
-   e->num_engines = num_engines;
  
  	return e;
  




Re: [Intel-gfx] [PATCH 20/27] drm/i915/guc: Connect UAPI to GuC multi-lrc interface

2021-09-20 Thread John Harrison

On 8/20/2021 15:44, Matthew Brost wrote:

Introduce 'set parallel submit' extension to connect UAPI to GuC
multi-lrc interface. Kernel doc in new uAPI should explain it all.

IGT: https://patchwork.freedesktop.org/patch/447008/?series=93071&rev=1
media UMD: link to come

Is this link still not available?

Also, see 'kernel test robot' emails saying that sparse is complaining 
about something I don't understand but presumably needs to be fixed.





v2:
  (Daniel Vetter)
   - Add IGT link and placeholder for media UMD link

Cc: Tvrtko Ursulin 
Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gem/i915_gem_context.c   | 220 +-
  .../gpu/drm/i915/gem/i915_gem_context_types.h |   6 +
  drivers/gpu/drm/i915/gt/intel_context_types.h |   9 +-
  drivers/gpu/drm/i915/gt/intel_engine.h|  12 +-
  drivers/gpu/drm/i915/gt/intel_engine_cs.c |   6 +-
  .../drm/i915/gt/intel_execlists_submission.c  |   6 +-
  drivers/gpu/drm/i915/gt/selftest_execlists.c  |  12 +-
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 114 -
  include/uapi/drm/i915_drm.h   | 128 ++
  9 files changed, 485 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index bcaaf514876b..de0fd145fb47 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -522,9 +522,149 @@ set_proto_ctx_engines_bond(struct i915_user_extension 
__user *base, void *data)
return 0;
  }
  
+static int

+set_proto_ctx_engines_parallel_submit(struct i915_user_extension __user *base,
+ void *data)
+{
+   struct i915_context_engines_parallel_submit __user *ext =
+   container_of_user(base, typeof(*ext), base);
+   const struct set_proto_ctx_engines *set = data;
+   struct drm_i915_private *i915 = set->i915;
+   u64 flags;
+   int err = 0, n, i, j;
+   u16 slot, width, num_siblings;
+   struct intel_engine_cs **siblings = NULL;
+   intel_engine_mask_t prev_mask;
+
+   /* Disabling for now */
+   return -ENODEV;
+
+   if (!(intel_uc_uses_guc_submission(&i915->gt.uc)))
+   return -ENODEV;

This needs a FIXME comment to say that exec list will be added later.


+
+   if (get_user(slot, &ext->engine_index))
+   return -EFAULT;
+
+   if (get_user(width, &ext->width))
+   return -EFAULT;
+
+   if (get_user(num_siblings, &ext->num_siblings))
+   return -EFAULT;
+
+   if (slot >= set->num_engines) {
+   drm_dbg(&i915->drm, "Invalid placement value, %d >= %d\n",
+   slot, set->num_engines);
+   return -EINVAL;
+   }
+
+   if (set->engines[slot].type != I915_GEM_ENGINE_TYPE_INVALID) {
+   drm_dbg(&i915->drm,
+   "Invalid placement[%d], already occupied\n", slot);
+   return -EINVAL;
+   }
+
+   if (get_user(flags, &ext->flags))
+   return -EFAULT;
+
+   if (flags) {
+   drm_dbg(&i915->drm, "Unknown flags 0x%02llx", flags);
+   return -EINVAL;
+   }
+
+   for (n = 0; n < ARRAY_SIZE(ext->mbz64); n++) {
+   err = check_user_mbz(&ext->mbz64[n]);
+   if (err)
+   return err;
+   }
+
+   if (width < 2) {
+   drm_dbg(&i915->drm, "Width (%d) < 2\n", width);
+   return -EINVAL;
+   }
+
+   if (num_siblings < 1) {
+   drm_dbg(&i915->drm, "Number siblings (%d) < 1\n",
+   num_siblings);
+   return -EINVAL;
+   }
+
+   siblings = kmalloc_array(num_siblings * width,
+sizeof(*siblings),
+GFP_KERNEL);
+   if (!siblings)
+   return -ENOMEM;
+
+   /* Create contexts / engines */
+   for (i = 0; i < width; ++i) {
+   intel_engine_mask_t current_mask = 0;
+   struct i915_engine_class_instance prev_engine;
+
+   for (j = 0; j < num_siblings; ++j) {
+   struct i915_engine_class_instance ci;
+
+   n = i * num_siblings + j;
+   if (copy_from_user(&ci, &ext->engines[n], sizeof(ci))) {
+   err = -EFAULT;
+   goto out_err;
+   }
+
+   siblings[n] =
+   intel_engine_lookup_user(i915, ci.engine_class,
+ci.engine_instance);
+   if (!siblings[n]) {
+   drm_dbg(&i915->drm,
+   "Invalid sibling[%d]: { class:%d, inst:%d 
}\n",
+   n, ci.engine_class, ci.engine_instance);
+   er

Re: [Intel-gfx] [PATCH 21/27] drm/i915/doc: Update parallel submit doc to point to i915_drm.h

2021-09-20 Thread John Harrison

On 8/20/2021 15:44, Matthew Brost wrote:

Update parallel submit doc to point to i915_drm.h

Signed-off-by: Matthew Brost 

Reviewed-by: John Harrison 



---
  Documentation/gpu/rfc/i915_parallel_execbuf.h | 122 --
  Documentation/gpu/rfc/i915_scheduler.rst  |   4 +-
  2 files changed, 2 insertions(+), 124 deletions(-)
  delete mode 100644 Documentation/gpu/rfc/i915_parallel_execbuf.h

diff --git a/Documentation/gpu/rfc/i915_parallel_execbuf.h 
b/Documentation/gpu/rfc/i915_parallel_execbuf.h
deleted file mode 100644
index 8cbe2c4e0172..
--- a/Documentation/gpu/rfc/i915_parallel_execbuf.h
+++ /dev/null
@@ -1,122 +0,0 @@
-/* SPDX-License-Identifier: MIT */
-/*
- * Copyright © 2021 Intel Corporation
- */
-
-#define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see 
i915_context_engines_parallel_submit */
-
-/**
- * struct drm_i915_context_engines_parallel_submit - Configure engine for
- * parallel submission.
- *
- * Setup a slot in the context engine map to allow multiple BBs to be submitted
- * in a single execbuf IOCTL. Those BBs will then be scheduled to run on the 
GPU
- * in parallel. Multiple hardware contexts are created internally in the i915
- * run these BBs. Once a slot is configured for N BBs only N BBs can be
- * submitted in each execbuf IOCTL and this is implicit behavior e.g. The user
- * doesn't tell the execbuf IOCTL there are N BBs, the execbuf IOCTL knows how
- * many BBs there are based on the slot's configuration. The N BBs are the last
- * N buffer objects or first N if I915_EXEC_BATCH_FIRST is set.
- *
- * The default placement behavior is to create implicit bonds between each
- * context if each context maps to more than 1 physical engine (e.g. context is
- * a virtual engine). Also we only allow contexts of same engine class and 
these
- * contexts must be in logically contiguous order. Examples of the placement
- * behavior described below. Lastly, the default is to not allow BBs to
- * preempted mid BB rather insert coordinated preemption on all hardware
- * contexts between each set of BBs. Flags may be added in the future to change
- * both of these default behaviors.
- *
- * Returns -EINVAL if hardware context placement configuration is invalid or if
- * the placement configuration isn't supported on the platform / submission
- * interface.
- * Returns -ENODEV if extension isn't supported on the platform / submission
- * interface.
- *
- * .. code-block:: none
- *
- * Example 1 pseudo code:
- * CS[X] = generic engine of same class, logical instance X
- * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
- * set_engines(INVALID)
- * set_parallel(engine_index=0, width=2, num_siblings=1,
- *  engines=CS[0],CS[1])
- *
- * Results in the following valid placement:
- * CS[0], CS[1]
- *
- * Example 2 pseudo code:
- * CS[X] = generic engine of same class, logical instance X
- * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
- * set_engines(INVALID)
- * set_parallel(engine_index=0, width=2, num_siblings=2,
- *  engines=CS[0],CS[2],CS[1],CS[3])
- *
- * Results in the following valid placements:
- * CS[0], CS[1]
- * CS[2], CS[3]
- *
- * This can also be thought of as 2 virtual engines described by 2-D array
- * in the engines the field with bonds placed between each index of the
- * virtual engines. e.g. CS[0] is bonded to CS[1], CS[2] is bonded to
- * CS[3].
- * VE[0] = CS[0], CS[2]
- * VE[1] = CS[1], CS[3]
- *
- * Example 3 pseudo code:
- * CS[X] = generic engine of same class, logical instance X
- * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
- * set_engines(INVALID)
- * set_parallel(engine_index=0, width=2, num_siblings=2,
- *  engines=CS[0],CS[1],CS[1],CS[3])
- *
- * Results in the following valid and invalid placements:
- * CS[0], CS[1]
- * CS[1], CS[3] - Not logical contiguous, return -EINVAL
- */
-struct drm_i915_context_engines_parallel_submit {
-   /**
-* @base: base user extension.
-*/
-   struct i915_user_extension base;
-
-   /**
-* @engine_index: slot for parallel engine
-*/
-   __u16 engine_index;
-
-   /**
-* @width: number of contexts per parallel engine
-*/
-   __u16 width;
-
-   /**
-* @num_siblings: number of siblings per context
-*/
-   __u16 num_siblings;
-
-   /**
-* @mbz16: reserved for future use; must be zero
-*/
-   __u16 mbz16;
-
-   /**
-* @flags: all undefined flags must be zero, currently not defined flags
-*/
-   __u64 flags;
-
-   /**
-* @mbz64: reserved for future use; must be zero
-*/
-   __u64 mbz64[3];
-
-   /**
-* @engines: 2-d array of engine instances to configure parallel engine
-*
-

Re: [Intel-gfx] [PATCH 15/27] drm/i915/guc: Implement multi-lrc submission

2021-09-22 Thread John Harrison

On 9/22/2021 09:25, Matthew Brost wrote:

On Mon, Sep 20, 2021 at 02:48:52PM -0700, John Harrison wrote:

On 8/20/2021 15:44, Matthew Brost wrote:

Implement multi-lrc submission via a single workqueue entry and single
H2G. The workqueue entry contains an updated tail value for each
request, of all the contexts in the multi-lrc submission, and updates
these values simultaneously. As such, the tasklet and bypass path have
been updated to coalesce requests into a single submission.

Signed-off-by: Matthew Brost 
---
   drivers/gpu/drm/i915/gt/uc/intel_guc.c|  21 ++
   drivers/gpu/drm/i915/gt/uc/intel_guc.h|   8 +
   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |  24 +-
   drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   6 +-
   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 312 +++---
   drivers/gpu/drm/i915/i915_request.h   |   8 +
   6 files changed, 317 insertions(+), 62 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index fbfcae727d7f..879aef662b2e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -748,3 +748,24 @@ void intel_guc_load_status(struct intel_guc *guc, struct 
drm_printer *p)
}
}
   }
+
+void intel_guc_write_barrier(struct intel_guc *guc)
+{
+   struct intel_gt *gt = guc_to_gt(guc);
+
+   if (i915_gem_object_is_lmem(guc->ct.vma->obj)) {
+   GEM_BUG_ON(guc->send_regs.fw_domains);

Granted, this patch is just moving code from one file to another not
changing it. However, I think it would be worth adding a blank line in here.
Otherwise the 'this register' comment below can be confusingly read as
referring to the send_regs.fw_domain entry above.

And maybe add a comment why it is a bug for the send_regs value to be set?
I'm not seeing any obvious connection between it and the reset of this code.


Can add a blank line. I think the GEM_BUG_ON relates to being able to
use intel_uncore_write_fw vs intel_uncore_write. Can add comment.


+   /*
+* This register is used by the i915 and GuC for MMIO based
+* communication. Once we are in this code CTBs are the only
+* method the i915 uses to communicate with the GuC so it is
+* safe to write to this register (a value of 0 is NOP for MMIO
+* communication). If we ever start mixing CTBs and MMIOs a new
+* register will have to be chosen.
+*/
+   intel_uncore_write_fw(gt->uncore, GEN11_SOFT_SCRATCH(0), 0);
+   } else {
+   /* wmb() sufficient for a barrier if in smem */
+   wmb();
+   }
+}
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 3f95b1b4f15c..0ead2406d03c 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -37,6 +37,12 @@ struct intel_guc {
/* Global engine used to submit requests to GuC */
struct i915_sched_engine *sched_engine;
struct i915_request *stalled_request;
+   enum {
+   STALL_NONE,
+   STALL_REGISTER_CONTEXT,
+   STALL_MOVE_LRC_TAIL,
+   STALL_ADD_REQUEST,
+   } submission_stall_reason;
/* intel_guc_recv interrupt related state */
spinlock_t irq_lock;
@@ -332,4 +338,6 @@ void intel_guc_submission_cancel_requests(struct intel_guc 
*guc);
   void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
+void intel_guc_write_barrier(struct intel_guc *guc);
+
   #endif
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 20c710a74498..10d1878d2826 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -377,28 +377,6 @@ static u32 ct_get_next_fence(struct intel_guc_ct *ct)
return ++ct->requests.last_fence;
   }
-static void write_barrier(struct intel_guc_ct *ct)
-{
-   struct intel_guc *guc = ct_to_guc(ct);
-   struct intel_gt *gt = guc_to_gt(guc);
-
-   if (i915_gem_object_is_lmem(guc->ct.vma->obj)) {
-   GEM_BUG_ON(guc->send_regs.fw_domains);
-   /*
-* This register is used by the i915 and GuC for MMIO based
-* communication. Once we are in this code CTBs are the only
-* method the i915 uses to communicate with the GuC so it is
-* safe to write to this register (a value of 0 is NOP for MMIO
-* communication). If we ever start mixing CTBs and MMIOs a new
-* register will have to be chosen.
-*/
-   intel_uncore_write_fw(gt->uncore, GEN11_SOFT_SCRATCH(0), 0);
-   } else {
-   /* wmb() sufficient for a barrier if in smem */
-   wmb();
- 

Re: [Intel-gfx] [PATCH 22/27] drm/i915/guc: Add basic GuC multi-lrc selftest

2021-09-28 Thread John Harrison

On 8/20/2021 15:44, Matthew Brost wrote:

Add very basic (single submission) multi-lrc selftest.

Signed-off-by: Matthew Brost 
---
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c |   1 +
  .../drm/i915/gt/uc/selftest_guc_multi_lrc.c   | 180 ++
  .../drm/i915/selftests/i915_live_selftests.h  |   1 +
  3 files changed, 182 insertions(+)
  create mode 100644 drivers/gpu/drm/i915/gt/uc/selftest_guc_multi_lrc.c

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 2554d0eb4afd..91330525330d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -3924,4 +3924,5 @@ bool intel_guc_virtual_engine_has_heartbeat(const struct 
intel_engine_cs *ve)
  
  #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)

  #include "selftest_guc.c"
+#include "selftest_guc_multi_lrc.c"
  #endif
diff --git a/drivers/gpu/drm/i915/gt/uc/selftest_guc_multi_lrc.c 
b/drivers/gpu/drm/i915/gt/uc/selftest_guc_multi_lrc.c
new file mode 100644
index ..dacfc5dfadd6
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/uc/selftest_guc_multi_lrc.c
@@ -0,0 +1,180 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright �� 2019 Intel Corporation
+ */
+
+#include "selftests/igt_spinner.h"
+#include "selftests/igt_reset.h"
+#include "selftests/intel_scheduler_helpers.h"
+#include "gt/intel_engine_heartbeat.h"
+#include "gem/selftests/mock_context.h"
+
+static void logical_sort(struct intel_engine_cs **engines, int num_engines)
+{
+   struct intel_engine_cs *sorted[MAX_ENGINE_INSTANCE + 1];
+   int i, j;
+
+   for (i = 0; i < num_engines; ++i)
+   for (j = 0; j < MAX_ENGINE_INSTANCE + 1; ++j) {
+   if (engines[j]->logical_mask & BIT(i)) {
+   sorted[i] = engines[j];
+   break;
+   }
+   }
+
+   memcpy(*engines, *sorted,
+  sizeof(struct intel_engine_cs *) * num_engines);
+}
+
+static struct intel_context *
+multi_lrc_create_parent(struct intel_gt *gt, u8 class,
+   unsigned long flags)
+{
+   struct intel_engine_cs *siblings[MAX_ENGINE_INSTANCE + 1];
+   struct intel_engine_cs *engine;
+   enum intel_engine_id id;
+   int i = 0;
+
+   for_each_engine(engine, gt, id) {
+   if (engine->class != class)
+   continue;
+
+   siblings[i++] = engine;
+   }
+
+   if (i <= 1)
+   return ERR_PTR(0);
+
+   logical_sort(siblings, i);
+
+   return intel_engine_create_parallel(siblings, 1, i);
+}
+
+static void multi_lrc_context_unpin(struct intel_context *ce)
+{
+   struct intel_context *child;
+
+   GEM_BUG_ON(!intel_context_is_parent(ce));
+
+   for_each_child(ce, child)
+   intel_context_unpin(child);
+   intel_context_unpin(ce);
+}
+
+static void multi_lrc_context_put(struct intel_context *ce)
+{
+   GEM_BUG_ON(!intel_context_is_parent(ce));
+
+   /*
+* Only the parent gets the creation ref put in the uAPI, the parent
+* itself is responsible for creation ref put on the children.
+*/
+   intel_context_put(ce);
+}
+
+static struct i915_request *
+multi_lrc_nop_request(struct intel_context *ce)
+{
+   struct intel_context *child;
+   struct i915_request *rq, *child_rq;
+   int i = 0;
+
+   GEM_BUG_ON(!intel_context_is_parent(ce));
+
+   rq = intel_context_create_request(ce);
+   if (IS_ERR(rq))
+   return rq;
+
+   i915_request_get(rq);
+   i915_request_add(rq);
+
+   for_each_child(ce, child) {
+   child_rq = intel_context_create_request(child);
+   if (IS_ERR(child_rq))
+   goto child_error;
+
+   if (++i == ce->guc_number_children)
+   set_bit(I915_FENCE_FLAG_SUBMIT_PARALLEL,
+   &child_rq->fence.flags);
+   i915_request_add(child_rq);
+   }
+
+   return rq;
+
+child_error:
+   i915_request_put(rq);
+
+   return ERR_PTR(-ENOMEM);
+}
+
+static int __intel_guc_multi_lrc_basic(struct intel_gt *gt, unsigned int class)
+{
+   struct intel_context *parent;
+   struct i915_request *rq;
+   int ret;
+
+   parent = multi_lrc_create_parent(gt, class, 0);
+   if (IS_ERR(parent)) {
+   pr_err("Failed creating contexts: %ld", PTR_ERR(parent));
+   return PTR_ERR(parent);
+   } else if (!parent) {
+   pr_debug("Not enough engines in class: %d",
+VIDEO_DECODE_CLASS);

Should be 'class'.

With that fixed:
Reviewed-by: John Harrison 


+   return 0;
+   }
+
+   rq = multi_lrc_nop_request(parent);
+   if (IS_ER

Re: [Intel-gfx] [PATCH 23/27] drm/i915/guc: Implement no mid batch preemption for multi-lrc

2021-09-28 Thread John Harrison

On 8/20/2021 15:44, Matthew Brost wrote:

For some users of multi-lrc, e.g. split frame, it isn't safe to preempt
mid BB. To safely enable preemption at the BB boundary, a handshake
between to parent and child is needed. This is implemented via custom
emit_bb_start & emit_fini_breadcrumb functions and enabled via by

via by -> by


default if a context is configured by set parallel extension.
I tend to agree with Tvrtko that this should probably be an opt in 
change. Is there a flags word passed in when creating the context?


Also, it's not just a change in pre-emption behaviour but a change in 
synchronisation too, right? Previously, if you had a whole bunch of back 
to back submissions then one child could run ahead of another and/or the 
parent. After this change, there is a forced regroup at the end of each 
batch. So while one could end sooner/later than the others, they can't 
ever get an entire batch (or more) ahead or behind. Or was that 
synchronisation already in there through other means anyway?




Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/intel_context.c   |   2 +-
  drivers/gpu/drm/i915/gt/intel_context_types.h |   3 +
  drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   2 +-
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 283 +-
  4 files changed, 287 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index 5615be32879c..2de62649e275 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -561,7 +561,7 @@ void intel_context_bind_parent_child(struct intel_context 
*parent,
GEM_BUG_ON(intel_context_is_child(child));
GEM_BUG_ON(intel_context_is_parent(child));
  
-	parent->guc_number_children++;

+   child->guc_child_index = parent->guc_number_children++;
list_add_tail(&child->guc_child_link,
  &parent->guc_child_list);
child->parent = parent;
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 713d85b0b364..727f91e7f7c2 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -246,6 +246,9 @@ struct intel_context {
/** @guc_number_children: number of children if parent */
u8 guc_number_children;
  
+		/** @guc_child_index: index into guc_child_list if child */

+   u8 guc_child_index;
+
/**
 * @parent_page: page in context used by parent for work queue,
 * work queue descriptor
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
index 6cd26dc060d1..9f61cfa5566a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -188,7 +188,7 @@ struct guc_process_desc {
u32 wq_status;
u32 engine_presence;
u32 priority;
-   u32 reserved[30];
+   u32 reserved[36];
What is this extra space for? All the extra storage is grabbed from 
after the end of this structure, isn't it?



  } __packed;
  
  #define CONTEXT_REGISTRATION_FLAG_KMD	BIT(0)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 91330525330d..1a18f99bf12a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -11,6 +11,7 @@
  #include "gt/intel_context.h"
  #include "gt/intel_engine_pm.h"
  #include "gt/intel_engine_heartbeat.h"
+#include "gt/intel_gpu_commands.h"
  #include "gt/intel_gt.h"
  #include "gt/intel_gt_irq.h"
  #include "gt/intel_gt_pm.h"
@@ -366,10 +367,14 @@ static struct i915_priolist *to_priolist(struct rb_node 
*rb)
  
  /*

   * When using multi-lrc submission an extra page in the context state is
- * reserved for the process descriptor and work queue.
+ * reserved for the process descriptor, work queue, and preempt BB boundary
+ * handshake between the parent + childlren contexts.
   *
   * The layout of this page is below:
   * 0  guc_process_desc
+ * + sizeof(struct guc_process_desc)   child go
+ * + CACHELINE_BYTES   child join ...
+ * + CACHELINE_BYTES ...
Would be better written as '[num_children]' instead of '...' to make it 
clear it is a per child array.


Also, maybe create a struct for this to get rid of the magic '+1's and 
'BYTES / sizeof' constructs in the functions below.



   * ...unused
   * PAGE_SIZE / 2  work queue start
   * ...work queue
@@ -1785,6 +1790,30 @@ static int deregister_context(struct intel_context *ce, 
u32 guc_id, bool loop)
return __guc_action_deregister_context(guc, guc_id, loop);
  }
  
+static in

Re: [Intel-gfx] [PATCH 23/27] drm/i915/guc: Implement no mid batch preemption for multi-lrc

2021-09-28 Thread John Harrison

On 9/28/2021 15:33, Matthew Brost wrote:

On Tue, Sep 28, 2021 at 03:20:42PM -0700, John Harrison wrote:

On 8/20/2021 15:44, Matthew Brost wrote:

For some users of multi-lrc, e.g. split frame, it isn't safe to preempt
mid BB. To safely enable preemption at the BB boundary, a handshake
between to parent and child is needed. This is implemented via custom
emit_bb_start & emit_fini_breadcrumb functions and enabled via by

via by -> by


default if a context is configured by set parallel extension.

I tend to agree with Tvrtko that this should probably be an opt in change.
Is there a flags word passed in when creating the context?


I don't disagree but the uAPI in this series is where we landed. It has
been acked all by the relevant parties in the RFC, ported to our
internal tree, and the media UMD has been updated / posted. Concerns
with the uAPI should've been raised in the RFC phase, not now. I really
don't feel like changing this uAPI another time.
The counter argument is that once a UAPI has been merged, it cannot be 
changed. Ever. So it is worth taking the trouble to get it right first 
time.


The proposal isn't a major re-write of the interface. It is simply a 
request to set an extra flag when creating the context.






Also, it's not just a change in pre-emption behaviour but a change in
synchronisation too, right? Previously, if you had a whole bunch of back to
back submissions then one child could run ahead of another and/or the
parent. After this change, there is a forced regroup at the end of each
batch. So while one could end sooner/later than the others, they can't ever
get an entire batch (or more) ahead or behind. Or was that synchronisation
already in there through other means anyway?


Yes, each parent / child sync at the of each batch - this is the only
way safely insert preemption points. Without this the GuC could attempt
a preemption and hang the batches.
To be clear, I'm not saying that this is wrong. I'm just saying that 
this appears to be new behaviour with this patch but it is not 
explicitly called out in the description of the patch.






Signed-off-by: Matthew Brost 
---
   drivers/gpu/drm/i915/gt/intel_context.c   |   2 +-
   drivers/gpu/drm/i915/gt/intel_context_types.h |   3 +
   drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   2 +-
   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 283 +-
   4 files changed, 287 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index 5615be32879c..2de62649e275 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -561,7 +561,7 @@ void intel_context_bind_parent_child(struct intel_context 
*parent,
GEM_BUG_ON(intel_context_is_child(child));
GEM_BUG_ON(intel_context_is_parent(child));
-   parent->guc_number_children++;
+   child->guc_child_index = parent->guc_number_children++;
list_add_tail(&child->guc_child_link,
  &parent->guc_child_list);
child->parent = parent;
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 713d85b0b364..727f91e7f7c2 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -246,6 +246,9 @@ struct intel_context {
/** @guc_number_children: number of children if parent */
u8 guc_number_children;
+   /** @guc_child_index: index into guc_child_list if child */
+   u8 guc_child_index;
+
/**
 * @parent_page: page in context used by parent for work queue,
 * work queue descriptor
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
index 6cd26dc060d1..9f61cfa5566a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -188,7 +188,7 @@ struct guc_process_desc {
u32 wq_status;
u32 engine_presence;
u32 priority;
-   u32 reserved[30];
+   u32 reserved[36];

What is this extra space for? All the extra storage is grabbed from after
the end of this structure, isn't it?


This is the size of process descriptor in the GuC spec. Even though this
is unused space we really don't want the child go / join memory using
anything within the process descriptor.
Okay. So it's more that the code was previously broken and we just 
hadn't hit a problem because of it? Again, worth adding a comment in the 
description to call it out as a bug fix.





   } __packed;
   #define CONTEXT_REGISTRATION_FLAG_KMDBIT(0)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 91330525330d..1a18f99bf12a 100644
--- a/drivers

Re: [Intel-gfx] [PATCH 25/27] drm/i915/guc: Handle errors in multi-lrc requests

2021-09-29 Thread John Harrison

On 8/20/2021 15:44, Matthew Brost wrote:

If an error occurs in the front end when multi-lrc requests are getting
generated we need to skip these in the backend but we still need to
emit the breadcrumbs seqno. An issues arrises because with multi-lrc

arrises -> arises


breadcrumbs there is a handshake between the parent and children to make
forwad progress. If all the requests are not present this handshake

forwad -> forward


doesn't work. To work around this, if multi-lrc request has an error we
skip the handshake but still emit the breadcrumbs seqno.

Signed-off-by: Matthew Brost 
---
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 61 ++-
  1 file changed, 58 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 2ef38557b0f0..61e737fd1eee 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -3546,8 +3546,8 @@ static int 
emit_bb_start_child_no_preempt_mid_batch(struct i915_request *rq,
  }
  
  static u32 *

-emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq,
-u32 *cs)
+__emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq,
+  u32 *cs)
  {
struct intel_context *ce = rq->context;
u8 i;
@@ -3575,6 +3575,41 @@ emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct 
i915_request *rq,
  get_children_go_addr(ce),
  0);
  
+	return cs;

+}
+
+/*
+ * If this true, a submission of multi-lrc requests had an error and the
+ * requests need to be skipped. The front end (execuf IOCTL) should've called
+ * i915_request_skip which squashes the BB but we still need to emit the fini
+ * breadrcrumbs seqno write. At this point we don't know how many of the
+ * requests in the multi-lrc submission were generated so we can't do the
+ * handshake between the parent and children (e.g. if 4 requests should be
+ * generated but 2nd hit an error only 1 would be seen by the GuC backend).
+ * Simply skip the handshake, but still emit the breadcrumbd seqno, if an error
+ * has occurred on any of the requests in submission / relationship.
+ */
+static inline bool skip_handshake(struct i915_request *rq)
+{
+   return test_bit(I915_FENCE_FLAG_SKIP_PARALLEL, &rq->fence.flags);
+}
+
+static u32 *
+emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq,
+u32 *cs)
+{
+   struct intel_context *ce = rq->context;
+
+   GEM_BUG_ON(!intel_context_is_parent(ce));
+
+   if (unlikely(skip_handshake(rq))) {
+   memset(cs, 0, sizeof(u32) *
+  (ce->engine->emit_fini_breadcrumb_dw - 6));
+   cs += ce->engine->emit_fini_breadcrumb_dw - 6;
Why -6? There are 12 words about to be written. Indeed the value of 
emit_..._dw is '12 + 4*num_children'. This should only be skipping over 
the 4*children, right? As it stands, it will skip all but the last six 
words, then write an extra twelve words and thus overflow the 
reservation by six. Unless I am totally confused?


I assume there is some reason why the amount of data written must 
exactly match the space reserved? It's a while since I've looked at the 
ring buffer code!


Seems like it would be clearer to not split the semaphore writes out but 
have them right next to the skip code that is meant to replicate them 
but with no-ops.



+   } else {
+   cs = __emit_fini_breadcrumb_parent_no_preempt_mid_batch(rq, cs);
+   }
+
/* Emit fini breadcrumb */
cs = gen8_emit_ggtt_write(cs,
  rq->fence.seqno,
@@ -3591,7 +3626,8 @@ emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct 
i915_request *rq,
  }
  
  static u32 *

-emit_fini_breadcrumb_child_no_preempt_mid_batch(struct i915_request *rq, u32 
*cs)
+__emit_fini_breadcrumb_child_no_preempt_mid_batch(struct i915_request *rq,
+ u32 *cs)
  {
struct intel_context *ce = rq->context;
  
@@ -3617,6 +3653,25 @@ emit_fini_breadcrumb_child_no_preempt_mid_batch(struct i915_request *rq, u32 *cs

*cs++ = get_children_go_addr(ce->parent);
*cs++ = 0;
  
+	return cs;

+}
+
+static u32 *
+emit_fini_breadcrumb_child_no_preempt_mid_batch(struct i915_request *rq,
+   u32 *cs)
+{
+   struct intel_context *ce = rq->context;
+
+   GEM_BUG_ON(!intel_context_is_child(ce));
+
+   if (unlikely(skip_handshake(rq))) {
+   memset(cs, 0, sizeof(u32) *
+  (ce->engine->emit_fini_breadcrumb_dw - 6));
+   cs += ce->engine->emit_fini_breadcrumb_dw - 6;
+   } else {
+   cs = __emit_fini_breadcrumb_child_no_preempt_m

Re: [Intel-gfx] [PATCH 23/51] drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs

2021-07-20 Thread John Harrison

On 7/16/2021 13:16, Matthew Brost wrote:

With GuC virtual engines the physical engine which a request executes
and completes on isn't known to the i915. Therefore we can't attach a
request to a physical engines breadcrumbs. To work around this we create
a single breadcrumbs per engine class when using GuC submission and
direct all physical engine interrupts to this breadcrumbs.

v2:
  (John H)
   - Rework header file structure so intel_engine_mask_t can be in
 intel_engine_types.h

Signed-off-by: Matthew Brost 
CC: John Harrison 

Reviewed-by: John Harrison 


---
  drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   | 41 +---
  drivers/gpu/drm/i915/gt/intel_breadcrumbs.h   | 16 -
  .../gpu/drm/i915/gt/intel_breadcrumbs_types.h |  7 ++
  drivers/gpu/drm/i915/gt/intel_engine.h|  3 +
  drivers/gpu/drm/i915/gt/intel_engine_cs.c | 28 +++-
  drivers/gpu/drm/i915/gt/intel_engine_types.h  |  2 +-
  .../drm/i915/gt/intel_execlists_submission.c  |  2 +-
  drivers/gpu/drm/i915/gt/mock_engine.c |  4 +-
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 67 +--
  9 files changed, 133 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c 
b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index 38cc42783dfb..2007dc6f6b99 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -15,28 +15,14 @@
  #include "intel_gt_pm.h"
  #include "intel_gt_requests.h"
  
-static bool irq_enable(struct intel_engine_cs *engine)

+static bool irq_enable(struct intel_breadcrumbs *b)
  {
-   if (!engine->irq_enable)
-   return false;
-
-   /* Caller disables interrupts */
-   spin_lock(&engine->gt->irq_lock);
-   engine->irq_enable(engine);
-   spin_unlock(&engine->gt->irq_lock);
-
-   return true;
+   return intel_engine_irq_enable(b->irq_engine);
  }
  
-static void irq_disable(struct intel_engine_cs *engine)

+static void irq_disable(struct intel_breadcrumbs *b)
  {
-   if (!engine->irq_disable)
-   return;
-
-   /* Caller disables interrupts */
-   spin_lock(&engine->gt->irq_lock);
-   engine->irq_disable(engine);
-   spin_unlock(&engine->gt->irq_lock);
+   intel_engine_irq_disable(b->irq_engine);
  }
  
  static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)

@@ -57,7 +43,7 @@ static void __intel_breadcrumbs_arm_irq(struct 
intel_breadcrumbs *b)
WRITE_ONCE(b->irq_armed, true);
  
  	/* Requests may have completed before we could enable the interrupt. */

-   if (!b->irq_enabled++ && irq_enable(b->irq_engine))
+   if (!b->irq_enabled++ && b->irq_enable(b))
irq_work_queue(&b->irq_work);
  }
  
@@ -76,7 +62,7 @@ static void __intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b)

  {
GEM_BUG_ON(!b->irq_enabled);
if (!--b->irq_enabled)
-   irq_disable(b->irq_engine);
+   b->irq_disable(b);
  
  	WRITE_ONCE(b->irq_armed, false);

intel_gt_pm_put_async(b->irq_engine->gt);
@@ -281,7 +267,7 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine)
if (!b)
return NULL;
  
-	b->irq_engine = irq_engine;

+   kref_init(&b->ref);
  
  	spin_lock_init(&b->signalers_lock);

INIT_LIST_HEAD(&b->signalers);
@@ -290,6 +276,10 @@ intel_breadcrumbs_create(struct intel_engine_cs 
*irq_engine)
spin_lock_init(&b->irq_lock);
init_irq_work(&b->irq_work, signal_irq_work);
  
+	b->irq_engine = irq_engine;

+   b->irq_enable = irq_enable;
+   b->irq_disable = irq_disable;
+
return b;
  }
  
@@ -303,9 +293,9 @@ void intel_breadcrumbs_reset(struct intel_breadcrumbs *b)

spin_lock_irqsave(&b->irq_lock, flags);
  
  	if (b->irq_enabled)

-   irq_enable(b->irq_engine);
+   b->irq_enable(b);
else
-   irq_disable(b->irq_engine);
+   b->irq_disable(b);
  
  	spin_unlock_irqrestore(&b->irq_lock, flags);

  }
@@ -325,11 +315,14 @@ void __intel_breadcrumbs_park(struct intel_breadcrumbs *b)
}
  }
  
-void intel_breadcrumbs_free(struct intel_breadcrumbs *b)

+void intel_breadcrumbs_free(struct kref *kref)
  {
+   struct intel_breadcrumbs *b = container_of(kref, typeof(*b), ref);
+
irq_work_sync(&b->irq_work);
GEM_BUG_ON(!list_empty(&b->signalers));
GEM_BUG_ON(b->irq_armed);
+
kfree(b);
  }
  
diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h

index 3ce5ce270b04..be0d4f379a85 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
@@ -9,7 +9,7 @@
  #include 

Re: [Intel-gfx] [PATCH 15/51] drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC

2021-07-20 Thread John Harrison

On 7/19/2021 18:53, Matthew Brost wrote:

On Mon, Jul 19, 2021 at 06:03:05PM -0700, John Harrison wrote:

On 7/16/2021 13:16, Matthew Brost wrote:

When running the GuC the GPU can't be considered idle if the GuC still
has contexts pinned. As such, a call has been added in
intel_gt_wait_for_idle to idle the UC and in turn the GuC by waiting for
the number of unpinned contexts to go to zero.

v2: rtimeout -> remaining_timeout
v3: Drop unnecessary includes, guc_submission_busy_loop ->
guc_submission_send_busy_loop, drop negatie timeout trick, move a
refactor of guc_context_unpin to earlier path (John H)

Cc: John Harrison 
Signed-off-by: Matthew Brost 
---
   drivers/gpu/drm/i915/gem/i915_gem_mman.c  |  3 +-
   drivers/gpu/drm/i915/gt/intel_gt.c| 19 +
   drivers/gpu/drm/i915/gt/intel_gt.h|  2 +
   drivers/gpu/drm/i915/gt/intel_gt_requests.c   | 21 ++---
   drivers/gpu/drm/i915/gt/intel_gt_requests.h   |  7 +-
   drivers/gpu/drm/i915/gt/uc/intel_guc.h|  4 +
   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |  1 +
   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  4 +
   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 85 +--
   drivers/gpu/drm/i915/gt/uc/intel_uc.h |  5 ++
   drivers/gpu/drm/i915/i915_gem_evict.c |  1 +
   .../gpu/drm/i915/selftests/igt_live_test.c|  2 +-
   .../gpu/drm/i915/selftests/mock_gem_device.c  |  3 +-
   13 files changed, 129 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index a90f796e85c0..6fffd4d377c2 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -645,7 +645,8 @@ mmap_offset_attach(struct drm_i915_gem_object *obj,
goto insert;
/* Attempt to reap some mmap space from dead objects */
-   err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT);
+   err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT,
+  NULL);
if (err)
goto err;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c 
b/drivers/gpu/drm/i915/gt/intel_gt.c
index e714e21c0a4d..acfdd53b2678 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -585,6 +585,25 @@ static void __intel_gt_disable(struct intel_gt *gt)
GEM_BUG_ON(intel_gt_pm_is_awake(gt));
   }
+int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
+{
+   long remaining_timeout;
+
+   /* If the device is asleep, we have no requests outstanding */
+   if (!intel_gt_pm_is_awake(gt))
+   return 0;
+
+   while ((timeout = intel_gt_retire_requests_timeout(gt, timeout,
+  &remaining_timeout)) 
> 0) {
+   cond_resched();
+   if (signal_pending(current))
+   return -EINTR;
+   }
+
+   return timeout ? timeout : intel_uc_wait_for_idle(>->uc,
+ remaining_timeout);
+}
+
   int intel_gt_init(struct intel_gt *gt)
   {
int err;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h 
b/drivers/gpu/drm/i915/gt/intel_gt.h
index e7aabe0cc5bf..74e771871a9b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -48,6 +48,8 @@ void intel_gt_driver_release(struct intel_gt *gt);
   void intel_gt_driver_late_release(struct intel_gt *gt);
+int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
+
   void intel_gt_check_and_clear_faults(struct intel_gt *gt);
   void intel_gt_clear_error_registers(struct intel_gt *gt,
intel_engine_mask_t engine_mask);
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c 
b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
index 647eca9d867a..edb881d75630 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
@@ -130,7 +130,8 @@ void intel_engine_fini_retire(struct intel_engine_cs 
*engine)
GEM_BUG_ON(engine->retire);
   }
-long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout)
+long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
+ long *remaining_timeout)
   {
struct intel_gt_timelines *timelines = >->timelines;
struct intel_timeline *tl, *tn;
@@ -195,22 +196,10 @@ out_active:   spin_lock(&timelines->lock);
if (flush_submission(gt, timeout)) /* Wait, there's more! */
active_count++;
-   return active_count ? timeout : 0;
-}
-
-int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
-{
-   /* If the device is asleep, we have no requests outstanding */
-   if (!intel_gt_pm_is_awake(gt))
-   return 0;
-
-   while ((t

Re: [Intel-gfx] [PATCH 24/51] drm/i915: Add i915_sched_engine destroy vfunc

2021-07-20 Thread John Harrison

On 7/16/2021 13:16, Matthew Brost wrote:

This help the backends clean up when the schedule engine object gets
help -> helps. Although, I would say it's more like 'this is required to 
allow backend specific cleanup'. It doesn't just make life a bit easier, 
it allows us to not leak stuff and/or dereference null pointers!


Either way...
Reviewed-by: John Harrison 


destroyed.

Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/i915_scheduler.c   | 3 ++-
  drivers/gpu/drm/i915/i915_scheduler.h   | 4 +---
  drivers/gpu/drm/i915/i915_scheduler_types.h | 5 +
  3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
b/drivers/gpu/drm/i915/i915_scheduler.c
index 3a58a9130309..4fceda96deed 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -431,7 +431,7 @@ void i915_request_show_with_schedule(struct drm_printer *m,
rcu_read_unlock();
  }
  
-void i915_sched_engine_free(struct kref *kref)

+static void default_destroy(struct kref *kref)
  {
struct i915_sched_engine *sched_engine =
container_of(kref, typeof(*sched_engine), ref);
@@ -453,6 +453,7 @@ i915_sched_engine_create(unsigned int subclass)
  
  	sched_engine->queue = RB_ROOT_CACHED;

sched_engine->queue_priority_hint = INT_MIN;
+   sched_engine->destroy = default_destroy;
  
  	INIT_LIST_HEAD(&sched_engine->requests);

INIT_LIST_HEAD(&sched_engine->hold);
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h 
b/drivers/gpu/drm/i915/i915_scheduler.h
index 650ab8e0db9f..3c9504e9f409 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -51,8 +51,6 @@ static inline void i915_priolist_free(struct i915_priolist *p)
  struct i915_sched_engine *
  i915_sched_engine_create(unsigned int subclass);
  
-void i915_sched_engine_free(struct kref *kref);

-
  static inline struct i915_sched_engine *
  i915_sched_engine_get(struct i915_sched_engine *sched_engine)
  {
@@ -63,7 +61,7 @@ i915_sched_engine_get(struct i915_sched_engine *sched_engine)
  static inline void
  i915_sched_engine_put(struct i915_sched_engine *sched_engine)
  {
-   kref_put(&sched_engine->ref, i915_sched_engine_free);
+   kref_put(&sched_engine->ref, sched_engine->destroy);
  }
  
  static inline bool

diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h 
b/drivers/gpu/drm/i915/i915_scheduler_types.h
index 5935c3152bdc..00384e2c5273 100644
--- a/drivers/gpu/drm/i915/i915_scheduler_types.h
+++ b/drivers/gpu/drm/i915/i915_scheduler_types.h
@@ -163,6 +163,11 @@ struct i915_sched_engine {
 */
void *private_data;
  
+	/**

+* @destroy: destroy schedule engine / cleanup in backend
+*/
+   void(*destroy)(struct kref *kref);
+
/**
 * @kick_backend: kick backend after a request's priority has changed
 */


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 25/51] drm/i915: Move active request tracking to a vfunc

2021-07-20 Thread John Harrison

On 7/16/2021 13:16, Matthew Brost wrote:

Move active request tracking to a backend vfunc rather than assuming all
backends want to do this in the maner. In the case execlists /

maner -> manner.
In the case *of* execlists

With those fixed...
Reviewed-by: John Harrison 



ring submission the tracking is on the physical engine while with GuC
submission it is on the context.

Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/intel_context.c   |  3 ++
  drivers/gpu/drm/i915/gt/intel_context_types.h |  7 
  drivers/gpu/drm/i915/gt/intel_engine_types.h  |  6 +++
  .../drm/i915/gt/intel_execlists_submission.c  | 40 ++
  .../gpu/drm/i915/gt/intel_ring_submission.c   | 22 ++
  drivers/gpu/drm/i915/gt/mock_engine.c | 30 ++
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 33 +++
  drivers/gpu/drm/i915/i915_request.c   | 41 ++-
  drivers/gpu/drm/i915/i915_request.h   |  2 +
  9 files changed, 147 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index 251ff7eea22d..bfb05d8697d1 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -393,6 +393,9 @@ intel_context_init(struct intel_context *ce, struct 
intel_engine_cs *engine)
spin_lock_init(&ce->guc_state.lock);
INIT_LIST_HEAD(&ce->guc_state.fences);
  
+	spin_lock_init(&ce->guc_active.lock);

+   INIT_LIST_HEAD(&ce->guc_active.requests);
+
ce->guc_id = GUC_INVALID_LRC_ID;
INIT_LIST_HEAD(&ce->guc_id_link);
  
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h

index 542c98418771..035108c10b2c 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -162,6 +162,13 @@ struct intel_context {
struct list_head fences;
} guc_state;
  
+	struct {

+   /** lock: protects everything in guc_active */
+   spinlock_t lock;
+   /** requests: active requests on this context */
+   struct list_head requests;
+   } guc_active;
+
/* GuC scheduling state flags that do not require a lock. */
atomic_t guc_sched_state_no_lock;
  
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h

index 03a81e8d87f4..950fc73ed6af 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -420,6 +420,12 @@ struct intel_engine_cs {
  
  	void		(*release)(struct intel_engine_cs *engine);
  
+	/*

+* Add / remove request from engine active tracking
+*/
+   void(*add_active_request)(struct i915_request *rq);
+   void(*remove_active_request)(struct i915_request *rq);
+
struct intel_engine_execlists execlists;
  
  	/*

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index abe48421fd7a..f9b5f54a5abe 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -3106,6 +3106,42 @@ static void execlists_park(struct intel_engine_cs 
*engine)
cancel_timer(&engine->execlists.preempt);
  }
  
+static void add_to_engine(struct i915_request *rq)

+{
+   lockdep_assert_held(&rq->engine->sched_engine->lock);
+   list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
+}
+
+static void remove_from_engine(struct i915_request *rq)
+{
+   struct intel_engine_cs *engine, *locked;
+
+   /*
+* Virtual engines complicate acquiring the engine timeline lock,
+* as their rq->engine pointer is not stable until under that
+* engine lock. The simple ploy we use is to take the lock then
+* check that the rq still belongs to the newly locked engine.
+*/
+   locked = READ_ONCE(rq->engine);
+   spin_lock_irq(&locked->sched_engine->lock);
+   while (unlikely(locked != (engine = READ_ONCE(rq->engine {
+   spin_unlock(&locked->sched_engine->lock);
+   spin_lock(&engine->sched_engine->lock);
+   locked = engine;
+   }
+   list_del_init(&rq->sched.link);
+
+   clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
+   clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags);
+
+   /* Prevent further __await_execution() registering a cb, then flush */
+   set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
+
+   spin_unlock_irq(&locked->sched_engine->lock);
+
+   i915_request_notify_execute_cb_imm(rq);
+}
+
  static bool can_preempt(struct intel_engine_cs *e

Re: [Intel-gfx] [PATCH 26/51] drm/i915/guc: Reset implementation for new GuC interface

2021-07-20 Thread John Harrison

On 7/16/2021 13:16, Matthew Brost wrote:

Reset implementation for new GuC interface. This is the legacy reset
implementation which is called when the i915 owns the engine hang check.
Future patches will offload the engine hang check to GuC but we will
continue to maintain this legacy path as a fallback and this code path
is also required if the GuC dies.

With the new GuC interface it is not possible to reset individual
engines - it is only possible to reset the GPU entirely. This patch
forces an entire chip reset if any engine hangs.

v2:
  (Michal)
   - Check for -EPIPE rather than -EIO (CT deadlock/corrupt check)
v3:
  (John H)
   - Split into a series of smaller patches
While the split happened, it doesn't look like any of the other comments 
were address. Repeated below for clarity. Also, Tvrtko has a bunch of 
outstanding comments too.




Cc: John Harrison 
Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/intel_gt_pm.c |   6 +-
  drivers/gpu/drm/i915/gt/intel_reset.c |  18 +-
  drivers/gpu/drm/i915/gt/uc/intel_guc.c|  13 -
  drivers/gpu/drm/i915/gt/uc/intel_guc.h|   8 +-
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 562 ++
  drivers/gpu/drm/i915/gt/uc/intel_uc.c |  39 +-
  drivers/gpu/drm/i915/gt/uc/intel_uc.h |   3 +
  7 files changed, 515 insertions(+), 134 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c 
b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
index aef3084e8b16..463a6ae605a0 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
@@ -174,8 +174,6 @@ static void gt_sanitize(struct intel_gt *gt, bool force)
if (intel_gt_is_wedged(gt))
intel_gt_unset_wedged(gt);
  
-	intel_uc_sanitize(>->uc);

-
for_each_engine(engine, gt, id)
if (engine->reset.prepare)
engine->reset.prepare(engine);
@@ -191,6 +189,8 @@ static void gt_sanitize(struct intel_gt *gt, bool force)
__intel_engine_reset(engine, false);
}
  
+	intel_uc_reset(>->uc, false);

+
for_each_engine(engine, gt, id)
if (engine->reset.finish)
engine->reset.finish(engine);
@@ -243,6 +243,8 @@ int intel_gt_resume(struct intel_gt *gt)
goto err_wedged;
}
  
+	intel_uc_reset_finish(>->uc);

+
intel_rps_enable(>->rps);
intel_llc_enable(>->llc);
  
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c

index 72251638d4ea..2987282dff6d 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -826,6 +826,8 @@ static int gt_reset(struct intel_gt *gt, 
intel_engine_mask_t stalled_mask)
__intel_engine_reset(engine, stalled_mask & engine->mask);
local_bh_enable();
  
+	intel_uc_reset(>->uc, true);

+
intel_ggtt_restore_fences(gt->ggtt);
  
  	return err;

@@ -850,6 +852,8 @@ static void reset_finish(struct intel_gt *gt, 
intel_engine_mask_t awake)
if (awake & engine->mask)
intel_engine_pm_put(engine);
}
+
+   intel_uc_reset_finish(>->uc);
  }
  
  static void nop_submit_request(struct i915_request *request)

@@ -903,6 +907,7 @@ static void __intel_gt_set_wedged(struct intel_gt *gt)
for_each_engine(engine, gt, id)
if (engine->reset.cancel)
engine->reset.cancel(engine);
+   intel_uc_cancel_requests(>->uc);
local_bh_enable();
  
  	reset_finish(gt, awake);

@@ -1191,6 +1196,9 @@ int __intel_engine_reset_bh(struct intel_engine_cs 
*engine, const char *msg)
ENGINE_TRACE(engine, "flags=%lx\n", gt->reset.flags);
GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, >->reset.flags));
  
+	if (intel_engine_uses_guc(engine))

+   return -ENODEV;
+
if (!intel_engine_pm_get_if_awake(engine))
return 0;
  
@@ -1201,13 +1209,10 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg)

   "Resetting %s for %s\n", engine->name, msg);

atomic_inc(&engine->i915->gpu_error.reset_engine_count[engine->uabi_class]);
  
-	if (intel_engine_uses_guc(engine))

-   ret = intel_guc_reset_engine(&engine->gt->uc.guc, engine);
-   else
-   ret = intel_gt_reset_engine(engine);
+   ret = intel_gt_reset_engine(engine);
if (ret) {
/* If we fail here, we expect to fallback to a global reset */
-   ENGINE_TRACE(engine, "Failed to reset, err: %d\n", ret);
+   ENGINE_TRACE(engine, "Failed to reset %s, err: %d\n", 
engine->name, ret);
goto out;
}
  
@@ -1341,7 +1346,8 @@ void in

Re: [Intel-gfx] [PATCH 30/51] drm/i915/guc: Handle context reset notification

2021-07-20 Thread John Harrison

On 7/16/2021 13:17, Matthew Brost wrote:

GuC will issue a reset on detecting an engine hang and will notify
the driver via a G2H message. The driver will service the notification
by resetting the guilty context to a simple state or banning it
completely.

v2:
  (John Harrison)
   - Move msg[0] lookup after length check

Cc: Matthew Brost 
Cc: John Harrison 
Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/uc/intel_guc.h|  2 ++
  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |  3 ++
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 36 +++
  drivers/gpu/drm/i915/i915_trace.h | 10 ++
  4 files changed, 51 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index b3cfc52fe0bc..f23a3a618550 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -262,6 +262,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc 
*guc,
  const u32 *msg, u32 len);
  int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 const u32 *msg, u32 len);
+int intel_guc_context_reset_process_msg(struct intel_guc *guc,
+   const u32 *msg, u32 len);
  
  void intel_guc_submission_reset_prepare(struct intel_guc *guc);

  void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 503a78517610..c4f9b44b9f86 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -981,6 +981,9 @@ static int ct_process_request(struct intel_guc_ct *ct, 
struct ct_incoming_msg *r
case INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE:
ret = intel_guc_sched_done_process_msg(guc, payload, len);
break;
+   case INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION:
+   ret = intel_guc_context_reset_process_msg(guc, payload, len);
+   break;
default:
ret = -EOPNOTSUPP;
break;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index fdb17279095c..feaf1ca61eaa 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2196,6 +2196,42 @@ int intel_guc_sched_done_process_msg(struct intel_guc 
*guc,
return 0;
  }
  
+static void guc_context_replay(struct intel_context *ce)

+{
+   struct i915_sched_engine *sched_engine = ce->engine->sched_engine;
+
+   __guc_reset_context(ce, true);
+   tasklet_hi_schedule(&sched_engine->tasklet);
+}
+
+static void guc_handle_context_reset(struct intel_guc *guc,
+struct intel_context *ce)
+{
+   trace_intel_context_reset(ce);
+   guc_context_replay(ce);
+}
+
+int intel_guc_context_reset_process_msg(struct intel_guc *guc,
+   const u32 *msg, u32 len)
+{
+   struct intel_context *ce;
+   int desc_idx;
+
+   if (unlikely(len != 1)) {
+   drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);

I think we decided that these should be drm_err rather than drm_dbg?

With that updated:
Reviewed-by: John Harrison 


+   return -EPROTO;
+   }
+
+   desc_idx = msg[0];
+   ce = g2h_context_lookup(guc, desc_idx);
+   if (unlikely(!ce))
+   return -EPROTO;
+
+   guc_handle_context_reset(guc, ce);
+
+   return 0;
+}
+
  void intel_guc_submission_print_info(struct intel_guc *guc,
 struct drm_printer *p)
  {
diff --git a/drivers/gpu/drm/i915/i915_trace.h 
b/drivers/gpu/drm/i915/i915_trace.h
index 97c2e83984ed..c095c4d39456 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -929,6 +929,11 @@ DECLARE_EVENT_CLASS(intel_context,
  __entry->guc_sched_state_no_lock)
  );
  
+DEFINE_EVENT(intel_context, intel_context_reset,

+TP_PROTO(struct intel_context *ce),
+TP_ARGS(ce)
+);
+
  DEFINE_EVENT(intel_context, intel_context_register,
 TP_PROTO(struct intel_context *ce),
 TP_ARGS(ce)
@@ -1026,6 +1031,11 @@ trace_i915_request_out(struct i915_request *rq)
  {
  }
  
+static inline void

+trace_intel_context_reset(struct intel_context *ce)
+{
+}
+
  static inline void
  trace_intel_context_register(struct intel_context *ce)
  {


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 42/51] drm/i915/guc: Implement banned contexts for GuC submission

2021-07-20 Thread John Harrison

On 7/16/2021 13:17, Matthew Brost wrote:

When using GuC submission, if a context gets banned disable scheduling
and mark all inflight requests as complete.

Cc: John Harrison 
Signed-off-by: Matthew Brost 

Reviewed-by: John Harrison 


---
  drivers/gpu/drm/i915/gem/i915_gem_context.c   |   2 +-
  drivers/gpu/drm/i915/gt/intel_context.h   |  13 ++
  drivers/gpu/drm/i915/gt/intel_context_types.h |   2 +
  drivers/gpu/drm/i915/gt/intel_reset.c |  32 +---
  .../gpu/drm/i915/gt/intel_ring_submission.c   |  20 +++
  drivers/gpu/drm/i915/gt/uc/intel_guc.h|   2 +
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 151 --
  drivers/gpu/drm/i915/i915_trace.h |  10 ++
  8 files changed, 195 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 28c62f7ccfc7..d87a4c6da5bc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1084,7 +1084,7 @@ static void kill_engines(struct i915_gem_engines 
*engines, bool ban)
for_each_gem_engine(ce, engines, it) {
struct intel_engine_cs *engine;
  
-		if (ban && intel_context_set_banned(ce))

+   if (ban && intel_context_ban(ce, NULL))
continue;
  
  		/*

diff --git a/drivers/gpu/drm/i915/gt/intel_context.h 
b/drivers/gpu/drm/i915/gt/intel_context.h
index 2ed9bf5f91a5..814d9277096a 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -16,6 +16,7 @@
  #include "intel_engine_types.h"
  #include "intel_ring_types.h"
  #include "intel_timeline_types.h"
+#include "i915_trace.h"
  
  #define CE_TRACE(ce, fmt, ...) do {	\

const struct intel_context *ce__ = (ce);\
@@ -243,6 +244,18 @@ static inline bool intel_context_set_banned(struct 
intel_context *ce)
return test_and_set_bit(CONTEXT_BANNED, &ce->flags);
  }
  
+static inline bool intel_context_ban(struct intel_context *ce,

+struct i915_request *rq)
+{
+   bool ret = intel_context_set_banned(ce);
+
+   trace_intel_context_ban(ce);
+   if (ce->ops->ban)
+   ce->ops->ban(ce, rq);
+
+   return ret;
+}
+
  static inline bool
  intel_context_force_single_submission(const struct intel_context *ce)
  {
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 035108c10b2c..57c19ee3e313 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -35,6 +35,8 @@ struct intel_context_ops {
  
  	int (*alloc)(struct intel_context *ce);
  
+	void (*ban)(struct intel_context *ce, struct i915_request *rq);

+
int (*pre_pin)(struct intel_context *ce, struct i915_gem_ww_ctx *ww, 
void **vaddr);
int (*pin)(struct intel_context *ce, void *vaddr);
void (*unpin)(struct intel_context *ce);
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c 
b/drivers/gpu/drm/i915/gt/intel_reset.c
index f3cdbf4ba5c8..3ed694cab5af 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -22,7 +22,6 @@
  #include "intel_reset.h"
  
  #include "uc/intel_guc.h"

-#include "uc/intel_guc_submission.h"
  
  #define RESET_MAX_RETRIES 3
  
@@ -39,21 +38,6 @@ static void rmw_clear_fw(struct intel_uncore *uncore, i915_reg_t reg, u32 clr)

intel_uncore_rmw_fw(uncore, reg, clr, 0);
  }
  
-static void skip_context(struct i915_request *rq)

-{
-   struct intel_context *hung_ctx = rq->context;
-
-   list_for_each_entry_from_rcu(rq, &hung_ctx->timeline->requests, link) {
-   if (!i915_request_is_active(rq))
-   return;
-
-   if (rq->context == hung_ctx) {
-   i915_request_set_error_once(rq, -EIO);
-   __i915_request_skip(rq);
-   }
-   }
-}
-
  static void client_mark_guilty(struct i915_gem_context *ctx, bool banned)
  {
struct drm_i915_file_private *file_priv = ctx->file_priv;
@@ -88,10 +72,8 @@ static bool mark_guilty(struct i915_request *rq)
bool banned;
int i;
  
-	if (intel_context_is_closed(rq->context)) {

-   intel_context_set_banned(rq->context);
+   if (intel_context_is_closed(rq->context))
return true;
-   }
  
  	rcu_read_lock();

ctx = rcu_dereference(rq->context->gem_context);
@@ -123,11 +105,9 @@ static bool mark_guilty(struct i915_request *rq)
banned = !i915_gem_context_is_recoverable(ctx);
if (time_before(jiffies, prev_hang + CONTEXT_FAST_HANG_JIFFIES))
banned = true;
-   if (banned) {
+   if (banned)
  

Re: [Intel-gfx] [PATCH 47/51] drm/i915/selftest: Increase some timeouts in live_requests

2021-07-20 Thread John Harrison

On 7/16/2021 13:17, Matthew Brost wrote:

Requests may take slightly longer with GuC submission, let's increase
the timeouts in live_requests.

Signed-off-by: Matthew Brost 

Reviewed-by: John Harrison 


---
  drivers/gpu/drm/i915/selftests/i915_request.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c 
b/drivers/gpu/drm/i915/selftests/i915_request.c
index bd5c96a77ba3..d67710d10615 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -1313,7 +1313,7 @@ static int __live_parallel_engine1(void *arg)
i915_request_add(rq);
  
  		err = 0;

-   if (i915_request_wait(rq, 0, HZ / 5) < 0)
+   if (i915_request_wait(rq, 0, HZ) < 0)
err = -ETIME;
i915_request_put(rq);
if (err)
@@ -1419,7 +1419,7 @@ static int __live_parallel_spin(void *arg)
}
igt_spinner_end(&spin);
  
-	if (err == 0 && i915_request_wait(rq, 0, HZ / 5) < 0)

+   if (err == 0 && i915_request_wait(rq, 0, HZ) < 0)
err = -EIO;
i915_request_put(rq);
  


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 06/50] drm/i915/xehp: Extra media engines - Part 1 (engine definitions)

2021-07-20 Thread John Harrison

On 7/20/2021 16:03, Lucas De Marchi wrote:

On Tue, Jul 13, 2021 at 08:14:56PM -0700, Matt Roper wrote:

From: John Harrison 

Xe_HP can have a lot of extra media engines. This patch adds the basic
definitions for them.

v2:
- Re-order intel_gt_info and intel_device_info slightly to avoid
  unnecessary padding now that we've increased the size of
  intel_engine_mask_t.  (Tvrtko)

Cc: Tvrtko Ursulin 
Signed-off-by: John Harrison 
Signed-off-by: Tomas Winkler 
Signed-off-by: Matt Roper 
---
drivers/gpu/drm/i915/gt/gen8_engine_cs.c |  7 ++-
drivers/gpu/drm/i915/gt/intel_engine_cs.c    | 50 
drivers/gpu/drm/i915/gt/intel_engine_types.h | 14 --
drivers/gpu/drm/i915/gt/intel_gt_types.h |  5 +-
drivers/gpu/drm/i915/i915_reg.h  |  6 +++
drivers/gpu/drm/i915/intel_device_info.h |  3 +-
6 files changed, 74 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c 
b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c

index 87b06572fd2e..35edc55720f4 100644
--- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
@@ -279,7 +279,7 @@ int gen12_emit_flush_xcs(struct i915_request *rq, 
u32 mode)

if (mode & EMIT_INVALIDATE)
    aux_inv = rq->engine->mask & ~BIT(BCS0);
if (aux_inv)
-    cmd += 2 * hweight8(aux_inv) + 2;
+    cmd += 2 * hweight32(aux_inv) + 2;

cs = intel_ring_begin(rq, cmd);
if (IS_ERR(cs))
@@ -313,9 +313,8 @@ int gen12_emit_flush_xcs(struct i915_request *rq, 
u32 mode)

    struct intel_engine_cs *engine;
    unsigned int tmp;

-    *cs++ = MI_LOAD_REGISTER_IMM(hweight8(aux_inv));
-    for_each_engine_masked(engine, rq->engine->gt,
-   aux_inv, tmp) {
+    *cs++ = MI_LOAD_REGISTER_IMM(hweight32(aux_inv));
+    for_each_engine_masked(engine, rq->engine->gt, aux_inv, tmp) {
    *cs++ = i915_mmio_reg_offset(aux_inv_reg(engine));
    *cs++ = AUX_INV;
    }
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c

index 3f8013612a08..6c2cb1400c8c 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -104,6 +104,38 @@ static const struct engine_info intel_engines[] = {
    { .graphics_ver = 11, .base = GEN11_BSD4_RING_BASE }
    },
},
+    [VCS4] = {
+    .hw_id = 0, /* not used in GEN12+, see MI_SEMAPHORE_SIGNAL */


I may be misreading this, but hw_id is only used by
RING_FAULT_REG() which is not actually used since
gen8... they are using GEN8_RING_FAULT_REG().

I'm having a hard time to understand what this comment "see
MI_SEMAPHORE_SIGNAL" actually means.
I vaguely recall something about being told the hw_id field was used in 
semaphore messages from one engine to another. I.e. if engine X is 
blocked on a semaphore that is signalled by engine Y then the MI_ 
instruction executed on Y to do the signal needs to specify X as the 
target. Whereas, on newer hardware this requirement was no longer 
applicable because MI_SEMAPHORE_SIGNAL uses memory mailboxes instead of 
directed engine messages. Maybe that information was wrong or maybe that 
code has since been removed or reworked?






I'd just remove all these `.hw_id = 0, ...` together with the comment
since it will be zero-initiliazed.
Yeah, the reason for explicitly setting it to zero was to avoid 
confusion over whether it had just been forgotten or not. I.e. to say 
'we know semaphores used to use this field but honest guv, we didn't 
forget to add it, it's just that newer hardware doesn't need it'.


John.




Lucas De Marchi



+    .class = VIDEO_DECODE_CLASS,
+    .instance = 4,
+    .mmio_bases = {
+    { .graphics_ver = 11, .base = XEHP_BSD5_RING_BASE }
+    },
+    },
+    [VCS5] = {
+    .hw_id = 0, /* not used in GEN12+, see MI_SEMAPHORE_SIGNAL */
+    .class = VIDEO_DECODE_CLASS,
+    .instance = 5,
+    .mmio_bases = {
+    { .graphics_ver = 12, .base = XEHP_BSD6_RING_BASE }
+    },
+    },
+    [VCS6] = {
+    .hw_id = 0, /* not used in GEN12+, see MI_SEMAPHORE_SIGNAL */
+    .class = VIDEO_DECODE_CLASS,
+    .instance = 6,
+    .mmio_bases = {
+    { .graphics_ver = 12, .base = XEHP_BSD7_RING_BASE }
+    },
+    },
+    [VCS7] = {
+    .hw_id = 0, /* not used in GEN12+, see MI_SEMAPHORE_SIGNAL */
+    .class = VIDEO_DECODE_CLASS,
+    .instance = 7,
+    .mmio_bases = {
+    { .graphics_ver = 12, .base = XEHP_BSD8_RING_BASE }
+    },
+    },
[VECS0] = {
    .hw_id = VECS0_HW,
    .class = VIDEO_ENHANCEMENT_CLASS,
@@ -121,6 +153,22 @@ static const struct engine_info intel_engines[] = {
    { .graphics_ver = 11, .base = GEN11_VEBOX2_RING_BASE }
    },
},
+    [VECS2] = {
+    .hw_id = 0, /* not used in GEN12+, see MI_SEMAP

Re: [Intel-gfx] [PATCH 04/18] drm/i915/guc: Implement GuC submission tasklet

2021-07-20 Thread John Harrison

On 7/20/2021 15:39, Matthew Brost wrote:

Implement GuC submission tasklet for new interface. The new GuC
interface uses H2G to submit contexts to the GuC. Since H2G use a single
channel, a single tasklet is used for the submission path.

Also the per engine interrupt handler has been updated to disable the
rescheduling of the physical engine tasklet, when using GuC scheduling,
as the physical engine tasklet is no longer used.

In this patch the field, guc_id, has been added to intel_context and is
not assigned. Patches later in the series will assign this value.

v2:
  (John Harrison)
   - Clean up some comments
v3:
  (John Harrison)
   - More comment cleanups

Cc: John Harrison 
Signed-off-by: Matthew Brost 

Reviewed-by: John Harrison 


---
  drivers/gpu/drm/i915/gt/intel_context_types.h |   9 +
  drivers/gpu/drm/i915/gt/uc/intel_guc.h|   4 +
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 231 +-
  3 files changed, 127 insertions(+), 117 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 90026c177105..6d99631d19b9 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -137,6 +137,15 @@ struct intel_context {
struct intel_sseu sseu;
  
  	u8 wa_bb_page; /* if set, page num reserved for context workarounds */

+
+   /* GuC scheduling state flags that do not require a lock. */
+   atomic_t guc_sched_state_no_lock;
+
+   /*
+* GuC LRC descriptor ID - Not assigned in this patch but future patches
+* in the series will.
+*/
+   u16 guc_id;
  };
  
  #endif /* __INTEL_CONTEXT_TYPES__ */

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 35783558d261..8c7b92f699f1 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -30,6 +30,10 @@ struct intel_guc {
struct intel_guc_log log;
struct intel_guc_ct ct;
  
+	/* Global engine used to submit requests to GuC */

+   struct i915_sched_engine *sched_engine;
+   struct i915_request *stalled_request;
+
/* intel_guc_recv interrupt related state */
spinlock_t irq_lock;
unsigned int msg_enabled_mask;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 23a94a896a0b..ca0717166a27 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -60,6 +60,31 @@
  
  #define GUC_REQUEST_SIZE 64 /* bytes */
  
+/*

+ * Below is a set of functions which control the GuC scheduling state which do
+ * not require a lock as all state transitions are mutually exclusive. i.e. It
+ * is not possible for the context pinning code and submission, for the same
+ * context, to be executing simultaneously. We still need an atomic as it is
+ * possible for some of the bits to changing at the same time though.
+ */
+#define SCHED_STATE_NO_LOCK_ENABLEDBIT(0)
+static inline bool context_enabled(struct intel_context *ce)
+{
+   return (atomic_read(&ce->guc_sched_state_no_lock) &
+   SCHED_STATE_NO_LOCK_ENABLED);
+}
+
+static inline void set_context_enabled(struct intel_context *ce)
+{
+   atomic_or(SCHED_STATE_NO_LOCK_ENABLED, &ce->guc_sched_state_no_lock);
+}
+
+static inline void clr_context_enabled(struct intel_context *ce)
+{
+   atomic_and((u32)~SCHED_STATE_NO_LOCK_ENABLED,
+  &ce->guc_sched_state_no_lock);
+}
+
  static inline struct i915_priolist *to_priolist(struct rb_node *rb)
  {
return rb_entry(rb, struct i915_priolist, node);
@@ -122,37 +147,29 @@ static inline void set_lrc_desc_registered(struct 
intel_guc *guc, u32 id,
xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
  }
  
-static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)

+static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
  {
-   /* Leaving stub as this function will be used in future patches */
-}
+   int err;
+   struct intel_context *ce = rq->context;
+   u32 action[3];
+   int len = 0;
+   bool enabled = context_enabled(ce);
  
-/*

- * When we're doing submissions using regular execlists backend, writing to
- * ELSP from CPU side is enough to make sure that writes to ringbuffer pages
- * pinned in mappable aperture portion of GGTT are visible to command streamer.
- * Writes done by GuC on our behalf are not guaranteeing such ordering,
- * therefore, to ensure the flush, we're issuing a POSTING READ.
- */
-static void flush_ggtt_writes(struct i915_vma *vma)
-{
-   if (i915_vma_is_map_and_fenceable(vma))
-   intel_uncore_posting_read_fw(vma->vm->gt->uncore,
-GUC_STATUS);

Re: [Intel-gfx] [PATCH 06/18] drm/i915/guc: Implement GuC context operations for new inteface

2021-07-20 Thread John Harrison

On 7/20/2021 15:39, Matthew Brost wrote:

Implement GuC context operations which includes GuC specific operations
alloc, pin, unpin, and destroy.

v2:
  (Daniel Vetter)
   - Use msleep_interruptible rather than cond_resched in busy loop
  (Michal)
   - Remove C++ style comment
v3:
  (Matthew Brost)
   - Drop GUC_ID_START
  (John Harrison)
   - Fix a bunch of typos
   - Use drm_err rather than drm_dbg for G2H errors
  (Daniele)
   - Fix ;; typo
   - Clean up sched state functions
   - Add lockdep for guc_id functions
   - Don't call __release_guc_id when guc_id is invalid
   - Use MISSING_CASE
   - Add comment in guc_context_pin
   - Use shorter path to rpm
  (Daniele / CI)
   - Don't call release_guc_id on an invalid guc_id in destroy

Signed-off-by: John Harrison 
Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/intel_context.c   |   5 +
  drivers/gpu/drm/i915/gt/intel_context_types.h |  22 +-
  drivers/gpu/drm/i915/gt/intel_lrc_reg.h   |   1 -
  drivers/gpu/drm/i915/gt/uc/intel_guc.h|  40 ++
  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |   4 +
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 667 --
  drivers/gpu/drm/i915/i915_reg.h   |   1 +
  drivers/gpu/drm/i915/i915_request.c   |   1 +
  8 files changed, 686 insertions(+), 55 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index bd63813c8a80..32fd6647154b 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -384,6 +384,11 @@ intel_context_init(struct intel_context *ce, struct 
intel_engine_cs *engine)
  
  	mutex_init(&ce->pin_mutex);
  
+	spin_lock_init(&ce->guc_state.lock);

+
+   ce->guc_id = GUC_INVALID_LRC_ID;
+   INIT_LIST_HEAD(&ce->guc_id_link);
+
i915_active_init(&ce->active,
 __intel_context_active, __intel_context_retire, 0);
  }
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 6d99631d19b9..606c480aec26 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -96,6 +96,7 @@ struct intel_context {
  #define CONTEXT_BANNED6
  #define CONTEXT_FORCE_SINGLE_SUBMISSION   7
  #define CONTEXT_NOPREEMPT 8
+#define CONTEXT_LRCA_DIRTY 9
  
  	struct {

u64 timeout_us;
@@ -138,14 +139,29 @@ struct intel_context {
  
  	u8 wa_bb_page; /* if set, page num reserved for context workarounds */
  
+	struct {

+   /** lock: protects everything in guc_state */
+   spinlock_t lock;
+   /**
+* sched_state: scheduling state of this context using GuC
+* submission
+*/
+   u8 sched_state;
+   } guc_state;
+
/* GuC scheduling state flags that do not require a lock. */
atomic_t guc_sched_state_no_lock;
  
+	/* GuC LRC descriptor ID */

+   u16 guc_id;
+
+   /* GuC LRC descriptor reference count */
+   atomic_t guc_id_ref;
+
/*
-* GuC LRC descriptor ID - Not assigned in this patch but future patches
-* in the series will.
+* GuC ID link - in list when unpinned but guc_id still valid in GuC
 */
-   u16 guc_id;
+   struct list_head guc_id_link;
  };
  
  #endif /* __INTEL_CONTEXT_TYPES__ */

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h 
b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
index 41e5350a7a05..49d4857ad9b7 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
+++ b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
@@ -87,7 +87,6 @@
  #define GEN11_CSB_WRITE_PTR_MASK  (GEN11_CSB_PTR_MASK << 0)
  
  #define MAX_CONTEXT_HW_ID	(1 << 21) /* exclusive */

-#define MAX_GUC_CONTEXT_HW_ID  (1 << 20) /* exclusive */
  #define GEN11_MAX_CONTEXT_HW_ID   (1 << 11) /* exclusive */
  /* in Gen12 ID 0x7FF is reserved to indicate idle */
  #define GEN12_MAX_CONTEXT_HW_ID   (GEN11_MAX_CONTEXT_HW_ID - 1)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 8c7b92f699f1..30773cd699f5 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -7,6 +7,7 @@
  #define _INTEL_GUC_H_
  
  #include 

+#include 
  
  #include "intel_uncore.h"

  #include "intel_guc_fw.h"
@@ -44,6 +45,14 @@ struct intel_guc {
void (*disable)(struct intel_guc *guc);
} interrupts;
  
+	/*

+* contexts_lock protects the pool of free guc ids and a linked list of
+* guc ids available to be stolen
+*/
+   spinlock_t contexts_lock;
+   struct ida guc_ids;
+   struct list_head guc_id_list;
+
bool submission_selected;
  
  	struct i915_vma *ads_vma;

@@ -101,6 +11

Re: [Intel-gfx] [PATCH 08/33] drm/i915/guc: Reset implementation for new GuC interface

2021-07-26 Thread John Harrison

On 7/22/2021 16:54, Matthew Brost wrote:

Reset implementation for new GuC interface. This is the legacy reset
implementation which is called when the i915 owns the engine hang check.
Future patches will offload the engine hang check to GuC but we will
continue to maintain this legacy path as a fallback and this code path
is also required if the GuC dies.

With the new GuC interface it is not possible to reset individual
engines - it is only possible to reset the GPU entirely. This patch
forces an entire chip reset if any engine hangs.

v2:
  (Michal)
   - Check for -EPIPE rather than -EIO (CT deadlock/corrupt check)
v3:
  (John H)
   - Split into a series of smaller patches
v4:
  (John H)
   - Fix typo
   - Add braces around if statements in reset code

Cc: John Harrison 
Signed-off-by: Matthew Brost 

Reviewed-by: John Harrison 


---
  drivers/gpu/drm/i915/gt/intel_gt_pm.c |   6 +-
  drivers/gpu/drm/i915/gt/intel_reset.c |  18 +-
  drivers/gpu/drm/i915/gt/uc/intel_guc.c|  13 -
  drivers/gpu/drm/i915/gt/uc/intel_guc.h|   8 +-
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 561 ++
  drivers/gpu/drm/i915/gt/uc/intel_uc.c |  39 +-
  drivers/gpu/drm/i915/gt/uc/intel_uc.h |   3 +
  7 files changed, 516 insertions(+), 132 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c 
b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
index d86825437516..cd7b96005d29 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
@@ -170,8 +170,6 @@ static void gt_sanitize(struct intel_gt *gt, bool force)
if (intel_gt_is_wedged(gt))
intel_gt_unset_wedged(gt);
  
-	intel_uc_sanitize(>->uc);

-
for_each_engine(engine, gt, id)
if (engine->reset.prepare)
engine->reset.prepare(engine);
@@ -187,6 +185,8 @@ static void gt_sanitize(struct intel_gt *gt, bool force)
__intel_engine_reset(engine, false);
}
  
+	intel_uc_reset(>->uc, false);

+
for_each_engine(engine, gt, id)
if (engine->reset.finish)
engine->reset.finish(engine);
@@ -239,6 +239,8 @@ int intel_gt_resume(struct intel_gt *gt)
goto err_wedged;
}
  
+	intel_uc_reset_finish(>->uc);

+
intel_rps_enable(>->rps);
intel_llc_enable(>->llc);
  
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c

index 72251638d4ea..2987282dff6d 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -826,6 +826,8 @@ static int gt_reset(struct intel_gt *gt, 
intel_engine_mask_t stalled_mask)
__intel_engine_reset(engine, stalled_mask & engine->mask);
local_bh_enable();
  
+	intel_uc_reset(>->uc, true);

+
intel_ggtt_restore_fences(gt->ggtt);
  
  	return err;

@@ -850,6 +852,8 @@ static void reset_finish(struct intel_gt *gt, 
intel_engine_mask_t awake)
if (awake & engine->mask)
intel_engine_pm_put(engine);
}
+
+   intel_uc_reset_finish(>->uc);
  }
  
  static void nop_submit_request(struct i915_request *request)

@@ -903,6 +907,7 @@ static void __intel_gt_set_wedged(struct intel_gt *gt)
for_each_engine(engine, gt, id)
if (engine->reset.cancel)
engine->reset.cancel(engine);
+   intel_uc_cancel_requests(>->uc);
local_bh_enable();
  
  	reset_finish(gt, awake);

@@ -1191,6 +1196,9 @@ int __intel_engine_reset_bh(struct intel_engine_cs 
*engine, const char *msg)
ENGINE_TRACE(engine, "flags=%lx\n", gt->reset.flags);
GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, >->reset.flags));
  
+	if (intel_engine_uses_guc(engine))

+   return -ENODEV;
+
if (!intel_engine_pm_get_if_awake(engine))
return 0;
  
@@ -1201,13 +1209,10 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg)

   "Resetting %s for %s\n", engine->name, msg);

atomic_inc(&engine->i915->gpu_error.reset_engine_count[engine->uabi_class]);
  
-	if (intel_engine_uses_guc(engine))

-   ret = intel_guc_reset_engine(&engine->gt->uc.guc, engine);
-   else
-   ret = intel_gt_reset_engine(engine);
+   ret = intel_gt_reset_engine(engine);
if (ret) {
/* If we fail here, we expect to fallback to a global reset */
-   ENGINE_TRACE(engine, "Failed to reset, err: %d\n", ret);
+   ENGINE_TRACE(engine, "Failed to reset %s, err: %d\n", 
engine->name, ret);
goto out;
}
  
@@ -1341,7 +1346,8 @@ void intel_gt_handle_error(struct intel_gt *gt,

 * Try engine reset when

  1   2   3   4   5   6   7   8   9   10   >