Re: [PATCH 04/51] drm/i915/guc: Implement GuC submission tasklet
On 7/16/2021 13:16, Matthew Brost wrote: Implement GuC submission tasklet for new interface. The new GuC interface uses H2G to submit contexts to the GuC. Since H2G use a single channel, a single tasklet submits is used for the submission path. This still needs fixing - 'a single tasklet submits is used' is not valid English. It also seems that the idea of splitting all the deletes of old code into a separate patch didn't happen. It really does obfuscate things significantly having completely unrelated deletes and adds interspersed :(. John. Also the per engine interrupt handler has been updated to disable the rescheduling of the physical engine tasklet, when using GuC scheduling, as the physical engine tasklet is no longer used. In this patch the field, guc_id, has been added to intel_context and is not assigned. Patches later in the series will assign this value. v2: (John Harrison) - Clean up some comments Cc: John Harrison Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/intel_context_types.h | 9 + drivers/gpu/drm/i915/gt/uc/intel_guc.h| 4 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 231 +- 3 files changed, 127 insertions(+), 117 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index 90026c177105..6d99631d19b9 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -137,6 +137,15 @@ struct intel_context { struct intel_sseu sseu; u8 wa_bb_page; /* if set, page num reserved for context workarounds */ + + /* GuC scheduling state flags that do not require a lock. */ + atomic_t guc_sched_state_no_lock; + + /* +* GuC LRC descriptor ID - Not assigned in this patch but future patches +* in the series will. +*/ + u16 guc_id; }; #endif /* __INTEL_CONTEXT_TYPES__ */ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 35783558d261..8c7b92f699f1 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -30,6 +30,10 @@ struct intel_guc { struct intel_guc_log log; struct intel_guc_ct ct; + /* Global engine used to submit requests to GuC */ + struct i915_sched_engine *sched_engine; + struct i915_request *stalled_request; + /* intel_guc_recv interrupt related state */ spinlock_t irq_lock; unsigned int msg_enabled_mask; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 23a94a896a0b..ca0717166a27 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -60,6 +60,31 @@ #define GUC_REQUEST_SIZE 64 /* bytes */ +/* + * Below is a set of functions which control the GuC scheduling state which do + * not require a lock as all state transitions are mutually exclusive. i.e. It + * is not possible for the context pinning code and submission, for the same + * context, to be executing simultaneously. We still need an atomic as it is + * possible for some of the bits to changing at the same time though. + */ +#define SCHED_STATE_NO_LOCK_ENABLEDBIT(0) +static inline bool context_enabled(struct intel_context *ce) +{ + return (atomic_read(&ce->guc_sched_state_no_lock) & + SCHED_STATE_NO_LOCK_ENABLED); +} + +static inline void set_context_enabled(struct intel_context *ce) +{ + atomic_or(SCHED_STATE_NO_LOCK_ENABLED, &ce->guc_sched_state_no_lock); +} + +static inline void clr_context_enabled(struct intel_context *ce) +{ + atomic_and((u32)~SCHED_STATE_NO_LOCK_ENABLED, + &ce->guc_sched_state_no_lock); +} + static inline struct i915_priolist *to_priolist(struct rb_node *rb) { return rb_entry(rb, struct i915_priolist, node); @@ -122,37 +147,29 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id, xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC); } -static void guc_add_request(struct intel_guc *guc, struct i915_request *rq) +static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) { - /* Leaving stub as this function will be used in future patches */ -} + int err; + struct intel_context *ce = rq->context; + u32 action[3]; + int len = 0; + bool enabled = context_enabled(ce); -/* - * When we're doing submissions using regular execlists backend, writing to - * ELSP from CPU side is enough to make sure that writes to ringbuffer pages - * pinned in mappable aperture portion of GGTT are visible to command streamer. - * Writes done by GuC on our behalf are not guaranteeing such ordering, - * therefore, to ensure the flush, we're issuing a POSTING RE
Re: [PATCH 06/51] drm/i915/guc: Implement GuC context operations for new inteface
On 7/16/2021 13:16, Matthew Brost wrote: Implement GuC context operations which includes GuC specific operations alloc, pin, unpin, and destroy. v2: (Daniel Vetter) - Use msleep_interruptible rather than cond_resched in busy loop (Michal) - Remove C++ style comment Signed-off-by: John Harrison Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/intel_context.c | 5 + drivers/gpu/drm/i915/gt/intel_context_types.h | 22 +- drivers/gpu/drm/i915/gt/intel_lrc_reg.h | 1 - drivers/gpu/drm/i915/gt/uc/intel_guc.h| 40 ++ drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 4 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 666 -- drivers/gpu/drm/i915/i915_reg.h | 1 + drivers/gpu/drm/i915/i915_request.c | 1 + 8 files changed, 685 insertions(+), 55 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index bd63813c8a80..32fd6647154b 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -384,6 +384,11 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine) mutex_init(&ce->pin_mutex); + spin_lock_init(&ce->guc_state.lock); + + ce->guc_id = GUC_INVALID_LRC_ID; + INIT_LIST_HEAD(&ce->guc_id_link); + i915_active_init(&ce->active, __intel_context_active, __intel_context_retire, 0); } diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index 6d99631d19b9..606c480aec26 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -96,6 +96,7 @@ struct intel_context { #define CONTEXT_BANNED6 #define CONTEXT_FORCE_SINGLE_SUBMISSION 7 #define CONTEXT_NOPREEMPT 8 +#define CONTEXT_LRCA_DIRTY 9 struct { u64 timeout_us; @@ -138,14 +139,29 @@ struct intel_context { u8 wa_bb_page; /* if set, page num reserved for context workarounds */ + struct { + /** lock: protects everything in guc_state */ + spinlock_t lock; + /** +* sched_state: scheduling state of this context using GuC +* submission +*/ + u8 sched_state; + } guc_state; + /* GuC scheduling state flags that do not require a lock. */ atomic_t guc_sched_state_no_lock; + /* GuC LRC descriptor ID */ + u16 guc_id; + + /* GuC LRC descriptor reference count */ + atomic_t guc_id_ref; + /* -* GuC LRC descriptor ID - Not assigned in this patch but future patches -* in the series will. +* GuC ID link - in list when unpinned but guc_id still valid in GuC */ - u16 guc_id; + struct list_head guc_id_link; }; #endif /* __INTEL_CONTEXT_TYPES__ */ diff --git a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h index 41e5350a7a05..49d4857ad9b7 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h +++ b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h @@ -87,7 +87,6 @@ #define GEN11_CSB_WRITE_PTR_MASK (GEN11_CSB_PTR_MASK << 0) #define MAX_CONTEXT_HW_ID (1 << 21) /* exclusive */ -#define MAX_GUC_CONTEXT_HW_ID (1 << 20) /* exclusive */ #define GEN11_MAX_CONTEXT_HW_ID (1 << 11) /* exclusive */ /* in Gen12 ID 0x7FF is reserved to indicate idle */ #define GEN12_MAX_CONTEXT_HW_ID (GEN11_MAX_CONTEXT_HW_ID - 1) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 8c7b92f699f1..30773cd699f5 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -7,6 +7,7 @@ #define _INTEL_GUC_H_ #include +#include #include "intel_uncore.h" #include "intel_guc_fw.h" @@ -44,6 +45,14 @@ struct intel_guc { void (*disable)(struct intel_guc *guc); } interrupts; + /* +* contexts_lock protects the pool of free guc ids and a linked list of +* guc ids available to be stolen +*/ + spinlock_t contexts_lock; + struct ida guc_ids; + struct list_head guc_id_list; + bool submission_selected; struct i915_vma *ads_vma; @@ -101,6 +110,34 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len, response_buf, response_buf_size, 0); } +static inline int intel_guc_send_busy_loop(struct intel_guc* guc, + const u32 *action, + u32 len, + bool loop) +{ + int err; + unsigned int sleep_period_ms = 1; + bool not_atomic = !in_atomic() &a
Re: [PATCH 04/51] drm/i915/guc: Implement GuC submission tasklet
On 7/19/2021 15:55, Matthew Brost wrote: On Mon, Jul 19, 2021 at 04:01:56PM -0700, John Harrison wrote: On 7/16/2021 13:16, Matthew Brost wrote: Implement GuC submission tasklet for new interface. The new GuC interface uses H2G to submit contexts to the GuC. Since H2G use a single channel, a single tasklet submits is used for the submission path. This still needs fixing - 'a single tasklet submits is used' is not valid English. Will fix. It also seems that the idea of splitting all the deletes of old code into a separate patch didn't happen. It really does obfuscate things significantly having completely unrelated deletes and adds interspersed :(. I don't recall promising to do that. Matt "No promises but perhaps I'll do this in the next rev." Well, this is the next rev. So I am expressing my disappointment that it didn't happen. Reviewability of patches is important. John.
Re: [PATCH 13/51] drm/i915/guc: Disable semaphores when using GuC scheduling
On 7/16/2021 13:16, Matthew Brost wrote: Semaphores are an optimization and not required for basic GuC submission to work properly. Disable until we have time to do the implementation to enable semaphores and tune them for performance. Also long direction is just to delete semaphores from the i915 so another reason to not enable these for GuC submission. This patch fixes an existing bug where I915_ENGINE_HAS_SEMAPHORES was not honored correctly. Bugs plural. Otherwise: Reviewed-by: John Harrison v2: Reword commit message v3: (John H) - Add text to commit indicating this also fixing an existing bug Cc: John Harrison Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 7d6f52d8a801..64659802d4df 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -799,7 +799,8 @@ static int intel_context_set_gem(struct intel_context *ce, } if (ctx->sched.priority >= I915_PRIORITY_NORMAL && - intel_engine_has_timeslices(ce->engine)) + intel_engine_has_timeslices(ce->engine) && + intel_engine_has_semaphores(ce->engine)) __set_bit(CONTEXT_USE_SEMAPHORES, &ce->flags); if (IS_ACTIVE(CONFIG_DRM_I915_REQUEST_TIMEOUT) && @@ -1778,7 +1779,8 @@ static void __apply_priority(struct intel_context *ce, void *arg) if (!intel_engine_has_timeslices(ce->engine)) return; - if (ctx->sched.priority >= I915_PRIORITY_NORMAL) + if (ctx->sched.priority >= I915_PRIORITY_NORMAL && + intel_engine_has_semaphores(ce->engine)) intel_context_set_use_semaphores(ce); else intel_context_clear_use_semaphores(ce);
Re: [PATCH 15/51] drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC
On 7/16/2021 13:16, Matthew Brost wrote: When running the GuC the GPU can't be considered idle if the GuC still has contexts pinned. As such, a call has been added in intel_gt_wait_for_idle to idle the UC and in turn the GuC by waiting for the number of unpinned contexts to go to zero. v2: rtimeout -> remaining_timeout v3: Drop unnecessary includes, guc_submission_busy_loop -> guc_submission_send_busy_loop, drop negatie timeout trick, move a refactor of guc_context_unpin to earlier path (John H) Cc: John Harrison Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gem/i915_gem_mman.c | 3 +- drivers/gpu/drm/i915/gt/intel_gt.c| 19 + drivers/gpu/drm/i915/gt/intel_gt.h| 2 + drivers/gpu/drm/i915/gt/intel_gt_requests.c | 21 ++--- drivers/gpu/drm/i915/gt/intel_gt_requests.h | 7 +- drivers/gpu/drm/i915/gt/uc/intel_guc.h| 4 + drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 4 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 85 +-- drivers/gpu/drm/i915/gt/uc/intel_uc.h | 5 ++ drivers/gpu/drm/i915/i915_gem_evict.c | 1 + .../gpu/drm/i915/selftests/igt_live_test.c| 2 +- .../gpu/drm/i915/selftests/mock_gem_device.c | 3 +- 13 files changed, 129 insertions(+), 28 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c index a90f796e85c0..6fffd4d377c2 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c @@ -645,7 +645,8 @@ mmap_offset_attach(struct drm_i915_gem_object *obj, goto insert; /* Attempt to reap some mmap space from dead objects */ - err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT); + err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT, + NULL); if (err) goto err; diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index e714e21c0a4d..acfdd53b2678 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -585,6 +585,25 @@ static void __intel_gt_disable(struct intel_gt *gt) GEM_BUG_ON(intel_gt_pm_is_awake(gt)); } +int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout) +{ + long remaining_timeout; + + /* If the device is asleep, we have no requests outstanding */ + if (!intel_gt_pm_is_awake(gt)) + return 0; + + while ((timeout = intel_gt_retire_requests_timeout(gt, timeout, + &remaining_timeout)) > 0) { + cond_resched(); + if (signal_pending(current)) + return -EINTR; + } + + return timeout ? timeout : intel_uc_wait_for_idle(>->uc, + remaining_timeout); +} + int intel_gt_init(struct intel_gt *gt) { int err; diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h index e7aabe0cc5bf..74e771871a9b 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.h +++ b/drivers/gpu/drm/i915/gt/intel_gt.h @@ -48,6 +48,8 @@ void intel_gt_driver_release(struct intel_gt *gt); void intel_gt_driver_late_release(struct intel_gt *gt); +int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout); + void intel_gt_check_and_clear_faults(struct intel_gt *gt); void intel_gt_clear_error_registers(struct intel_gt *gt, intel_engine_mask_t engine_mask); diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c index 647eca9d867a..edb881d75630 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c @@ -130,7 +130,8 @@ void intel_engine_fini_retire(struct intel_engine_cs *engine) GEM_BUG_ON(engine->retire); } -long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout) +long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout, + long *remaining_timeout) { struct intel_gt_timelines *timelines = >->timelines; struct intel_timeline *tl, *tn; @@ -195,22 +196,10 @@ out_active: spin_lock(&timelines->lock); if (flush_submission(gt, timeout)) /* Wait, there's more! */ active_count++; - return active_count ? timeout : 0; -} - -int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout) -{ - /* If the device is asleep, we have no requests outstanding */ - if (!intel_gt_pm_is_awake(gt)) - return 0; - - while ((timeout = intel_gt_retire_requests_timeout(gt, timeout)) > 0) { - cond_resched(); - if (
Re: [PATCH 16/51] drm/i915/guc: Update GuC debugfs to support new GuC
On 7/16/2021 13:16, Matthew Brost wrote: Update GuC debugfs to support the new GuC structures. v2: (John Harrison) - Remove intel_lrc_reg.h include from i915_debugfs.c (Michal) - Rename GuC debugfs functions Signed-off-by: John Harrison Signed-off-by: Matthew Brost Reviewed-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 22 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 3 + .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c| 23 +++- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 55 +++ .../gpu/drm/i915/gt/uc/intel_guc_submission.h | 5 ++ 5 files changed, 107 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index f1cbed6b9f0a..503a78517610 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -1171,3 +1171,25 @@ void intel_guc_ct_event_handler(struct intel_guc_ct *ct) ct_try_receive_message(ct); } + +void intel_guc_ct_print_info(struct intel_guc_ct *ct, +struct drm_printer *p) +{ + drm_printf(p, "CT %s\n", enableddisabled(ct->enabled)); + + if (!ct->enabled) + return; + + drm_printf(p, "H2G Space: %u\n", + atomic_read(&ct->ctbs.send.space) * 4); + drm_printf(p, "Head: %u\n", + ct->ctbs.send.desc->head); + drm_printf(p, "Tail: %u\n", + ct->ctbs.send.desc->tail); + drm_printf(p, "G2H Space: %u\n", + atomic_read(&ct->ctbs.recv.space) * 4); + drm_printf(p, "Head: %u\n", + ct->ctbs.recv.desc->head); + drm_printf(p, "Tail: %u\n", + ct->ctbs.recv.desc->tail); +} diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index 4b30a562ae63..7b34026d264a 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -16,6 +16,7 @@ struct i915_vma; struct intel_guc; +struct drm_printer; /** * DOC: Command Transport (CT). @@ -112,4 +113,6 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, u32 *response_buf, u32 response_buf_size, u32 flags); void intel_guc_ct_event_handler(struct intel_guc_ct *ct); +void intel_guc_ct_print_info(struct intel_guc_ct *ct, struct drm_printer *p); + #endif /* _INTEL_GUC_CT_H_ */ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c index fe7cb7b29a1e..7a454c91a736 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c @@ -9,6 +9,8 @@ #include "intel_guc.h" #include "intel_guc_debugfs.h" #include "intel_guc_log_debugfs.h" +#include "gt/uc/intel_guc_ct.h" +#include "gt/uc/intel_guc_submission.h" static int guc_info_show(struct seq_file *m, void *data) { @@ -22,16 +24,35 @@ static int guc_info_show(struct seq_file *m, void *data) drm_puts(&p, "\n"); intel_guc_log_info(&guc->log, &p); - /* Add more as required ... */ + if (!intel_guc_submission_is_used(guc)) + return 0; + + intel_guc_ct_print_info(&guc->ct, &p); + intel_guc_submission_print_info(guc, &p); return 0; } DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_info); +static int guc_registered_contexts_show(struct seq_file *m, void *data) +{ + struct intel_guc *guc = m->private; + struct drm_printer p = drm_seq_file_printer(m); + + if (!intel_guc_submission_is_used(guc)) + return -ENODEV; + + intel_guc_submission_print_context_info(guc, &p); + + return 0; +} +DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_registered_contexts); + void intel_guc_debugfs_register(struct intel_guc *guc, struct dentry *root) { static const struct debugfs_gt_file files[] = { { "guc_info", &guc_info_fops, NULL }, + { "guc_registered_contexts", &guc_registered_contexts_fops, NULL }, }; if (!intel_guc_is_supported(guc)) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 088d11e2e497..a2af7e17dcc2 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1602,3 +1602,58 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc, return 0; } + +void intel_guc_submission_print_info(struct intel_guc *guc, +struct drm_printer *p) +{ + struct i915_sched_engine *sched_engine = guc->sched_engine; + struct rb_node *
Re: [PATCH 17/51] drm/i915/guc: Add several request trace points
On 7/16/2021 13:16, Matthew Brost wrote: Add trace points for request dependencies and GuC submit. Extended existing request trace points to include submit fence value,, guc_id, Still has misplaced commas. Also, Tvrtko has a bunch of comments/questions on the previous version that need to be addressed. John. and ring tail value. v2: Fix white space alignment in i915_request_add trace point Cc: John Harrison Signed-off-by: Matthew Brost Reviewed-by: John Harrison --- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 3 ++ drivers/gpu/drm/i915/i915_request.c | 3 ++ drivers/gpu/drm/i915/i915_trace.h | 43 +-- 3 files changed, 45 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index a2af7e17dcc2..480fb2184ecf 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -417,6 +417,7 @@ static int guc_dequeue_one_context(struct intel_guc *guc) guc->stalled_request = last; return false; } + trace_i915_request_guc_submit(last); } guc->stalled_request = NULL; @@ -637,6 +638,8 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc, ret = guc_add_request(guc, rq); if (ret == -EBUSY) guc->stalled_request = rq; + else + trace_i915_request_guc_submit(rq); return ret; } diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 2b2b63cba06c..01aa3d1ee2b1 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -1319,6 +1319,9 @@ __i915_request_await_execution(struct i915_request *to, return err; } + trace_i915_request_dep_to(to); + trace_i915_request_dep_from(from); + /* Couple the dependency tree for PI on this exposed to->fence */ if (to->engine->sched_engine->schedule) { err = i915_sched_node_add_dependency(&to->sched, diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h index 6778ad2a14a4..ea41d069bf7d 100644 --- a/drivers/gpu/drm/i915/i915_trace.h +++ b/drivers/gpu/drm/i915/i915_trace.h @@ -794,30 +794,50 @@ DECLARE_EVENT_CLASS(i915_request, TP_STRUCT__entry( __field(u32, dev) __field(u64, ctx) +__field(u32, guc_id) __field(u16, class) __field(u16, instance) __field(u32, seqno) +__field(u32, tail) ), TP_fast_assign( __entry->dev = rq->engine->i915->drm.primary->index; __entry->class = rq->engine->uabi_class; __entry->instance = rq->engine->uabi_instance; + __entry->guc_id = rq->context->guc_id; __entry->ctx = rq->fence.context; __entry->seqno = rq->fence.seqno; + __entry->tail = rq->tail; ), - TP_printk("dev=%u, engine=%u:%u, ctx=%llu, seqno=%u", + TP_printk("dev=%u, engine=%u:%u, guc_id=%u, ctx=%llu, seqno=%u, tail=%u", __entry->dev, __entry->class, __entry->instance, - __entry->ctx, __entry->seqno) + __entry->guc_id, __entry->ctx, __entry->seqno, + __entry->tail) ); DEFINE_EVENT(i915_request, i915_request_add, - TP_PROTO(struct i915_request *rq), - TP_ARGS(rq) +TP_PROTO(struct i915_request *rq), +TP_ARGS(rq) ); #if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS) +DEFINE_EVENT(i915_request, i915_request_dep_to, +TP_PROTO(struct i915_request *rq), +TP_ARGS(rq) +); + +DEFINE_EVENT(i915_request, i915_request_dep_from, +TP_PROTO(struct i915_request *rq), +TP_ARGS(rq) +); + +DEFINE_EVENT(i915_request, i915_request_guc_submit, +TP_PROTO(struct i915_request *rq), +TP_ARGS(rq) +); + DEFINE_EVENT(i915_request, i915_request_submit, TP_PROTO(struct i915_request *rq), TP_ARGS(rq) @@ -887,6 +907,21 @@ TRACE_EVENT(i915_request_out, #else #if !defined(TRACE_HEADER_MULTI_READ) +static inline void +trace_i915_request_dep_to(struct i915_request *rq) +{ +} + +static inline void +trace_i915_request_dep_from(struct i915_request *rq) +{ +} + +static inline void +trace_i915_request_guc_submit(struct i915_request *rq) +{ +} + static inline void trace_i915_request_submit(struct i915_request *rq) {
Re: [PATCH 20/51] drm/i915: Track 'serial' counts for virtual engines
On 7/16/2021 13:16, Matthew Brost wrote: From: John Harrison The serial number tracking of engines happens at the backend of request submission and was expecting to only be given physical engines. However, in GuC submission mode, the decomposition of virtual to physical engines does not happen in i915. Instead, requests are submitted to their virtual engine mask all the way through to the hardware (i.e. to GuC). This would mean that the heart beat code thinks the physical engines are idle due to the serial number not incrementing. This patch updates the tracking to decompose virtual engines into their physical constituents and tracks the request against each. This is not entirely accurate as the GuC will only be issuing the request to one physical engine. However, it is the best that i915 can do given that it has no knowledge of the GuC's scheduling decisions. Signed-off-by: John Harrison Signed-off-by: Matthew Brost Still needs to pull in Tvrtko's updated subject and description. John. --- drivers/gpu/drm/i915/gt/intel_engine_types.h | 2 ++ .../gpu/drm/i915/gt/intel_execlists_submission.c | 6 ++ drivers/gpu/drm/i915/gt/intel_ring_submission.c | 6 ++ drivers/gpu/drm/i915/gt/mock_engine.c| 6 ++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c| 16 drivers/gpu/drm/i915/i915_request.c | 4 +++- 6 files changed, 39 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index 1cb9c3b70b29..8ad304b2f2e4 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -388,6 +388,8 @@ struct intel_engine_cs { void(*park)(struct intel_engine_cs *engine); void(*unpark)(struct intel_engine_cs *engine); + void (*bump_serial)(struct intel_engine_cs *engine); + void(*set_default_submission)(struct intel_engine_cs *engine); const struct intel_context_ops *cops; diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 28492cdce706..920707e22eb0 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -3191,6 +3191,11 @@ static void execlists_release(struct intel_engine_cs *engine) lrc_fini_wa_ctx(engine); } +static void execlist_bump_serial(struct intel_engine_cs *engine) +{ + engine->serial++; +} + static void logical_ring_default_vfuncs(struct intel_engine_cs *engine) { @@ -3200,6 +3205,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine) engine->cops = &execlists_context_ops; engine->request_alloc = execlists_request_alloc; + engine->bump_serial = execlist_bump_serial; engine->reset.prepare = execlists_reset_prepare; engine->reset.rewind = execlists_reset_rewind; diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c index 5c4d204d07cc..61469c631057 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c @@ -1047,6 +1047,11 @@ static void setup_irq(struct intel_engine_cs *engine) } } +static void ring_bump_serial(struct intel_engine_cs *engine) +{ + engine->serial++; +} + static void setup_common(struct intel_engine_cs *engine) { struct drm_i915_private *i915 = engine->i915; @@ -1066,6 +1071,7 @@ static void setup_common(struct intel_engine_cs *engine) engine->cops = &ring_context_ops; engine->request_alloc = ring_request_alloc; + engine->bump_serial = ring_bump_serial; /* * Using a global execution timeline; the previous final breadcrumb is diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c index 68970398e4ef..9203c766db80 100644 --- a/drivers/gpu/drm/i915/gt/mock_engine.c +++ b/drivers/gpu/drm/i915/gt/mock_engine.c @@ -292,6 +292,11 @@ static void mock_engine_release(struct intel_engine_cs *engine) intel_engine_fini_retire(engine); } +static void mock_bump_serial(struct intel_engine_cs *engine) +{ + engine->serial++; +} + struct intel_engine_cs *mock_engine(struct drm_i915_private *i915, const char *name, int id) @@ -318,6 +323,7 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915, engine->base.cops = &mock_context_ops; engine->base.request_alloc = mock_request_alloc; + engine->base.bump_serial = mock_bump_serial; engine->base.emit_flush = mock_emit_flush; engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb; engine->base.submit_request = mock_submit_request; d
Re: [PATCH 23/51] drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs
On 7/16/2021 13:16, Matthew Brost wrote: With GuC virtual engines the physical engine which a request executes and completes on isn't known to the i915. Therefore we can't attach a request to a physical engines breadcrumbs. To work around this we create a single breadcrumbs per engine class when using GuC submission and direct all physical engine interrupts to this breadcrumbs. v2: (John H) - Rework header file structure so intel_engine_mask_t can be in intel_engine_types.h Signed-off-by: Matthew Brost CC: John Harrison Reviewed-by: John Harrison --- drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 41 +--- drivers/gpu/drm/i915/gt/intel_breadcrumbs.h | 16 - .../gpu/drm/i915/gt/intel_breadcrumbs_types.h | 7 ++ drivers/gpu/drm/i915/gt/intel_engine.h| 3 + drivers/gpu/drm/i915/gt/intel_engine_cs.c | 28 +++- drivers/gpu/drm/i915/gt/intel_engine_types.h | 2 +- .../drm/i915/gt/intel_execlists_submission.c | 2 +- drivers/gpu/drm/i915/gt/mock_engine.c | 4 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 67 +-- 9 files changed, 133 insertions(+), 37 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c index 38cc42783dfb..2007dc6f6b99 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c @@ -15,28 +15,14 @@ #include "intel_gt_pm.h" #include "intel_gt_requests.h" -static bool irq_enable(struct intel_engine_cs *engine) +static bool irq_enable(struct intel_breadcrumbs *b) { - if (!engine->irq_enable) - return false; - - /* Caller disables interrupts */ - spin_lock(&engine->gt->irq_lock); - engine->irq_enable(engine); - spin_unlock(&engine->gt->irq_lock); - - return true; + return intel_engine_irq_enable(b->irq_engine); } -static void irq_disable(struct intel_engine_cs *engine) +static void irq_disable(struct intel_breadcrumbs *b) { - if (!engine->irq_disable) - return; - - /* Caller disables interrupts */ - spin_lock(&engine->gt->irq_lock); - engine->irq_disable(engine); - spin_unlock(&engine->gt->irq_lock); + intel_engine_irq_disable(b->irq_engine); } static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b) @@ -57,7 +43,7 @@ static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b) WRITE_ONCE(b->irq_armed, true); /* Requests may have completed before we could enable the interrupt. */ - if (!b->irq_enabled++ && irq_enable(b->irq_engine)) + if (!b->irq_enabled++ && b->irq_enable(b)) irq_work_queue(&b->irq_work); } @@ -76,7 +62,7 @@ static void __intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b) { GEM_BUG_ON(!b->irq_enabled); if (!--b->irq_enabled) - irq_disable(b->irq_engine); + b->irq_disable(b); WRITE_ONCE(b->irq_armed, false); intel_gt_pm_put_async(b->irq_engine->gt); @@ -281,7 +267,7 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine) if (!b) return NULL; - b->irq_engine = irq_engine; + kref_init(&b->ref); spin_lock_init(&b->signalers_lock); INIT_LIST_HEAD(&b->signalers); @@ -290,6 +276,10 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine) spin_lock_init(&b->irq_lock); init_irq_work(&b->irq_work, signal_irq_work); + b->irq_engine = irq_engine; + b->irq_enable = irq_enable; + b->irq_disable = irq_disable; + return b; } @@ -303,9 +293,9 @@ void intel_breadcrumbs_reset(struct intel_breadcrumbs *b) spin_lock_irqsave(&b->irq_lock, flags); if (b->irq_enabled) - irq_enable(b->irq_engine); + b->irq_enable(b); else - irq_disable(b->irq_engine); + b->irq_disable(b); spin_unlock_irqrestore(&b->irq_lock, flags); } @@ -325,11 +315,14 @@ void __intel_breadcrumbs_park(struct intel_breadcrumbs *b) } } -void intel_breadcrumbs_free(struct intel_breadcrumbs *b) +void intel_breadcrumbs_free(struct kref *kref) { + struct intel_breadcrumbs *b = container_of(kref, typeof(*b), ref); + irq_work_sync(&b->irq_work); GEM_BUG_ON(!list_empty(&b->signalers)); GEM_BUG_ON(b->irq_armed); + kfree(b); } diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h index 3ce5ce270b04..be0d4f379a85 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h @@ -9,7 +9,7 @@ #include
Re: [PATCH 15/51] drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC
On 7/19/2021 18:53, Matthew Brost wrote: On Mon, Jul 19, 2021 at 06:03:05PM -0700, John Harrison wrote: On 7/16/2021 13:16, Matthew Brost wrote: When running the GuC the GPU can't be considered idle if the GuC still has contexts pinned. As such, a call has been added in intel_gt_wait_for_idle to idle the UC and in turn the GuC by waiting for the number of unpinned contexts to go to zero. v2: rtimeout -> remaining_timeout v3: Drop unnecessary includes, guc_submission_busy_loop -> guc_submission_send_busy_loop, drop negatie timeout trick, move a refactor of guc_context_unpin to earlier path (John H) Cc: John Harrison Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gem/i915_gem_mman.c | 3 +- drivers/gpu/drm/i915/gt/intel_gt.c| 19 + drivers/gpu/drm/i915/gt/intel_gt.h| 2 + drivers/gpu/drm/i915/gt/intel_gt_requests.c | 21 ++--- drivers/gpu/drm/i915/gt/intel_gt_requests.h | 7 +- drivers/gpu/drm/i915/gt/uc/intel_guc.h| 4 + drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 4 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 85 +-- drivers/gpu/drm/i915/gt/uc/intel_uc.h | 5 ++ drivers/gpu/drm/i915/i915_gem_evict.c | 1 + .../gpu/drm/i915/selftests/igt_live_test.c| 2 +- .../gpu/drm/i915/selftests/mock_gem_device.c | 3 +- 13 files changed, 129 insertions(+), 28 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c index a90f796e85c0..6fffd4d377c2 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c @@ -645,7 +645,8 @@ mmap_offset_attach(struct drm_i915_gem_object *obj, goto insert; /* Attempt to reap some mmap space from dead objects */ - err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT); + err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT, + NULL); if (err) goto err; diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index e714e21c0a4d..acfdd53b2678 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -585,6 +585,25 @@ static void __intel_gt_disable(struct intel_gt *gt) GEM_BUG_ON(intel_gt_pm_is_awake(gt)); } +int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout) +{ + long remaining_timeout; + + /* If the device is asleep, we have no requests outstanding */ + if (!intel_gt_pm_is_awake(gt)) + return 0; + + while ((timeout = intel_gt_retire_requests_timeout(gt, timeout, + &remaining_timeout)) > 0) { + cond_resched(); + if (signal_pending(current)) + return -EINTR; + } + + return timeout ? timeout : intel_uc_wait_for_idle(>->uc, + remaining_timeout); +} + int intel_gt_init(struct intel_gt *gt) { int err; diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h index e7aabe0cc5bf..74e771871a9b 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.h +++ b/drivers/gpu/drm/i915/gt/intel_gt.h @@ -48,6 +48,8 @@ void intel_gt_driver_release(struct intel_gt *gt); void intel_gt_driver_late_release(struct intel_gt *gt); +int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout); + void intel_gt_check_and_clear_faults(struct intel_gt *gt); void intel_gt_clear_error_registers(struct intel_gt *gt, intel_engine_mask_t engine_mask); diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c index 647eca9d867a..edb881d75630 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c @@ -130,7 +130,8 @@ void intel_engine_fini_retire(struct intel_engine_cs *engine) GEM_BUG_ON(engine->retire); } -long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout) +long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout, + long *remaining_timeout) { struct intel_gt_timelines *timelines = >->timelines; struct intel_timeline *tl, *tn; @@ -195,22 +196,10 @@ out_active: spin_lock(&timelines->lock); if (flush_submission(gt, timeout)) /* Wait, there's more! */ active_count++; - return active_count ? timeout : 0; -} - -int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout) -{ - /* If the device is asleep, we have no requests outstanding */ - if (!intel_gt_pm_is_awake(gt)) - return 0; - - while ((t
Re: [PATCH 24/51] drm/i915: Add i915_sched_engine destroy vfunc
On 7/16/2021 13:16, Matthew Brost wrote: This help the backends clean up when the schedule engine object gets help -> helps. Although, I would say it's more like 'this is required to allow backend specific cleanup'. It doesn't just make life a bit easier, it allows us to not leak stuff and/or dereference null pointers! Either way... Reviewed-by: John Harrison destroyed. Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/i915_scheduler.c | 3 ++- drivers/gpu/drm/i915/i915_scheduler.h | 4 +--- drivers/gpu/drm/i915/i915_scheduler_types.h | 5 + 3 files changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index 3a58a9130309..4fceda96deed 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -431,7 +431,7 @@ void i915_request_show_with_schedule(struct drm_printer *m, rcu_read_unlock(); } -void i915_sched_engine_free(struct kref *kref) +static void default_destroy(struct kref *kref) { struct i915_sched_engine *sched_engine = container_of(kref, typeof(*sched_engine), ref); @@ -453,6 +453,7 @@ i915_sched_engine_create(unsigned int subclass) sched_engine->queue = RB_ROOT_CACHED; sched_engine->queue_priority_hint = INT_MIN; + sched_engine->destroy = default_destroy; INIT_LIST_HEAD(&sched_engine->requests); INIT_LIST_HEAD(&sched_engine->hold); diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h index 650ab8e0db9f..3c9504e9f409 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.h +++ b/drivers/gpu/drm/i915/i915_scheduler.h @@ -51,8 +51,6 @@ static inline void i915_priolist_free(struct i915_priolist *p) struct i915_sched_engine * i915_sched_engine_create(unsigned int subclass); -void i915_sched_engine_free(struct kref *kref); - static inline struct i915_sched_engine * i915_sched_engine_get(struct i915_sched_engine *sched_engine) { @@ -63,7 +61,7 @@ i915_sched_engine_get(struct i915_sched_engine *sched_engine) static inline void i915_sched_engine_put(struct i915_sched_engine *sched_engine) { - kref_put(&sched_engine->ref, i915_sched_engine_free); + kref_put(&sched_engine->ref, sched_engine->destroy); } static inline bool diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h index 5935c3152bdc..00384e2c5273 100644 --- a/drivers/gpu/drm/i915/i915_scheduler_types.h +++ b/drivers/gpu/drm/i915/i915_scheduler_types.h @@ -163,6 +163,11 @@ struct i915_sched_engine { */ void *private_data; + /** +* @destroy: destroy schedule engine / cleanup in backend +*/ + void(*destroy)(struct kref *kref); + /** * @kick_backend: kick backend after a request's priority has changed */
Re: [PATCH 25/51] drm/i915: Move active request tracking to a vfunc
On 7/16/2021 13:16, Matthew Brost wrote: Move active request tracking to a backend vfunc rather than assuming all backends want to do this in the maner. In the case execlists / maner -> manner. In the case *of* execlists With those fixed... Reviewed-by: John Harrison ring submission the tracking is on the physical engine while with GuC submission it is on the context. Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/intel_context.c | 3 ++ drivers/gpu/drm/i915/gt/intel_context_types.h | 7 drivers/gpu/drm/i915/gt/intel_engine_types.h | 6 +++ .../drm/i915/gt/intel_execlists_submission.c | 40 ++ .../gpu/drm/i915/gt/intel_ring_submission.c | 22 ++ drivers/gpu/drm/i915/gt/mock_engine.c | 30 ++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 33 +++ drivers/gpu/drm/i915/i915_request.c | 41 ++- drivers/gpu/drm/i915/i915_request.h | 2 + 9 files changed, 147 insertions(+), 37 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 251ff7eea22d..bfb05d8697d1 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -393,6 +393,9 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine) spin_lock_init(&ce->guc_state.lock); INIT_LIST_HEAD(&ce->guc_state.fences); + spin_lock_init(&ce->guc_active.lock); + INIT_LIST_HEAD(&ce->guc_active.requests); + ce->guc_id = GUC_INVALID_LRC_ID; INIT_LIST_HEAD(&ce->guc_id_link); diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index 542c98418771..035108c10b2c 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -162,6 +162,13 @@ struct intel_context { struct list_head fences; } guc_state; + struct { + /** lock: protects everything in guc_active */ + spinlock_t lock; + /** requests: active requests on this context */ + struct list_head requests; + } guc_active; + /* GuC scheduling state flags that do not require a lock. */ atomic_t guc_sched_state_no_lock; diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index 03a81e8d87f4..950fc73ed6af 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -420,6 +420,12 @@ struct intel_engine_cs { void (*release)(struct intel_engine_cs *engine); + /* +* Add / remove request from engine active tracking +*/ + void(*add_active_request)(struct i915_request *rq); + void(*remove_active_request)(struct i915_request *rq); + struct intel_engine_execlists execlists; /* diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index abe48421fd7a..f9b5f54a5abe 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -3106,6 +3106,42 @@ static void execlists_park(struct intel_engine_cs *engine) cancel_timer(&engine->execlists.preempt); } +static void add_to_engine(struct i915_request *rq) +{ + lockdep_assert_held(&rq->engine->sched_engine->lock); + list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests); +} + +static void remove_from_engine(struct i915_request *rq) +{ + struct intel_engine_cs *engine, *locked; + + /* +* Virtual engines complicate acquiring the engine timeline lock, +* as their rq->engine pointer is not stable until under that +* engine lock. The simple ploy we use is to take the lock then +* check that the rq still belongs to the newly locked engine. +*/ + locked = READ_ONCE(rq->engine); + spin_lock_irq(&locked->sched_engine->lock); + while (unlikely(locked != (engine = READ_ONCE(rq->engine { + spin_unlock(&locked->sched_engine->lock); + spin_lock(&engine->sched_engine->lock); + locked = engine; + } + list_del_init(&rq->sched.link); + + clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); + clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags); + + /* Prevent further __await_execution() registering a cb, then flush */ + set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags); + + spin_unlock_irq(&locked->sched_engine->lock); + + i915_request_notify_execute_cb_imm(rq); +} + static bool can_preempt(struct intel_engine_cs *e
Re: [PATCH 26/51] drm/i915/guc: Reset implementation for new GuC interface
On 7/16/2021 13:16, Matthew Brost wrote: Reset implementation for new GuC interface. This is the legacy reset implementation which is called when the i915 owns the engine hang check. Future patches will offload the engine hang check to GuC but we will continue to maintain this legacy path as a fallback and this code path is also required if the GuC dies. With the new GuC interface it is not possible to reset individual engines - it is only possible to reset the GPU entirely. This patch forces an entire chip reset if any engine hangs. v2: (Michal) - Check for -EPIPE rather than -EIO (CT deadlock/corrupt check) v3: (John H) - Split into a series of smaller patches While the split happened, it doesn't look like any of the other comments were address. Repeated below for clarity. Also, Tvrtko has a bunch of outstanding comments too. Cc: John Harrison Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/intel_gt_pm.c | 6 +- drivers/gpu/drm/i915/gt/intel_reset.c | 18 +- drivers/gpu/drm/i915/gt/uc/intel_guc.c| 13 - drivers/gpu/drm/i915/gt/uc/intel_guc.h| 8 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 562 ++ drivers/gpu/drm/i915/gt/uc/intel_uc.c | 39 +- drivers/gpu/drm/i915/gt/uc/intel_uc.h | 3 + 7 files changed, 515 insertions(+), 134 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c index aef3084e8b16..463a6ae605a0 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c @@ -174,8 +174,6 @@ static void gt_sanitize(struct intel_gt *gt, bool force) if (intel_gt_is_wedged(gt)) intel_gt_unset_wedged(gt); - intel_uc_sanitize(>->uc); - for_each_engine(engine, gt, id) if (engine->reset.prepare) engine->reset.prepare(engine); @@ -191,6 +189,8 @@ static void gt_sanitize(struct intel_gt *gt, bool force) __intel_engine_reset(engine, false); } + intel_uc_reset(>->uc, false); + for_each_engine(engine, gt, id) if (engine->reset.finish) engine->reset.finish(engine); @@ -243,6 +243,8 @@ int intel_gt_resume(struct intel_gt *gt) goto err_wedged; } + intel_uc_reset_finish(>->uc); + intel_rps_enable(>->rps); intel_llc_enable(>->llc); diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index 72251638d4ea..2987282dff6d 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -826,6 +826,8 @@ static int gt_reset(struct intel_gt *gt, intel_engine_mask_t stalled_mask) __intel_engine_reset(engine, stalled_mask & engine->mask); local_bh_enable(); + intel_uc_reset(>->uc, true); + intel_ggtt_restore_fences(gt->ggtt); return err; @@ -850,6 +852,8 @@ static void reset_finish(struct intel_gt *gt, intel_engine_mask_t awake) if (awake & engine->mask) intel_engine_pm_put(engine); } + + intel_uc_reset_finish(>->uc); } static void nop_submit_request(struct i915_request *request) @@ -903,6 +907,7 @@ static void __intel_gt_set_wedged(struct intel_gt *gt) for_each_engine(engine, gt, id) if (engine->reset.cancel) engine->reset.cancel(engine); + intel_uc_cancel_requests(>->uc); local_bh_enable(); reset_finish(gt, awake); @@ -1191,6 +1196,9 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg) ENGINE_TRACE(engine, "flags=%lx\n", gt->reset.flags); GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, >->reset.flags)); + if (intel_engine_uses_guc(engine)) + return -ENODEV; + if (!intel_engine_pm_get_if_awake(engine)) return 0; @@ -1201,13 +1209,10 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg) "Resetting %s for %s\n", engine->name, msg); atomic_inc(&engine->i915->gpu_error.reset_engine_count[engine->uabi_class]); - if (intel_engine_uses_guc(engine)) - ret = intel_guc_reset_engine(&engine->gt->uc.guc, engine); - else - ret = intel_gt_reset_engine(engine); + ret = intel_gt_reset_engine(engine); if (ret) { /* If we fail here, we expect to fallback to a global reset */ - ENGINE_TRACE(engine, "Failed to reset, err: %d\n", ret); + ENGINE_TRACE(engine, "Failed to reset %s, err: %d\n", engine->name, ret); goto out; } @@ -1341,7 +1346,8 @@ void in
Re: [PATCH 30/51] drm/i915/guc: Handle context reset notification
On 7/16/2021 13:17, Matthew Brost wrote: GuC will issue a reset on detecting an engine hang and will notify the driver via a G2H message. The driver will service the notification by resetting the guilty context to a simple state or banning it completely. v2: (John Harrison) - Move msg[0] lookup after length check Cc: Matthew Brost Cc: John Harrison Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/uc/intel_guc.h| 2 ++ drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 3 ++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 36 +++ drivers/gpu/drm/i915/i915_trace.h | 10 ++ 4 files changed, 51 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index b3cfc52fe0bc..f23a3a618550 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -262,6 +262,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc, const u32 *msg, u32 len); int intel_guc_sched_done_process_msg(struct intel_guc *guc, const u32 *msg, u32 len); +int intel_guc_context_reset_process_msg(struct intel_guc *guc, + const u32 *msg, u32 len); void intel_guc_submission_reset_prepare(struct intel_guc *guc); void intel_guc_submission_reset(struct intel_guc *guc, bool stalled); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 503a78517610..c4f9b44b9f86 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -981,6 +981,9 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r case INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE: ret = intel_guc_sched_done_process_msg(guc, payload, len); break; + case INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION: + ret = intel_guc_context_reset_process_msg(guc, payload, len); + break; default: ret = -EOPNOTSUPP; break; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index fdb17279095c..feaf1ca61eaa 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -2196,6 +2196,42 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc, return 0; } +static void guc_context_replay(struct intel_context *ce) +{ + struct i915_sched_engine *sched_engine = ce->engine->sched_engine; + + __guc_reset_context(ce, true); + tasklet_hi_schedule(&sched_engine->tasklet); +} + +static void guc_handle_context_reset(struct intel_guc *guc, +struct intel_context *ce) +{ + trace_intel_context_reset(ce); + guc_context_replay(ce); +} + +int intel_guc_context_reset_process_msg(struct intel_guc *guc, + const u32 *msg, u32 len) +{ + struct intel_context *ce; + int desc_idx; + + if (unlikely(len != 1)) { + drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len); I think we decided that these should be drm_err rather than drm_dbg? With that updated: Reviewed-by: John Harrison + return -EPROTO; + } + + desc_idx = msg[0]; + ce = g2h_context_lookup(guc, desc_idx); + if (unlikely(!ce)) + return -EPROTO; + + guc_handle_context_reset(guc, ce); + + return 0; +} + void intel_guc_submission_print_info(struct intel_guc *guc, struct drm_printer *p) { diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h index 97c2e83984ed..c095c4d39456 100644 --- a/drivers/gpu/drm/i915/i915_trace.h +++ b/drivers/gpu/drm/i915/i915_trace.h @@ -929,6 +929,11 @@ DECLARE_EVENT_CLASS(intel_context, __entry->guc_sched_state_no_lock) ); +DEFINE_EVENT(intel_context, intel_context_reset, +TP_PROTO(struct intel_context *ce), +TP_ARGS(ce) +); + DEFINE_EVENT(intel_context, intel_context_register, TP_PROTO(struct intel_context *ce), TP_ARGS(ce) @@ -1026,6 +1031,11 @@ trace_i915_request_out(struct i915_request *rq) { } +static inline void +trace_intel_context_reset(struct intel_context *ce) +{ +} + static inline void trace_intel_context_register(struct intel_context *ce) {
Re: [PATCH 42/51] drm/i915/guc: Implement banned contexts for GuC submission
On 7/16/2021 13:17, Matthew Brost wrote: When using GuC submission, if a context gets banned disable scheduling and mark all inflight requests as complete. Cc: John Harrison Signed-off-by: Matthew Brost Reviewed-by: John Harrison --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 2 +- drivers/gpu/drm/i915/gt/intel_context.h | 13 ++ drivers/gpu/drm/i915/gt/intel_context_types.h | 2 + drivers/gpu/drm/i915/gt/intel_reset.c | 32 +--- .../gpu/drm/i915/gt/intel_ring_submission.c | 20 +++ drivers/gpu/drm/i915/gt/uc/intel_guc.h| 2 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 151 -- drivers/gpu/drm/i915/i915_trace.h | 10 ++ 8 files changed, 195 insertions(+), 37 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 28c62f7ccfc7..d87a4c6da5bc 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -1084,7 +1084,7 @@ static void kill_engines(struct i915_gem_engines *engines, bool ban) for_each_gem_engine(ce, engines, it) { struct intel_engine_cs *engine; - if (ban && intel_context_set_banned(ce)) + if (ban && intel_context_ban(ce, NULL)) continue; /* diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h index 2ed9bf5f91a5..814d9277096a 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.h +++ b/drivers/gpu/drm/i915/gt/intel_context.h @@ -16,6 +16,7 @@ #include "intel_engine_types.h" #include "intel_ring_types.h" #include "intel_timeline_types.h" +#include "i915_trace.h" #define CE_TRACE(ce, fmt, ...) do { \ const struct intel_context *ce__ = (ce);\ @@ -243,6 +244,18 @@ static inline bool intel_context_set_banned(struct intel_context *ce) return test_and_set_bit(CONTEXT_BANNED, &ce->flags); } +static inline bool intel_context_ban(struct intel_context *ce, +struct i915_request *rq) +{ + bool ret = intel_context_set_banned(ce); + + trace_intel_context_ban(ce); + if (ce->ops->ban) + ce->ops->ban(ce, rq); + + return ret; +} + static inline bool intel_context_force_single_submission(const struct intel_context *ce) { diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index 035108c10b2c..57c19ee3e313 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -35,6 +35,8 @@ struct intel_context_ops { int (*alloc)(struct intel_context *ce); + void (*ban)(struct intel_context *ce, struct i915_request *rq); + int (*pre_pin)(struct intel_context *ce, struct i915_gem_ww_ctx *ww, void **vaddr); int (*pin)(struct intel_context *ce, void *vaddr); void (*unpin)(struct intel_context *ce); diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index f3cdbf4ba5c8..3ed694cab5af 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -22,7 +22,6 @@ #include "intel_reset.h" #include "uc/intel_guc.h" -#include "uc/intel_guc_submission.h" #define RESET_MAX_RETRIES 3 @@ -39,21 +38,6 @@ static void rmw_clear_fw(struct intel_uncore *uncore, i915_reg_t reg, u32 clr) intel_uncore_rmw_fw(uncore, reg, clr, 0); } -static void skip_context(struct i915_request *rq) -{ - struct intel_context *hung_ctx = rq->context; - - list_for_each_entry_from_rcu(rq, &hung_ctx->timeline->requests, link) { - if (!i915_request_is_active(rq)) - return; - - if (rq->context == hung_ctx) { - i915_request_set_error_once(rq, -EIO); - __i915_request_skip(rq); - } - } -} - static void client_mark_guilty(struct i915_gem_context *ctx, bool banned) { struct drm_i915_file_private *file_priv = ctx->file_priv; @@ -88,10 +72,8 @@ static bool mark_guilty(struct i915_request *rq) bool banned; int i; - if (intel_context_is_closed(rq->context)) { - intel_context_set_banned(rq->context); + if (intel_context_is_closed(rq->context)) return true; - } rcu_read_lock(); ctx = rcu_dereference(rq->context->gem_context); @@ -123,11 +105,9 @@ static bool mark_guilty(struct i915_request *rq) banned = !i915_gem_context_is_recoverable(ctx); if (time_before(jiffies, prev_hang + CONTEXT_FAST_HANG_JIFFIES)) banned = true; - if (banned) { + if (banned)
Re: [PATCH 47/51] drm/i915/selftest: Increase some timeouts in live_requests
On 7/16/2021 13:17, Matthew Brost wrote: Requests may take slightly longer with GuC submission, let's increase the timeouts in live_requests. Signed-off-by: Matthew Brost Reviewed-by: John Harrison --- drivers/gpu/drm/i915/selftests/i915_request.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c index bd5c96a77ba3..d67710d10615 100644 --- a/drivers/gpu/drm/i915/selftests/i915_request.c +++ b/drivers/gpu/drm/i915/selftests/i915_request.c @@ -1313,7 +1313,7 @@ static int __live_parallel_engine1(void *arg) i915_request_add(rq); err = 0; - if (i915_request_wait(rq, 0, HZ / 5) < 0) + if (i915_request_wait(rq, 0, HZ) < 0) err = -ETIME; i915_request_put(rq); if (err) @@ -1419,7 +1419,7 @@ static int __live_parallel_spin(void *arg) } igt_spinner_end(&spin); - if (err == 0 && i915_request_wait(rq, 0, HZ / 5) < 0) + if (err == 0 && i915_request_wait(rq, 0, HZ) < 0) err = -EIO; i915_request_put(rq);
Re: [PATCH 04/18] drm/i915/guc: Implement GuC submission tasklet
On 7/20/2021 15:39, Matthew Brost wrote: Implement GuC submission tasklet for new interface. The new GuC interface uses H2G to submit contexts to the GuC. Since H2G use a single channel, a single tasklet is used for the submission path. Also the per engine interrupt handler has been updated to disable the rescheduling of the physical engine tasklet, when using GuC scheduling, as the physical engine tasklet is no longer used. In this patch the field, guc_id, has been added to intel_context and is not assigned. Patches later in the series will assign this value. v2: (John Harrison) - Clean up some comments v3: (John Harrison) - More comment cleanups Cc: John Harrison Signed-off-by: Matthew Brost Reviewed-by: John Harrison --- drivers/gpu/drm/i915/gt/intel_context_types.h | 9 + drivers/gpu/drm/i915/gt/uc/intel_guc.h| 4 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 231 +- 3 files changed, 127 insertions(+), 117 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index 90026c177105..6d99631d19b9 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -137,6 +137,15 @@ struct intel_context { struct intel_sseu sseu; u8 wa_bb_page; /* if set, page num reserved for context workarounds */ + + /* GuC scheduling state flags that do not require a lock. */ + atomic_t guc_sched_state_no_lock; + + /* +* GuC LRC descriptor ID - Not assigned in this patch but future patches +* in the series will. +*/ + u16 guc_id; }; #endif /* __INTEL_CONTEXT_TYPES__ */ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 35783558d261..8c7b92f699f1 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -30,6 +30,10 @@ struct intel_guc { struct intel_guc_log log; struct intel_guc_ct ct; + /* Global engine used to submit requests to GuC */ + struct i915_sched_engine *sched_engine; + struct i915_request *stalled_request; + /* intel_guc_recv interrupt related state */ spinlock_t irq_lock; unsigned int msg_enabled_mask; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 23a94a896a0b..ca0717166a27 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -60,6 +60,31 @@ #define GUC_REQUEST_SIZE 64 /* bytes */ +/* + * Below is a set of functions which control the GuC scheduling state which do + * not require a lock as all state transitions are mutually exclusive. i.e. It + * is not possible for the context pinning code and submission, for the same + * context, to be executing simultaneously. We still need an atomic as it is + * possible for some of the bits to changing at the same time though. + */ +#define SCHED_STATE_NO_LOCK_ENABLEDBIT(0) +static inline bool context_enabled(struct intel_context *ce) +{ + return (atomic_read(&ce->guc_sched_state_no_lock) & + SCHED_STATE_NO_LOCK_ENABLED); +} + +static inline void set_context_enabled(struct intel_context *ce) +{ + atomic_or(SCHED_STATE_NO_LOCK_ENABLED, &ce->guc_sched_state_no_lock); +} + +static inline void clr_context_enabled(struct intel_context *ce) +{ + atomic_and((u32)~SCHED_STATE_NO_LOCK_ENABLED, + &ce->guc_sched_state_no_lock); +} + static inline struct i915_priolist *to_priolist(struct rb_node *rb) { return rb_entry(rb, struct i915_priolist, node); @@ -122,37 +147,29 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id, xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC); } -static void guc_add_request(struct intel_guc *guc, struct i915_request *rq) +static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) { - /* Leaving stub as this function will be used in future patches */ -} + int err; + struct intel_context *ce = rq->context; + u32 action[3]; + int len = 0; + bool enabled = context_enabled(ce); -/* - * When we're doing submissions using regular execlists backend, writing to - * ELSP from CPU side is enough to make sure that writes to ringbuffer pages - * pinned in mappable aperture portion of GGTT are visible to command streamer. - * Writes done by GuC on our behalf are not guaranteeing such ordering, - * therefore, to ensure the flush, we're issuing a POSTING READ. - */ -static void flush_ggtt_writes(struct i915_vma *vma) -{ - if (i915_vma_is_map_and_fenceable(vma)) - intel_uncore_posting_read_fw(vma->vm->gt->uncore, -GUC_STATUS);
Re: [PATCH 06/18] drm/i915/guc: Implement GuC context operations for new inteface
On 7/20/2021 15:39, Matthew Brost wrote: Implement GuC context operations which includes GuC specific operations alloc, pin, unpin, and destroy. v2: (Daniel Vetter) - Use msleep_interruptible rather than cond_resched in busy loop (Michal) - Remove C++ style comment v3: (Matthew Brost) - Drop GUC_ID_START (John Harrison) - Fix a bunch of typos - Use drm_err rather than drm_dbg for G2H errors (Daniele) - Fix ;; typo - Clean up sched state functions - Add lockdep for guc_id functions - Don't call __release_guc_id when guc_id is invalid - Use MISSING_CASE - Add comment in guc_context_pin - Use shorter path to rpm (Daniele / CI) - Don't call release_guc_id on an invalid guc_id in destroy Signed-off-by: John Harrison Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/intel_context.c | 5 + drivers/gpu/drm/i915/gt/intel_context_types.h | 22 +- drivers/gpu/drm/i915/gt/intel_lrc_reg.h | 1 - drivers/gpu/drm/i915/gt/uc/intel_guc.h| 40 ++ drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 4 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 667 -- drivers/gpu/drm/i915/i915_reg.h | 1 + drivers/gpu/drm/i915/i915_request.c | 1 + 8 files changed, 686 insertions(+), 55 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index bd63813c8a80..32fd6647154b 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -384,6 +384,11 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine) mutex_init(&ce->pin_mutex); + spin_lock_init(&ce->guc_state.lock); + + ce->guc_id = GUC_INVALID_LRC_ID; + INIT_LIST_HEAD(&ce->guc_id_link); + i915_active_init(&ce->active, __intel_context_active, __intel_context_retire, 0); } diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index 6d99631d19b9..606c480aec26 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -96,6 +96,7 @@ struct intel_context { #define CONTEXT_BANNED6 #define CONTEXT_FORCE_SINGLE_SUBMISSION 7 #define CONTEXT_NOPREEMPT 8 +#define CONTEXT_LRCA_DIRTY 9 struct { u64 timeout_us; @@ -138,14 +139,29 @@ struct intel_context { u8 wa_bb_page; /* if set, page num reserved for context workarounds */ + struct { + /** lock: protects everything in guc_state */ + spinlock_t lock; + /** +* sched_state: scheduling state of this context using GuC +* submission +*/ + u8 sched_state; + } guc_state; + /* GuC scheduling state flags that do not require a lock. */ atomic_t guc_sched_state_no_lock; + /* GuC LRC descriptor ID */ + u16 guc_id; + + /* GuC LRC descriptor reference count */ + atomic_t guc_id_ref; + /* -* GuC LRC descriptor ID - Not assigned in this patch but future patches -* in the series will. +* GuC ID link - in list when unpinned but guc_id still valid in GuC */ - u16 guc_id; + struct list_head guc_id_link; }; #endif /* __INTEL_CONTEXT_TYPES__ */ diff --git a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h index 41e5350a7a05..49d4857ad9b7 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h +++ b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h @@ -87,7 +87,6 @@ #define GEN11_CSB_WRITE_PTR_MASK (GEN11_CSB_PTR_MASK << 0) #define MAX_CONTEXT_HW_ID (1 << 21) /* exclusive */ -#define MAX_GUC_CONTEXT_HW_ID (1 << 20) /* exclusive */ #define GEN11_MAX_CONTEXT_HW_ID (1 << 11) /* exclusive */ /* in Gen12 ID 0x7FF is reserved to indicate idle */ #define GEN12_MAX_CONTEXT_HW_ID (GEN11_MAX_CONTEXT_HW_ID - 1) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 8c7b92f699f1..30773cd699f5 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -7,6 +7,7 @@ #define _INTEL_GUC_H_ #include +#include #include "intel_uncore.h" #include "intel_guc_fw.h" @@ -44,6 +45,14 @@ struct intel_guc { void (*disable)(struct intel_guc *guc); } interrupts; + /* +* contexts_lock protects the pool of free guc ids and a linked list of +* guc ids available to be stolen +*/ + spinlock_t contexts_lock; + struct ida guc_ids; + struct list_head guc_id_list; + bool submission_selected; struct i915_vma *ads_vma; @@ -101,6 +11
Re: [Intel-gfx] [PATCH 08/33] drm/i915/guc: Reset implementation for new GuC interface
On 7/22/2021 16:54, Matthew Brost wrote: Reset implementation for new GuC interface. This is the legacy reset implementation which is called when the i915 owns the engine hang check. Future patches will offload the engine hang check to GuC but we will continue to maintain this legacy path as a fallback and this code path is also required if the GuC dies. With the new GuC interface it is not possible to reset individual engines - it is only possible to reset the GPU entirely. This patch forces an entire chip reset if any engine hangs. v2: (Michal) - Check for -EPIPE rather than -EIO (CT deadlock/corrupt check) v3: (John H) - Split into a series of smaller patches v4: (John H) - Fix typo - Add braces around if statements in reset code Cc: John Harrison Signed-off-by: Matthew Brost Reviewed-by: John Harrison --- drivers/gpu/drm/i915/gt/intel_gt_pm.c | 6 +- drivers/gpu/drm/i915/gt/intel_reset.c | 18 +- drivers/gpu/drm/i915/gt/uc/intel_guc.c| 13 - drivers/gpu/drm/i915/gt/uc/intel_guc.h| 8 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 561 ++ drivers/gpu/drm/i915/gt/uc/intel_uc.c | 39 +- drivers/gpu/drm/i915/gt/uc/intel_uc.h | 3 + 7 files changed, 516 insertions(+), 132 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c index d86825437516..cd7b96005d29 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c @@ -170,8 +170,6 @@ static void gt_sanitize(struct intel_gt *gt, bool force) if (intel_gt_is_wedged(gt)) intel_gt_unset_wedged(gt); - intel_uc_sanitize(>->uc); - for_each_engine(engine, gt, id) if (engine->reset.prepare) engine->reset.prepare(engine); @@ -187,6 +185,8 @@ static void gt_sanitize(struct intel_gt *gt, bool force) __intel_engine_reset(engine, false); } + intel_uc_reset(>->uc, false); + for_each_engine(engine, gt, id) if (engine->reset.finish) engine->reset.finish(engine); @@ -239,6 +239,8 @@ int intel_gt_resume(struct intel_gt *gt) goto err_wedged; } + intel_uc_reset_finish(>->uc); + intel_rps_enable(>->rps); intel_llc_enable(>->llc); diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index 72251638d4ea..2987282dff6d 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -826,6 +826,8 @@ static int gt_reset(struct intel_gt *gt, intel_engine_mask_t stalled_mask) __intel_engine_reset(engine, stalled_mask & engine->mask); local_bh_enable(); + intel_uc_reset(>->uc, true); + intel_ggtt_restore_fences(gt->ggtt); return err; @@ -850,6 +852,8 @@ static void reset_finish(struct intel_gt *gt, intel_engine_mask_t awake) if (awake & engine->mask) intel_engine_pm_put(engine); } + + intel_uc_reset_finish(>->uc); } static void nop_submit_request(struct i915_request *request) @@ -903,6 +907,7 @@ static void __intel_gt_set_wedged(struct intel_gt *gt) for_each_engine(engine, gt, id) if (engine->reset.cancel) engine->reset.cancel(engine); + intel_uc_cancel_requests(>->uc); local_bh_enable(); reset_finish(gt, awake); @@ -1191,6 +1196,9 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg) ENGINE_TRACE(engine, "flags=%lx\n", gt->reset.flags); GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, >->reset.flags)); + if (intel_engine_uses_guc(engine)) + return -ENODEV; + if (!intel_engine_pm_get_if_awake(engine)) return 0; @@ -1201,13 +1209,10 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg) "Resetting %s for %s\n", engine->name, msg); atomic_inc(&engine->i915->gpu_error.reset_engine_count[engine->uabi_class]); - if (intel_engine_uses_guc(engine)) - ret = intel_guc_reset_engine(&engine->gt->uc.guc, engine); - else - ret = intel_gt_reset_engine(engine); + ret = intel_gt_reset_engine(engine); if (ret) { /* If we fail here, we expect to fallback to a global reset */ - ENGINE_TRACE(engine, "Failed to reset, err: %d\n", ret); + ENGINE_TRACE(engine, "Failed to reset %s, err: %d\n", engine->name, ret); goto out; } @@ -1341,7 +1346,8 @@ void intel_gt_handle_error(struct intel_gt *gt, * Try engine reset when
Re: [PATCH 0/2] Add support for querying hw info that UMDs need
On 7/27/2021 02:49, Daniel Vetter wrote: On Mon, Jul 26, 2021 at 07:21:43PM -0700, john.c.harri...@intel.com wrote: From: John Harrison Various UMDs require hardware configuration information about the current platform. A bunch of static information is available in a fixed table that can be retrieved from the GuC. Test-with: 20210727002812.43469-2-john.c.harri...@intel.com UMD: https://github.com/intel/compute-runtime/pull/432/files Signed-off-by: John Harrison Can you pls submit this with all the usual suspect from the umd side (so also media-driver and mesa) cced? Do you have a list of names that you would like included? Also do the mesa/media-driver patches exist somewhere? Afaiui this isn't very useful without those bits in place too. I don't know about mesa but the media team have the support in place in their internal tree and (as per compute) are waiting for us to push the kernel side. This also comes under the headings of both new platforms and platforms which are POR for GuC submission. So I believe a lot of the UMD side changes for the config table are wrapped up in their support for the new platforms/GuC as a whole and thus not yet ready for upstream. John. -Daniel John Harrison (1): drm/i915/guc: Add fetch of hwconfig table Rodrigo Vivi (1): drm/i915/uapi: Add query for hwconfig table drivers/gpu/drm/i915/Makefile | 1 + .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h | 1 + .../gpu/drm/i915/gt/uc/abi/guc_errors_abi.h | 4 + drivers/gpu/drm/i915/gt/uc/intel_guc.c| 3 +- drivers/gpu/drm/i915/gt/uc/intel_guc.h| 2 + .../gpu/drm/i915/gt/uc/intel_guc_hwconfig.c | 156 ++ .../gpu/drm/i915/gt/uc/intel_guc_hwconfig.h | 19 +++ drivers/gpu/drm/i915/gt/uc/intel_uc.c | 6 + drivers/gpu/drm/i915/i915_query.c | 23 +++ include/uapi/drm/i915_drm.h | 1 + 10 files changed, 215 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/i915/gt/uc/intel_guc_hwconfig.c create mode 100644 drivers/gpu/drm/i915/gt/uc/intel_guc_hwconfig.h -- 2.25.1
Re: [PATCH 29/33] drm/i915/selftest: Increase some timeouts in live_requests
On 7/26/2021 17:23, Matthew Brost wrote: Requests may take slightly longer with GuC submission, let's increase the timeouts in live_requests. Signed-off-by: Matthew Brost Was already reviewed in previous series. Repeating here for patchwork: Reviewed-by: John Harrison --- drivers/gpu/drm/i915/selftests/i915_request.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c index bd5c96a77ba3..d67710d10615 100644 --- a/drivers/gpu/drm/i915/selftests/i915_request.c +++ b/drivers/gpu/drm/i915/selftests/i915_request.c @@ -1313,7 +1313,7 @@ static int __live_parallel_engine1(void *arg) i915_request_add(rq); err = 0; - if (i915_request_wait(rq, 0, HZ / 5) < 0) + if (i915_request_wait(rq, 0, HZ) < 0) err = -ETIME; i915_request_put(rq); if (err) @@ -1419,7 +1419,7 @@ static int __live_parallel_spin(void *arg) } igt_spinner_end(&spin); - if (err == 0 && i915_request_wait(rq, 0, HZ / 5) < 0) + if (err == 0 && i915_request_wait(rq, 0, HZ) < 0) err = -EIO; i915_request_put(rq);
Re: [Intel-gfx] [PATCH 1/1] drm/i915: Check if engine has heartbeat when closing a context
On 7/28/2021 17:34, Matthew Brost wrote: If an engine associated with a context does not have a heartbeat, ban it immediately. This is needed for GuC submission as a idle pulse doesn't kick the context off the hardware where it then can check for a heartbeat and ban the context. It's worse than this. If the engine in question is an individual physical engine then sending a pulse (with sufficiently high priority) will pre-empt the engine and kick the context off. However, the GuC scheduler does not have hacks in it to check the state of the heartbeat or whether a context is actually a zombie or not. Thus, the context will get resubmitted to the hardware after the pulse completes and effectively nothing will have happened. I would assume that the DRM scheduler which we are meant to be switching to for execlist as well as GuC submission is also unlikely to have hacks for zombie contexts and tests for whether the i915 specific heartbeat has been disabled since the context became a zombie. So when that switch happens, this test will also fail in execlist mode as well as GuC mode. The choices I see here are to simply remove persistence completely (it is a basically a bug that became UAPI because it wasn't caught soon enough!) or to implement it in a way that does not require hacks in the back end scheduler. Apparently, the DRM scheduler is expected to allow zombie contexts to persist until the DRM file handle is closed. So presumably we will have to go with option two. That means flagging a context as being a zombie when it is closed but still active. The driver would then add it to a zombie list owned by the DRM client object. When that client object is closed, i915 would go through the list and genuinely kill all the contexts. No back end scheduler hacks required and no intimate knowledge of the i915 heartbeat mechanism required either. John. This patch also updates intel_engine_has_heartbeat to be a vfunc as we now need to call this function on execlists virtual engines too. Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 5 +++-- drivers/gpu/drm/i915/gt/intel_context_types.h | 2 ++ drivers/gpu/drm/i915/gt/intel_engine.h| 21 ++- .../drm/i915/gt/intel_execlists_submission.c | 14 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 6 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.h | 2 -- 6 files changed, 26 insertions(+), 24 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 9c3672bac0e2..b8e01c5ba9e5 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -1090,8 +1090,9 @@ static void kill_engines(struct i915_gem_engines *engines, bool ban) */ for_each_gem_engine(ce, engines, it) { struct intel_engine_cs *engine; + bool local_ban = ban || !intel_engine_has_heartbeat(ce->engine); - if (ban && intel_context_ban(ce, NULL)) + if (local_ban && intel_context_ban(ce, NULL)) continue; /* @@ -1104,7 +1105,7 @@ static void kill_engines(struct i915_gem_engines *engines, bool ban) engine = active_engine(ce); /* First attempt to gracefully cancel the context */ - if (engine && !__cancel_engine(engine) && ban) + if (engine && !__cancel_engine(engine) && local_ban) /* * If we are unable to send a preemptive pulse to bump * the context from the GPU, we have to resort to a full diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index e54351a170e2..65f2eb2a78e4 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -55,6 +55,8 @@ struct intel_context_ops { void (*reset)(struct intel_context *ce); void (*destroy)(struct kref *kref); + bool (*has_heartbeat)(const struct intel_engine_cs *engine); + /* virtual engine/context interface */ struct intel_context *(*create_virtual)(struct intel_engine_cs **engine, unsigned int count); diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h index c2a5640ae055..1b11a808acc4 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine.h +++ b/drivers/gpu/drm/i915/gt/intel_engine.h @@ -283,28 +283,11 @@ struct intel_context * intel_engine_create_virtual(struct intel_engine_cs **siblings, unsigned int count); -static inline bool -intel_virtual_engine_has_heartbeat(const struct intel_engine_cs *engine) -{ - /* -* For non-GuC submission we expect the back-end to look at the -* heartbeat status of the actual physical engine that the work -
Re: [Intel-gfx] [PATCH 1/1] drm/i915: Check if engine has heartbeat when closing a context
On 7/30/2021 02:49, Tvrtko Ursulin wrote: On 30/07/2021 01:13, John Harrison wrote: On 7/28/2021 17:34, Matthew Brost wrote: If an engine associated with a context does not have a heartbeat, ban it immediately. This is needed for GuC submission as a idle pulse doesn't kick the context off the hardware where it then can check for a heartbeat and ban the context. Pulse, that is a request with I915_PRIORITY_BARRIER, does not preempt a running normal priority context? Why does it matter then whether or not heartbeats are enabled - when heartbeat just ends up sending the same engine pulse (eventually, with raising priority)? The point is that the pulse is pointless. See the rest of my comments below, specifically "the context will get resubmitted to the hardware after the pulse completes". To re-iterate... Yes, it preempts the context. Yes, it does so whether heartbeats are enabled or not. But so what? Who cares? You have preempted a context. It is no longer running on the hardware. BUT IT IS STILL A VALID CONTEXT. The backend scheduler will just resubmit it to the hardware as soon as the pulse completes. The only reason this works at all is because of the horrid hack in the execlist scheduler's back end implementation (in __execlists_schedule_in): if (unlikely(intel_context_is_closed(ce) && !intel_engine_has_heartbeat(engine))) intel_context_set_banned(ce); The actual back end scheduler is saying "Is this a zombie context? Is the heartbeat disabled? Then ban it". No other scheduler backend is going to have knowledge of zombie context status or of the heartbeat status. Nor are they going to call back into the higher levels of the i915 driver to trigger a ban operation. Certainly a hardware implemented scheduler is not going to be looking at private i915 driver information to decide whether to submit a context or whether to tell the OS to kill it off instead. For persistence to work with a hardware scheduler (or a non-Intel specific scheduler such as the DRM one), the handling of zombie contexts, banning, etc. *must* be done entirely in the front end. It cannot rely on any backend hacks. That means you can't rely on any fancy behaviour of pulses. If you want to ban a context then you must explicitly ban that context. If you want to ban it at some later point then you need to track it at the top level as a zombie and then explicitly ban that zombie at whatever later point. It's worse than this. If the engine in question is an individual physical engine then sending a pulse (with sufficiently high priority) will pre-empt the engine and kick the context off. However, the GuC Why it is different for physical vs virtual, aren't both just schedulable contexts with different engine masks for what GuC is concerned? Oh, is it a matter of needing to send pulses to all engines which comprise a virtual one? It isn't different. It is totally broken for both. It is potentially more broken for virtual engines because of the question of which engine to pulse. But as stated above, the pulse is pointless anyway so the which engine question doesn't even matter. John. scheduler does not have hacks in it to check the state of the heartbeat or whether a context is actually a zombie or not. Thus, the context will get resubmitted to the hardware after the pulse completes and effectively nothing will have happened. I would assume that the DRM scheduler which we are meant to be switching to for execlist as well as GuC submission is also unlikely to have hacks for zombie contexts and tests for whether the i915 specific heartbeat has been disabled since the context became a zombie. So when that switch happens, this test will also fail in execlist mode as well as GuC mode. The choices I see here are to simply remove persistence completely (it is a basically a bug that became UAPI because it wasn't caught soon enough!) or to implement it in a way that does not require hacks in the back end scheduler. Apparently, the DRM scheduler is expected to allow zombie contexts to persist until the DRM file handle is closed. So presumably we will have to go with option two. That means flagging a context as being a zombie when it is closed but still active. The driver would then add it to a zombie list owned by the DRM client object. When that client object is closed, i915 would go through the list and genuinely kill all the contexts. No back end scheduler hacks required and no intimate knowledge of the i915 heartbeat mechanism required either. John. This patch also updates intel_engine_has_heartbeat to be a vfunc as we now need to call this function on execlists virtual engines too. Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 5 +++-- drivers/gpu/drm/i915/gt/intel_context_types.h | 2 ++ drivers/gpu/dr
Re: [Intel-gfx] [PATCH 1/1] drm/i915: Check if engine has heartbeat when closing a context
On 8/2/2021 02:40, Tvrtko Ursulin wrote: On 30/07/2021 19:13, John Harrison wrote: On 7/30/2021 02:49, Tvrtko Ursulin wrote: On 30/07/2021 01:13, John Harrison wrote: On 7/28/2021 17:34, Matthew Brost wrote: If an engine associated with a context does not have a heartbeat, ban it immediately. This is needed for GuC submission as a idle pulse doesn't kick the context off the hardware where it then can check for a heartbeat and ban the context. Pulse, that is a request with I915_PRIORITY_BARRIER, does not preempt a running normal priority context? Why does it matter then whether or not heartbeats are enabled - when heartbeat just ends up sending the same engine pulse (eventually, with raising priority)? The point is that the pulse is pointless. See the rest of my comments below, specifically "the context will get resubmitted to the hardware after the pulse completes". To re-iterate... Yes, it preempts the context. Yes, it does so whether heartbeats are enabled or not. But so what? Who cares? You have preempted a context. It is no longer running on the hardware. BUT IT IS STILL A VALID CONTEXT. It is valid yes, and it even may be the current ABI so another question is whether it is okay to change that. The backend scheduler will just resubmit it to the hardware as soon as the pulse completes. The only reason this works at all is because of the horrid hack in the execlist scheduler's back end implementation (in __execlists_schedule_in): if (unlikely(intel_context_is_closed(ce) && !intel_engine_has_heartbeat(engine))) intel_context_set_banned(ce); Right, is the above code then needed with this patch - when ban is immediately applied on the higher level? The actual back end scheduler is saying "Is this a zombie context? Is the heartbeat disabled? Then ban it". No other scheduler backend is going to have knowledge of zombie context status or of the heartbeat status. Nor are they going to call back into the higher levels of the i915 driver to trigger a ban operation. Certainly a hardware implemented scheduler is not going to be looking at private i915 driver information to decide whether to submit a context or whether to tell the OS to kill it off instead. For persistence to work with a hardware scheduler (or a non-Intel specific scheduler such as the DRM one), the handling of zombie contexts, banning, etc. *must* be done entirely in the front end. It cannot rely on any backend hacks. That means you can't rely on any fancy behaviour of pulses. If you want to ban a context then you must explicitly ban that context. If you want to ban it at some later point then you need to track it at the top level as a zombie and then explicitly ban that zombie at whatever later point. I am still trying to understand it all. If I go by the commit message: """ This is needed for GuC submission as a idle pulse doesn't kick the context off the hardware where it then can check for a heartbeat and ban the context. """ That did not explain things for me. Sentence does not appear to make sense. Now, it seems "kick off the hardware" is meant as revoke and not just preempt. Which is fine, perhaps just needs to be written more explicitly. But the part of checking for heartbeat after idle pulse does not compute for me. It is the heartbeat which emits idle pulses, not idle pulse emitting heartbeats. I am in agreement that the commit message is confusing and does not explain either the problem or the solution. But anyway, I can buy the handling at the front end story completely. It makes sense. We just need to agree that a) it is okay to change the ABI and b) remove the backend check from execlists if it is not needed any longer. And if ABI change is okay then commit message needs to talk about it loudly and clearly. I don't think we have a choice. The current ABI is not and cannot ever be compatible with any scheduler external to i915. It cannot be implemented with a hardware scheduler such as the GuC and it cannot be implemented with an external software scheduler such as the DRM one. My view is that any implementation involving knowledge of the heartbeat is fundamentally broken. According to Daniel Vetter, the DRM ABI on this subject is that an actively executing context should persist until the DRM file handle is closed. That seems like a much more plausible and simple ABI than one that says 'if the heartbeat is running then a context will persist forever, if the heartbeat is not running then it will be killed immediately, if the heart was running but then stops running then the context will be killed on the next context switch, ...'. And if I understand it correctly, the current ABI allows a badly written user app to cause a denial of service by leaving contexts permanently running an infinit
Re: [Intel-gfx] [PATCH] drm/i915: Fix syncmap memory leak
On 7/30/2021 12:53, Matthew Brost wrote: A small race exists between intel_gt_retire_requests_timeout and intel_timeline_exit which could result in the syncmap not getting free'd. Rather than work to hard to seal this race, simply cleanup the free'd -> freed syncmap on fini. unreferenced object 0x88813bc53b18 (size 96): comm "gem_close_race", pid 5410, jiffies 4294917818 (age 1105.600s) hex dump (first 32 bytes): 01 00 00 00 00 00 00 00 00 00 00 00 0a 00 00 00 00 00 00 00 00 00 00 00 6b 6b 6b 6b 06 00 00 00 backtrace: [<120b863a>] __sync_alloc_leaf+0x1e/0x40 [i915] [<042f6959>] __sync_set+0x1bb/0x240 [i915] [<90f0e90f>] i915_request_await_dma_fence+0x1c7/0x400 [i915] [<56a48219>] i915_request_await_object+0x222/0x360 [i915] [] i915_gem_do_execbuffer+0x1bd0/0x2250 [i915] [<3c9d830f>] i915_gem_execbuffer2_ioctl+0x405/0xce0 [i915] [ ] drm_ioctl_kernel+0xb0/0xf0 [drm] [ ] drm_ioctl+0x305/0x3c0 [drm] [<8b0d8986>] __x64_sys_ioctl+0x71/0xb0 [<76c362a4>] do_syscall_64+0x33/0x80 [ ] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Signed-off-by: Matthew Brost Fixes: 531958f6f357 ("drm/i915/gt: Track timeline activeness in enter/exit") Cc: --- drivers/gpu/drm/i915/gt/intel_timeline.c | 9 + 1 file changed, 9 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c index c4a126c8caef..1257f4f11e66 100644 --- a/drivers/gpu/drm/i915/gt/intel_timeline.c +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c @@ -127,6 +127,15 @@ static void intel_timeline_fini(struct rcu_head *rcu) i915_vma_put(timeline->hwsp_ggtt); i915_active_fini(&timeline->active); + + /* +* A small race exists between intel_gt_retire_requests_timeout and +* intel_timeline_exit which could result in the syncmap not getting +* free'd. Rather than work to hard to seal this race, simply cleanup +* the syncmap on fini. What is the race? I'm going round in circles just trying to work out how intel_gt_retire_requests_timeout is supposed to get to intel_timeline_exit in the first place. Also, free'd -> freed. John. +*/ + i915_syncmap_free(&timeline->sync); + kfree(timeline); }
Re: [Intel-gfx] [PATCH] drm/i915: Disable bonding on gen12+ platforms
On 7/28/2021 12:21, Matthew Brost wrote: Disable bonding on gen12+ platforms aside from ones already supported by the i915 - TGL, RKL, and ADL-S. Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 05c3ee191710..9c3672bac0e2 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -446,6 +446,13 @@ set_proto_ctx_engines_bond(struct i915_user_extension __user *base, void *data) u16 idx, num_bonds; int err, n; + if (GRAPHICS_VER(i915) >= 12 && !IS_TIGERLAKE(i915) && + !IS_ROCKETLAKE(i915) && !IS_ALDERLAKE_S(i915)) { + drm_dbg(&i915->drm, + "Bonding on gen12+ aside from TGL, RKL, and ADL_S not allowed\n"); I would have said not supported rather than not allowed. Either way: Reviewed-by: John Harrison + return -ENODEV; + } + if (get_user(idx, &ext->virtual_index)) return -EFAULT;
Re: [Intel-gfx] [PATCH 2/4] drm/i915/guc: put all guc objects in lmem when available
On 8/2/2021 22:11, Matthew Brost wrote: From: Daniele Ceraolo Spurio The firmware binary has to be loaded from lmem and the recommendation is to put all other objects in there as well. Note that we don't fall back to system memory if the allocation in lmem fails because all objects are allocated during driver load and if we have issues with lmem at that point something is seriously wrong with the system, so no point in trying to handle it. Cc: Matthew Auld Cc: Abdiel Janulgue Cc: Michal Wajdeczko Cc: Vinay Belgaumkar Cc: Radoslaw Szwichtenberg Signed-off-by: Daniele Ceraolo Spurio Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gem/i915_gem_lmem.c | 26 drivers/gpu/drm/i915/gem/i915_gem_lmem.h | 4 ++ drivers/gpu/drm/i915/gt/uc/intel_guc.c| 9 ++- drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c | 11 +++- drivers/gpu/drm/i915/gt/uc/intel_huc.c| 14 - drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 75 +-- 6 files changed, 127 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c index eb345305dc52..034226c5d4d0 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c @@ -103,6 +103,32 @@ __i915_gem_object_create_lmem_with_ps(struct drm_i915_private *i915, size, page_size, flags); } +struct drm_i915_gem_object * +i915_gem_object_create_lmem_from_data(struct drm_i915_private *i915, + const void *data, size_t size) +{ + struct drm_i915_gem_object *obj; + void *map; + + obj = i915_gem_object_create_lmem(i915, + round_up(size, PAGE_SIZE), + I915_BO_ALLOC_CONTIGUOUS); + if (IS_ERR(obj)) + return obj; + + map = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WC); + if (IS_ERR(map)) { + i915_gem_object_put(obj); + return map; + } + + memcpy(map, data, size); + + i915_gem_object_unpin_map(obj); + + return obj; +} + struct drm_i915_gem_object * i915_gem_object_create_lmem(struct drm_i915_private *i915, resource_size_t size, diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.h b/drivers/gpu/drm/i915/gem/i915_gem_lmem.h index 4ee81fc66302..1b88ea13435c 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.h @@ -23,6 +23,10 @@ bool i915_gem_object_is_lmem(struct drm_i915_gem_object *obj); bool __i915_gem_object_is_lmem(struct drm_i915_gem_object *obj); +struct drm_i915_gem_object * +i915_gem_object_create_lmem_from_data(struct drm_i915_private *i915, + const void *data, size_t size); + struct drm_i915_gem_object * __i915_gem_object_create_lmem_with_ps(struct drm_i915_private *i915, resource_size_t size, diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c index 979128e28372..55160d3e401a 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c @@ -3,6 +3,7 @@ * Copyright © 2014-2019 Intel Corporation */ +#include "gem/i915_gem_lmem.h" #include "gt/intel_gt.h" #include "gt/intel_gt_irq.h" #include "gt/intel_gt_pm_irq.h" @@ -630,7 +631,13 @@ struct i915_vma *intel_guc_allocate_vma(struct intel_guc *guc, u32 size) u64 flags; int ret; - obj = i915_gem_object_create_shmem(gt->i915, size); + if (HAS_LMEM(gt->i915)) + obj = i915_gem_object_create_lmem(gt->i915, size, + I915_BO_ALLOC_CPU_CLEAR | + I915_BO_ALLOC_CONTIGUOUS); + else + obj = i915_gem_object_create_shmem(gt->i915, size); + if (IS_ERR(obj)) return ERR_CAST(obj); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c index 76fe766ad1bc..962be0c12208 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c @@ -41,7 +41,7 @@ static void guc_prepare_xfer(struct intel_uncore *uncore) } /* Copy RSA signature from the fw image to HW for verification */ -static void guc_xfer_rsa(struct intel_uc_fw *guc_fw, +static int guc_xfer_rsa(struct intel_uc_fw *guc_fw, struct intel_uncore *uncore) { u32 rsa[UOS_RSA_SCRATCH_COUNT]; @@ -49,10 +49,13 @@ static void guc_xfer_rsa(struct intel_uc_fw *guc_fw, int i; copied = intel_uc_fw_copy_rsa(guc_fw, rsa, sizeof(rsa)); - GEM_BUG_ON(copied < sizeof(rsa)); + if (copied < sizeof(rsa)) + return -ENOMEM; for (i = 0; i < UOS_RSA_SCRATCH_COUNT; i++) intel_uncore_write(u
Re: [Intel-gfx] [PATCH 3/4] drm/i915/guc: Add DG1 GuC / HuC firmware defs
On 8/2/2021 22:11, Matthew Brost wrote: Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c index f8cb00ffb506..a685d563df72 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c @@ -51,6 +51,7 @@ void intel_uc_fw_change_status(struct intel_uc_fw *uc_fw, #define INTEL_UC_FIRMWARE_DEFS(fw_def, guc_def, huc_def) \ fw_def(ALDERLAKE_P, 0, guc_def(adlp, 62, 0, 3), huc_def(tgl, 7, 9, 3)) \ fw_def(ALDERLAKE_S, 0, guc_def(tgl, 62, 0, 0), huc_def(tgl, 7, 9, 3)) \ + fw_def(DG1, 0, guc_def(dg1, 62, 0, 0), huc_def(dg1, 7, 9, 3)) \ fw_def(ROCKETLAKE, 0, guc_def(tgl, 62, 0, 0), huc_def(tgl, 7, 9, 3)) \ fw_def(TIGERLAKE, 0, guc_def(tgl, 62, 0, 0), huc_def(tgl, 7, 9, 3)) \ fw_def(JASPERLAKE, 0, guc_def(ehl, 62, 0, 0), huc_def(ehl, 9, 0, 0)) \ Reviewed-by: John Harrison
Re: [Intel-gfx] [PATCH 4/4] drm/i915/guc: Enable GuC submission by default on DG1
On 8/2/2021 22:11, Matthew Brost wrote: Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/uc/intel_uc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c index da57d18d9f6b..fc2fc8d111d8 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c @@ -35,7 +35,7 @@ static void uc_expand_default_options(struct intel_uc *uc) } /* Intermediate platforms are HuC authentication only */ - if (IS_DG1(i915) || IS_ALDERLAKE_S(i915)) { + if (IS_ALDERLAKE_S(i915)) { i915->params.enable_guc = ENABLE_GUC_LOAD_HUC; return; } Reviewed-by: John Harrison
Re: [Intel-gfx] [PATCH] drm/i915: Fix syncmap memory leak
On 8/6/2021 11:29, Matthew Brost wrote: On Fri, Aug 06, 2021 at 11:23:06AM -0700, John Harrison wrote: On 7/30/2021 12:53, Matthew Brost wrote: A small race exists between intel_gt_retire_requests_timeout and intel_timeline_exit which could result in the syncmap not getting free'd. Rather than work to hard to seal this race, simply cleanup the free'd -> freed Sure. syncmap on fini. unreferenced object 0x88813bc53b18 (size 96): comm "gem_close_race", pid 5410, jiffies 4294917818 (age 1105.600s) hex dump (first 32 bytes): 01 00 00 00 00 00 00 00 00 00 00 00 0a 00 00 00 00 00 00 00 00 00 00 00 6b 6b 6b 6b 06 00 00 00 backtrace: [<120b863a>] __sync_alloc_leaf+0x1e/0x40 [i915] [<042f6959>] __sync_set+0x1bb/0x240 [i915] [<90f0e90f>] i915_request_await_dma_fence+0x1c7/0x400 [i915] [<56a48219>] i915_request_await_object+0x222/0x360 [i915] [<aaac4ee3>] i915_gem_do_execbuffer+0x1bd0/0x2250 [i915] [<3c9d830f>] i915_gem_execbuffer2_ioctl+0x405/0xce0 [i915] [<fd7a8e68>] drm_ioctl_kernel+0xb0/0xf0 [drm] [<e721ee87>] drm_ioctl+0x305/0x3c0 [drm] [<8b0d8986>] __x64_sys_ioctl+0x71/0xb0 [<76c362a4>] do_syscall_64+0x33/0x80 [<eb7a4831>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Signed-off-by: Matthew Brost Fixes: 531958f6f357 ("drm/i915/gt: Track timeline activeness in enter/exit") Cc: --- drivers/gpu/drm/i915/gt/intel_timeline.c | 9 + 1 file changed, 9 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c index c4a126c8caef..1257f4f11e66 100644 --- a/drivers/gpu/drm/i915/gt/intel_timeline.c +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c @@ -127,6 +127,15 @@ static void intel_timeline_fini(struct rcu_head *rcu) i915_vma_put(timeline->hwsp_ggtt); i915_active_fini(&timeline->active); + + /* +* A small race exists between intel_gt_retire_requests_timeout and +* intel_timeline_exit which could result in the syncmap not getting +* free'd. Rather than work to hard to seal this race, simply cleanup +* the syncmap on fini. What is the race? I'm going round in circles just trying to work out how intel_gt_retire_requests_timeout is supposed to get to intel_timeline_exit in the first place. intel_gt_retire_requests_timeout increments tl->active_count, active_count == 2 intel_timeline_exit is called, returns on atomic_add_unless, active_count == 1 intel_gt_retire_requests_timeout decrements tl->active_count, active_count == 0 i915_syncmap_free is never called, memory leak Matt Okay. Think I follow it now. Seems like the syncmap free should have been in timeline_fini instead of timeline_exit in the first place? Reviewed-by: John Harrison Also, free'd -> freed. John. +*/ + i915_syncmap_free(&timeline->sync); + kfree(timeline); }
Re: [Intel-gfx] [PATCH 1/1] drm/i915: Check if engine has heartbeat when closing a context
On 8/6/2021 12:46, Daniel Vetter wrote: Seen this fly by and figured I dropped a few thoughts in here. At the likely cost of looking a bit out of whack :-) On Fri, Aug 6, 2021 at 8:01 PM John Harrison wrote: On 8/2/2021 02:40, Tvrtko Ursulin wrote: On 30/07/2021 19:13, John Harrison wrote: On 7/30/2021 02:49, Tvrtko Ursulin wrote: On 30/07/2021 01:13, John Harrison wrote: On 7/28/2021 17:34, Matthew Brost wrote: If an engine associated with a context does not have a heartbeat, ban it immediately. This is needed for GuC submission as a idle pulse doesn't kick the context off the hardware where it then can check for a heartbeat and ban the context. Pulse, that is a request with I915_PRIORITY_BARRIER, does not preempt a running normal priority context? Why does it matter then whether or not heartbeats are enabled - when heartbeat just ends up sending the same engine pulse (eventually, with raising priority)? The point is that the pulse is pointless. See the rest of my comments below, specifically "the context will get resubmitted to the hardware after the pulse completes". To re-iterate... Yes, it preempts the context. Yes, it does so whether heartbeats are enabled or not. But so what? Who cares? You have preempted a context. It is no longer running on the hardware. BUT IT IS STILL A VALID CONTEXT. It is valid yes, and it even may be the current ABI so another question is whether it is okay to change that. The backend scheduler will just resubmit it to the hardware as soon as the pulse completes. The only reason this works at all is because of the horrid hack in the execlist scheduler's back end implementation (in __execlists_schedule_in): if (unlikely(intel_context_is_closed(ce) && !intel_engine_has_heartbeat(engine))) intel_context_set_banned(ce); Right, is the above code then needed with this patch - when ban is immediately applied on the higher level? The actual back end scheduler is saying "Is this a zombie context? Is the heartbeat disabled? Then ban it". No other scheduler backend is going to have knowledge of zombie context status or of the heartbeat status. Nor are they going to call back into the higher levels of the i915 driver to trigger a ban operation. Certainly a hardware implemented scheduler is not going to be looking at private i915 driver information to decide whether to submit a context or whether to tell the OS to kill it off instead. For persistence to work with a hardware scheduler (or a non-Intel specific scheduler such as the DRM one), the handling of zombie contexts, banning, etc. *must* be done entirely in the front end. It cannot rely on any backend hacks. That means you can't rely on any fancy behaviour of pulses. If you want to ban a context then you must explicitly ban that context. If you want to ban it at some later point then you need to track it at the top level as a zombie and then explicitly ban that zombie at whatever later point. I am still trying to understand it all. If I go by the commit message: """ This is needed for GuC submission as a idle pulse doesn't kick the context off the hardware where it then can check for a heartbeat and ban the context. """ That did not explain things for me. Sentence does not appear to make sense. Now, it seems "kick off the hardware" is meant as revoke and not just preempt. Which is fine, perhaps just needs to be written more explicitly. But the part of checking for heartbeat after idle pulse does not compute for me. It is the heartbeat which emits idle pulses, not idle pulse emitting heartbeats. I am in agreement that the commit message is confusing and does not explain either the problem or the solution. But anyway, I can buy the handling at the front end story completely. It makes sense. We just need to agree that a) it is okay to change the ABI and b) remove the backend check from execlists if it is not needed any longer. And if ABI change is okay then commit message needs to talk about it loudly and clearly. I don't think we have a choice. The current ABI is not and cannot ever be compatible with any scheduler external to i915. It cannot be implemented with a hardware scheduler such as the GuC and it cannot be implemented with an external software scheduler such as the DRM one. So generally on linux we implement helper libraries, which means massive flexibility everywhere. https://blog.ffwll.ch/2016/12/midlayers-once-more-with-feeling.html So it shouldn't be an insurmountable problem to make this happen even with drm/scheduler, we can patch it up. Whether that's justified is another question. Helper libraries won't work with a hardware scheduler. My view is that any implementation involving knowledge of the heartbeat is fundamentally broken. According to Daniel Vetter, the DRM ABI on this subject is that an actively executing cont
Re: [Intel-gfx] [PATCH 0/1] Fix gem_ctx_persistence failures with GuC submission
On 8/9/2021 23:38, Daniel Vetter wrote: On Wed, Jul 28, 2021 at 05:33:59PM -0700, Matthew Brost wrote: Should fix below failures with GuC submission for the following tests: gem_exec_balancer --r noheartbeat gem_ctx_persistence --r heartbeat-close Not going to fix: gem_ctx_persistence --r heartbeat-many gem_ctx_persistence --r heartbeat-stop After looking at that big thread and being very confused: Are we fixing an actual use-case here, or is this another case of blindly following igts tests just because they exist? My understanding is that this is established behaviour and therefore must be maintained because the UAPI (whether documented or not) is inviolate. Therefore IGTs have been written to validate this past behaviour and now we must conform to the IGTs in order to keep the existing behaviour unchanged. Whether anybody actually makes use of this behaviour or not is another matter entirely. I am certainly not aware of any vital use case. Others might have more recollection. I do know that we tell the UMD teams to explicitly disable persistence on every context they create. I'm leaning towards that we should stall on this, and first document what exactly is the actual intention behind all this, and then fix up the tests I'm not sure there ever was an 'intention'. The rumour I heard way back when was that persistence was a bug on earlier platforms (or possibly we didn't have hardware support for doing engine resets?). But once the bug was realised (or the hardware support was added), it was too late to change the default behaviour because existing kernel behaviour must never change on pain of painful things. Thus the persistence flag was added so that people could opt out of the broken, leaky behaviour and have their contexts clean up properly. Feel free to document what you believe should be the behaviour from a software architect point of view. Any documentation I produce is basically going to be created by reverse engineering the existing code. That is the only 'spec' that I am aware of and as I keep saying, I personally think it is a totally broken concept that should just be removed. to match (if needed). And only then fix up GuC to match whatever we actually want to do. I also still maintain there is no 'fix up the GuC'. This is not behaviour we should be adding to a hardware scheduler. It is behaviour that should be implemented at the front end not the back end. If we absolutely need to do this then we need to do it solely at the context management level not at the back end submission level. And the solution should work by default on any submission back end. John. -Daniel As the above tests change the heartbeat value to 0 (off) after the context is closed and we have no way to detect that with GuC submission unless we keep a list of closed but running contexts which seems like overkill for a non-real world use case. We likely should just skip these tests with GuC submission. Signed-off-by: Matthew Brost Matthew Brost (1): drm/i915: Check if engine has heartbeat when closing a context drivers/gpu/drm/i915/gem/i915_gem_context.c | 5 +++-- drivers/gpu/drm/i915/gt/intel_context_types.h | 2 ++ drivers/gpu/drm/i915/gt/intel_engine.h| 21 ++- .../drm/i915/gt/intel_execlists_submission.c | 14 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 6 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.h | 2 -- 6 files changed, 26 insertions(+), 24 deletions(-) -- 2.28.0
Re: [Intel-gfx] [PATCH 1/1] drm/i915: Check if engine has heartbeat when closing a context
On 8/9/2021 23:36, Daniel Vetter wrote: On Mon, Aug 09, 2021 at 04:12:52PM -0700, John Harrison wrote: On 8/6/2021 12:46, Daniel Vetter wrote: Seen this fly by and figured I dropped a few thoughts in here. At the likely cost of looking a bit out of whack :-) On Fri, Aug 6, 2021 at 8:01 PM John Harrison wrote: On 8/2/2021 02:40, Tvrtko Ursulin wrote: On 30/07/2021 19:13, John Harrison wrote: On 7/30/2021 02:49, Tvrtko Ursulin wrote: On 30/07/2021 01:13, John Harrison wrote: On 7/28/2021 17:34, Matthew Brost wrote: If an engine associated with a context does not have a heartbeat, ban it immediately. This is needed for GuC submission as a idle pulse doesn't kick the context off the hardware where it then can check for a heartbeat and ban the context. Pulse, that is a request with I915_PRIORITY_BARRIER, does not preempt a running normal priority context? Why does it matter then whether or not heartbeats are enabled - when heartbeat just ends up sending the same engine pulse (eventually, with raising priority)? The point is that the pulse is pointless. See the rest of my comments below, specifically "the context will get resubmitted to the hardware after the pulse completes". To re-iterate... Yes, it preempts the context. Yes, it does so whether heartbeats are enabled or not. But so what? Who cares? You have preempted a context. It is no longer running on the hardware. BUT IT IS STILL A VALID CONTEXT. It is valid yes, and it even may be the current ABI so another question is whether it is okay to change that. The backend scheduler will just resubmit it to the hardware as soon as the pulse completes. The only reason this works at all is because of the horrid hack in the execlist scheduler's back end implementation (in __execlists_schedule_in): if (unlikely(intel_context_is_closed(ce) && !intel_engine_has_heartbeat(engine))) intel_context_set_banned(ce); Right, is the above code then needed with this patch - when ban is immediately applied on the higher level? The actual back end scheduler is saying "Is this a zombie context? Is the heartbeat disabled? Then ban it". No other scheduler backend is going to have knowledge of zombie context status or of the heartbeat status. Nor are they going to call back into the higher levels of the i915 driver to trigger a ban operation. Certainly a hardware implemented scheduler is not going to be looking at private i915 driver information to decide whether to submit a context or whether to tell the OS to kill it off instead. For persistence to work with a hardware scheduler (or a non-Intel specific scheduler such as the DRM one), the handling of zombie contexts, banning, etc. *must* be done entirely in the front end. It cannot rely on any backend hacks. That means you can't rely on any fancy behaviour of pulses. If you want to ban a context then you must explicitly ban that context. If you want to ban it at some later point then you need to track it at the top level as a zombie and then explicitly ban that zombie at whatever later point. I am still trying to understand it all. If I go by the commit message: """ This is needed for GuC submission as a idle pulse doesn't kick the context off the hardware where it then can check for a heartbeat and ban the context. """ That did not explain things for me. Sentence does not appear to make sense. Now, it seems "kick off the hardware" is meant as revoke and not just preempt. Which is fine, perhaps just needs to be written more explicitly. But the part of checking for heartbeat after idle pulse does not compute for me. It is the heartbeat which emits idle pulses, not idle pulse emitting heartbeats. I am in agreement that the commit message is confusing and does not explain either the problem or the solution. But anyway, I can buy the handling at the front end story completely. It makes sense. We just need to agree that a) it is okay to change the ABI and b) remove the backend check from execlists if it is not needed any longer. And if ABI change is okay then commit message needs to talk about it loudly and clearly. I don't think we have a choice. The current ABI is not and cannot ever be compatible with any scheduler external to i915. It cannot be implemented with a hardware scheduler such as the GuC and it cannot be implemented with an external software scheduler such as the DRM one. So generally on linux we implement helper libraries, which means massive flexibility everywhere. https://blog.ffwll.ch/2016/12/midlayers-once-more-with-feeling.html So it shouldn't be an insurmountable problem to make this happen even with drm/scheduler, we can patch it up. Whether that's justified is another question. Helper libraries won't work with a hardware scheduler. Hm I guess I misunderstood then what exactly the hold-up is. This entire discussi
Re: [PATCH 1/1] drm/i915/selftests: Increase timeout in i915_gem_contexts selftests
On 7/26/2021 20:17, Matthew Brost wrote: Like in the case of several other selftests, generating lots of requests in a loop takes a bit longer with GuC submission. Increase a timeout in i915_gem_contexts selftest to take this into account. Signed-off-by: Matthew Brost Reviewed-by: John Harrison --- drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c index 8eb5050f8cb3..4d2758718d21 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c @@ -94,7 +94,7 @@ static int live_nop_switch(void *arg) rq = i915_request_get(this); i915_request_add(this); } - if (i915_request_wait(rq, 0, HZ / 5) < 0) { + if (i915_request_wait(rq, 0, HZ) < 0) { pr_err("Failed to populated %d contexts\n", nctx); intel_gt_set_wedged(&i915->gt); i915_request_put(rq);
Re: [Intel-gfx] [PATCH 11/27] drm/i915/guc: Copy whole golden context, set engine state size of subset
On 8/25/2021 20:23, Matthew Brost wrote: When the GuC does a media reset, it copies a golden context state back into the corrupted context's state. The address of the golden context and the size of the engine state restore are passed in via the GuC ADS. The i915 had a bug where it passed in the whole size of the golden context, not the size of the engine state to restore resulting in a memory corruption. Also copy the entire golden context on init rather than just the engine state that is restored. Fixes: 481d458caede ("drm/i915/guc: Add golden context to GuC ADS") Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 28 +- 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c index 6926919bcac6..df2734bfe078 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c @@ -358,6 +358,11 @@ static int guc_prep_golden_context(struct intel_guc *guc, u8 engine_class, guc_class; struct guc_gt_system_info *info, local_info; + /* Skip execlist and PPGTT registers + HWSP */ + const u32 lr_hw_context_size = 80 * sizeof(u32); + const u32 skip_size = LRC_PPHWSP_SZ * PAGE_SIZE + + lr_hw_context_size; + /* * Reserve the memory for the golden contexts and point GuC at it but * leave it empty for now. The context data will be filled in later @@ -396,7 +401,18 @@ static int guc_prep_golden_context(struct intel_guc *guc, if (!blob) continue; - blob->ads.eng_state_size[guc_class] = real_size; + /* +* This interface is slightly confusing. We need to pass the +* base address of the golden context and the engine state size +* which is not the size of the whole golden context, it is a +* subset that the GuC uses when doing a watchdog reset. The +* engine state size must match the size of the golden context +* minus the first part of the golden context that the GuC does +* not retore during reset. Currently no real way to verify this +* other than reading the GuC spec / code and ensuring the +* 'skip_size' below matches the value used in the GuC code. +*/ + blob->ads.eng_state_size[guc_class] = real_size - skip_size; blob->ads.golden_context_lrca[guc_class] = addr_ggtt; addr_ggtt += alloc_size; } @@ -437,8 +453,8 @@ static void guc_init_golden_context(struct intel_guc *guc) u8 *ptr; /* Skip execlist and PPGTT registers + HWSP */ - const u32 lr_hw_context_size = 80 * sizeof(u32); - const u32 skip_size = LRC_PPHWSP_SZ * PAGE_SIZE + + __maybe_unused const u32 lr_hw_context_size = 80 * sizeof(u32); + __maybe_unused const u32 skip_size = LRC_PPHWSP_SZ * PAGE_SIZE + lr_hw_context_size; Not sure why the 'maybe unused'? The values are not only used in BUG_ONs or such that could vanish. More importantly, you now have two sets of definitions for these magic numbers. That seems like a very bad idea. They should be moved into a helper function rather than repeated. John. if (!intel_uc_uses_guc_submission(>->uc)) @@ -476,12 +492,12 @@ static void guc_init_golden_context(struct intel_guc *guc) continue; } - GEM_BUG_ON(blob->ads.eng_state_size[guc_class] != real_size); + GEM_BUG_ON(blob->ads.eng_state_size[guc_class] != + real_size - skip_size); GEM_BUG_ON(blob->ads.golden_context_lrca[guc_class] != addr_ggtt); addr_ggtt += alloc_size; - shmem_read(engine->default_state, skip_size, ptr + skip_size, - real_size - skip_size); + shmem_read(engine->default_state, 0, ptr, real_size); ptr += alloc_size; }
Re: [PATCH 08/47] drm/i915/guc: Add new GuC interface defines and structures
On 6/24/2021 00:04, Matthew Brost wrote: Add new GuC interface defines and structures while maintaining old ones in parallel. Cc: John Harrison Signed-off-by: Matthew Brost I think there was some difference of opinion over whether these additions should be squashed in to the specific patches that first use them. However, on the grounds that such is basically a patch-only style comment and doesn't change the final product plus, we need to get this stuff merged efficiently and not spend forever rebasing and refactoring... Reviewed-by: John Harrison --- .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h | 14 +++ drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 41 +++ 2 files changed, 55 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h index 2d6198e63ebe..57e18babdf4b 100644 --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h @@ -124,10 +124,24 @@ enum intel_guc_action { INTEL_GUC_ACTION_FORCE_LOG_BUFFER_FLUSH = 0x302, INTEL_GUC_ACTION_ENTER_S_STATE = 0x501, INTEL_GUC_ACTION_EXIT_S_STATE = 0x502, + INTEL_GUC_ACTION_GLOBAL_SCHED_POLICY_CHANGE = 0x506, + INTEL_GUC_ACTION_SCHED_CONTEXT = 0x1000, + INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET = 0x1001, + INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE = 0x1002, + INTEL_GUC_ACTION_SCHED_ENGINE_MODE_SET = 0x1003, + INTEL_GUC_ACTION_SCHED_ENGINE_MODE_DONE = 0x1004, + INTEL_GUC_ACTION_SET_CONTEXT_PRIORITY = 0x1005, + INTEL_GUC_ACTION_SET_CONTEXT_EXECUTION_QUANTUM = 0x1006, + INTEL_GUC_ACTION_SET_CONTEXT_PREEMPTION_TIMEOUT = 0x1007, + INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION = 0x1008, + INTEL_GUC_ACTION_ENGINE_FAILURE_NOTIFICATION = 0x1009, INTEL_GUC_ACTION_SLPC_REQUEST = 0x3003, INTEL_GUC_ACTION_AUTHENTICATE_HUC = 0x4000, + INTEL_GUC_ACTION_REGISTER_CONTEXT = 0x4502, + INTEL_GUC_ACTION_DEREGISTER_CONTEXT = 0x4503, INTEL_GUC_ACTION_REGISTER_COMMAND_TRANSPORT_BUFFER = 0x4505, INTEL_GUC_ACTION_DEREGISTER_COMMAND_TRANSPORT_BUFFER = 0x4506, + INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE = 0x4600, INTEL_GUC_ACTION_LIMIT }; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h index 617ec601648d..28245a217a39 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h @@ -17,6 +17,9 @@ #include "abi/guc_communication_ctb_abi.h" #include "abi/guc_messages_abi.h" +#define GUC_CONTEXT_DISABLE 0 +#define GUC_CONTEXT_ENABLE 1 + #define GUC_CLIENT_PRIORITY_KMD_HIGH 0 #define GUC_CLIENT_PRIORITY_HIGH 1 #define GUC_CLIENT_PRIORITY_KMD_NORMAL2 @@ -26,6 +29,9 @@ #define GUC_MAX_STAGE_DESCRIPTORS 1024 #define GUC_INVALID_STAGE_IDGUC_MAX_STAGE_DESCRIPTORS +#define GUC_MAX_LRC_DESCRIPTORS 65535 +#defineGUC_INVALID_LRC_ID GUC_MAX_LRC_DESCRIPTORS + #define GUC_RENDER_ENGINE 0 #define GUC_VIDEO_ENGINE 1 #define GUC_BLITTER_ENGINE2 @@ -237,6 +243,41 @@ struct guc_stage_desc { u64 desc_private; } __packed; +#define CONTEXT_REGISTRATION_FLAG_KMD BIT(0) + +#define CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US 100 +#define CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US 50 + +/* Preempt to idle on quantum expiry */ +#define CONTEXT_POLICY_FLAG_PREEMPT_TO_IDLEBIT(0) + +/* + * GuC Context registration descriptor. + * FIXME: This is only required to exist during context registration. + * The current 1:1 between guc_lrc_desc and LRCs for the lifetime of the LRC + * is not required. + */ +struct guc_lrc_desc { + u32 hw_context_desc; + u32 slpm_perf_mode_hint;/* SPLC v1 only */ + u32 slpm_freq_hint; + u32 engine_submit_mask; /* In logical space */ + u8 engine_class; + u8 reserved0[3]; + u32 priority; + u32 process_desc; + u32 wq_addr; + u32 wq_size; + u32 context_flags; /* CONTEXT_REGISTRATION_* */ + /* Time for one workload to execute. (in micro seconds) */ + u32 execution_quantum; + /* Time to wait for a preemption request to complete before issuing a +* reset. (in micro seconds). */ + u32 preemption_timeout; + u32 policy_flags; /* CONTEXT_POLICY_* */ + u32 reserved1[19]; +} __packed; + #define GUC_POWER_UNSPECIFIED 0 #define GUC_POWER_D0 1 #define GUC_POWER_D1 2
Re: [PATCH 10/47] drm/i915/guc: Add lrc descriptor context lookup array
On 6/25/2021 10:26, Matthew Brost wrote: On Fri, Jun 25, 2021 at 03:17:51PM +0200, Michal Wajdeczko wrote: On 24.06.2021 09:04, Matthew Brost wrote: Add lrc descriptor context lookup array which can resolve the intel_context from the lrc descriptor index. In addition to lookup, it can determine in the lrc descriptor context is currently registered with the GuC by checking if an entry for a descriptor index is present. Future patches in the series will make use of this array. s/lrc/LRC I guess? lrc and LRC are used interchangeably throughout the current code base. It is an abbreviation so LRC is technically the correct version for a comment. The fact that other existing comments are incorrect is not a valid reason to perpetuate a mistake :). Might as well fix it if you are going to repost the patch anyway for any other reason, but I would not call it a blocking issue. Also, 'can determine in the' should be 'can determine if the'. Again, not exactly a blocking issue but should be fixed. Cc: John Harrison Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/uc/intel_guc.h| 5 +++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 32 +-- 2 files changed, 35 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index b28fa54214f2..2313d9fc087b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -6,6 +6,8 @@ #ifndef _INTEL_GUC_H_ #define _INTEL_GUC_H_ +#include "linux/xarray.h" #include Yep. + #include "intel_uncore.h" #include "intel_guc_fw.h" #include "intel_guc_fwif.h" @@ -46,6 +48,9 @@ struct intel_guc { struct i915_vma *lrc_desc_pool; void *lrc_desc_pool_vaddr; + /* guc_id to intel_context lookup */ + struct xarray context_lookup; + /* Control params for fw initialization */ u32 params[GUC_CTL_MAX_DWORDS]; btw, IIRC there was idea to move most struct definitions to intel_guc_types.h, is this still a plan ? I don't ever recall discussing this but we can certainly do this. For what it is worth we do introduce intel_guc_submission_types.h a bit later. I'll make a note about intel_guc_types.h though. Matt Yeah, my only recollection was about the submission types header. Are there sufficient non-submission fields in the GuC structure to warrant a general GuC types header? With the commit message tweaks and #include fix mentioned above, it looks good to me. Reviewed-by: John Harrison diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index a366890fb840..23a94a896a0b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -65,8 +65,6 @@ static inline struct i915_priolist *to_priolist(struct rb_node *rb) return rb_entry(rb, struct i915_priolist, node); } -/* Future patches will use this function */ -__attribute__ ((unused)) static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index) { struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr; @@ -76,6 +74,15 @@ static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index) return &base[index]; } +static inline struct intel_context *__get_context(struct intel_guc *guc, u32 id) +{ + struct intel_context *ce = xa_load(&guc->context_lookup, id); + + GEM_BUG_ON(id >= GUC_MAX_LRC_DESCRIPTORS); + + return ce; +} + static int guc_lrc_desc_pool_create(struct intel_guc *guc) { u32 size; @@ -96,6 +103,25 @@ static void guc_lrc_desc_pool_destroy(struct intel_guc *guc) i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP); } +static inline void reset_lrc_desc(struct intel_guc *guc, u32 id) +{ + struct guc_lrc_desc *desc = __get_lrc_desc(guc, id); + + memset(desc, 0, sizeof(*desc)); + xa_erase_irq(&guc->context_lookup, id); +} + +static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id) +{ + return __get_context(guc, id); +} + +static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id, + struct intel_context *ce) +{ + xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC); +} + static void guc_add_request(struct intel_guc *guc, struct i915_request *rq) { /* Leaving stub as this function will be used in future patches */ @@ -400,6 +426,8 @@ int intel_guc_submission_init(struct intel_guc *guc) */ GEM_BUG_ON(!guc->lrc_desc_pool); + xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ); + return 0; }
Re: [PATCH 11/47] drm/i915/guc: Implement GuC submission tasklet
On 6/24/2021 00:04, Matthew Brost wrote: Implement GuC submission tasklet for new interface. The new GuC interface uses H2G to submit contexts to the GuC. Since H2G use a single channel, a single tasklet submits is used for the submission path. Re-word? 'a single tasklet submits is used...' doesn't make sense. Also the per engine interrupt handler has been updated to disable the rescheduling of the physical engine tasklet, when using GuC scheduling, as the physical engine tasklet is no longer used. In this patch the field, guc_id, has been added to intel_context and is not assigned. Patches later in the series will assign this value. Cc: John Harrison Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/intel_context_types.h | 9 + drivers/gpu/drm/i915/gt/uc/intel_guc.h| 4 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 231 +- 3 files changed, 127 insertions(+), 117 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index ed8c447a7346..bb6fef7eae52 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -136,6 +136,15 @@ struct intel_context { struct intel_sseu sseu; u8 wa_bb_page; /* if set, page num reserved for context workarounds */ + + /* GuC scheduling state that does not require a lock. */ Maybe 'GuC scheduling state flags that do not require a lock'? Otherwise it just looks like a counter or something. + atomic_t guc_sched_state_no_lock; + + /* +* GuC lrc descriptor ID - Not assigned in this patch but future patches Not a blocker but s/lrc/LRC/ would keep Michal happy ;). Although presumably this comment is at least being amended by later patches in the series. +* in the series will. +*/ + u16 guc_id; }; #endif /* __INTEL_CONTEXT_TYPES__ */ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 2313d9fc087b..9ba8219475b2 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -30,6 +30,10 @@ struct intel_guc { struct intel_guc_log log; struct intel_guc_ct ct; + /* Global engine used to submit requests to GuC */ + struct i915_sched_engine *sched_engine; + struct i915_request *stalled_request; + /* intel_guc_recv interrupt related state */ spinlock_t irq_lock; unsigned int msg_enabled_mask; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 23a94a896a0b..ee933efbf0ff 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -60,6 +60,31 @@ #define GUC_REQUEST_SIZE 64 /* bytes */ +/* + * Below is a set of functions which control the GuC scheduling state which do + * not require a lock as all state transitions are mutually exclusive. i.e. It + * is not possible for the context pinning code and submission, for the same + * context, to be executing simultaneously. We still need an atomic as it is + * possible for some of the bits to changing at the same time though. + */ +#define SCHED_STATE_NO_LOCK_ENABLEDBIT(0) +static inline bool context_enabled(struct intel_context *ce) +{ + return (atomic_read(&ce->guc_sched_state_no_lock) & + SCHED_STATE_NO_LOCK_ENABLED); +} + +static inline void set_context_enabled(struct intel_context *ce) +{ + atomic_or(SCHED_STATE_NO_LOCK_ENABLED, &ce->guc_sched_state_no_lock); +} + +static inline void clr_context_enabled(struct intel_context *ce) +{ + atomic_and((u32)~SCHED_STATE_NO_LOCK_ENABLED, + &ce->guc_sched_state_no_lock); +} + static inline struct i915_priolist *to_priolist(struct rb_node *rb) { return rb_entry(rb, struct i915_priolist, node); @@ -122,37 +147,29 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id, xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC); } -static void guc_add_request(struct intel_guc *guc, struct i915_request *rq) +static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) { - /* Leaving stub as this function will be used in future patches */ -} + int err; + struct intel_context *ce = rq->context; + u32 action[3]; + int len = 0; + bool enabled = context_enabled(ce); -/* - * When we're doing submissions using regular execlists backend, writing to - * ELSP from CPU side is enough to make sure that writes to ringbuffer pages - * pinned in mappable aperture portion of GGTT are visible to command streamer. - * Writes done by GuC on our behalf are not guaranteeing such ordering, - * therefore, to ensure the flush, we're issuing a POSTING READ. -
Re: [PATCH 12/47] drm/i915/guc: Add bypass tasklet submission path to GuC
On 6/24/2021 00:04, Matthew Brost wrote: Add bypass tasklet submission path to GuC. The tasklet is only used if H2G channel has backpresure. Signed-off-by: Matthew Brost Reviewed-by: John Harrison --- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 37 +++ 1 file changed, 29 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index ee933efbf0ff..38aff83ee9fa 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -172,6 +172,12 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) return err; } +static inline void guc_set_lrc_tail(struct i915_request *rq) +{ + rq->context->lrc_reg_state[CTX_RING_TAIL] = + intel_ring_set_tail(rq->ring, rq->tail); +} + static inline int rq_prio(const struct i915_request *rq) { return rq->sched.attr.priority; @@ -215,8 +221,7 @@ static int guc_dequeue_one_context(struct intel_guc *guc) } done: if (submit) { - last->context->lrc_reg_state[CTX_RING_TAIL] = - intel_ring_set_tail(last->ring, last->tail); + guc_set_lrc_tail(last); resubmit: /* * We only check for -EBUSY here even though it is possible for @@ -496,20 +501,36 @@ static inline void queue_request(struct i915_sched_engine *sched_engine, set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); } +static int guc_bypass_tasklet_submit(struct intel_guc *guc, +struct i915_request *rq) +{ + int ret; + + __i915_request_submit(rq); + + trace_i915_request_in(rq, 0); + + guc_set_lrc_tail(rq); + ret = guc_add_request(guc, rq); + if (ret == -EBUSY) + guc->stalled_request = rq; + + return ret; +} + static void guc_submit_request(struct i915_request *rq) { struct i915_sched_engine *sched_engine = rq->engine->sched_engine; + struct intel_guc *guc = &rq->engine->gt->uc.guc; unsigned long flags; /* Will be called from irq-context when using foreign fences. */ spin_lock_irqsave(&sched_engine->lock, flags); - queue_request(sched_engine, rq, rq_prio(rq)); - - GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine)); - GEM_BUG_ON(list_empty(&rq->sched.link)); - - tasklet_hi_schedule(&sched_engine->tasklet); + if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine)) + queue_request(sched_engine, rq, rq_prio(rq)); + else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY) + tasklet_hi_schedule(&sched_engine->tasklet); spin_unlock_irqrestore(&sched_engine->lock, flags); }
Re: [PATCH 47/47] drm/i915/guc: Unblock GuC submission on Gen11+
On 6/30/2021 01:22, Martin Peres wrote: On 24/06/2021 10:05, Matthew Brost wrote: From: Daniele Ceraolo Spurio Unblock GuC submission on Gen11+ platforms. Signed-off-by: Michal Wajdeczko Signed-off-by: Daniele Ceraolo Spurio Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/uc/intel_guc.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 8 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h | 3 +-- drivers/gpu/drm/i915/gt/uc/intel_uc.c | 14 +- 4 files changed, 19 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index fae01dc8e1b9..77981788204f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -54,6 +54,7 @@ struct intel_guc { struct ida guc_ids; struct list_head guc_id_list; + bool submission_supported; bool submission_selected; struct i915_vma *ads_vma; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index a427336ce916..405339202280 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -2042,6 +2042,13 @@ void intel_guc_submission_disable(struct intel_guc *guc) /* Note: By the time we're here, GuC may have already been reset */ } +static bool __guc_submission_supported(struct intel_guc *guc) +{ + /* GuC submission is unavailable for pre-Gen11 */ + return intel_guc_is_supported(guc) && + INTEL_GEN(guc_to_gt(guc)->i915) >= 11; +} + static bool __guc_submission_selected(struct intel_guc *guc) { struct drm_i915_private *i915 = guc_to_gt(guc)->i915; @@ -2054,6 +2061,7 @@ static bool __guc_submission_selected(struct intel_guc *guc) void intel_guc_submission_init_early(struct intel_guc *guc) { + guc->submission_supported = __guc_submission_supported(guc); guc->submission_selected = __guc_submission_selected(guc); } diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h index a2a3fad72be1..be767eb6ff71 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h @@ -37,8 +37,7 @@ int intel_guc_wait_for_pending_msg(struct intel_guc *guc, static inline bool intel_guc_submission_is_supported(struct intel_guc *guc) { - /* XXX: GuC submission is unavailable for now */ - return false; + return guc->submission_supported; } static inline bool intel_guc_submission_is_wanted(struct intel_guc *guc) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c index 7a69c3c027e9..61be0aa81492 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c @@ -34,8 +34,15 @@ static void uc_expand_default_options(struct intel_uc *uc) return; } - /* Default: enable HuC authentication only */ - i915->params.enable_guc = ENABLE_GUC_LOAD_HUC; + /* Intermediate platforms are HuC authentication only */ + if (IS_DG1(i915) || IS_ALDERLAKE_S(i915)) { + drm_dbg(&i915->drm, "Disabling GuC only due to old platform\n"); This comment does not seem accurate, given that DG1 is barely out, and ADL is not out yet. How about: "Disabling GuC on untested platforms"? Just because something is not in the shops yet does not mean it is new. Technology is always obsolete by the time it goes on sale. And the issue is not a lack of testing, it is a question of whether we are allowed to change the default on something that has already started being used by customers or not (including pre-release beta customers). I.e. it is basically a political decision not an engineering decision. + i915->params.enable_guc = ENABLE_GUC_LOAD_HUC; + return; + } + + /* Default: enable HuC authentication and GuC submission */ + i915->params.enable_guc = ENABLE_GUC_LOAD_HUC | ENABLE_GUC_SUBMISSION; This seems to be in contradiction with the GuC submission plan which states: "Not enabled by default on any current platforms but can be enabled via modparam enable_guc". All current platforms have already been explicitly tested for above. This is setting the default on newer platforms - ADL-P and later. For which the official expectation is to have GuC enabled. When you rework the patch, could you please add a warning when the user force-enables the GuC Command Submission? There already is one. If you set the module parameter then the kernel is tainted. That means 'here be dragons' - you have done something officially not supported to your kernel so all bets are off, if it blows up it is your own problem. Something like: "WARNING: The user force-enabled the experimental GuC command submission backend using i915.enable_guc. Please disable it if experie
Re: [PATCH 4/7] drm/i915/guc: Add non blocking CTB send function
On 7/1/2021 10:15, Matthew Brost wrote: Add non blocking CTB send function, intel_guc_send_nb. GuC submission will send CTBs in the critical path and does not need to wait for these CTBs to complete before moving on, hence the need for this new function. The non-blocking CTB now must have a flow control mechanism to ensure the buffer isn't overrun. A lazy spin wait is used as we believe the flow control condition should be rare with a properly sized buffer. The function, intel_guc_send_nb, is exported in this patch but unused. Several patches later in the series make use of this function. v2: (Michal) - Use define for H2G room calculations - Move INTEL_GUC_SEND_NB define (Daniel Vetter) - Use msleep_interruptible rather than cond_resched v3: (Michal) - Move includes to following patch - s/INTEL_GUC_SEND_NB/INTEL_GUC_CT_SEND_NB/g Signed-off-by: John Harrison Signed-off-by: Matthew Brost --- .../gt/uc/abi/guc_communication_ctb_abi.h | 3 +- drivers/gpu/drm/i915/gt/uc/intel_guc.h| 11 ++- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 87 +-- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 4 +- 4 files changed, 91 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h index e933ca02d0eb..99e1fad5ca20 100644 --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h @@ -79,7 +79,8 @@ static_assert(sizeof(struct guc_ct_buffer_desc) == 64); * +---+---+--+ */ -#define GUC_CTB_MSG_MIN_LEN 1u +#define GUC_CTB_HDR_LEN1u +#define GUC_CTB_MSG_MIN_LENGUC_CTB_HDR_LEN #define GUC_CTB_MSG_MAX_LEN 256u #define GUC_CTB_MSG_0_FENCE (0x << 16) #define GUC_CTB_MSG_0_FORMAT (0xf << 12) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 4abc59f6f3cd..72e4653222e2 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -74,7 +74,14 @@ static inline struct intel_guc *log_to_guc(struct intel_guc_log *log) static inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len) { - return intel_guc_ct_send(&guc->ct, action, len, NULL, 0); + return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, 0); +} + +static +inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len) +{ + return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, +INTEL_GUC_CT_SEND_NB); } static inline int @@ -82,7 +89,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len, u32 *response_buf, u32 response_buf_size) { return intel_guc_ct_send(&guc->ct, action, len, -response_buf, response_buf_size); +response_buf, response_buf_size, 0); } static inline void intel_guc_to_host_event_handler(struct intel_guc *guc) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 43e03aa2dde8..fb825cc1d090 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -3,6 +3,8 @@ * Copyright © 2016-2019 Intel Corporation */ +#include + #include "i915_drv.h" #include "intel_guc_ct.h" #include "gt/intel_gt.h" @@ -373,7 +375,7 @@ static void write_barrier(struct intel_guc_ct *ct) static int ct_write(struct intel_guc_ct *ct, const u32 *action, u32 len /* in dwords */, - u32 fence) + u32 fence, u32 flags) { struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct guc_ct_buffer_desc *desc = ctb->desc; @@ -409,7 +411,7 @@ static int ct_write(struct intel_guc_ct *ct, used = tail - head; /* make sure there is a space including extra dw for the fence */ - if (unlikely(used + len + 1 >= size)) + if (unlikely(used + len + GUC_CTB_HDR_LEN >= size)) I thought the plan was to update the comment? Given that the '+1' is now 'HDR_LEN' it would be good to update the comment to say 'header' instead of 'fence'. return -ENOSPC; /* @@ -421,9 +423,13 @@ static int ct_write(struct intel_guc_ct *ct, FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) | FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence); - hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) | - FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION | -
Re: [PATCH 5/7] drm/i915/guc: Add stall timer to non blocking CTB send function
On 7/1/2021 10:15, Matthew Brost wrote: Implement a stall timer which fails H2G CTBs once a period of time with no forward progress is reached to prevent deadlock. v2: (Michal) - Improve error message in ct_deadlock() - Set broken when ct_deadlock() returns true - Return -EPIPE on ct_deadlock() v3: (Michal) - Add ms to stall timer comment (Matthew) - Move broken check to intel_guc_ct_send() Signed-off-by: John Harrison Signed-off-by: Daniele Ceraolo Spurio Signed-off-by: Matthew Brost Looks plausible to me. Reviewed-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 62 --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 4 ++ 2 files changed, 59 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index fb825cc1d090..a9cb7b608520 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -4,6 +4,9 @@ */ #include +#include +#include +#include #include "i915_drv.h" #include "intel_guc_ct.h" @@ -316,6 +319,7 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct) goto err_deregister; ct->enabled = true; + ct->stall_time = KTIME_MAX; return 0; @@ -388,9 +392,6 @@ static int ct_write(struct intel_guc_ct *ct, u32 *cmds = ctb->cmds; unsigned int i; - if (unlikely(ctb->broken)) - return -EPIPE; - if (unlikely(desc->status)) goto corrupted; @@ -506,6 +507,25 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status) return err; } +#define GUC_CTB_TIMEOUT_MS 1500 +static inline bool ct_deadlocked(struct intel_guc_ct *ct) +{ + long timeout = GUC_CTB_TIMEOUT_MS; + bool ret = ktime_ms_delta(ktime_get(), ct->stall_time) > timeout; + + if (unlikely(ret)) { + struct guc_ct_buffer_desc *send = ct->ctbs.send.desc; + struct guc_ct_buffer_desc *recv = ct->ctbs.send.desc; + + CT_ERROR(ct, "Communication stalled for %lld ms, desc status=%#x,%#x\n", +ktime_ms_delta(ktime_get(), ct->stall_time), +send->status, recv->status); + ct->ctbs.send.broken = true; + } + + return ret; +} + static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw) { struct guc_ct_buffer_desc *desc = ctb->desc; @@ -517,6 +537,26 @@ static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw) return space >= len_dw; } +static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw) +{ + struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; + + lockdep_assert_held(&ct->ctbs.send.lock); + + if (unlikely(!h2g_has_room(ctb, len_dw))) { + if (ct->stall_time == KTIME_MAX) + ct->stall_time = ktime_get(); + + if (unlikely(ct_deadlocked(ct))) + return -EPIPE; + else + return -EBUSY; + } + + ct->stall_time = KTIME_MAX; + return 0; +} + static int ct_send_nb(struct intel_guc_ct *ct, const u32 *action, u32 len, @@ -529,11 +569,9 @@ static int ct_send_nb(struct intel_guc_ct *ct, spin_lock_irqsave(&ctb->lock, spin_flags); - ret = h2g_has_room(ctb, len + GUC_CTB_HDR_LEN); - if (unlikely(!ret)) { - ret = -EBUSY; + ret = has_room_nb(ct, len + GUC_CTB_HDR_LEN); + if (unlikely(ret)) goto out; - } fence = ct_get_next_fence(ct); ret = ct_write(ct, action, len, fence, flags); @@ -576,8 +614,13 @@ static int ct_send(struct intel_guc_ct *ct, retry: spin_lock_irqsave(&ctb->lock, flags); if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) { + if (ct->stall_time == KTIME_MAX) + ct->stall_time = ktime_get(); spin_unlock_irqrestore(&ctb->lock, flags); + if (unlikely(ct_deadlocked(ct))) + return -EPIPE; + if (msleep_interruptible(sleep_period_ms)) return -EINTR; sleep_period_ms = sleep_period_ms << 1; @@ -585,6 +628,8 @@ static int ct_send(struct intel_guc_ct *ct, goto retry; } + ct->stall_time = KTIME_MAX; + fence = ct_get_next_fence(ct); request.fence = fence; request.status = 0; @@ -647,6 +692,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, return -ENODEV; } + if (unlikely(ct->ctbs.send.broken)) + return -EPIPE; + if (flags & INTEL_GUC_CT_SEND_NB) ret
Re: [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads
On 7/1/2021 10:15, Matthew Brost wrote: CTB writes are now in the path of command submission and should be optimized for performance. Rather than reading CTB descriptor values (e.g. head, tail) which could result in accesses across the PCIe bus, store shadow local copies and only read/write the descriptor values when absolutely necessary. Also store the current space in the each channel locally. v2: (Michel) - Add additional sanity checks for head / tail pointers - Use GUC_CTB_HDR_LEN rather than magic 1 Signed-off-by: John Harrison Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 88 +++ drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 6 ++ 2 files changed, 65 insertions(+), 29 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index a9cb7b608520..5b8b4ff609e2 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc) static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb) { ctb->broken = false; + ctb->tail = 0; + ctb->head = 0; + ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size); + guc_ct_buffer_desc_init(ctb->desc); } @@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct, { struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct guc_ct_buffer_desc *desc = ctb->desc; - u32 head = desc->head; - u32 tail = desc->tail; + u32 tail = ctb->tail; u32 size = ctb->size; - u32 used; u32 header; u32 hxg; u32 *cmds = ctb->cmds; @@ -395,25 +397,22 @@ static int ct_write(struct intel_guc_ct *ct, if (unlikely(desc->status)) goto corrupted; - if (unlikely((tail | head) >= size)) { + GEM_BUG_ON(tail > size); + +#ifdef CONFIG_DRM_I915_DEBUG_GUC + if (unlikely(tail != READ_ONCE(desc->tail))) { + CT_ERROR(ct, "Tail was modified %u != %u\n", +desc->tail, ctb->tail); + desc->status |= GUC_CTB_STATUS_MISMATCH; + goto corrupted; + } + if (unlikely((desc->tail | desc->head) >= size)) { CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n", -head, tail, size); +desc->head, desc->tail, size); desc->status |= GUC_CTB_STATUS_OVERFLOW; goto corrupted; } - - /* -* tail == head condition indicates empty. GuC FW does not support -* using up the entire buffer to get tail == head meaning full. -*/ - if (tail < head) - used = (size - head) + tail; - else - used = tail - head; - - /* make sure there is a space including extra dw for the fence */ - if (unlikely(used + len + GUC_CTB_HDR_LEN >= size)) - return -ENOSPC; +#endif /* * dw0: CT header (including fence) @@ -454,7 +453,9 @@ static int ct_write(struct intel_guc_ct *ct, write_barrier(ct); /* now update descriptor */ + ctb->tail = tail; WRITE_ONCE(desc->tail, tail); + ctb->space -= len + GUC_CTB_HDR_LEN; return 0; @@ -470,7 +471,7 @@ static int ct_write(struct intel_guc_ct *ct, * @req: pointer to pending request * @status: placeholder for status * - * For each sent request, Guc shall send bac CT response message. + * For each sent request, GuC shall send back CT response message. * Our message handler will update status of tracked request once * response message with given fence is received. Wait here and * check for valid response status value. @@ -526,24 +527,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct) return ret; } -static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw) +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw) { - struct guc_ct_buffer_desc *desc = ctb->desc; - u32 head = READ_ONCE(desc->head); + struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; + u32 head; u32 space; - space = CIRC_SPACE(desc->tail, head, ctb->size); + if (ctb->space >= len_dw) + return true; + + head = READ_ONCE(ctb->desc->head); + if (unlikely(head > ctb->size)) { + CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n", +ctb->desc->head, ctb->desc->tail, ctb->size); + ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW; + ctb->broken = true; + return false; + } + + s
Re: [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads
On 7/6/2021 12:12, Michal Wajdeczko wrote: On 06.07.2021 21:00, John Harrison wrote: On 7/1/2021 10:15, Matthew Brost wrote: CTB writes are now in the path of command submission and should be optimized for performance. Rather than reading CTB descriptor values (e.g. head, tail) which could result in accesses across the PCIe bus, store shadow local copies and only read/write the descriptor values when absolutely necessary. Also store the current space in the each channel locally. v2: (Michel) - Add additional sanity checks for head / tail pointers - Use GUC_CTB_HDR_LEN rather than magic 1 Signed-off-by: John Harrison Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 88 +++ drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 6 ++ 2 files changed, 65 insertions(+), 29 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index a9cb7b608520..5b8b4ff609e2 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc) static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb) { ctb->broken = false; + ctb->tail = 0; + ctb->head = 0; + ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size); + guc_ct_buffer_desc_init(ctb->desc); } @@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct, { struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct guc_ct_buffer_desc *desc = ctb->desc; - u32 head = desc->head; - u32 tail = desc->tail; + u32 tail = ctb->tail; u32 size = ctb->size; - u32 used; u32 header; u32 hxg; u32 *cmds = ctb->cmds; @@ -395,25 +397,22 @@ static int ct_write(struct intel_guc_ct *ct, if (unlikely(desc->status)) goto corrupted; - if (unlikely((tail | head) >= size)) { + GEM_BUG_ON(tail > size); + +#ifdef CONFIG_DRM_I915_DEBUG_GUC + if (unlikely(tail != READ_ONCE(desc->tail))) { + CT_ERROR(ct, "Tail was modified %u != %u\n", + desc->tail, ctb->tail); + desc->status |= GUC_CTB_STATUS_MISMATCH; + goto corrupted; + } + if (unlikely((desc->tail | desc->head) >= size)) { CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n", - head, tail, size); + desc->head, desc->tail, size); desc->status |= GUC_CTB_STATUS_OVERFLOW; goto corrupted; } - - /* - * tail == head condition indicates empty. GuC FW does not support - * using up the entire buffer to get tail == head meaning full. - */ - if (tail < head) - used = (size - head) + tail; - else - used = tail - head; - - /* make sure there is a space including extra dw for the fence */ - if (unlikely(used + len + GUC_CTB_HDR_LEN >= size)) - return -ENOSPC; +#endif /* * dw0: CT header (including fence) @@ -454,7 +453,9 @@ static int ct_write(struct intel_guc_ct *ct, write_barrier(ct); /* now update descriptor */ + ctb->tail = tail; WRITE_ONCE(desc->tail, tail); + ctb->space -= len + GUC_CTB_HDR_LEN; return 0; @@ -470,7 +471,7 @@ static int ct_write(struct intel_guc_ct *ct, * @req: pointer to pending request * @status: placeholder for status * - * For each sent request, Guc shall send bac CT response message. + * For each sent request, GuC shall send back CT response message. * Our message handler will update status of tracked request once * response message with given fence is received. Wait here and * check for valid response status value. @@ -526,24 +527,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct) return ret; } -static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw) +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw) { - struct guc_ct_buffer_desc *desc = ctb->desc; - u32 head = READ_ONCE(desc->head); + struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; + u32 head; u32 space; - space = CIRC_SPACE(desc->tail, head, ctb->size); + if (ctb->space >= len_dw) + return true; + + head = READ_ONCE(ctb->desc->head); + if (unlikely(head > ctb->size)) { + CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n", + ctb->desc->head, ctb->desc->tail, ctb->size); + ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW; + ctb->broken = true; + return false; + } + + space = CIRC_SPACE(ctb->tail, head, ctb->size); + ctb->space = space; return space >= len_dw; } static int
Re: [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads
On 7/6/2021 12:33, Michal Wajdeczko wrote: On 06.07.2021 21:19, John Harrison wrote: On 7/6/2021 12:12, Michal Wajdeczko wrote: On 06.07.2021 21:00, John Harrison wrote: On 7/1/2021 10:15, Matthew Brost wrote: CTB writes are now in the path of command submission and should be optimized for performance. Rather than reading CTB descriptor values (e.g. head, tail) which could result in accesses across the PCIe bus, store shadow local copies and only read/write the descriptor values when absolutely necessary. Also store the current space in the each channel locally. v2: (Michel) - Add additional sanity checks for head / tail pointers - Use GUC_CTB_HDR_LEN rather than magic 1 Signed-off-by: John Harrison Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 88 +++ drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 6 ++ 2 files changed, 65 insertions(+), 29 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index a9cb7b608520..5b8b4ff609e2 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc) static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb) { ctb->broken = false; + ctb->tail = 0; + ctb->head = 0; + ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size); + guc_ct_buffer_desc_init(ctb->desc); } @@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct, { struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct guc_ct_buffer_desc *desc = ctb->desc; - u32 head = desc->head; - u32 tail = desc->tail; + u32 tail = ctb->tail; u32 size = ctb->size; - u32 used; u32 header; u32 hxg; u32 *cmds = ctb->cmds; @@ -395,25 +397,22 @@ static int ct_write(struct intel_guc_ct *ct, if (unlikely(desc->status)) goto corrupted; - if (unlikely((tail | head) >= size)) { + GEM_BUG_ON(tail > size); + +#ifdef CONFIG_DRM_I915_DEBUG_GUC + if (unlikely(tail != READ_ONCE(desc->tail))) { + CT_ERROR(ct, "Tail was modified %u != %u\n", + desc->tail, ctb->tail); + desc->status |= GUC_CTB_STATUS_MISMATCH; + goto corrupted; + } + if (unlikely((desc->tail | desc->head) >= size)) { CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n", - head, tail, size); + desc->head, desc->tail, size); desc->status |= GUC_CTB_STATUS_OVERFLOW; goto corrupted; } - - /* - * tail == head condition indicates empty. GuC FW does not support - * using up the entire buffer to get tail == head meaning full. - */ - if (tail < head) - used = (size - head) + tail; - else - used = tail - head; - - /* make sure there is a space including extra dw for the fence */ - if (unlikely(used + len + GUC_CTB_HDR_LEN >= size)) - return -ENOSPC; +#endif /* * dw0: CT header (including fence) @@ -454,7 +453,9 @@ static int ct_write(struct intel_guc_ct *ct, write_barrier(ct); /* now update descriptor */ + ctb->tail = tail; WRITE_ONCE(desc->tail, tail); + ctb->space -= len + GUC_CTB_HDR_LEN; return 0; @@ -470,7 +471,7 @@ static int ct_write(struct intel_guc_ct *ct, * @req: pointer to pending request * @status: placeholder for status * - * For each sent request, Guc shall send bac CT response message. + * For each sent request, GuC shall send back CT response message. * Our message handler will update status of tracked request once * response message with given fence is received. Wait here and * check for valid response status value. @@ -526,24 +527,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct) return ret; } -static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw) +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw) { - struct guc_ct_buffer_desc *desc = ctb->desc; - u32 head = READ_ONCE(desc->head); + struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; + u32 head; u32 space; - space = CIRC_SPACE(desc->tail, head, ctb->size); + if (ctb->space >= len_dw) + return true; + + head = READ_ONCE(ctb->desc->head); + if (unlikely(head > ctb->size)) { + CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n", + ctb->desc->head, ctb->desc->tail, ctb->size); + ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW; + ctb->broken = true; + return false; + } + + sp
Re: [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads
On 7/6/2021 15:20, Matthew Brost wrote: CTB writes are now in the path of command submission and should be optimized for performance. Rather than reading CTB descriptor values (e.g. head, tail) which could result in accesses across the PCIe bus, store shadow local copies and only read/write the descriptor values when absolutely necessary. Also store the current space in the each channel locally. v2: (Michal) - Add additional sanity checks for head / tail pointers - Use GUC_CTB_HDR_LEN rather than magic 1 v3: (Michal / John H) - Drop redundant check of head value Signed-off-by: John Harrison Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 88 +++ drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 6 ++ 2 files changed, 65 insertions(+), 29 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index db3e85b89573..4a73a1f03a9b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc) static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb) { ctb->broken = false; + ctb->tail = 0; + ctb->head = 0; + ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size); + guc_ct_buffer_desc_init(ctb->desc); } @@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct, { struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct guc_ct_buffer_desc *desc = ctb->desc; - u32 head = desc->head; - u32 tail = desc->tail; + u32 tail = ctb->tail; u32 size = ctb->size; - u32 used; u32 header; u32 hxg; u32 type; @@ -396,25 +398,22 @@ static int ct_write(struct intel_guc_ct *ct, if (unlikely(desc->status)) goto corrupted; - if (unlikely((tail | head) >= size)) { + GEM_BUG_ON(tail > size); + +#ifdef CONFIG_DRM_I915_DEBUG_GUC + if (unlikely(tail != READ_ONCE(desc->tail))) { + CT_ERROR(ct, "Tail was modified %u != %u\n", +desc->tail, ctb->tail); + desc->status |= GUC_CTB_STATUS_MISMATCH; + goto corrupted; + } + if (unlikely((desc->tail | desc->head) >= size)) { Same arguments below about head apply to tail here. Also, there is no #else check on ctb->head? CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n", -head, tail, size); +desc->head, desc->tail, size); desc->status |= GUC_CTB_STATUS_OVERFLOW; goto corrupted; } - - /* -* tail == head condition indicates empty. GuC FW does not support -* using up the entire buffer to get tail == head meaning full. -*/ - if (tail < head) - used = (size - head) + tail; - else - used = tail - head; - - /* make sure there is a space including extra dw for the header */ - if (unlikely(used + len + GUC_CTB_HDR_LEN >= size)) - return -ENOSPC; +#endif /* * dw0: CT header (including fence) @@ -453,7 +452,9 @@ static int ct_write(struct intel_guc_ct *ct, write_barrier(ct); /* now update descriptor */ + ctb->tail = tail; WRITE_ONCE(desc->tail, tail); + ctb->space -= len + GUC_CTB_HDR_LEN; return 0; @@ -469,7 +470,7 @@ static int ct_write(struct intel_guc_ct *ct, * @req: pointer to pending request * @status: placeholder for status * - * For each sent request, Guc shall send bac CT response message. + * For each sent request, GuC shall send back CT response message. * Our message handler will update status of tracked request once * response message with given fence is received. Wait here and * check for valid response status value. @@ -525,24 +526,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct) return ret; } -static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw) +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw) { - struct guc_ct_buffer_desc *desc = ctb->desc; - u32 head = READ_ONCE(desc->head); + struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; + u32 head; u32 space; - space = CIRC_SPACE(desc->tail, head, ctb->size); + if (ctb->space >= len_dw) + return true; + + head = READ_ONCE(ctb->desc->head); + if (unlikely(head > ctb->size)) { + CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n", +ctb->desc->head, ctb->desc->tail, ctb->size); +
Re: [Intel-gfx] [PATCH 47/47] drm/i915/guc: Unblock GuC submission on Gen11+
On 7/3/2021 01:21, Martin Peres wrote: On 02/07/2021 18:07, Michal Wajdeczko wrote: On 02.07.2021 10:09, Martin Peres wrote: On 02/07/2021 10:29, Pekka Paalanen wrote: On Thu, 1 Jul 2021 21:28:06 +0200 Daniel Vetter wrote: On Thu, Jul 1, 2021 at 8:27 PM Martin Peres wrote: On 01/07/2021 11:14, Pekka Paalanen wrote: On Wed, 30 Jun 2021 11:58:25 -0700 John Harrison wrote: On 6/30/2021 01:22, Martin Peres wrote: On 24/06/2021 10:05, Matthew Brost wrote: From: Daniele Ceraolo Spurio Unblock GuC submission on Gen11+ platforms. Signed-off-by: Michal Wajdeczko Signed-off-by: Daniele Ceraolo Spurio Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/uc/intel_guc.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 8 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h | 3 +-- drivers/gpu/drm/i915/gt/uc/intel_uc.c | 14 +- 4 files changed, 19 insertions(+), 7 deletions(-) ... diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c index 7a69c3c027e9..61be0aa81492 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c @@ -34,8 +34,15 @@ static void uc_expand_default_options(struct intel_uc *uc) return; } - /* Default: enable HuC authentication only */ - i915->params.enable_guc = ENABLE_GUC_LOAD_HUC; + /* Intermediate platforms are HuC authentication only */ + if (IS_DG1(i915) || IS_ALDERLAKE_S(i915)) { + drm_dbg(&i915->drm, "Disabling GuC only due to old platform\n"); This comment does not seem accurate, given that DG1 is barely out, and ADL is not out yet. How about: "Disabling GuC on untested platforms"? Just because something is not in the shops yet does not mean it is new. Technology is always obsolete by the time it goes on sale. That is a very good reason to not use terminology like "new", "old", "current", "modern" etc. at all. End users like me definitely do not share your interpretation of "old". Yep, old and new is relative. In the end, what matters is the validation effort, which is why I was proposing "untested platforms". Also, remember that you are not writing these messages for Intel engineers, but instead are writing for Linux *users*. It's drm_dbg. Users don't read this stuff, at least not users with no clue what the driver does and stuff like that. If I had a problem, I would read it, and I have no clue what anything of that is. Exactly. I don't see how replacing 'old' for 'untested' helps anybody to understand anything. Untested just implies we can't be bothered to test stuff before publishing it. And as previously stated, this is purely a political decision not a technical one. Sure, change the message to be 'Disabling GuC submission but enabling HuC loading via GuC on platform XXX' if that makes it clearer what is going on. Or just drop the message completely. It's simply explaining what the default option is for the current platform which you can also get by reading the code. However, I disagree that 'untested' is the correct message. Quite a lot of testing has been happening on TGL+ with GuC submission enabled. This level of defense for what is clearly a bad *debug* message (at the very least, the grammar) makes no sense at all! I don't want to hear arguments like "Not my patch" from a developer literally sending the patch to the ML and who added his SoB to the patch, playing with words, or minimizing the problem of having such a message. Agree that 'not my patch' is never a good excuse, but equally we can't blame original patch author as patch was updated few times since then. I never wanted to blame the author here, I was only speaking about the handling of feedback on the patch. Maybe to avoid confusions and simplify reviews, we could split this patch into two smaller: first one that really unblocks GuC submission on all Gen11+ (see __guc_submission_supported) and second one that updates defaults for Gen12+ (see uc_expand_default_options), as original patch (from ~2019) evolved more than what title/commit message says. Both work for me, as long as it is a collaborative effort. I'm not seeing how splitting the patch up fixes the complaints about the debug message. And to be clear, no-one is actually arguing for a code change as such? The issue is just about the text of the debug message? Or did I miss something somewhere? John. Cheers, Martin Then we can fix all messaging and make sure it's clear and understood. Thanks, Michal All of the above are just clear signals for the community to get off your playground, which is frankly unacceptable. Your email address does not matter. In the spirit of collaboration, your response should have been "Good catch, ho
Re: [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads
On 7/7/2021 10:50, Matthew Brost wrote: On Tue, Jul 06, 2021 at 03:51:00PM -0700, John Harrison wrote: On 7/6/2021 15:20, Matthew Brost wrote: CTB writes are now in the path of command submission and should be optimized for performance. Rather than reading CTB descriptor values (e.g. head, tail) which could result in accesses across the PCIe bus, store shadow local copies and only read/write the descriptor values when absolutely necessary. Also store the current space in the each channel locally. v2: (Michal) - Add additional sanity checks for head / tail pointers - Use GUC_CTB_HDR_LEN rather than magic 1 v3: (Michal / John H) - Drop redundant check of head value Signed-off-by: John Harrison Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 88 +++ drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 6 ++ 2 files changed, 65 insertions(+), 29 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index db3e85b89573..4a73a1f03a9b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc) static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb) { ctb->broken = false; + ctb->tail = 0; + ctb->head = 0; + ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size); + guc_ct_buffer_desc_init(ctb->desc); } @@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct, { struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct guc_ct_buffer_desc *desc = ctb->desc; - u32 head = desc->head; - u32 tail = desc->tail; + u32 tail = ctb->tail; u32 size = ctb->size; - u32 used; u32 header; u32 hxg; u32 type; @@ -396,25 +398,22 @@ static int ct_write(struct intel_guc_ct *ct, if (unlikely(desc->status)) goto corrupted; - if (unlikely((tail | head) >= size)) { + GEM_BUG_ON(tail > size); + +#ifdef CONFIG_DRM_I915_DEBUG_GUC + if (unlikely(tail != READ_ONCE(desc->tail))) { + CT_ERROR(ct, "Tail was modified %u != %u\n", +desc->tail, ctb->tail); + desc->status |= GUC_CTB_STATUS_MISMATCH; + goto corrupted; + } + if (unlikely((desc->tail | desc->head) >= size)) { Same arguments below about head apply to tail here. Also, there is no #else Yes, desc->tail can be removed from this check. Same for head below. Can you fix this when merging? check on ctb->head? ctb->head variable isn't used in this path nor is ctb->tail in the other. In the other path desc->tail is checked as it is read while desc->head isn't needed to be read here. The other path can also likely be reworked to pull the tail check outside of the if / else define block. CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n", -head, tail, size); +desc->head, desc->tail, size); desc->status |= GUC_CTB_STATUS_OVERFLOW; goto corrupted; } - - /* -* tail == head condition indicates empty. GuC FW does not support -* using up the entire buffer to get tail == head meaning full. -*/ - if (tail < head) - used = (size - head) + tail; - else - used = tail - head; - - /* make sure there is a space including extra dw for the header */ - if (unlikely(used + len + GUC_CTB_HDR_LEN >= size)) - return -ENOSPC; +#endif /* * dw0: CT header (including fence) @@ -453,7 +452,9 @@ static int ct_write(struct intel_guc_ct *ct, write_barrier(ct); /* now update descriptor */ + ctb->tail = tail; WRITE_ONCE(desc->tail, tail); + ctb->space -= len + GUC_CTB_HDR_LEN; return 0; @@ -469,7 +470,7 @@ static int ct_write(struct intel_guc_ct *ct, * @req: pointer to pending request * @status: placeholder for status * - * For each sent request, Guc shall send bac CT response message. + * For each sent request, GuC shall send back CT response message. * Our message handler will update status of tracked request once * response message with given fence is received. Wait here and * check for valid response status value. @@ -525,24 +526,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct) return ret; } -static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw) +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw) { - struct guc_ct_buffer_desc *desc = ctb->desc; - u32 head = READ_ONCE(desc->he
Re: [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads
On 7/7/2021 11:56, Matthew Brost wrote: Ok, I sent it but I looks like patchworks didn't like it. Anyways we should be able to review that patch. Matt Maybe because it came out as 6/56 instead of 6/7? Also, not sure if it needs to be in reply to 0/7 or 6/7? John.
Re: [PATCH 14/47] drm/i915/guc: Insert fence on context when deregistering
On 6/24/2021 00:04, Matthew Brost wrote: Sometime during context pinning a context with the same guc_id is Sometime*s* registered with the GuC. In this a case deregister must be before before before before -> done before the context can be registered. A fence is inserted on all requests while the deregister is in flight. Once the G2H is received indicating the deregistration is complete the context is registered and the fence is released. Cc: John Harrison Signed-off-by: Matthew Brost With the above text fixed up: Reviewed-by: John Harrison --- drivers/gpu/drm/i915/gt/intel_context.c | 1 + drivers/gpu/drm/i915/gt/intel_context_types.h | 5 ++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 51 ++- drivers/gpu/drm/i915/i915_request.h | 8 +++ 4 files changed, 63 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 2b68af16222c..f750c826e19d 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -384,6 +384,7 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine) mutex_init(&ce->pin_mutex); spin_lock_init(&ce->guc_state.lock); + INIT_LIST_HEAD(&ce->guc_state.fences); ce->guc_id = GUC_INVALID_LRC_ID; INIT_LIST_HEAD(&ce->guc_id_link); diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index ce7c69b34cd1..beafe55a9101 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -146,6 +146,11 @@ struct intel_context { * submission */ u8 sched_state; + /* +* fences: maintains of list of requests that have a submit +* fence related to GuC submission +*/ + struct list_head fences; } guc_state; /* GuC scheduling state that does not require a lock. */ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index d39579ac2faa..49e5d460d54b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -924,6 +924,30 @@ static const struct intel_context_ops guc_context_ops = { .destroy = guc_context_destroy, }; +static void __guc_signal_context_fence(struct intel_context *ce) +{ + struct i915_request *rq; + + lockdep_assert_held(&ce->guc_state.lock); + + list_for_each_entry(rq, &ce->guc_state.fences, guc_fence_link) + i915_sw_fence_complete(&rq->submit); + + INIT_LIST_HEAD(&ce->guc_state.fences); +} + +static void guc_signal_context_fence(struct intel_context *ce) +{ + unsigned long flags; + + GEM_BUG_ON(!context_wait_for_deregister_to_register(ce)); + + spin_lock_irqsave(&ce->guc_state.lock, flags); + clr_context_wait_for_deregister_to_register(ce); + __guc_signal_context_fence(ce); + spin_unlock_irqrestore(&ce->guc_state.lock, flags); +} + static bool context_needs_register(struct intel_context *ce, bool new_guc_id) { return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) || @@ -934,6 +958,7 @@ static int guc_request_alloc(struct i915_request *rq) { struct intel_context *ce = rq->context; struct intel_guc *guc = ce_to_guc(ce); + unsigned long flags; int ret; GEM_BUG_ON(!intel_context_is_pinned(rq->context)); @@ -978,7 +1003,7 @@ static int guc_request_alloc(struct i915_request *rq) * increment (in pin_guc_id) is needed to seal a race with unpin_guc_id. */ if (atomic_add_unless(&ce->guc_id_ref, 1, 0)) - return 0; + goto out; ret = pin_guc_id(guc, ce); /* returns 1 if new guc_id assigned */ if (unlikely(ret < 0)) @@ -994,6 +1019,28 @@ static int guc_request_alloc(struct i915_request *rq) clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags); +out: + /* +* We block all requests on this context if a G2H is pending for a +* context deregistration as the GuC will fail a context registration +* while this G2H is pending. Once a G2H returns, the fence is released +* that is blocking these requests (see guc_signal_context_fence). +* +* We can safely check the below field outside of the lock as it isn't +* possible for this field to transition from being clear to set but +* converse is possible, hence the need for the check within the lock. +*/ + if (likely(!context_wait_for_deregister_to_register(ce))) + return 0; + + spin_lock_irqsave(&ce->guc_state.lock, flags); + if (context_wait_f
Re: [PATCH 15/47] drm/i915/guc: Defer context unpin until scheduling is disabled
On 6/24/2021 00:04, Matthew Brost wrote: With GuC scheduling, it isn't safe to unpin a context while scheduling is enabled for that context as the GuC may touch some of the pinned state (e.g. LRC). To ensure scheduling isn't enabled when an unpin is done, a call back is added to intel_context_unpin when pin count == 1 to disable scheduling for that context. When the response CTB is received it is safe to do the final unpin. Future patches may add a heuristic / delay to schedule the disable call back to avoid thrashing on schedule enable / disable. Cc: John Harrison Signed-off-by: Matthew Brost Reviewed-by: John Harrison --- drivers/gpu/drm/i915/gt/intel_context.c | 4 +- drivers/gpu/drm/i915/gt/intel_context.h | 27 +++- drivers/gpu/drm/i915/gt/intel_context_types.h | 2 + drivers/gpu/drm/i915/gt/uc/intel_guc.h| 2 + drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 3 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 145 +- 6 files changed, 179 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index f750c826e19d..1499b8aace2a 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -306,9 +306,9 @@ int __intel_context_do_pin(struct intel_context *ce) return err; } -void intel_context_unpin(struct intel_context *ce) +void __intel_context_do_unpin(struct intel_context *ce, int sub) { - if (!atomic_dec_and_test(&ce->pin_count)) + if (!atomic_sub_and_test(sub, &ce->pin_count)) return; CE_TRACE(ce, "unpin\n"); diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h index f83a73a2b39f..8a7199afbe61 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.h +++ b/drivers/gpu/drm/i915/gt/intel_context.h @@ -113,7 +113,32 @@ static inline void __intel_context_pin(struct intel_context *ce) atomic_inc(&ce->pin_count); } -void intel_context_unpin(struct intel_context *ce); +void __intel_context_do_unpin(struct intel_context *ce, int sub); + +static inline void intel_context_sched_disable_unpin(struct intel_context *ce) +{ + __intel_context_do_unpin(ce, 2); +} + +static inline void intel_context_unpin(struct intel_context *ce) +{ + if (!ce->ops->sched_disable) { + __intel_context_do_unpin(ce, 1); + } else { + /* +* Move ownership of this pin to the scheduling disable which is +* an async operation. When that operation completes the above +* intel_context_sched_disable_unpin is called potentially +* unpinning the context. +*/ + while (!atomic_add_unless(&ce->pin_count, -1, 1)) { + if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1) { + ce->ops->sched_disable(ce); + break; + } + } + } +} void intel_context_enter_engine(struct intel_context *ce); void intel_context_exit_engine(struct intel_context *ce); diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index beafe55a9101..e7af6a2368f8 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -43,6 +43,8 @@ struct intel_context_ops { void (*enter)(struct intel_context *ce); void (*exit)(struct intel_context *ce); + void (*sched_disable)(struct intel_context *ce); + void (*reset)(struct intel_context *ce); void (*destroy)(struct kref *kref); }; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index d44316dc914b..b43ec56986b5 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -236,6 +236,8 @@ int intel_guc_reset_engine(struct intel_guc *guc, int intel_guc_deregister_done_process_msg(struct intel_guc *guc, const u32 *msg, u32 len); +int intel_guc_sched_done_process_msg(struct intel_guc *guc, +const u32 *msg, u32 len); void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 42a7daef2ff6..7491f041859e 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -905,6 +905,9 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r ret = intel_guc_deregister_done_process_msg(guc, payload, len); break; + case INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE:
Re: [PATCH 16/47] drm/i915/guc: Disable engine barriers with GuC during unpin
On 6/24/2021 00:04, Matthew Brost wrote: Disable engine barriers for unpinning with GuC. This feature isn't needed with the GuC as it disables context scheduling before unpinning which guarantees the HW will not reference the context. Hence it is not necessary to defer unpinning until a kernel context request completes on each engine in the context engine mask. Cc: John Harrison Signed-off-by: Matthew Brost Signed-off-by: Daniele Ceraolo Spurio --- drivers/gpu/drm/i915/gt/intel_context.c| 2 +- drivers/gpu/drm/i915/gt/intel_context.h| 1 + drivers/gpu/drm/i915/gt/selftest_context.c | 10 ++ 3 files changed, 12 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 1499b8aace2a..7f97753ab164 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -80,7 +80,7 @@ static int intel_context_active_acquire(struct intel_context *ce) __i915_active_acquire(&ce->active); - if (intel_context_is_barrier(ce)) + if (intel_context_is_barrier(ce) || intel_engine_uses_guc(ce->engine)) return 0; Would be better to have a scheduler flag to say whether barriers are required or not. That would prevent polluting front end code with back end details. John. /* Preallocate tracking nodes */ diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h index 8a7199afbe61..a592a9605dc8 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.h +++ b/drivers/gpu/drm/i915/gt/intel_context.h @@ -16,6 +16,7 @@ #include "intel_engine_types.h" #include "intel_ring_types.h" #include "intel_timeline_types.h" +#include "uc/intel_guc_submission.h" #define CE_TRACE(ce, fmt, ...) do { \ const struct intel_context *ce__ = (ce);\ diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c b/drivers/gpu/drm/i915/gt/selftest_context.c index 26685b927169..fa7b99a671dd 100644 --- a/drivers/gpu/drm/i915/gt/selftest_context.c +++ b/drivers/gpu/drm/i915/gt/selftest_context.c @@ -209,7 +209,13 @@ static int __live_active_context(struct intel_engine_cs *engine) * This test makes sure that the context is kept alive until a * subsequent idle-barrier (emitted when the engine wakeref hits 0 * with no more outstanding requests). +* +* In GuC submission mode we don't use idle barriers and we instead +* get a message from the GuC to signal that it is safe to unpin the +* context from memory. */ + if (intel_engine_uses_guc(engine)) + return 0; if (intel_engine_pm_is_awake(engine)) { pr_err("%s is awake before starting %s!\n", @@ -357,7 +363,11 @@ static int __live_remote_context(struct intel_engine_cs *engine) * on the context image remotely (intel_context_prepare_remote_request), * which inserts foreign fences into intel_context.active, does not * clobber the idle-barrier. +* +* In GuC submission mode we don't use idle barriers. */ + if (intel_engine_uses_guc(engine)) + return 0; if (intel_engine_pm_is_awake(engine)) { pr_err("%s is awake before starting %s!\n",
Re: [PATCH 17/47] drm/i915/guc: Extend deregistration fence to schedule disable
On 6/24/2021 00:04, Matthew Brost wrote: Extend the deregistration context fence to fence whne a GuC context has scheduling disable pending. Cc: John Harrison Signed-off-by: Matthew Brost --- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 37 +++ 1 file changed, 30 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 0386ccd5a481..0a6ccdf32316 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -918,7 +918,19 @@ static void guc_context_sched_disable(struct intel_context *ce) goto unpin; spin_lock_irqsave(&ce->guc_state.lock, flags); + + /* +* We have to check if the context has been pinned again as another pin +* operation is allowed to pass this function. Checking the pin count +* here synchronizes this function with guc_request_alloc ensuring a +* request doesn't slip through the 'context_pending_disable' fence. +*/ The pin count is an atomic so doesn't need the spinlock. Also the above comment 'checking the pin count here synchronizes ...' seems wrong. Isn't the point that acquiring the spinlock is what synchronises with guc_request_alloc? So the comment should be before the spinlock acquire and should mention using the spinlock for this purpose? John. + if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) { + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + return; + } guc_id = prep_context_pending_disable(ce); + spin_unlock_irqrestore(&ce->guc_state.lock, flags); with_intel_runtime_pm(runtime_pm, wakeref) @@ -1123,19 +1135,22 @@ static int guc_request_alloc(struct i915_request *rq) out: /* * We block all requests on this context if a G2H is pending for a -* context deregistration as the GuC will fail a context registration -* while this G2H is pending. Once a G2H returns, the fence is released -* that is blocking these requests (see guc_signal_context_fence). +* schedule disable or context deregistration as the GuC will fail a +* schedule enable or context registration if either G2H is pending +* respectfully. Once a G2H returns, the fence is released that is +* blocking these requests (see guc_signal_context_fence). * -* We can safely check the below field outside of the lock as it isn't -* possible for this field to transition from being clear to set but +* We can safely check the below fields outside of the lock as it isn't +* possible for these fields to transition from being clear to set but * converse is possible, hence the need for the check within the lock. */ - if (likely(!context_wait_for_deregister_to_register(ce))) + if (likely(!context_wait_for_deregister_to_register(ce) && + !context_pending_disable(ce))) return 0; spin_lock_irqsave(&ce->guc_state.lock, flags); - if (context_wait_for_deregister_to_register(ce)) { + if (context_wait_for_deregister_to_register(ce) || + context_pending_disable(ce)) { i915_sw_fence_await(&rq->submit); list_add_tail(&rq->guc_fence_link, &ce->guc_state.fences); @@ -1484,10 +1499,18 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc, if (context_pending_enable(ce)) { clr_context_pending_enable(ce); } else if (context_pending_disable(ce)) { + /* +* Unpin must be done before __guc_signal_context_fence, +* otherwise a race exists between the requests getting +* submitted + retired before this unpin completes resulting in +* the pin_count going to zero and the context still being +* enabled. +*/ intel_context_sched_disable_unpin(ce); spin_lock_irqsave(&ce->guc_state.lock, flags); clr_context_pending_disable(ce); + __guc_signal_context_fence(ce); spin_unlock_irqrestore(&ce->guc_state.lock, flags); }
Re: [PATCH 18/47] drm/i915: Disable preempt busywait when using GuC scheduling
On 6/24/2021 00:04, Matthew Brost wrote: Disable preempt busywait when using GuC scheduling. This isn't need as needed the GuC control preemption when scheduling. controls With the above fixed: Reviewed-by: John Harrison Cc: John Harrison Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c index 87b06572fd2e..f7aae502ec3d 100644 --- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c @@ -506,7 +506,8 @@ gen8_emit_fini_breadcrumb_tail(struct i915_request *rq, u32 *cs) *cs++ = MI_USER_INTERRUPT; *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE; - if (intel_engine_has_semaphores(rq->engine)) + if (intel_engine_has_semaphores(rq->engine) && + !intel_uc_uses_guc_submission(&rq->engine->gt->uc)) cs = emit_preempt_busywait(rq, cs); rq->tail = intel_ring_offset(rq, cs); @@ -598,7 +599,8 @@ gen12_emit_fini_breadcrumb_tail(struct i915_request *rq, u32 *cs) *cs++ = MI_USER_INTERRUPT; *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE; - if (intel_engine_has_semaphores(rq->engine)) + if (intel_engine_has_semaphores(rq->engine) && + !intel_uc_uses_guc_submission(&rq->engine->gt->uc)) cs = gen12_emit_preempt_busywait(rq, cs); rq->tail = intel_ring_offset(rq, cs);
Re: [PATCH 20/47] drm/i915/guc: Disable semaphores when using GuC scheduling
On 6/24/2021 00:04, Matthew Brost wrote: Semaphores are an optimization and not required for basic GuC submission to work properly. Disable until we have time to do the implementation to enable semaphores and tune them for performance. Also long direction is just to delete semaphores from the i915 so another reason to not enable these for GuC submission. v2: Reword commit message Cc: John Harrison Signed-off-by: Matthew Brost I think the commit description does not really match the patch content. The description is valid but the 'disable' is done by simply not setting the enable flag (done in the execlist back end and presumably not done in the GuC back end). However, what the patch is actually doing seems to be fixing bugs with the 'are semaphores enabled' mechanism. I.e. correcting pieces of code that used semaphores without checking if they are enabled. And presumably this would be broken if someone tried to disable semaphores in execlist mode for any reason? So I think keeping the existing comment text is fine but something should be added to explain the actual changes. John. --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 7720b8c22c81..5c07e6abf16a 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -230,7 +230,8 @@ static void intel_context_set_gem(struct intel_context *ce, ce->timeline = intel_timeline_get(ctx->timeline); if (ctx->sched.priority >= I915_PRIORITY_NORMAL && - intel_engine_has_timeslices(ce->engine)) + intel_engine_has_timeslices(ce->engine) && + intel_engine_has_semaphores(ce->engine)) __set_bit(CONTEXT_USE_SEMAPHORES, &ce->flags); intel_context_set_watchdog_us(ce, ctx->watchdog.timeout_us); @@ -1938,7 +1939,8 @@ static int __apply_priority(struct intel_context *ce, void *arg) if (!intel_engine_has_timeslices(ce->engine)) return 0; - if (ctx->sched.priority >= I915_PRIORITY_NORMAL) + if (ctx->sched.priority >= I915_PRIORITY_NORMAL && + intel_engine_has_semaphores(ce->engine)) intel_context_set_use_semaphores(ce); else intel_context_clear_use_semaphores(ce);
Re: [PATCH 22/47] drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC
On 6/24/2021 00:04, Matthew Brost wrote: When running the GuC the GPU can't be considered idle if the GuC still has contexts pinned. As such, a call has been added in intel_gt_wait_for_idle to idle the UC and in turn the GuC by waiting for the number of unpinned contexts to go to zero. v2: rtimeout -> remaining_timeout Cc: John Harrison Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gem/i915_gem_mman.c | 3 +- drivers/gpu/drm/i915/gt/intel_gt.c| 19 drivers/gpu/drm/i915/gt/intel_gt.h| 2 + drivers/gpu/drm/i915/gt/intel_gt_requests.c | 22 ++--- drivers/gpu/drm/i915/gt/intel_gt_requests.h | 9 +- drivers/gpu/drm/i915/gt/uc/intel_guc.h| 4 + drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 4 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 88 ++- drivers/gpu/drm/i915/gt/uc/intel_uc.h | 5 ++ drivers/gpu/drm/i915/i915_debugfs.c | 1 + drivers/gpu/drm/i915/i915_gem_evict.c | 1 + .../gpu/drm/i915/selftests/igt_live_test.c| 2 +- .../gpu/drm/i915/selftests/mock_gem_device.c | 3 +- 14 files changed, 137 insertions(+), 27 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c index 2fd155742bd2..335b955d5b4b 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c @@ -644,7 +644,8 @@ mmap_offset_attach(struct drm_i915_gem_object *obj, goto insert; /* Attempt to reap some mmap space from dead objects */ - err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT); + err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT, + NULL); if (err) goto err; diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index e714e21c0a4d..acfdd53b2678 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -585,6 +585,25 @@ static void __intel_gt_disable(struct intel_gt *gt) GEM_BUG_ON(intel_gt_pm_is_awake(gt)); } +int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout) +{ + long remaining_timeout; + + /* If the device is asleep, we have no requests outstanding */ + if (!intel_gt_pm_is_awake(gt)) + return 0; + + while ((timeout = intel_gt_retire_requests_timeout(gt, timeout, + &remaining_timeout)) > 0) { + cond_resched(); + if (signal_pending(current)) + return -EINTR; + } + + return timeout ? timeout : intel_uc_wait_for_idle(>->uc, + remaining_timeout); +} + int intel_gt_init(struct intel_gt *gt) { int err; diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h index e7aabe0cc5bf..74e771871a9b 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.h +++ b/drivers/gpu/drm/i915/gt/intel_gt.h @@ -48,6 +48,8 @@ void intel_gt_driver_release(struct intel_gt *gt); void intel_gt_driver_late_release(struct intel_gt *gt); +int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout); + void intel_gt_check_and_clear_faults(struct intel_gt *gt); void intel_gt_clear_error_registers(struct intel_gt *gt, intel_engine_mask_t engine_mask); diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c index 647eca9d867a..39f5e824dac5 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c @@ -13,6 +13,7 @@ #include "intel_gt_pm.h" #include "intel_gt_requests.h" #include "intel_timeline.h" +#include "uc/intel_uc.h" Why is this needed? static bool retire_requests(struct intel_timeline *tl) { @@ -130,7 +131,8 @@ void intel_engine_fini_retire(struct intel_engine_cs *engine) GEM_BUG_ON(engine->retire); } -long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout) +long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout, + long *remaining_timeout) { struct intel_gt_timelines *timelines = >->timelines; struct intel_timeline *tl, *tn; @@ -195,22 +197,10 @@ out_active: spin_lock(&timelines->lock); if (flush_submission(gt, timeout)) /* Wait, there's more! */ active_count++; - return active_count ? timeout : 0; -} - -int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout) -{ - /* If the device is asleep, we have no requests outstanding */ - if (!intel_gt_pm_is_awake(gt)) - return 0; - - whil
Re: [PATCH 17/47] drm/i915/guc: Extend deregistration fence to schedule disable
On 7/9/2021 20:36, Matthew Brost wrote: On Fri, Jul 09, 2021 at 03:59:11PM -0700, John Harrison wrote: On 6/24/2021 00:04, Matthew Brost wrote: Extend the deregistration context fence to fence whne a GuC context has scheduling disable pending. Cc: John Harrison Signed-off-by: Matthew Brost --- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 37 +++ 1 file changed, 30 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 0386ccd5a481..0a6ccdf32316 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -918,7 +918,19 @@ static void guc_context_sched_disable(struct intel_context *ce) goto unpin; spin_lock_irqsave(&ce->guc_state.lock, flags); + + /* +* We have to check if the context has been pinned again as another pin +* operation is allowed to pass this function. Checking the pin count +* here synchronizes this function with guc_request_alloc ensuring a +* request doesn't slip through the 'context_pending_disable' fence. +*/ The pin count is an atomic so doesn't need the spinlock. Also the above How about? /* * We have to check if the context has been pinned again as another pin * operation is allowed to pass this function. Checking the pin count, * within ce->guc_state.lock, synchronizes this function with * guc_request_alloc ensuring a request doesn't slip through the * 'context_pending_disable' fence. Checking within the spin lock (can't * sleep) ensures another process doesn't pin this context and generate * a request before we set the 'context_pending_disable' flag here. */ Matt Sounds good. With that added in: Reviewed-by: John Harrison comment 'checking the pin count here synchronizes ...' seems wrong. Isn't the point that acquiring the spinlock is what synchronises with guc_request_alloc? So the comment should be before the spinlock acquire and should mention using the spinlock for this purpose? John. + if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) { + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + return; + } guc_id = prep_context_pending_disable(ce); + spin_unlock_irqrestore(&ce->guc_state.lock, flags); with_intel_runtime_pm(runtime_pm, wakeref) @@ -1123,19 +1135,22 @@ static int guc_request_alloc(struct i915_request *rq) out: /* * We block all requests on this context if a G2H is pending for a -* context deregistration as the GuC will fail a context registration -* while this G2H is pending. Once a G2H returns, the fence is released -* that is blocking these requests (see guc_signal_context_fence). +* schedule disable or context deregistration as the GuC will fail a +* schedule enable or context registration if either G2H is pending +* respectfully. Once a G2H returns, the fence is released that is +* blocking these requests (see guc_signal_context_fence). * -* We can safely check the below field outside of the lock as it isn't -* possible for this field to transition from being clear to set but +* We can safely check the below fields outside of the lock as it isn't +* possible for these fields to transition from being clear to set but * converse is possible, hence the need for the check within the lock. */ - if (likely(!context_wait_for_deregister_to_register(ce))) + if (likely(!context_wait_for_deregister_to_register(ce) && + !context_pending_disable(ce))) return 0; spin_lock_irqsave(&ce->guc_state.lock, flags); - if (context_wait_for_deregister_to_register(ce)) { + if (context_wait_for_deregister_to_register(ce) || + context_pending_disable(ce)) { i915_sw_fence_await(&rq->submit); list_add_tail(&rq->guc_fence_link, &ce->guc_state.fences); @@ -1484,10 +1499,18 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc, if (context_pending_enable(ce)) { clr_context_pending_enable(ce); } else if (context_pending_disable(ce)) { + /* +* Unpin must be done before __guc_signal_context_fence, +* otherwise a race exists between the requests getting +* submitted + retired before this unpin completes resulting in +* the pin_count going to zero and the context still being +* enabled. +*/ intel_context_sched_disable_unpin(ce); spin_lock_irqsave(&ce->guc_state
Re: [PATCH 16/47] drm/i915/guc: Disable engine barriers with GuC during unpin
On 7/9/2021 20:00, Matthew Brost wrote: On Fri, Jul 09, 2021 at 03:53:29PM -0700, John Harrison wrote: On 6/24/2021 00:04, Matthew Brost wrote: Disable engine barriers for unpinning with GuC. This feature isn't needed with the GuC as it disables context scheduling before unpinning which guarantees the HW will not reference the context. Hence it is not necessary to defer unpinning until a kernel context request completes on each engine in the context engine mask. Cc: John Harrison Signed-off-by: Matthew Brost Signed-off-by: Daniele Ceraolo Spurio --- drivers/gpu/drm/i915/gt/intel_context.c| 2 +- drivers/gpu/drm/i915/gt/intel_context.h| 1 + drivers/gpu/drm/i915/gt/selftest_context.c | 10 ++ 3 files changed, 12 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 1499b8aace2a..7f97753ab164 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -80,7 +80,7 @@ static int intel_context_active_acquire(struct intel_context *ce) __i915_active_acquire(&ce->active); - if (intel_context_is_barrier(ce)) + if (intel_context_is_barrier(ce) || intel_engine_uses_guc(ce->engine)) return 0; Would be better to have a scheduler flag to say whether barriers are required or not. That would prevent polluting front end code with back end details. I guess an engine flag is slightly better but I still don't love that as we have to test if the context is a barrier (kernel context) and then call a function that is basically backend specific after. IMO we really need to push all of this to a vfunc. If you really want me to make this an engine flag I can, but in the end it just seems like that will trash the code (adding an engine flag just to remove it). I think this is just a clean up we write down, and figure out a bit later as nothing is functionally wrong + quite clear that it is something that should be cleaned up. Matt Flag, vfunc, whatever. I just mean that it would be better to abstract it out in some manner. Maybe a flag/vfunc on the ce object? That way it would swallow the 'ignore kernel contexts' test as well. But yes, probably best to add it to the todo list and move on as it is not going to be a two minute quick fix. I've added a comment to the Jira, so... Reviewed-by: John Harrison John. /* Preallocate tracking nodes */ diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h index 8a7199afbe61..a592a9605dc8 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.h +++ b/drivers/gpu/drm/i915/gt/intel_context.h @@ -16,6 +16,7 @@ #include "intel_engine_types.h" #include "intel_ring_types.h" #include "intel_timeline_types.h" +#include "uc/intel_guc_submission.h" #define CE_TRACE(ce, fmt, ...) do { \ const struct intel_context *ce__ = (ce);\ diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c b/drivers/gpu/drm/i915/gt/selftest_context.c index 26685b927169..fa7b99a671dd 100644 --- a/drivers/gpu/drm/i915/gt/selftest_context.c +++ b/drivers/gpu/drm/i915/gt/selftest_context.c @@ -209,7 +209,13 @@ static int __live_active_context(struct intel_engine_cs *engine) * This test makes sure that the context is kept alive until a * subsequent idle-barrier (emitted when the engine wakeref hits 0 * with no more outstanding requests). +* +* In GuC submission mode we don't use idle barriers and we instead +* get a message from the GuC to signal that it is safe to unpin the +* context from memory. */ + if (intel_engine_uses_guc(engine)) + return 0; if (intel_engine_pm_is_awake(engine)) { pr_err("%s is awake before starting %s!\n", @@ -357,7 +363,11 @@ static int __live_remote_context(struct intel_engine_cs *engine) * on the context image remotely (intel_context_prepare_remote_request), * which inserts foreign fences into intel_context.active, does not * clobber the idle-barrier. +* +* In GuC submission mode we don't use idle barriers. */ + if (intel_engine_uses_guc(engine)) + return 0; if (intel_engine_pm_is_awake(engine)) { pr_err("%s is awake before starting %s!\n",
Re: [PATCH 23/47] drm/i915/guc: Update GuC debugfs to support new GuC
On 6/24/2021 00:04, Matthew Brost wrote: Update GuC debugfs to support the new GuC structures. Signed-off-by: John Harrison Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 22 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 3 ++ .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c| 23 +++- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 52 +++ .../gpu/drm/i915/gt/uc/intel_guc_submission.h | 4 ++ drivers/gpu/drm/i915/i915_debugfs.c | 1 + 6 files changed, 104 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index e0f92e28350c..4ed074df88e5 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -1135,3 +1135,25 @@ void intel_guc_ct_event_handler(struct intel_guc_ct *ct) ct_try_receive_message(ct); } + +void intel_guc_log_ct_info(struct intel_guc_ct *ct, + struct drm_printer *p) +{ + if (!ct->enabled) { + drm_puts(p, "CT disabled\n"); + return; + } + + drm_printf(p, "H2G Space: %u\n", + atomic_read(&ct->ctbs.send.space) * 4); + drm_printf(p, "Head: %u\n", + ct->ctbs.send.desc->head); + drm_printf(p, "Tail: %u\n", + ct->ctbs.send.desc->tail); + drm_printf(p, "G2H Space: %u\n", + atomic_read(&ct->ctbs.recv.space) * 4); + drm_printf(p, "Head: %u\n", + ct->ctbs.recv.desc->head); + drm_printf(p, "Tail: %u\n", + ct->ctbs.recv.desc->tail); +} diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index ab1b79ab960b..f62eb06b32fc 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -16,6 +16,7 @@ struct i915_vma; struct intel_guc; +struct drm_printer; /** * DOC: Command Transport (CT). @@ -106,4 +107,6 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, u32 *response_buf, u32 response_buf_size, u32 flags); void intel_guc_ct_event_handler(struct intel_guc_ct *ct); +void intel_guc_log_ct_info(struct intel_guc_ct *ct, struct drm_printer *p); + #endif /* _INTEL_GUC_CT_H_ */ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c index fe7cb7b29a1e..62b9ce0fafaa 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c @@ -9,6 +9,8 @@ #include "intel_guc.h" #include "intel_guc_debugfs.h" #include "intel_guc_log_debugfs.h" +#include "gt/uc/intel_guc_ct.h" +#include "gt/uc/intel_guc_submission.h" static int guc_info_show(struct seq_file *m, void *data) { @@ -22,16 +24,35 @@ static int guc_info_show(struct seq_file *m, void *data) drm_puts(&p, "\n"); intel_guc_log_info(&guc->log, &p); - /* Add more as required ... */ + if (!intel_guc_submission_is_used(guc)) + return 0; + + intel_guc_log_ct_info(&guc->ct, &p); + intel_guc_log_submission_info(guc, &p); return 0; } DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_info); +static int guc_registered_contexts_show(struct seq_file *m, void *data) +{ + struct intel_guc *guc = m->private; + struct drm_printer p = drm_seq_file_printer(m); + + if (!intel_guc_submission_is_used(guc)) + return -ENODEV; + + intel_guc_log_context_info(guc, &p); + + return 0; +} +DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_registered_contexts); + void intel_guc_debugfs_register(struct intel_guc *guc, struct dentry *root) { static const struct debugfs_gt_file files[] = { { "guc_info", &guc_info_fops, NULL }, + { "guc_registered_contexts", &guc_registered_contexts_fops, NULL }, }; if (!intel_guc_is_supported(guc)) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index d1a28283a9ae..89b3c7e5d15b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1600,3 +1600,55 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc, return 0; } + +void intel_guc_log_submission_info(struct intel_guc *guc, + struct drm_printer *p) +{ + struct i915_sched_engine *sched_engine = guc->sched_engine; + struct rb_node *rb; + unsigned long flags; + + drm_printf(p, "GuC Number Outstanding Submission G2H: %u\n", +
Re: [PATCH 24/47] drm/i915/guc: Add several request trace points
On 6/24/2021 00:04, Matthew Brost wrote: Add trace points for request dependencies and GuC submit. Extended existing request trace points to include submit fence value,, guc_id, Excessive punctuation. Or maybe should say 'fence value, tail, guc_id'? With that fixed: Reviewed-by: John Harrison and ring tail value. Cc: John Harrison Signed-off-by: Matthew Brost --- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 3 ++ drivers/gpu/drm/i915/i915_request.c | 3 ++ drivers/gpu/drm/i915/i915_trace.h | 39 ++- 3 files changed, 43 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 89b3c7e5d15b..c2327eebc09c 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -422,6 +422,7 @@ static int guc_dequeue_one_context(struct intel_guc *guc) guc->stalled_request = last; return false; } + trace_i915_request_guc_submit(last); } guc->stalled_request = NULL; @@ -642,6 +643,8 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc, ret = guc_add_request(guc, rq); if (ret == -EBUSY) guc->stalled_request = rq; + else + trace_i915_request_guc_submit(rq); return ret; } diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index d92c9f25c9f4..7f7aa096e873 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -1344,6 +1344,9 @@ __i915_request_await_execution(struct i915_request *to, return err; } + trace_i915_request_dep_to(to); + trace_i915_request_dep_from(from); + /* Couple the dependency tree for PI on this exposed to->fence */ if (to->engine->sched_engine->schedule) { err = i915_sched_node_add_dependency(&to->sched, diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h index 6778ad2a14a4..b02d04b6c8f6 100644 --- a/drivers/gpu/drm/i915/i915_trace.h +++ b/drivers/gpu/drm/i915/i915_trace.h @@ -794,22 +794,27 @@ DECLARE_EVENT_CLASS(i915_request, TP_STRUCT__entry( __field(u32, dev) __field(u64, ctx) +__field(u32, guc_id) __field(u16, class) __field(u16, instance) __field(u32, seqno) +__field(u32, tail) ), TP_fast_assign( __entry->dev = rq->engine->i915->drm.primary->index; __entry->class = rq->engine->uabi_class; __entry->instance = rq->engine->uabi_instance; + __entry->guc_id = rq->context->guc_id; __entry->ctx = rq->fence.context; __entry->seqno = rq->fence.seqno; + __entry->tail = rq->tail; ), - TP_printk("dev=%u, engine=%u:%u, ctx=%llu, seqno=%u", + TP_printk("dev=%u, engine=%u:%u, guc_id=%u, ctx=%llu, seqno=%u, tail=%u", __entry->dev, __entry->class, __entry->instance, - __entry->ctx, __entry->seqno) + __entry->guc_id, __entry->ctx, __entry->seqno, + __entry->tail) ); DEFINE_EVENT(i915_request, i915_request_add, @@ -818,6 +823,21 @@ DEFINE_EVENT(i915_request, i915_request_add, ); #if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS) +DEFINE_EVENT(i915_request, i915_request_dep_to, +TP_PROTO(struct i915_request *rq), +TP_ARGS(rq) +); + +DEFINE_EVENT(i915_request, i915_request_dep_from, +TP_PROTO(struct i915_request *rq), +TP_ARGS(rq) +); + +DEFINE_EVENT(i915_request, i915_request_guc_submit, +TP_PROTO(struct i915_request *rq), +TP_ARGS(rq) +); + DEFINE_EVENT(i915_request, i915_request_submit, TP_PROTO(struct i915_request *rq), TP_ARGS(rq) @@ -887,6 +907,21 @@ TRACE_EVENT(i915_request_out, #else #if !defined(TRACE_HEADER_MULTI_READ) +static inline void +trace_i915_request_dep_to(struct i915_request *rq) +{ +} + +static inline void +trace_i915_request_dep_from(struct i915_request *rq) +{ +} + +static inline void +trace_i915_request_guc_submit(struct i915_request *rq) +{ +} + static inline void trace_i915_request_submit(struct i915_request *rq) {
Re: [PATCH 25/47] drm/i915: Add intel_context tracing
On 6/24/2021 00:04, Matthew Brost wrote: Add intel_context tracing. These trace points are particular helpful when debugging the GuC firmware and can be enabled via CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS kernel config option. Cc: John Harrison Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/intel_context.c | 6 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 14 ++ drivers/gpu/drm/i915/i915_trace.h | 148 +- 3 files changed, 166 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 7f97753ab164..b24a1b7a3f88 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -8,6 +8,7 @@ #include "i915_drv.h" #include "i915_globals.h" +#include "i915_trace.h" #include "intel_context.h" #include "intel_engine.h" @@ -28,6 +29,7 @@ static void rcu_context_free(struct rcu_head *rcu) { struct intel_context *ce = container_of(rcu, typeof(*ce), rcu); + trace_intel_context_free(ce); kmem_cache_free(global.slab_ce, ce); } @@ -46,6 +48,7 @@ intel_context_create(struct intel_engine_cs *engine) return ERR_PTR(-ENOMEM); intel_context_init(ce, engine); + trace_intel_context_create(ce); return ce; } @@ -268,6 +271,8 @@ int __intel_context_do_pin_ww(struct intel_context *ce, GEM_BUG_ON(!intel_context_is_pinned(ce)); /* no overflow! */ + trace_intel_context_do_pin(ce); + err_unlock: mutex_unlock(&ce->pin_mutex); err_post_unpin: @@ -323,6 +328,7 @@ void __intel_context_do_unpin(struct intel_context *ce, int sub) */ intel_context_get(ce); intel_context_active_release(ce); + trace_intel_context_do_unpin(ce); intel_context_put(ce); } diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index c2327eebc09c..d605af0d66e6 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -348,6 +348,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) err = intel_guc_send_nb(guc, action, len, g2h_len_dw); if (!enabled && !err) { + trace_intel_context_sched_enable(ce); atomic_inc(&guc->outstanding_submission_g2h); set_context_enabled(ce); } else if (!enabled) { @@ -812,6 +813,8 @@ static int register_context(struct intel_context *ce) u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) + ce->guc_id * sizeof(struct guc_lrc_desc); + trace_intel_context_register(ce); + return __guc_action_register_context(guc, ce->guc_id, offset); } @@ -831,6 +834,8 @@ static int deregister_context(struct intel_context *ce, u32 guc_id) { struct intel_guc *guc = ce_to_guc(ce); + trace_intel_context_deregister(ce); + return __guc_action_deregister_context(guc, guc_id); } @@ -905,6 +910,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce) * GuC before registering this context. */ if (context_registered) { + trace_intel_context_steal_guc_id(ce); set_context_wait_for_deregister_to_register(ce); intel_context_get(ce); @@ -963,6 +969,7 @@ static void __guc_context_sched_disable(struct intel_guc *guc, GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID); + trace_intel_context_sched_disable(ce); intel_context_get(ce); guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), @@ -1119,6 +1126,9 @@ static void __guc_signal_context_fence(struct intel_context *ce) lockdep_assert_held(&ce->guc_state.lock); + if (!list_empty(&ce->guc_state.fences)) + trace_intel_context_fence_release(ce); + list_for_each_entry(rq, &ce->guc_state.fences, guc_fence_link) i915_sw_fence_complete(&rq->submit); @@ -1529,6 +1539,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc, if (unlikely(!ce)) return -EPROTO; + trace_intel_context_deregister_done(ce); + if (context_wait_for_deregister_to_register(ce)) { struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm; @@ -1580,6 +1592,8 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc, return -EPROTO; } + trace_intel_context_sched_done(ce); + if (context_pending_enable(ce)) { clr_context_pending_enable(ce); } else if (context_pending_disable(ce)) { diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h index b02d04b6c8f6..97c2e83984ed 100644 --- a/d
Re: [PATCH 27/47] drm/i915: Track 'serial' counts for virtual engines
On 6/24/2021 00:04, Matthew Brost wrote: From: John Harrison The serial number tracking of engines happens at the backend of request submission and was expecting to only be given physical engines. However, in GuC submission mode, the decomposition of virtual to physical engines does not happen in i915. Instead, requests are submitted to their virtual engine mask all the way through to the hardware (i.e. to GuC). This would mean that the heart beat code thinks the physical engines are idle due to the serial number not incrementing. This patch updates the tracking to decompose virtual engines into their physical constituents and tracks the request against each. This is not entirely accurate as the GuC will only be issuing the request to one physical engine. However, it is the best that i915 can do given that it has no knowledge of the GuC's scheduling decisions. Signed-off-by: John Harrison Signed-off-by: Matthew Brost Need to pull in the updated subject line and commit description from Tvrtko in the RFC patch set review. John. --- drivers/gpu/drm/i915/gt/intel_engine_types.h | 2 ++ .../gpu/drm/i915/gt/intel_execlists_submission.c | 6 ++ drivers/gpu/drm/i915/gt/intel_ring_submission.c | 6 ++ drivers/gpu/drm/i915/gt/mock_engine.c| 6 ++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c| 16 drivers/gpu/drm/i915/i915_request.c | 4 +++- 6 files changed, 39 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index 5b91068ab277..1dc59e6c9a92 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -388,6 +388,8 @@ struct intel_engine_cs { void(*park)(struct intel_engine_cs *engine); void(*unpark)(struct intel_engine_cs *engine); + void (*bump_serial)(struct intel_engine_cs *engine); + void(*set_default_submission)(struct intel_engine_cs *engine); const struct intel_context_ops *cops; diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index bd4ced794ff9..9cfb8800a0e6 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -3203,6 +3203,11 @@ static void execlists_release(struct intel_engine_cs *engine) lrc_fini_wa_ctx(engine); } +static void execlist_bump_serial(struct intel_engine_cs *engine) +{ + engine->serial++; +} + static void logical_ring_default_vfuncs(struct intel_engine_cs *engine) { @@ -3212,6 +3217,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine) engine->cops = &execlists_context_ops; engine->request_alloc = execlists_request_alloc; + engine->bump_serial = execlist_bump_serial; engine->reset.prepare = execlists_reset_prepare; engine->reset.rewind = execlists_reset_rewind; diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c index 5d42a12ef3d6..e1506b280df1 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c @@ -1044,6 +1044,11 @@ static void setup_irq(struct intel_engine_cs *engine) } } +static void ring_bump_serial(struct intel_engine_cs *engine) +{ + engine->serial++; +} + static void setup_common(struct intel_engine_cs *engine) { struct drm_i915_private *i915 = engine->i915; @@ -1063,6 +1068,7 @@ static void setup_common(struct intel_engine_cs *engine) engine->cops = &ring_context_ops; engine->request_alloc = ring_request_alloc; + engine->bump_serial = ring_bump_serial; /* * Using a global execution timeline; the previous final breadcrumb is diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c index 68970398e4ef..9203c766db80 100644 --- a/drivers/gpu/drm/i915/gt/mock_engine.c +++ b/drivers/gpu/drm/i915/gt/mock_engine.c @@ -292,6 +292,11 @@ static void mock_engine_release(struct intel_engine_cs *engine) intel_engine_fini_retire(engine); } +static void mock_bump_serial(struct intel_engine_cs *engine) +{ + engine->serial++; +} + struct intel_engine_cs *mock_engine(struct drm_i915_private *i915, const char *name, int id) @@ -318,6 +323,7 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915, engine->base.cops = &mock_context_ops; engine->base.request_alloc = mock_request_alloc; + engine->base.bump_serial = mock_bump_serial; engine->base.emit_flush = mock_emit_flush; engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb; engine->base.sub
Re: [PATCH 28/47] drm/i915: Hold reference to intel_context over life of i915_request
On 6/24/2021 00:04, Matthew Brost wrote: Hold a reference to the intel_context over life of an i915_request. Without this an i915_request can exist after the context has been destroyed (e.g. request retired, context closed, but user space holds a reference to the request from an out fence). In the case of GuC submission + virtual engine, the engine that the request references is also destroyed which can trigger bad pointer dref in fence ops (e.g. Maybe quickly explain a why this is different for GuC submission vs execlist? Presumably it is about only decomposing virtual engines to physical ones in execlist mode? i915_fence_get_driver_name). We could likely change i915_fence_get_driver_name to avoid touching the engine but let's just be safe and hold the intel_context reference. Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/i915_request.c | 54 - 1 file changed, 22 insertions(+), 32 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index de9deb95b8b1..dec5a35c9aa2 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -126,39 +126,17 @@ static void i915_fence_release(struct dma_fence *fence) i915_sw_fence_fini(&rq->semaphore); /* -* Keep one request on each engine for reserved use under mempressure -* -* We do not hold a reference to the engine here and so have to be -* very careful in what rq->engine we poke. The virtual engine is -* referenced via the rq->context and we released that ref during -* i915_request_retire(), ergo we must not dereference a virtual -* engine here. Not that we would want to, as the only consumer of -* the reserved engine->request_pool is the power management parking, -* which must-not-fail, and that is only run on the physical engines. -* -* Since the request must have been executed to be have completed, -* we know that it will have been processed by the HW and will -* not be unsubmitted again, so rq->engine and rq->execution_mask -* at this point is stable. rq->execution_mask will be a single -* bit if the last and _only_ engine it could execution on was a -* physical engine, if it's multiple bits then it started on and -* could still be on a virtual engine. Thus if the mask is not a -* power-of-two we assume that rq->engine may still be a virtual -* engine and so a dangling invalid pointer that we cannot dereference -* -* For example, consider the flow of a bonded request through a virtual -* engine. The request is created with a wide engine mask (all engines -* that we might execute on). On processing the bond, the request mask -* is reduced to one or more engines. If the request is subsequently -* bound to a single engine, it will then be constrained to only -* execute on that engine and never returned to the virtual engine -* after timeslicing away, see __unwind_incomplete_requests(). Thus we -* know that if the rq->execution_mask is a single bit, rq->engine -* can be a physical engine with the exact corresponding mask. +* Keep one request on each engine for reserved use under mempressure, +* do not use with virtual engines as this really is only needed for +* kernel contexts. */ - if (is_power_of_2(rq->execution_mask) && - !cmpxchg(&rq->engine->request_pool, NULL, rq)) + if (!intel_engine_is_virtual(rq->engine) && + !cmpxchg(&rq->engine->request_pool, NULL, rq)) { + intel_context_put(rq->context); return; + } + + intel_context_put(rq->context); The put is actually unconditional? So it could be moved before the if? John. kmem_cache_free(global.slab_requests, rq); } @@ -977,7 +955,18 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp) } } - rq->context = ce; + /* +* Hold a reference to the intel_context over life of an i915_request. +* Without this an i915_request can exist after the context has been +* destroyed (e.g. request retired, context closed, but user space holds +* a reference to the request from an out fence). In the case of GuC +* submission + virtual engine, the engine that the request references +* is also destroyed which can trigger bad pointer dref in fence ops +* (e.g. i915_fence_get_driver_name). We could likely change these +* functions to avoid touching the engine but let's just be safe and +* hold the intel_context reference. +*/ + rq->context = intel_context_get(ce); rq->engine = ce->engine; rq->ring = ce->ring; rq->execution_mask = ce->engine->mask; @@ -1054,6 +1043,7 @@ __i915_request_create(struct intel_
Re: [PATCH 29/47] drm/i915/guc: Disable bonding extension with GuC submission
On 6/24/2021 00:04, Matthew Brost wrote: Update the bonding extension to return -ENODEV when using GuC submission as this extension fundamentally will not work with the GuC submission interface. Signed-off-by: Matthew Brost Reviewed-by: John Harrison --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 8a9293e0ca92..0429aa4172bf 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -1674,6 +1674,11 @@ set_engines__bond(struct i915_user_extension __user *base, void *data) } virtual = set->engines->engines[idx]->engine; + if (intel_engine_uses_guc(virtual)) { + DRM_DEBUG("bonding extension not supported with GuC submission"); + return -ENODEV; + } + err = check_user_mbz(&ext->flags); if (err) return err;
Re: [PATCH 30/47] drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs
On 6/24/2021 00:04, Matthew Brost wrote: With GuC virtual engines the physical engine which a request executes and completes on isn't known to the i915. Therefore we can't attach a request to a physical engines breadcrumbs. To work around this we create a single breadcrumbs per engine class when using GuC submission and direct all physical engine interrupts to this breadcrumbs. Signed-off-by: Matthew Brost CC: John Harrison --- drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 41 +--- drivers/gpu/drm/i915/gt/intel_breadcrumbs.h | 14 +++- .../gpu/drm/i915/gt/intel_breadcrumbs_types.h | 7 ++ drivers/gpu/drm/i915/gt/intel_engine.h| 3 + drivers/gpu/drm/i915/gt/intel_engine_cs.c | 28 +++- drivers/gpu/drm/i915/gt/intel_engine_types.h | 1 - .../drm/i915/gt/intel_execlists_submission.c | 2 +- drivers/gpu/drm/i915/gt/mock_engine.c | 4 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 67 +-- 9 files changed, 131 insertions(+), 36 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c index 38cc42783dfb..2007dc6f6b99 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c @@ -15,28 +15,14 @@ #include "intel_gt_pm.h" #include "intel_gt_requests.h" -static bool irq_enable(struct intel_engine_cs *engine) +static bool irq_enable(struct intel_breadcrumbs *b) { - if (!engine->irq_enable) - return false; - - /* Caller disables interrupts */ - spin_lock(&engine->gt->irq_lock); - engine->irq_enable(engine); - spin_unlock(&engine->gt->irq_lock); - - return true; + return intel_engine_irq_enable(b->irq_engine); } -static void irq_disable(struct intel_engine_cs *engine) +static void irq_disable(struct intel_breadcrumbs *b) { - if (!engine->irq_disable) - return; - - /* Caller disables interrupts */ - spin_lock(&engine->gt->irq_lock); - engine->irq_disable(engine); - spin_unlock(&engine->gt->irq_lock); + intel_engine_irq_disable(b->irq_engine); } static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b) @@ -57,7 +43,7 @@ static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b) WRITE_ONCE(b->irq_armed, true); /* Requests may have completed before we could enable the interrupt. */ - if (!b->irq_enabled++ && irq_enable(b->irq_engine)) + if (!b->irq_enabled++ && b->irq_enable(b)) irq_work_queue(&b->irq_work); } @@ -76,7 +62,7 @@ static void __intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b) { GEM_BUG_ON(!b->irq_enabled); if (!--b->irq_enabled) - irq_disable(b->irq_engine); + b->irq_disable(b); WRITE_ONCE(b->irq_armed, false); intel_gt_pm_put_async(b->irq_engine->gt); @@ -281,7 +267,7 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine) if (!b) return NULL; - b->irq_engine = irq_engine; + kref_init(&b->ref); spin_lock_init(&b->signalers_lock); INIT_LIST_HEAD(&b->signalers); @@ -290,6 +276,10 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine) spin_lock_init(&b->irq_lock); init_irq_work(&b->irq_work, signal_irq_work); + b->irq_engine = irq_engine; + b->irq_enable = irq_enable; + b->irq_disable = irq_disable; + return b; } @@ -303,9 +293,9 @@ void intel_breadcrumbs_reset(struct intel_breadcrumbs *b) spin_lock_irqsave(&b->irq_lock, flags); if (b->irq_enabled) - irq_enable(b->irq_engine); + b->irq_enable(b); else - irq_disable(b->irq_engine); + b->irq_disable(b); spin_unlock_irqrestore(&b->irq_lock, flags); } @@ -325,11 +315,14 @@ void __intel_breadcrumbs_park(struct intel_breadcrumbs *b) } } -void intel_breadcrumbs_free(struct intel_breadcrumbs *b) +void intel_breadcrumbs_free(struct kref *kref) { + struct intel_breadcrumbs *b = container_of(kref, typeof(*b), ref); + irq_work_sync(&b->irq_work); GEM_BUG_ON(!list_empty(&b->signalers)); GEM_BUG_ON(b->irq_armed); + kfree(b); } diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h index 3ce5ce270b04..72105b74663d 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h @@ -17,7 +17,7 @@ struct intel_breadcrumbs; struct intel_breadcrumbs * intel_breadcrumbs_create(struct intel_engine_cs *irq_engine); -void intel_breadc
Re: [PATCH 31/47] drm/i915/guc: Reset implementation for new GuC interface
On 6/24/2021 00:05, Matthew Brost wrote: Reset implementation for new GuC interface. This is the legacy reset implementation which is called when the i915 owns the engine hang check. Future patches will offload the engine hang check to GuC but we will continue to maintain this legacy path as a fallback and this code path is also required if the GuC dies. With the new GuC interface it is not possible to reset individual engines - it is only possible to reset the GPU entirely. This patch forces an entire chip reset if any engine hangs. There seems to be quite a lot more code being changed in the patch than is described above. Sure, it's all in order to support resets but there is a lot happening to request/context management, support for GuC submission enable/disable, etc. It feels like this patch really should be split into a couple of prep patches followed by the actual reset support. Plus see couple of minor comments below. Cc: John Harrison Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/intel_context.c | 3 + drivers/gpu/drm/i915/gt/intel_context_types.h | 7 + drivers/gpu/drm/i915/gt/intel_engine_types.h | 6 + .../drm/i915/gt/intel_execlists_submission.c | 40 ++ drivers/gpu/drm/i915/gt/intel_gt_pm.c | 6 +- drivers/gpu/drm/i915/gt/intel_reset.c | 18 +- .../gpu/drm/i915/gt/intel_ring_submission.c | 22 + drivers/gpu/drm/i915/gt/mock_engine.c | 31 + drivers/gpu/drm/i915/gt/uc/intel_guc.c| 13 - drivers/gpu/drm/i915/gt/uc/intel_guc.h| 8 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 581 ++ drivers/gpu/drm/i915/gt/uc/intel_uc.c | 39 +- drivers/gpu/drm/i915/gt/uc/intel_uc.h | 3 + drivers/gpu/drm/i915/i915_request.c | 41 +- drivers/gpu/drm/i915/i915_request.h | 2 + 15 files changed, 649 insertions(+), 171 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index b24a1b7a3f88..2f01437056a8 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -392,6 +392,9 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine) spin_lock_init(&ce->guc_state.lock); INIT_LIST_HEAD(&ce->guc_state.fences); + spin_lock_init(&ce->guc_active.lock); + INIT_LIST_HEAD(&ce->guc_active.requests); + ce->guc_id = GUC_INVALID_LRC_ID; INIT_LIST_HEAD(&ce->guc_id_link); diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index 6945963a31ba..b63c8cf7823b 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -165,6 +165,13 @@ struct intel_context { struct list_head fences; } guc_state; + struct { + /** lock: protects everything in guc_active */ + spinlock_t lock; + /** requests: active requests on this context */ + struct list_head requests; + } guc_active; + /* GuC scheduling state that does not require a lock. */ atomic_t guc_sched_state_no_lock; diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index e7cb6a06db9d..f9d264c008e8 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -426,6 +426,12 @@ struct intel_engine_cs { void (*release)(struct intel_engine_cs *engine); + /* +* Add / remove request from engine active tracking +*/ + void(*add_active_request)(struct i915_request *rq); + void(*remove_active_request)(struct i915_request *rq); + struct intel_engine_execlists execlists; /* diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index c10ea6080752..c301a2d088b1 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -3118,6 +3118,42 @@ static void execlists_park(struct intel_engine_cs *engine) cancel_timer(&engine->execlists.preempt); } +static void add_to_engine(struct i915_request *rq) +{ + lockdep_assert_held(&rq->engine->sched_engine->lock); + list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests); +} + +static void remove_from_engine(struct i915_request *rq) +{ + struct intel_engine_cs *engine, *locked; + + /* +* Virtual engines complicate acquiring the engine timeline lock, +* as their rq->engine pointer is not stable until under that +* engine lock. The simple ploy we use is to take the lock then +* check that the rq still belongs to the newly locked engine. +
Re: [PATCH 32/47] drm/i915: Reset GPU immediately if submission is disabled
On 6/24/2021 00:05, Matthew Brost wrote: If submission is disabled by the backend for any reason, reset the GPU immediately in the heartbeat code as the backend can't be reenabled until the GPU is reset. Signed-off-by: Matthew Brost Reviewed-by: John Harrison --- .../gpu/drm/i915/gt/intel_engine_heartbeat.c | 63 +++ .../gpu/drm/i915/gt/intel_engine_heartbeat.h | 4 ++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 9 +++ drivers/gpu/drm/i915/i915_scheduler.c | 6 ++ drivers/gpu/drm/i915/i915_scheduler.h | 6 ++ drivers/gpu/drm/i915/i915_scheduler_types.h | 5 ++ 6 files changed, 80 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c index b6a305e6a974..a8495364d906 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c @@ -70,12 +70,30 @@ static void show_heartbeat(const struct i915_request *rq, { struct drm_printer p = drm_debug_printer("heartbeat"); - intel_engine_dump(engine, &p, - "%s heartbeat {seqno:%llx:%lld, prio:%d} not ticking\n", - engine->name, - rq->fence.context, - rq->fence.seqno, - rq->sched.attr.priority); + if (!rq) { + intel_engine_dump(engine, &p, + "%s heartbeat not ticking\n", + engine->name); + } else { + intel_engine_dump(engine, &p, + "%s heartbeat {seqno:%llx:%lld, prio:%d} not ticking\n", + engine->name, + rq->fence.context, + rq->fence.seqno, + rq->sched.attr.priority); + } +} + +static void +reset_engine(struct intel_engine_cs *engine, struct i915_request *rq) +{ + if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) + show_heartbeat(rq, engine); + + intel_gt_handle_error(engine->gt, engine->mask, + I915_ERROR_CAPTURE, + "stopped heartbeat on %s", + engine->name); } static void heartbeat(struct work_struct *wrk) @@ -102,6 +120,11 @@ static void heartbeat(struct work_struct *wrk) if (intel_gt_is_wedged(engine->gt)) goto out; + if (i915_sched_engine_disabled(engine->sched_engine)) { + reset_engine(engine, engine->heartbeat.systole); + goto out; + } + if (engine->heartbeat.systole) { long delay = READ_ONCE(engine->props.heartbeat_interval_ms); @@ -139,13 +162,7 @@ static void heartbeat(struct work_struct *wrk) engine->sched_engine->schedule(rq, &attr); local_bh_enable(); } else { - if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) - show_heartbeat(rq, engine); - - intel_gt_handle_error(engine->gt, engine->mask, - I915_ERROR_CAPTURE, - "stopped heartbeat on %s", - engine->name); + reset_engine(engine, rq); } rq->emitted_jiffies = jiffies; @@ -194,6 +211,26 @@ void intel_engine_park_heartbeat(struct intel_engine_cs *engine) i915_request_put(fetch_and_zero(&engine->heartbeat.systole)); } +void intel_gt_unpark_heartbeats(struct intel_gt *gt) +{ + struct intel_engine_cs *engine; + enum intel_engine_id id; + + for_each_engine(engine, gt, id) + if (intel_engine_pm_is_awake(engine)) + intel_engine_unpark_heartbeat(engine); + +} + +void intel_gt_park_heartbeats(struct intel_gt *gt) +{ + struct intel_engine_cs *engine; + enum intel_engine_id id; + + for_each_engine(engine, gt, id) + intel_engine_park_heartbeat(engine); +} + void intel_engine_init_heartbeat(struct intel_engine_cs *engine) { INIT_DELAYED_WORK(&engine->heartbeat.work, heartbeat); diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h index a488ea3e84a3..5da6d809a87a 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h @@ -7,6 +7,7 @@ #define INTEL_ENGINE_HEARTBEAT_H struct intel_engine_cs; +struct intel_gt; void intel_engine_init_heartbeat(struct intel_engine_cs *engine); @@ -16,6 +17,9 @@ int intel_e
Re: [PATCH 33/47] drm/i915/guc: Add disable interrupts to guc sanitize
On 6/24/2021 00:05, Matthew Brost wrote: Add disable GuC interrupts to intel_guc_sanitize(). Part of this requires moving the guc_*_interrupt wrapper function into header file intel_guc.h. Signed-off-by: Matthew Brost Cc: Daniele Ceraolo Spurio Reviewed-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_guc.h | 16 drivers/gpu/drm/i915/gt/uc/intel_uc.c | 21 +++-- 2 files changed, 19 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 40c9868762d7..85ef6767f13b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -217,9 +217,25 @@ static inline bool intel_guc_is_ready(struct intel_guc *guc) return intel_guc_is_fw_running(guc) && intel_guc_ct_enabled(&guc->ct); } +static inline void intel_guc_reset_interrupts(struct intel_guc *guc) +{ + guc->interrupts.reset(guc); +} + +static inline void intel_guc_enable_interrupts(struct intel_guc *guc) +{ + guc->interrupts.enable(guc); +} + +static inline void intel_guc_disable_interrupts(struct intel_guc *guc) +{ + guc->interrupts.disable(guc); +} + static inline int intel_guc_sanitize(struct intel_guc *guc) { intel_uc_fw_sanitize(&guc->fw); + intel_guc_disable_interrupts(guc); intel_guc_ct_sanitize(&guc->ct); guc->mmio_msg = 0; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c index f0b02200aa01..ab11fe731ee7 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c @@ -207,21 +207,6 @@ static void guc_handle_mmio_msg(struct intel_guc *guc) spin_unlock_irq(&guc->irq_lock); } -static void guc_reset_interrupts(struct intel_guc *guc) -{ - guc->interrupts.reset(guc); -} - -static void guc_enable_interrupts(struct intel_guc *guc) -{ - guc->interrupts.enable(guc); -} - -static void guc_disable_interrupts(struct intel_guc *guc) -{ - guc->interrupts.disable(guc); -} - static int guc_enable_communication(struct intel_guc *guc) { struct intel_gt *gt = guc_to_gt(guc); @@ -242,7 +227,7 @@ static int guc_enable_communication(struct intel_guc *guc) guc_get_mmio_msg(guc); guc_handle_mmio_msg(guc); - guc_enable_interrupts(guc); + intel_guc_enable_interrupts(guc); /* check for CT messages received before we enabled interrupts */ spin_lock_irq(>->irq_lock); @@ -265,7 +250,7 @@ static void guc_disable_communication(struct intel_guc *guc) */ guc_clear_mmio_msg(guc); - guc_disable_interrupts(guc); + intel_guc_disable_interrupts(guc); intel_guc_ct_disable(&guc->ct); @@ -463,7 +448,7 @@ static int __uc_init_hw(struct intel_uc *uc) if (ret) goto err_out; - guc_reset_interrupts(guc); + intel_guc_reset_interrupts(guc); /* WaEnableuKernelHeaderValidFix:skl */ /* WaEnableGuCBootHashCheckNotSet:skl,bxt,kbl */
Re: [PATCH 23/47] drm/i915/guc: Update GuC debugfs to support new GuC
On 7/12/2021 13:59, Matthew Brost wrote: On Mon, Jul 12, 2021 at 11:05:59AM -0700, John Harrison wrote: On 6/24/2021 00:04, Matthew Brost wrote: Update GuC debugfs to support the new GuC structures. Signed-off-by: John Harrison Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 22 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 3 ++ .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c| 23 +++- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 52 +++ .../gpu/drm/i915/gt/uc/intel_guc_submission.h | 4 ++ drivers/gpu/drm/i915/i915_debugfs.c | 1 + 6 files changed, 104 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index e0f92e28350c..4ed074df88e5 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -1135,3 +1135,25 @@ void intel_guc_ct_event_handler(struct intel_guc_ct *ct) ct_try_receive_message(ct); } + +void intel_guc_log_ct_info(struct intel_guc_ct *ct, + struct drm_printer *p) +{ + if (!ct->enabled) { + drm_puts(p, "CT disabled\n"); + return; + } + + drm_printf(p, "H2G Space: %u\n", + atomic_read(&ct->ctbs.send.space) * 4); + drm_printf(p, "Head: %u\n", + ct->ctbs.send.desc->head); + drm_printf(p, "Tail: %u\n", + ct->ctbs.send.desc->tail); + drm_printf(p, "G2H Space: %u\n", + atomic_read(&ct->ctbs.recv.space) * 4); + drm_printf(p, "Head: %u\n", + ct->ctbs.recv.desc->head); + drm_printf(p, "Tail: %u\n", + ct->ctbs.recv.desc->tail); +} diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index ab1b79ab960b..f62eb06b32fc 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -16,6 +16,7 @@ struct i915_vma; struct intel_guc; +struct drm_printer; /** * DOC: Command Transport (CT). @@ -106,4 +107,6 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, u32 *response_buf, u32 response_buf_size, u32 flags); void intel_guc_ct_event_handler(struct intel_guc_ct *ct); +void intel_guc_log_ct_info(struct intel_guc_ct *ct, struct drm_printer *p); + #endif /* _INTEL_GUC_CT_H_ */ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c index fe7cb7b29a1e..62b9ce0fafaa 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c @@ -9,6 +9,8 @@ #include "intel_guc.h" #include "intel_guc_debugfs.h" #include "intel_guc_log_debugfs.h" +#include "gt/uc/intel_guc_ct.h" +#include "gt/uc/intel_guc_submission.h" static int guc_info_show(struct seq_file *m, void *data) { @@ -22,16 +24,35 @@ static int guc_info_show(struct seq_file *m, void *data) drm_puts(&p, "\n"); intel_guc_log_info(&guc->log, &p); - /* Add more as required ... */ + if (!intel_guc_submission_is_used(guc)) + return 0; + + intel_guc_log_ct_info(&guc->ct, &p); + intel_guc_log_submission_info(guc, &p); return 0; } DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_info); +static int guc_registered_contexts_show(struct seq_file *m, void *data) +{ + struct intel_guc *guc = m->private; + struct drm_printer p = drm_seq_file_printer(m); + + if (!intel_guc_submission_is_used(guc)) + return -ENODEV; + + intel_guc_log_context_info(guc, &p); + + return 0; +} +DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_registered_contexts); + void intel_guc_debugfs_register(struct intel_guc *guc, struct dentry *root) { static const struct debugfs_gt_file files[] = { { "guc_info", &guc_info_fops, NULL }, + { "guc_registered_contexts", &guc_registered_contexts_fops, NULL }, }; if (!intel_guc_is_supported(guc)) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index d1a28283a9ae..89b3c7e5d15b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1600,3 +1600,55 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc, return 0; } + +void intel_guc_log_submission_info(struct intel_guc *guc, + struct drm_printer *p) +{ + struct i915_sched_engine *sched_engine = guc->sched_engine; + struct rb_node *rb; + u
Re: [Intel-gfx] [PATCH 28/47] drm/i915: Hold reference to intel_context over life of i915_request
On 7/12/2021 14:36, Matthew Brost wrote: On Mon, Jul 12, 2021 at 08:05:30PM +, Matthew Brost wrote: On Mon, Jul 12, 2021 at 11:23:14AM -0700, John Harrison wrote: On 6/24/2021 00:04, Matthew Brost wrote: Hold a reference to the intel_context over life of an i915_request. Without this an i915_request can exist after the context has been destroyed (e.g. request retired, context closed, but user space holds a reference to the request from an out fence). In the case of GuC submission + virtual engine, the engine that the request references is also destroyed which can trigger bad pointer dref in fence ops (e.g. Maybe quickly explain a why this is different for GuC submission vs execlist? Presumably it is about only decomposing virtual engines to physical ones in execlist mode? Yes, it because in execlists we always end up pointing to a physical engine in the end while in GuC mode we can be pointing to dynamically allocated virtual engine. I can update the comment. i915_fence_get_driver_name). We could likely change i915_fence_get_driver_name to avoid touching the engine but let's just be safe and hold the intel_context reference. Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/i915_request.c | 54 - 1 file changed, 22 insertions(+), 32 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index de9deb95b8b1..dec5a35c9aa2 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -126,39 +126,17 @@ static void i915_fence_release(struct dma_fence *fence) i915_sw_fence_fini(&rq->semaphore); /* -* Keep one request on each engine for reserved use under mempressure -* -* We do not hold a reference to the engine here and so have to be -* very careful in what rq->engine we poke. The virtual engine is -* referenced via the rq->context and we released that ref during -* i915_request_retire(), ergo we must not dereference a virtual -* engine here. Not that we would want to, as the only consumer of -* the reserved engine->request_pool is the power management parking, -* which must-not-fail, and that is only run on the physical engines. -* -* Since the request must have been executed to be have completed, -* we know that it will have been processed by the HW and will -* not be unsubmitted again, so rq->engine and rq->execution_mask -* at this point is stable. rq->execution_mask will be a single -* bit if the last and _only_ engine it could execution on was a -* physical engine, if it's multiple bits then it started on and -* could still be on a virtual engine. Thus if the mask is not a -* power-of-two we assume that rq->engine may still be a virtual -* engine and so a dangling invalid pointer that we cannot dereference -* -* For example, consider the flow of a bonded request through a virtual -* engine. The request is created with a wide engine mask (all engines -* that we might execute on). On processing the bond, the request mask -* is reduced to one or more engines. If the request is subsequently -* bound to a single engine, it will then be constrained to only -* execute on that engine and never returned to the virtual engine -* after timeslicing away, see __unwind_incomplete_requests(). Thus we -* know that if the rq->execution_mask is a single bit, rq->engine -* can be a physical engine with the exact corresponding mask. +* Keep one request on each engine for reserved use under mempressure, +* do not use with virtual engines as this really is only needed for +* kernel contexts. */ - if (is_power_of_2(rq->execution_mask) && - !cmpxchg(&rq->engine->request_pool, NULL, rq)) + if (!intel_engine_is_virtual(rq->engine) && + !cmpxchg(&rq->engine->request_pool, NULL, rq)) { + intel_context_put(rq->context); return; + } + + intel_context_put(rq->context); The put is actually unconditional? So it could be moved before the if? Yep, I think so. Wait nope. We reference rq->engine which could a virtual engine and the intel_context_put could free that engine. So we need to do the put after we reference it. Matt Doh! That's a pretty good reason. Okay, with a tweaked description to explain about virtual engines being different on GuC vs execlist... Reviewed-by: John Harrison Matt John. kmem_cache_free(global.slab_requests, rq); } @@ -977,7 +955,18 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp) } } - rq->context = ce; + /* +* Hold a reference to the intel_contex
Re: [PATCH 25/47] drm/i915: Add intel_context tracing
On 7/12/2021 14:47, Matthew Brost wrote: On Mon, Jul 12, 2021 at 11:10:40AM -0700, John Harrison wrote: On 6/24/2021 00:04, Matthew Brost wrote: Add intel_context tracing. These trace points are particular helpful when debugging the GuC firmware and can be enabled via CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS kernel config option. Cc: John Harrison Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/intel_context.c | 6 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 14 ++ drivers/gpu/drm/i915/i915_trace.h | 148 +- 3 files changed, 166 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 7f97753ab164..b24a1b7a3f88 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -8,6 +8,7 @@ #include "i915_drv.h" #include "i915_globals.h" +#include "i915_trace.h" #include "intel_context.h" #include "intel_engine.h" @@ -28,6 +29,7 @@ static void rcu_context_free(struct rcu_head *rcu) { struct intel_context *ce = container_of(rcu, typeof(*ce), rcu); + trace_intel_context_free(ce); kmem_cache_free(global.slab_ce, ce); } @@ -46,6 +48,7 @@ intel_context_create(struct intel_engine_cs *engine) return ERR_PTR(-ENOMEM); intel_context_init(ce, engine); + trace_intel_context_create(ce); return ce; } @@ -268,6 +271,8 @@ int __intel_context_do_pin_ww(struct intel_context *ce, GEM_BUG_ON(!intel_context_is_pinned(ce)); /* no overflow! */ + trace_intel_context_do_pin(ce); + err_unlock: mutex_unlock(&ce->pin_mutex); err_post_unpin: @@ -323,6 +328,7 @@ void __intel_context_do_unpin(struct intel_context *ce, int sub) */ intel_context_get(ce); intel_context_active_release(ce); + trace_intel_context_do_unpin(ce); intel_context_put(ce); } diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index c2327eebc09c..d605af0d66e6 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -348,6 +348,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) err = intel_guc_send_nb(guc, action, len, g2h_len_dw); if (!enabled && !err) { + trace_intel_context_sched_enable(ce); atomic_inc(&guc->outstanding_submission_g2h); set_context_enabled(ce); } else if (!enabled) { @@ -812,6 +813,8 @@ static int register_context(struct intel_context *ce) u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) + ce->guc_id * sizeof(struct guc_lrc_desc); + trace_intel_context_register(ce); + return __guc_action_register_context(guc, ce->guc_id, offset); } @@ -831,6 +834,8 @@ static int deregister_context(struct intel_context *ce, u32 guc_id) { struct intel_guc *guc = ce_to_guc(ce); + trace_intel_context_deregister(ce); + return __guc_action_deregister_context(guc, guc_id); } @@ -905,6 +910,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce) * GuC before registering this context. */ if (context_registered) { + trace_intel_context_steal_guc_id(ce); set_context_wait_for_deregister_to_register(ce); intel_context_get(ce); @@ -963,6 +969,7 @@ static void __guc_context_sched_disable(struct intel_guc *guc, GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID); + trace_intel_context_sched_disable(ce); intel_context_get(ce); guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), @@ -1119,6 +1126,9 @@ static void __guc_signal_context_fence(struct intel_context *ce) lockdep_assert_held(&ce->guc_state.lock); + if (!list_empty(&ce->guc_state.fences)) + trace_intel_context_fence_release(ce); + list_for_each_entry(rq, &ce->guc_state.fences, guc_fence_link) i915_sw_fence_complete(&rq->submit); @@ -1529,6 +1539,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc, if (unlikely(!ce)) return -EPROTO; + trace_intel_context_deregister_done(ce); + if (context_wait_for_deregister_to_register(ce)) { struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm; @@ -1580,6 +1592,8 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc, return -EPROTO; } + trace_intel_context_sched_done(ce); + if (context_pending_enable(ce)) { clr_context_pending_enable(ce); } else if (context_pending_disable(ce)) { diff --git a/drivers/gpu/
Re: [PATCH 34/47] drm/i915/guc: Suspend/resume implementation for new interface
On 6/24/2021 00:05, Matthew Brost wrote: The new GuC interface introduces an MMIO H2G command, INTEL_GUC_ACTION_RESET_CLIENT, which is used to implement suspend. This MMIO tears down any active contexts generating a context reset G2H CTB for each. Once that step completes the GuC tears down the CTB channels. It is safe to suspend once this MMIO H2G command completes and all G2H CTBs have been processed. In practice the i915 will likely never receive a G2H as suspend should only be called after the GPU is idle. Resume is implemented in the same manner as before - simply reload the GuC firmware and reinitialize everything (e.g. CTB channels, contexts, etc..). Cc: John Harrison Signed-off-by: Matthew Brost Signed-off-by: Michal Wajdeczko Reviewed-by: John Harrison --- .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc.c| 64 --- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 14 ++-- .../gpu/drm/i915/gt/uc/intel_guc_submission.h | 5 ++ drivers/gpu/drm/i915/gt/uc/intel_uc.c | 20 -- 5 files changed, 53 insertions(+), 51 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h index 57e18babdf4b..596cf4b818e5 100644 --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h @@ -142,6 +142,7 @@ enum intel_guc_action { INTEL_GUC_ACTION_REGISTER_COMMAND_TRANSPORT_BUFFER = 0x4505, INTEL_GUC_ACTION_DEREGISTER_COMMAND_TRANSPORT_BUFFER = 0x4506, INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE = 0x4600, + INTEL_GUC_ACTION_RESET_CLIENT = 0x5B01, INTEL_GUC_ACTION_LIMIT }; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c index 9b09395b998f..68266cbffd1f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c @@ -524,51 +524,34 @@ int intel_guc_auth_huc(struct intel_guc *guc, u32 rsa_offset) */ int intel_guc_suspend(struct intel_guc *guc) { - struct intel_uncore *uncore = guc_to_gt(guc)->uncore; int ret; - u32 status; u32 action[] = { - INTEL_GUC_ACTION_ENTER_S_STATE, - GUC_POWER_D1, /* any value greater than GUC_POWER_D0 */ + INTEL_GUC_ACTION_RESET_CLIENT, }; - /* -* If GuC communication is enabled but submission is not supported, -* we do not need to suspend the GuC. -*/ - if (!intel_guc_submission_is_used(guc) || !intel_guc_is_ready(guc)) + if (!intel_guc_is_ready(guc)) return 0; - /* -* The ENTER_S_STATE action queues the save/restore operation in GuC FW -* and then returns, so waiting on the H2G is not enough to guarantee -* GuC is done. When all the processing is done, GuC writes -* INTEL_GUC_SLEEP_STATE_SUCCESS to scratch register 14, so we can poll -* on that. Note that GuC does not ensure that the value in the register -* is different from INTEL_GUC_SLEEP_STATE_SUCCESS while the action is -* in progress so we need to take care of that ourselves as well. -*/ - - intel_uncore_write(uncore, SOFT_SCRATCH(14), - INTEL_GUC_SLEEP_STATE_INVALID_MASK); - - ret = intel_guc_send(guc, action, ARRAY_SIZE(action)); - if (ret) - return ret; - - ret = __intel_wait_for_register(uncore, SOFT_SCRATCH(14), - INTEL_GUC_SLEEP_STATE_INVALID_MASK, - 0, 0, 10, &status); - if (ret) - return ret; - - if (status != INTEL_GUC_SLEEP_STATE_SUCCESS) { - DRM_ERROR("GuC failed to change sleep state. " - "action=0x%x, err=%u\n", - action[0], status); - return -EIO; + if (intel_guc_submission_is_used(guc)) { + /* +* This H2G MMIO command tears down the GuC in two steps. First it will +* generate a G2H CTB for every active context indicating a reset. In +* practice the i915 shouldn't ever get a G2H as suspend should only be +* called when the GPU is idle. Next, it tears down the CTBs and this +* H2G MMIO command completes. +* +* Don't abort on a failure code from the GuC. Keep going and do the +* clean up in santize() and re-initialisation on resume and hopefully +* the error here won't be problematic. +*/ + ret = intel_guc_send_mmio(guc, action, ARRAY_SIZE(action), NULL, 0); + if (ret) + DRM_ERROR("GuC suspend: RESET_CLIENT action failed with error %d!\n", ret);
Re: [PATCH 35/47] drm/i915/guc: Handle context reset notification
On 6/24/2021 00:05, Matthew Brost wrote: GuC will issue a reset on detecting an engine hang and will notify the driver via a G2H message. The driver will service the notification by resetting the guilty context to a simple state or banning it completely. Cc: Matthew Brost Cc: John Harrison Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/uc/intel_guc.h| 2 ++ drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 3 ++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 35 +++ drivers/gpu/drm/i915/i915_trace.h | 10 ++ 4 files changed, 50 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 85ef6767f13b..e94b0ef733da 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -262,6 +262,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc, const u32 *msg, u32 len); int intel_guc_sched_done_process_msg(struct intel_guc *guc, const u32 *msg, u32 len); +int intel_guc_context_reset_process_msg(struct intel_guc *guc, + const u32 *msg, u32 len); void intel_guc_submission_reset_prepare(struct intel_guc *guc); void intel_guc_submission_reset(struct intel_guc *guc, bool stalled); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 4ed074df88e5..a2020373b8e8 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -945,6 +945,9 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r case INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE: ret = intel_guc_sched_done_process_msg(guc, payload, len); break; + case INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION: + ret = intel_guc_context_reset_process_msg(guc, payload, len); + break; default: ret = -EOPNOTSUPP; break; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 16b61fe71b07..9845c5bd9832 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -2192,6 +2192,41 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc, return 0; } +static void guc_context_replay(struct intel_context *ce) +{ + struct i915_sched_engine *sched_engine = ce->engine->sched_engine; + + __guc_reset_context(ce, true); + tasklet_hi_schedule(&sched_engine->tasklet); +} + +static void guc_handle_context_reset(struct intel_guc *guc, +struct intel_context *ce) +{ + trace_intel_context_reset(ce); + guc_context_replay(ce); +} + +int intel_guc_context_reset_process_msg(struct intel_guc *guc, + const u32 *msg, u32 len) +{ + struct intel_context *ce; + int desc_idx = msg[0]; Should do this dereference after checking the length? Or is it guaranteed that the length cannot be zero? John. + + if (unlikely(len != 1)) { + drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len); + return -EPROTO; + } + + ce = g2h_context_lookup(guc, desc_idx); + if (unlikely(!ce)) + return -EPROTO; + + guc_handle_context_reset(guc, ce); + + return 0; +} + void intel_guc_log_submission_info(struct intel_guc *guc, struct drm_printer *p) { diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h index 97c2e83984ed..c095c4d39456 100644 --- a/drivers/gpu/drm/i915/i915_trace.h +++ b/drivers/gpu/drm/i915/i915_trace.h @@ -929,6 +929,11 @@ DECLARE_EVENT_CLASS(intel_context, __entry->guc_sched_state_no_lock) ); +DEFINE_EVENT(intel_context, intel_context_reset, +TP_PROTO(struct intel_context *ce), +TP_ARGS(ce) +); + DEFINE_EVENT(intel_context, intel_context_register, TP_PROTO(struct intel_context *ce), TP_ARGS(ce) @@ -1026,6 +1031,11 @@ trace_i915_request_out(struct i915_request *rq) { } +static inline void +trace_intel_context_reset(struct intel_context *ce) +{ +} + static inline void trace_intel_context_register(struct intel_context *ce) {
Re: [PATCH 36/47] drm/i915/guc: Handle engine reset failure notification
On 6/24/2021 00:05, Matthew Brost wrote: GuC will notify the driver, via G2H, if it fails to reset an engine. We recover by resorting to a full GPU reset. Signed-off-by: Matthew Brost Signed-off-by: Fernando Pacheco Reviewed-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_guc.h| 2 + drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 3 ++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 43 +++ 3 files changed, 48 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index e94b0ef733da..99742625e6ff 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -264,6 +264,8 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc, const u32 *msg, u32 len); int intel_guc_context_reset_process_msg(struct intel_guc *guc, const u32 *msg, u32 len); +int intel_guc_engine_failure_process_msg(struct intel_guc *guc, +const u32 *msg, u32 len); void intel_guc_submission_reset_prepare(struct intel_guc *guc); void intel_guc_submission_reset(struct intel_guc *guc, bool stalled); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index a2020373b8e8..dd6177c8d75c 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -948,6 +948,9 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r case INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION: ret = intel_guc_context_reset_process_msg(guc, payload, len); break; + case INTEL_GUC_ACTION_ENGINE_FAILURE_NOTIFICATION: + ret = intel_guc_engine_failure_process_msg(guc, payload, len); + break; default: ret = -EOPNOTSUPP; break; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 9845c5bd9832..c3223958dfe0 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -2227,6 +2227,49 @@ int intel_guc_context_reset_process_msg(struct intel_guc *guc, return 0; } +static struct intel_engine_cs * +guc_lookup_engine(struct intel_guc *guc, u8 guc_class, u8 instance) +{ + struct intel_gt *gt = guc_to_gt(guc); + u8 engine_class = guc_class_to_engine_class(guc_class); + + /* Class index is checked in class converter */ + GEM_BUG_ON(instance > MAX_ENGINE_INSTANCE); + + return gt->engine_class[engine_class][instance]; +} + +int intel_guc_engine_failure_process_msg(struct intel_guc *guc, +const u32 *msg, u32 len) +{ + struct intel_engine_cs *engine; + u8 guc_class, instance; + u32 reason; + + if (unlikely(len != 3)) { + drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len); + return -EPROTO; + } + + guc_class = msg[0]; + instance = msg[1]; + reason = msg[2]; + + engine = guc_lookup_engine(guc, guc_class, instance); + if (unlikely(!engine)) { + drm_dbg(&guc_to_gt(guc)->i915->drm, + "Invalid engine %d:%d", guc_class, instance); + return -EPROTO; + } + + intel_gt_handle_error(guc_to_gt(guc), engine->mask, + I915_ERROR_CAPTURE, + "GuC failed to reset %s (reason=0x%08x)\n", + engine->name, reason); + + return 0; +} + void intel_guc_log_submission_info(struct intel_guc *guc, struct drm_printer *p) {
Re: [PATCH 37/47] drm/i915/guc: Enable the timer expired interrupt for GuC
On 6/24/2021 00:05, Matthew Brost wrote: The GuC can implement execution qunatums, detect hung contexts and other such things but it requires the timer expired interrupt to do so. Signed-off-by: Matthew Brost CC: John Harrison Reviewed-by: John Harrison --- drivers/gpu/drm/i915/gt/intel_rps.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/i915/gt/intel_rps.c index 06e9a8ed4e03..0c8e7f2b06f0 100644 --- a/drivers/gpu/drm/i915/gt/intel_rps.c +++ b/drivers/gpu/drm/i915/gt/intel_rps.c @@ -1877,6 +1877,10 @@ void intel_rps_init(struct intel_rps *rps) if (GRAPHICS_VER(i915) >= 8 && GRAPHICS_VER(i915) < 11) rps->pm_intrmsk_mbz |= GEN8_PMINTR_DISABLE_REDIRECT_TO_GUC; + + /* GuC needs ARAT expired interrupt unmasked */ + if (intel_uc_uses_guc_submission(&rps_to_gt(rps)->uc)) + rps->pm_intrmsk_mbz |= ARAT_EXPIRED_INTRMSK; } void intel_rps_sanitize(struct intel_rps *rps)
Re: [PATCH 41/47] drm/i915/guc: Capture error state on context reset
On 6/24/2021 00:05, Matthew Brost wrote: We receive notification of an engine reset from GuC at its completion. Meaning GuC has potentially cleared any HW state we may have been interested in capturing. GuC resumes scheduling on the engine post-reset, as the resets are meant to be transparent, further muddling our error state. There is ongoing work to define an API for a GuC debug state dump. The suggestion for now is to manually disable FW initiated resets in cases where debug state is needed. Signed-off-by: Matthew Brost Reviewed-by: John Harrison --- drivers/gpu/drm/i915/gt/intel_context.c | 20 +++ drivers/gpu/drm/i915/gt/intel_context.h | 3 ++ drivers/gpu/drm/i915/gt/intel_engine.h| 21 ++- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 11 -- drivers/gpu/drm/i915/gt/intel_engine_types.h | 2 ++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 35 +-- drivers/gpu/drm/i915/i915_gpu_error.c | 25 ++--- 7 files changed, 91 insertions(+), 26 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 2f01437056a8..3fe7794b2bfd 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -514,6 +514,26 @@ struct i915_request *intel_context_create_request(struct intel_context *ce) return rq; } +struct i915_request *intel_context_find_active_request(struct intel_context *ce) +{ + struct i915_request *rq, *active = NULL; + unsigned long flags; + + GEM_BUG_ON(!intel_engine_uses_guc(ce->engine)); + + spin_lock_irqsave(&ce->guc_active.lock, flags); + list_for_each_entry_reverse(rq, &ce->guc_active.requests, + sched.link) { + if (i915_request_completed(rq)) + break; + + active = rq; + } + spin_unlock_irqrestore(&ce->guc_active.lock, flags); + + return active; +} + #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) #include "selftest_context.c" #endif diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h index a592a9605dc8..3363b59c0c40 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.h +++ b/drivers/gpu/drm/i915/gt/intel_context.h @@ -201,6 +201,9 @@ int intel_context_prepare_remote_request(struct intel_context *ce, struct i915_request *intel_context_create_request(struct intel_context *ce); +struct i915_request * +intel_context_find_active_request(struct intel_context *ce); + static inline struct intel_ring *__intel_context_ring_size(u64 sz) { return u64_to_ptr(struct intel_ring, sz); diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h index e9e0657f847a..6ea5643a3aaa 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine.h +++ b/drivers/gpu/drm/i915/gt/intel_engine.h @@ -245,7 +245,7 @@ ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine, ktime_t *now); struct i915_request * -intel_engine_find_active_request(struct intel_engine_cs *engine); +intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine); u32 intel_engine_context_size(struct intel_gt *gt, u8 class); struct intel_context * @@ -328,4 +328,23 @@ intel_engine_get_sibling(struct intel_engine_cs *engine, unsigned int sibling) return engine->cops->get_sibling(engine, sibling); } +static inline void +intel_engine_set_hung_context(struct intel_engine_cs *engine, + struct intel_context *ce) +{ + engine->hung_ce = ce; +} + +static inline void +intel_engine_clear_hung_context(struct intel_engine_cs *engine) +{ + intel_engine_set_hung_context(engine, NULL); +} + +static inline struct intel_context * +intel_engine_get_hung_context(struct intel_engine_cs *engine) +{ + return engine->hung_ce; +} + #endif /* _INTEL_RINGBUFFER_H_ */ diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 69245670b8b0..1d243b83b023 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -1671,7 +1671,7 @@ void intel_engine_dump(struct intel_engine_cs *engine, drm_printf(m, "\tRequests:\n"); spin_lock_irqsave(&engine->sched_engine->lock, flags); - rq = intel_engine_find_active_request(engine); + rq = intel_engine_execlist_find_hung_request(engine); if (rq) { struct intel_timeline *tl = get_timeline(rq); @@ -1782,10 +1782,17 @@ static bool match_ring(struct i915_request *rq) } struct i915_request * -intel_engine_find_active_request(struct intel_engine_cs *engine) +intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine) { struct i915_request *request, *active = NULL;
Re: [PATCH 21/47] drm/i915/guc: Ensure G2H response has space in buffer
On 6/24/2021 00:04, Matthew Brost wrote: Ensure G2H response has space in the buffer before sending H2G CTB as the GuC can't handle any backpressure on the G2H interface. Signed-off-by: John Harrison Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/uc/intel_guc.h| 13 +++- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 76 +++ drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 4 +- drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 4 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 13 ++-- 5 files changed, 87 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index b43ec56986b5..24e7a924134e 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -95,11 +95,17 @@ inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len) } #define INTEL_GUC_SEND_NB BIT(31) +#define INTEL_GUC_SEND_G2H_DW_SHIFT0 +#define INTEL_GUC_SEND_G2H_DW_MASK (0xff << INTEL_GUC_SEND_G2H_DW_SHIFT) +#define MAKE_SEND_FLAGS(len) \ + ({GEM_BUG_ON(!FIELD_FIT(INTEL_GUC_SEND_G2H_DW_MASK, len)); \ + (FIELD_PREP(INTEL_GUC_SEND_G2H_DW_MASK, len) | INTEL_GUC_SEND_NB);}) static -inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len) +inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len, +u32 g2h_len_dw) { return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, -INTEL_GUC_SEND_NB); +MAKE_SEND_FLAGS(g2h_len_dw)); } static inline int @@ -113,6 +119,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len, static inline int intel_guc_send_busy_loop(struct intel_guc* guc, const u32 *action, u32 len, + u32 g2h_len_dw, bool loop) { int err; @@ -121,7 +128,7 @@ static inline int intel_guc_send_busy_loop(struct intel_guc* guc, might_sleep_if(loop && (!in_atomic() && !irqs_disabled())); retry: - err = intel_guc_send_nb(guc, action, len); + err = intel_guc_send_nb(guc, action, len, g2h_len_dw); if (unlikely(err == -EBUSY && loop)) { if (likely(!in_atomic() && !irqs_disabled())) cond_resched(); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 7491f041859e..a60970e85635 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -73,6 +73,7 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct) #define CTB_DESC_SIZE ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K) #define CTB_H2G_BUFFER_SIZE (SZ_4K) #define CTB_G2H_BUFFER_SIZE (4 * CTB_H2G_BUFFER_SIZE) +#define G2H_ROOM_BUFFER_SIZE (PAGE_SIZE) Any particular reason why PAGE_SIZE instead of SZ_4K? I'm not seeing anything in the code that is actually related to page sizes. Seems like '(CTB_G2H_BUFFER_SIZE / 4)' would be a more correct way to express it. Unless I'm missing something about how it's used? John. struct ct_request { struct list_head link; @@ -129,23 +130,27 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc) static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb) { + u32 space; + ctb->broken = false; ctb->tail = 0; ctb->head = 0; - ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size); + space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size) - ctb->resv_space; + atomic_set(&ctb->space, space); guc_ct_buffer_desc_init(ctb->desc); } static void guc_ct_buffer_init(struct intel_guc_ct_buffer *ctb, struct guc_ct_buffer_desc *desc, - u32 *cmds, u32 size_in_bytes) + u32 *cmds, u32 size_in_bytes, u32 resv_space) { GEM_BUG_ON(size_in_bytes % 4); ctb->desc = desc; ctb->cmds = cmds; ctb->size = size_in_bytes / 4; + ctb->resv_space = resv_space / 4; guc_ct_buffer_reset(ctb); } @@ -226,6 +231,7 @@ int intel_guc_ct_init(struct intel_guc_ct *ct) struct guc_ct_buffer_desc *desc; u32 blob_size; u32 cmds_size; + u32 resv_space; void *blob; u32 *cmds; int err; @@ -250,19 +256,23 @@ int intel_guc_ct_init(struct intel_guc_ct *ct) desc = blob; cmds = blob + 2 * CTB_DESC_SIZE; cmds_size = CTB_H2G_BUFFER_SIZE; - CT_DEBUG(ct, "%s desc %#tx cmds %#tx size %u\n", "s
Re: [PATCH 21/47] drm/i915/guc: Ensure G2H response has space in buffer
On 7/14/2021 17:06, Matthew Brost wrote: On Tue, Jul 13, 2021 at 11:36:05AM -0700, John Harrison wrote: On 6/24/2021 00:04, Matthew Brost wrote: Ensure G2H response has space in the buffer before sending H2G CTB as the GuC can't handle any backpressure on the G2H interface. Signed-off-by: John Harrison Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/uc/intel_guc.h| 13 +++- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 76 +++ drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 4 +- drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 4 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 13 ++-- 5 files changed, 87 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index b43ec56986b5..24e7a924134e 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -95,11 +95,17 @@ inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len) } #define INTEL_GUC_SEND_NBBIT(31) +#define INTEL_GUC_SEND_G2H_DW_SHIFT0 +#define INTEL_GUC_SEND_G2H_DW_MASK (0xff << INTEL_GUC_SEND_G2H_DW_SHIFT) +#define MAKE_SEND_FLAGS(len) \ + ({GEM_BUG_ON(!FIELD_FIT(INTEL_GUC_SEND_G2H_DW_MASK, len)); \ + (FIELD_PREP(INTEL_GUC_SEND_G2H_DW_MASK, len) | INTEL_GUC_SEND_NB);}) static -inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len) +inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len, +u32 g2h_len_dw) { return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, -INTEL_GUC_SEND_NB); +MAKE_SEND_FLAGS(g2h_len_dw)); } static inline int @@ -113,6 +119,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len, static inline int intel_guc_send_busy_loop(struct intel_guc* guc, const u32 *action, u32 len, + u32 g2h_len_dw, bool loop) { int err; @@ -121,7 +128,7 @@ static inline int intel_guc_send_busy_loop(struct intel_guc* guc, might_sleep_if(loop && (!in_atomic() && !irqs_disabled())); retry: - err = intel_guc_send_nb(guc, action, len); + err = intel_guc_send_nb(guc, action, len, g2h_len_dw); if (unlikely(err == -EBUSY && loop)) { if (likely(!in_atomic() && !irqs_disabled())) cond_resched(); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 7491f041859e..a60970e85635 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -73,6 +73,7 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct) #define CTB_DESC_SIZEALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K) #define CTB_H2G_BUFFER_SIZE (SZ_4K) #define CTB_G2H_BUFFER_SIZE (4 * CTB_H2G_BUFFER_SIZE) +#define G2H_ROOM_BUFFER_SIZE (PAGE_SIZE) Any particular reason why PAGE_SIZE instead of SZ_4K? I'm not seeing anything in the code that is actually related to page sizes. Seems like '(CTB_G2H_BUFFER_SIZE / 4)' would be a more correct way to express it. Unless I'm missing something about how it's used? Yes, CTB_G2H_BUFFER_SIZE / 4 is better. Matt Okay. With that changed: Reviewed-by: John Harrison John. struct ct_request { struct list_head link; @@ -129,23 +130,27 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc) static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb) { + u32 space; + ctb->broken = false; ctb->tail = 0; ctb->head = 0; - ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size); + space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size) - ctb->resv_space; + atomic_set(&ctb->space, space); guc_ct_buffer_desc_init(ctb->desc); } static void guc_ct_buffer_init(struct intel_guc_ct_buffer *ctb, struct guc_ct_buffer_desc *desc, - u32 *cmds, u32 size_in_bytes) + u32 *cmds, u32 size_in_bytes, u32 resv_space) { GEM_BUG_ON(size_in_bytes % 4); ctb->desc = desc; ctb->cmds = cmds; ctb->size = size_in_bytes / 4; + ctb->resv_space = resv_space / 4; guc_ct_buffer_reset(ctb); } @@ -226,6 +231,7 @@ int intel_guc_ct_init(struct intel_guc_ct *ct) struct guc_ct_buffer_desc *desc; u32 blob_size; u32 cmds_size; + u32 resv_space; void *blob; u32 *cmds; int err;
Re: [Intel-gfx] [PATCH 3/3] drm/i915/uapi: Add query for L3 bank count
On 6/16/2021 03:25, Daniel Vetter wrote: On Thu, Jun 10, 2021 at 10:46 PM wrote: From: John Harrison Various UMDs need to know the L3 bank count. So add a query API for it. Please link to both the igt test submission for this (there's not even a Test-with: on the cover letter) Is there a wiki page that describes all such tags? That is not one I was aware of and I can't find anything in the Kernel patch submission wiki or DRM maintainers wiki that mentions it. and the merge requests for the various UMD which uses new uapi. Is there a particular tag to use for this? John. Also as other mentioned, full uapi kerneldoc is needed too. Please fill in any gaps in the existing docs that relate to your addition directly (like we've e.g. done for the extension chaining when adding lmem support). Thanks, Daniel Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/intel_gt.c | 15 +++ drivers/gpu/drm/i915/gt/intel_gt.h | 1 + drivers/gpu/drm/i915/i915_query.c | 22 ++ drivers/gpu/drm/i915/i915_reg.h| 1 + include/uapi/drm/i915_drm.h| 1 + 5 files changed, 40 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index 2161bf01ef8b..708bb3581d83 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -704,3 +704,18 @@ void intel_gt_info_print(const struct intel_gt_info *info, intel_sseu_dump(&info->sseu, p); } + +int intel_gt_get_l3bank_count(struct intel_gt *gt) +{ + struct drm_i915_private *i915 = gt->i915; + intel_wakeref_t wakeref; + u32 fuse3; + + if (GRAPHICS_VER(i915) < 12) + return -ENODEV; + + with_intel_runtime_pm(gt->uncore->rpm, wakeref) + fuse3 = intel_uncore_read(gt->uncore, GEN10_MIRROR_FUSE3); + + return hweight32(REG_FIELD_GET(GEN12_GT_L3_MODE_MASK, ~fuse3)); +} diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h index 7ec395cace69..46aa1cf4cf30 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.h +++ b/drivers/gpu/drm/i915/gt/intel_gt.h @@ -77,6 +77,7 @@ static inline bool intel_gt_is_wedged(const struct intel_gt *gt) void intel_gt_info_print(const struct intel_gt_info *info, struct drm_printer *p); +int intel_gt_get_l3bank_count(struct intel_gt *gt); void intel_gt_watchdog_work(struct work_struct *work); diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c index 96bd8fb3e895..0e92bb2d21b2 100644 --- a/drivers/gpu/drm/i915/i915_query.c +++ b/drivers/gpu/drm/i915/i915_query.c @@ -10,6 +10,7 @@ #include "i915_perf.h" #include "i915_query.h" #include +#include "gt/intel_gt.h" static int copy_query_item(void *query_hdr, size_t query_sz, u32 total_length, @@ -502,6 +503,26 @@ static int query_hwconfig_table(struct drm_i915_private *i915, return hwconfig->size; } +static int query_l3banks(struct drm_i915_private *i915, +struct drm_i915_query_item *query_item) +{ + u32 banks; + + if (query_item->length == 0) + return sizeof(banks); + + if (query_item->length < sizeof(banks)) + return -EINVAL; + + banks = intel_gt_get_l3bank_count(&i915->gt); + + if (copy_to_user(u64_to_user_ptr(query_item->data_ptr), +&banks, sizeof(banks))) + return -EFAULT; + + return sizeof(banks); +} + static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv, struct drm_i915_query_item *query_item) = { query_topology_info, @@ -509,6 +530,7 @@ static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv, query_perf_config, query_memregion_info, query_hwconfig_table, + query_l3banks, }; int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index eb13c601d680..e9ba88fe3db7 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -3099,6 +3099,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg) #defineGEN10_MIRROR_FUSE3 _MMIO(0x9118) #define GEN10_L3BANK_PAIR_COUNT 4 #define GEN10_L3BANK_MASK 0x0F +#define GEN12_GT_L3_MODE_MASK 0xFF #define GEN8_EU_DISABLE0 _MMIO(0x9134) #define GEN8_EU_DIS0_S0_MASK 0xff diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 87d369cae22a..20d18cca5066 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -2234,6 +2234,7 @@ struct drm_i915_query_item { #define DRM_I915_QUERY_PERF_CONFIG 3 #define DRM_I915_QUERY_MEMORY_REGIONS 4 #def
Re: [Intel-gfx] [PATCH 41/51] drm/i915/guc: Add golden context to GuC ADS
On 7/19/2021 10:24, Matthew Brost wrote: On Fri, Jul 16, 2021 at 01:17:14PM -0700, Matthew Brost wrote: From: John Harrison The media watchdog mechanism involves GuC doing a silent reset and continue of the hung context. This requires the i915 driver provide a golden context to GuC in the ADS. Signed-off-by: John Harrison Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/intel_gt.c | 2 + drivers/gpu/drm/i915/gt/uc/intel_guc.c | 5 + drivers/gpu/drm/i915/gt/uc/intel_guc.h | 2 + drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 213 ++--- drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_uc.c | 5 + drivers/gpu/drm/i915/gt/uc/intel_uc.h | 1 + 7 files changed, 199 insertions(+), 30 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index acfdd53b2678..ceeb517ba259 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -654,6 +654,8 @@ int intel_gt_init(struct intel_gt *gt) if (err) goto err_gt; + intel_uc_init_late(>->uc); + err = i915_inject_probe_error(gt->i915, -EIO); if (err) goto err_gt; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c index 68266cbffd1f..979128e28372 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c @@ -180,6 +180,11 @@ void intel_guc_init_early(struct intel_guc *guc) } } +void intel_guc_init_late(struct intel_guc *guc) +{ + intel_guc_ads_init_late(guc); +} + static u32 guc_ctl_debug_flags(struct intel_guc *guc) { u32 level = intel_guc_log_get_level(&guc->log); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index bc71635c70b9..dc18ac510ac8 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -60,6 +60,7 @@ struct intel_guc { struct i915_vma *ads_vma; struct __guc_ads_blob *ads_blob; u32 ads_regset_size; + u32 ads_golden_ctxt_size; struct i915_vma *lrc_desc_pool; void *lrc_desc_pool_vaddr; @@ -176,6 +177,7 @@ static inline u32 intel_guc_ggtt_offset(struct intel_guc *guc, } void intel_guc_init_early(struct intel_guc *guc); +void intel_guc_init_late(struct intel_guc *guc); void intel_guc_init_send_regs(struct intel_guc *guc); void intel_guc_write_params(struct intel_guc *guc); int intel_guc_init(struct intel_guc *guc); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c index 93b0ac35a508..241b3089b658 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c @@ -7,6 +7,7 @@ #include "gt/intel_gt.h" #include "gt/intel_lrc.h" +#include "gt/shmem_utils.h" #include "intel_guc_ads.h" #include "intel_guc_fwif.h" #include "intel_uc.h" @@ -33,6 +34,10 @@ * +---+ <== dynamic * | padding | * +---+ <== 4K aligned + * | golden contexts | + * +---+ + * | padding | + * +---+ <== 4K aligned * | private data | * +---+ * | padding | @@ -52,6 +57,11 @@ static u32 guc_ads_regset_size(struct intel_guc *guc) return guc->ads_regset_size; } +static u32 guc_ads_golden_ctxt_size(struct intel_guc *guc) +{ + return PAGE_ALIGN(guc->ads_golden_ctxt_size); +} + static u32 guc_ads_private_data_size(struct intel_guc *guc) { return PAGE_ALIGN(guc->fw.private_data_size); @@ -62,12 +72,23 @@ static u32 guc_ads_regset_offset(struct intel_guc *guc) return offsetof(struct __guc_ads_blob, regset); } -static u32 guc_ads_private_data_offset(struct intel_guc *guc) +static u32 guc_ads_golden_ctxt_offset(struct intel_guc *guc) { u32 offset; offset = guc_ads_regset_offset(guc) + guc_ads_regset_size(guc); + + return PAGE_ALIGN(offset); +} + +static u32 guc_ads_private_data_offset(struct intel_guc *guc) +{ + u32 offset; + + offset = guc_ads_golden_ctxt_offset(guc) + +guc_ads_golden_ctxt_size(guc); + return PAGE_ALIGN(offset); } @@ -319,53 +340,163 @@ static void guc_mmio_reg_state_init(struct intel_guc *guc, GEM_BUG_ON(temp_set.size); } -/* - * The first 80 dwords of the register state context, containing the - * execlists and ppgtt registers. - */ -#define LR_HW
Re: [Intel-gfx] [PATCH v3 1/4] drm/i915/guc: Limit scheduling properties to avoid overflow
On 3/8/2022 01:43, Tvrtko Ursulin wrote: On 03/03/2022 22:37, john.c.harri...@intel.com wrote: From: John Harrison GuC converts the pre-emption timeout and timeslice quantum values into clock ticks internally. That significantly reduces the point of 32bit overflow. On current platforms, worst case scenario is approximately 110 seconds. Rather than allowing the user to set higher values and then get confused by early timeouts, add limits when setting these values. v2: Add helper functins for clamping (review feedback from Tvrtko). Signed-off-by: John Harrison Reviewed-by: Daniele Ceraolo Spurio (v1) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index b3a429a92c0d..8208164c25e7 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -2218,13 +2218,24 @@ static inline u32 get_children_join_value(struct intel_context *ce, static void guc_context_policy_init(struct intel_engine_cs *engine, struct guc_lrc_desc *desc) { + struct drm_device *drm = &engine->i915->drm; + desc->policy_flags = 0; if (engine->flags & I915_ENGINE_WANT_FORCED_PREEMPTION) desc->policy_flags |= CONTEXT_POLICY_FLAG_PREEMPT_TO_IDLE; /* NB: For both of these, zero means disabled. */ + if (overflows_type(engine->props.timeslice_duration_ms * 1000, + desc->execution_quantum)) + drm_warn_once(drm, "GuC interface cannot support %lums timeslice!\n", + engine->props.timeslice_duration_ms); desc->execution_quantum = engine->props.timeslice_duration_ms * 1000; + + if (overflows_type(engine->props.preempt_timeout_ms * 1000, + desc->preemption_timeout)) + drm_warn_once(drm, "GuC interface cannot support %lums preemption timeout!\n", + engine->props.preempt_timeout_ms); desc->preemption_timeout = engine->props.preempt_timeout_ms * 1000; } As previously explained, this is wrong. If the check must be present then it should be a BUG_ON as it is indicative of an internal driver failure. There is already a top level helper function for ensuring all range checks are done and the value is valid. If that is broken then that is a bug and should have been caught in pre-merge testing or code review. It is not possible for a bad value to get beyond that helper function. That is the whole point of the helper. We do not double bag every other value check in the driver. Once you have passed input validation, the values are assumed to be correct. Otherwise we would have every other line of code be a value check! And if somehow a bad value did make it through, simply printing a once shot warning is pointless. You are still going to get undefined behaviour potentially leading to a totally broken system. E.g. your very big timeout has overflowed and become extremely small, thus no batch buffer can ever complete because they all get reset before they have even finished the context switch in. That is a fundamentally broken system. John. With that: Reviewed-by: Tvrtko Ursulin Regards, Tvrtko --- drivers/gpu/drm/i915/gt/intel_engine.h | 6 ++ drivers/gpu/drm/i915/gt/intel_engine_cs.c | 69 + drivers/gpu/drm/i915/gt/sysfs_engines.c | 25 +--- drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 9 +++ 4 files changed, 99 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h index 1c0ab05c3c40..d7044c4e526e 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine.h +++ b/drivers/gpu/drm/i915/gt/intel_engine.h @@ -351,4 +351,10 @@ intel_engine_get_hung_context(struct intel_engine_cs *engine) return engine->hung_ce; } +u64 intel_clamp_heartbeat_interval_ms(struct intel_engine_cs *engine, u64 value); +u64 intel_clamp_max_busywait_duration_ns(struct intel_engine_cs *engine, u64 value); +u64 intel_clamp_preempt_timeout_ms(struct intel_engine_cs *engine, u64 value); +u64 intel_clamp_stop_timeout_ms(struct intel_engine_cs *engine, u64 value); +u64 intel_clamp_timeslice_duration_ms(struct intel_engine_cs *engine, u64 value); + #endif /* _INTEL_RINGBUFFER_H_ */ diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 7447411a5b26..22e70e4e007c 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -442,6 +442,26 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id, engine->flags |= I915_ENGINE_HAS_EU_PRIORITY; } + /* Cap properties according to any system limits */ +#define CLAMP_PROP(field) \ + do { \ + u64 clamp = intel_clamp_##
Re: [Intel-gfx] [PATCH v3 4/4] drm/i915: Improve long running OCL w/a for GuC submission
On 3/8/2022 01:41, Tvrtko Ursulin wrote: On 03/03/2022 22:37, john.c.harri...@intel.com wrote: From: John Harrison A workaround was added to the driver to allow OpenCL workloads to run 'forever' by disabling pre-emption on the RCS engine for Gen12. It is not totally unbound as the heartbeat will kick in eventually and cause a reset of the hung engine. However, this does not work well in GuC submission mode. In GuC mode, the pre-emption timeout is how GuC detects hung contexts and triggers a per engine reset. Thus, disabling the timeout means also losing all per engine reset ability. A full GT reset will still occur when the heartbeat finally expires, but that is a much more destructive and undesirable mechanism. The purpose of the workaround is actually to give OpenCL tasks longer to reach a pre-emption point after a pre-emption request has been issued. This is necessary because Gen12 does not support mid-thread pre-emption and OpenCL can have long running threads. So, rather than disabling the timeout completely, just set it to a 'long' value. v2: Review feedback from Tvrtko - must hard code the 'long' value instead of determining it algorithmically. So make it an extra CONFIG definition. Also, remove the execlist centric comment from the existing pre-emption timeout CONFIG option given that it applies to more than just execlists. Signed-off-by: John Harrison Reviewed-by: Daniele Ceraolo Spurio (v1) Acked-by: Michal Mrozek --- drivers/gpu/drm/i915/Kconfig.profile | 26 +++ drivers/gpu/drm/i915/gt/intel_engine_cs.c | 9 ++-- 2 files changed, 29 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile index 39328567c200..7cc38d25ee5c 100644 --- a/drivers/gpu/drm/i915/Kconfig.profile +++ b/drivers/gpu/drm/i915/Kconfig.profile @@ -57,10 +57,28 @@ config DRM_I915_PREEMPT_TIMEOUT default 640 # milliseconds help How long to wait (in milliseconds) for a preemption event to occur - when submitting a new context via execlists. If the current context - does not hit an arbitration point and yield to HW before the timer - expires, the HW will be reset to allow the more important context - to execute. + when submitting a new context. If the current context does not hit + an arbitration point and yield to HW before the timer expires, the + HW will be reset to allow the more important context to execute. + + This is adjustable via + /sys/class/drm/card?/engine/*/preempt_timeout_ms + + May be 0 to disable the timeout. + + The compiled in default may get overridden at driver probe time on + certain platforms and certain engines which will be reflected in the + sysfs control. + +config DRM_I915_PREEMPT_TIMEOUT_COMPUTE + int "Preempt timeout for compute engines (ms, jiffy granularity)" + default 7500 # milliseconds + help + How long to wait (in milliseconds) for a preemption event to occur + when submitting a new context to a compute capable engine. If the + current context does not hit an arbitration point and yield to HW + before the timer expires, the HW will be reset to allow the more + important context to execute. This is adjustable via /sys/class/drm/card?/engine/*/preempt_timeout_ms diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 4185c7338581..cc0954ad836a 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -438,9 +438,14 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id, engine->props.timeslice_duration_ms = CONFIG_DRM_I915_TIMESLICE_DURATION; - /* Override to uninterruptible for OpenCL workloads. */ + /* + * Mid-thread pre-emption is not available in Gen12. Unfortunately, + * some OpenCL workloads run quite long threads. That means they get + * reset due to not pre-empting in a timely manner. So, bump the + * pre-emption timeout value to be much higher for compute engines. + */ if (GRAPHICS_VER(i915) == 12 && (engine->flags & I915_ENGINE_HAS_RCS_REG_STATE)) - engine->props.preempt_timeout_ms = 0; + engine->props.preempt_timeout_ms = CONFIG_DRM_I915_PREEMPT_TIMEOUT_COMPUTE; I wouldn't go as far as adding a config option since as it is it only applies to Gen12 but Kconfig text says nothing about that. And I am not saying you should add a Gen12 specific config option, that would be weird. So IMO just drop it. You were the one arguing that the driver was illegally overriding the user's explicitly chosen settings, including the compile time config options. Just having a hardcoded magic number in the driver is the absolute worst kind of override there is. And tec
Re: [Intel-gfx] [PATCH v3 4/4] drm/i915: Improve long running OCL w/a for GuC submission
On 3/10/2022 01:27, Tvrtko Ursulin wrote: On 09/03/2022 21:16, John Harrison wrote: On 3/8/2022 01:41, Tvrtko Ursulin wrote: On 03/03/2022 22:37, john.c.harri...@intel.com wrote: From: John Harrison A workaround was added to the driver to allow OpenCL workloads to run 'forever' by disabling pre-emption on the RCS engine for Gen12. It is not totally unbound as the heartbeat will kick in eventually and cause a reset of the hung engine. However, this does not work well in GuC submission mode. In GuC mode, the pre-emption timeout is how GuC detects hung contexts and triggers a per engine reset. Thus, disabling the timeout means also losing all per engine reset ability. A full GT reset will still occur when the heartbeat finally expires, but that is a much more destructive and undesirable mechanism. The purpose of the workaround is actually to give OpenCL tasks longer to reach a pre-emption point after a pre-emption request has been issued. This is necessary because Gen12 does not support mid-thread pre-emption and OpenCL can have long running threads. So, rather than disabling the timeout completely, just set it to a 'long' value. v2: Review feedback from Tvrtko - must hard code the 'long' value instead of determining it algorithmically. So make it an extra CONFIG definition. Also, remove the execlist centric comment from the existing pre-emption timeout CONFIG option given that it applies to more than just execlists. Signed-off-by: John Harrison Reviewed-by: Daniele Ceraolo Spurio (v1) Acked-by: Michal Mrozek --- drivers/gpu/drm/i915/Kconfig.profile | 26 +++ drivers/gpu/drm/i915/gt/intel_engine_cs.c | 9 ++-- 2 files changed, 29 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile index 39328567c200..7cc38d25ee5c 100644 --- a/drivers/gpu/drm/i915/Kconfig.profile +++ b/drivers/gpu/drm/i915/Kconfig.profile @@ -57,10 +57,28 @@ config DRM_I915_PREEMPT_TIMEOUT default 640 # milliseconds help How long to wait (in milliseconds) for a preemption event to occur - when submitting a new context via execlists. If the current context - does not hit an arbitration point and yield to HW before the timer - expires, the HW will be reset to allow the more important context - to execute. + when submitting a new context. If the current context does not hit + an arbitration point and yield to HW before the timer expires, the + HW will be reset to allow the more important context to execute. + + This is adjustable via + /sys/class/drm/card?/engine/*/preempt_timeout_ms + + May be 0 to disable the timeout. + + The compiled in default may get overridden at driver probe time on + certain platforms and certain engines which will be reflected in the + sysfs control. + +config DRM_I915_PREEMPT_TIMEOUT_COMPUTE + int "Preempt timeout for compute engines (ms, jiffy granularity)" + default 7500 # milliseconds + help + How long to wait (in milliseconds) for a preemption event to occur + when submitting a new context to a compute capable engine. If the + current context does not hit an arbitration point and yield to HW + before the timer expires, the HW will be reset to allow the more + important context to execute. This is adjustable via /sys/class/drm/card?/engine/*/preempt_timeout_ms diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 4185c7338581..cc0954ad836a 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -438,9 +438,14 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id, engine->props.timeslice_duration_ms = CONFIG_DRM_I915_TIMESLICE_DURATION; - /* Override to uninterruptible for OpenCL workloads. */ + /* + * Mid-thread pre-emption is not available in Gen12. Unfortunately, + * some OpenCL workloads run quite long threads. That means they get + * reset due to not pre-empting in a timely manner. So, bump the + * pre-emption timeout value to be much higher for compute engines. + */ if (GRAPHICS_VER(i915) == 12 && (engine->flags & I915_ENGINE_HAS_RCS_REG_STATE)) - engine->props.preempt_timeout_ms = 0; + engine->props.preempt_timeout_ms = CONFIG_DRM_I915_PREEMPT_TIMEOUT_COMPUTE; I wouldn't go as far as adding a config option since as it is it only applies to Gen12 but Kconfig text says nothing about that. And I am not saying you should add a Gen12 specific config option, that would be weird. So IMO just drop it. You were the one arguing that the driver was illegally overriding the user's explicitly chosen settings, including the compile time config This is a bit out of contex
Re: [PATCH] drm/i915/guc: Use iosys_map interface to update lrc_desc
Sorry, only just seen this patch. Please do not do this! The entire lrc_desc_pool entity is being dropped as part of the update to GuC v70. That's why there was a recent patch set to significantly re-organise how/where it is used. That patch set explicitly said - this is all in preparation for removing the desc pool entirely. Merging this change would just cause unnecessary churn and rebase conflicts with the v70 update patches that I am working on. Please wait until that lands and then see if there is anything left that you think still needs to be updated. John. On 3/8/2022 08:47, Balasubramani Vivekanandan wrote: This patch is continuation of the effort to move all pointers in i915, which at any point may be pointing to device memory or system memory, to iosys_map interface. More details about the need of this change is explained in the patch series which initiated this task https://patchwork.freedesktop.org/series/99711/ This patch converts all access to the lrc_desc through iosys_map interfaces. Cc: Lucas De Marchi Cc: John Harrison Cc: Matthew Brost Cc: Umesh Nerlige Ramappa Signed-off-by: Balasubramani Vivekanandan --- drivers/gpu/drm/i915/gt/uc/intel_guc.h| 2 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 68 --- 2 files changed, 43 insertions(+), 27 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index e439e6c1ac8b..cbbc24dbaf0f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -168,7 +168,7 @@ struct intel_guc { /** @lrc_desc_pool: object allocated to hold the GuC LRC descriptor pool */ struct i915_vma *lrc_desc_pool; /** @lrc_desc_pool_vaddr: contents of the GuC LRC descriptor pool */ - void *lrc_desc_pool_vaddr; + struct iosys_map lrc_desc_pool_vaddr; /** * @context_lookup: used to resolve intel_context from guc_id, if a diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 9ec03234d2c2..84b17ded886a 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -467,13 +467,14 @@ static u32 *get_wq_pointer(struct guc_process_desc *desc, return &__get_parent_scratch(ce)->wq[ce->parallel.guc.wqi_tail / sizeof(u32)]; } -static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index) +static void __write_lrc_desc(struct intel_guc *guc, u32 index, +struct guc_lrc_desc *desc) { - struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr; + unsigned int size = sizeof(struct guc_lrc_desc); GEM_BUG_ON(index >= GUC_MAX_CONTEXT_ID); - return &base[index]; + iosys_map_memcpy_to(&guc->lrc_desc_pool_vaddr, index * size, desc, size); } static inline struct intel_context *__get_context(struct intel_guc *guc, u32 id) @@ -489,20 +490,28 @@ static int guc_lrc_desc_pool_create(struct intel_guc *guc) { u32 size; int ret; + void *addr; size = PAGE_ALIGN(sizeof(struct guc_lrc_desc) * GUC_MAX_CONTEXT_ID); ret = intel_guc_allocate_and_map_vma(guc, size, &guc->lrc_desc_pool, -(void **)&guc->lrc_desc_pool_vaddr); +&addr); + if (ret) return ret; + if (i915_gem_object_is_lmem(guc->lrc_desc_pool->obj)) + iosys_map_set_vaddr_iomem(&guc->lrc_desc_pool_vaddr, + (void __iomem *)addr); + else + iosys_map_set_vaddr(&guc->lrc_desc_pool_vaddr, addr); + return 0; } static void guc_lrc_desc_pool_destroy(struct intel_guc *guc) { - guc->lrc_desc_pool_vaddr = NULL; + iosys_map_clear(&guc->lrc_desc_pool_vaddr); i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP); } @@ -513,9 +522,11 @@ static inline bool guc_submission_initialized(struct intel_guc *guc) static inline void _reset_lrc_desc(struct intel_guc *guc, u32 id) { - struct guc_lrc_desc *desc = __get_lrc_desc(guc, id); + unsigned int size = sizeof(struct guc_lrc_desc); - memset(desc, 0, sizeof(*desc)); + GEM_BUG_ON(id >= GUC_MAX_CONTEXT_ID); + + iosys_map_memset(&guc->lrc_desc_pool_vaddr, id * size, 0, size); } static inline bool ctx_id_mapped(struct intel_guc *guc, u32 id) @@ -2233,7 +2244,7 @@ static void prepare_context_registration_info(struct intel_context *ce) struct intel_engine_cs *engine = ce->engine; struct intel_guc *guc = &engine->gt->uc.guc; u32 ctx_id = ce->guc_id.id; - struct guc_lrc_desc *desc; + struct guc_lrc_desc
Re: [PATCH] drm/i915/guc: Initialize GuC submission locks and queues early
On 2/14/2022 17:11, Daniele Ceraolo Spurio wrote: Move initialization of submission-related spinlock, lists and workers to init_early. This fixes an issue where if the GuC init fails we might still try to get the lock in the context cleanup code. Note that it is safe to call the GuC context cleanup code even if the init failed because all contexts are initialized with an invalid GuC ID, which will cause the GuC side of the cleanup to be skipped, so it is easier to just make sure the variables are initialized than to special case the cleanup to handle the case when they're not. References: https://gitlab.freedesktop.org/drm/intel/-/issues/4932 Signed-off-by: Daniele Ceraolo Spurio Cc: Matthew Brost Cc: John Harrison Reviewed-by: John Harrison --- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 27 ++- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index b3a429a92c0da..2160da2c83cbf 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1818,24 +1818,11 @@ int intel_guc_submission_init(struct intel_guc *guc) */ GEM_BUG_ON(!guc->lrc_desc_pool); - xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ); - - spin_lock_init(&guc->submission_state.lock); - INIT_LIST_HEAD(&guc->submission_state.guc_id_list); - ida_init(&guc->submission_state.guc_ids); - INIT_LIST_HEAD(&guc->submission_state.destroyed_contexts); - INIT_WORK(&guc->submission_state.destroyed_worker, - destroyed_worker_func); - INIT_WORK(&guc->submission_state.reset_fail_worker, - reset_fail_worker_func); - guc->submission_state.guc_ids_bitmap = bitmap_zalloc(NUMBER_MULTI_LRC_GUC_ID(guc), GFP_KERNEL); if (!guc->submission_state.guc_ids_bitmap) return -ENOMEM; - spin_lock_init(&guc->timestamp.lock); - INIT_DELAYED_WORK(&guc->timestamp.work, guc_timestamp_ping); guc->timestamp.ping_delay = (POLL_TIME_CLKS / gt->clock_frequency + 1) * HZ; guc->timestamp.shift = gpm_timestamp_shift(gt); @@ -3831,6 +3818,20 @@ static bool __guc_submission_selected(struct intel_guc *guc) void intel_guc_submission_init_early(struct intel_guc *guc) { + xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ); + + spin_lock_init(&guc->submission_state.lock); + INIT_LIST_HEAD(&guc->submission_state.guc_id_list); + ida_init(&guc->submission_state.guc_ids); + INIT_LIST_HEAD(&guc->submission_state.destroyed_contexts); + INIT_WORK(&guc->submission_state.destroyed_worker, + destroyed_worker_func); + INIT_WORK(&guc->submission_state.reset_fail_worker, + reset_fail_worker_func); + + spin_lock_init(&guc->timestamp.lock); + INIT_DELAYED_WORK(&guc->timestamp.work, guc_timestamp_ping); + guc->submission_state.num_guc_ids = GUC_MAX_LRC_DESCRIPTORS; guc->submission_supported = __guc_submission_supported(guc); guc->submission_selected = __guc_submission_selected(guc);
Re: [Intel-gfx] [PATCH v2] drm/i915/guc: Do not complain about stale reset notifications
On 2/22/2022 17:39, Ceraolo Spurio, Daniele wrote: On 2/11/2022 5:04 PM, john.c.harri...@intel.com wrote: From: John Harrison It is possible for reset notifications to arrive for a context that is in the process of being banned. So don't flag these as an error, just report it as informational (because it is still useful to know that resets are happening even if they are being ignored). v2: Better wording for the message (review feedback from Tvrtko). Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index b3a429a92c0d..3afff24b8f24 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -4022,10 +4022,10 @@ static void guc_handle_context_reset(struct intel_guc *guc, capture_error_state(guc, ce); guc_context_replay(ce); } else { - drm_err(&guc_to_gt(guc)->i915->drm, - "Invalid GuC engine reset notificaion for 0x%04X on %s: banned = %d, blocked = %d", - ce->guc_id.id, ce->engine->name, intel_context_is_banned(ce), - context_blocked(ce)); + drm_info(&guc_to_gt(guc)->i915->drm, + "Ignoring context reset notification for 0x%04X on %s: banned = %d, blocked = %d", The if statement above checks for !banned, so if we're here we're banned for sure, no need to print it as if it was conditional. I'd reword it as something like: "Ignoring reset notification for banned context 0x%04X ...". With that: Hmm. The patch was based on an older tree that had an extra term in the if. Seems like the patch applied cleanly and I didn't check the surrounding code! Will update it to drop the banned and blocked values. John. Reviewed-by: Daniele Ceraolo Spurio Daniele + ce->guc_id.id, ce->engine->name, intel_context_is_banned(ce), + context_blocked(ce)); } }
Re: [Intel-gfx] [PATCH 1/3] drm/i915/guc: Limit scheduling properties to avoid overflow
On 2/22/2022 01:52, Tvrtko Ursulin wrote: On 18/02/2022 21:33, john.c.harri...@intel.com wrote: From: John Harrison GuC converts the pre-emption timeout and timeslice quantum values into clock ticks internally. That significantly reduces the point of 32bit overflow. On current platforms, worst case scenario is approximately Where does 32-bit come from, the GuC side? We already use 64-bits so that something to fix to start with. Yep... Yes, the GuC API is defined as 32bits only and then does a straight multiply by the clock speed with no range checking. We have requested 64bit support but there was push back on the grounds that it is not something the GuC timer hardware supports and such long timeouts are not real world usable anyway. ./gt/uc/intel_guc_fwif.h: u32 execution_quantum; ./gt/uc/intel_guc_submission.c: desc->execution_quantum = engine->props.timeslice_duration_ms * 1000; ./gt/intel_engine_types.h: unsigned long timeslice_duration_ms; timeslice_store/preempt_timeout_store: err = kstrtoull(buf, 0, &duration); So both kconfig and sysfs can already overflow GuC, not only because of tick conversion internally but because at backend level nothing was done for assigning 64-bit into 32-bit. Or I failed to find where it is handled. That's why I'm adding this range check to make sure we don't allow overflows. 110 seconds. Rather than allowing the user to set higher values and then get confused by early timeouts, add limits when setting these values. Btw who is reviewing GuC patches these days - things have somehow gotten pretty quiet in activity and I don't think that's due absence of stuff to improve or fix? Asking since I think I noticed a few already which you posted and then crickets on the mailing list. Too much work to do and not enough engineers to do it all :(. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 15 +++ drivers/gpu/drm/i915/gt/sysfs_engines.c | 14 ++ drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 9 + 3 files changed, 38 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index e53008b4dd05..2a1e9f36e6f5 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -389,6 +389,21 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id, if (GRAPHICS_VER(i915) == 12 && engine->class == RENDER_CLASS) engine->props.preempt_timeout_ms = 0; + /* Cap timeouts to prevent overflow inside GuC */ + if (intel_guc_submission_is_wanted(>->uc.guc)) { + if (engine->props.timeslice_duration_ms > GUC_POLICY_MAX_EXEC_QUANTUM_MS) { Hm "wanted".. There's been too much back and forth on the GuC load options over the years to keep track.. intel_engine_uses_guc work sounds like would work and read nicer. I'm not adding a new feature check here. I'm just using the existing one. If we want to rename it yet again then that would be a different patch set. And limit to class instead of applying to all engines looks like a miss. As per follow up email, the class limit is not applied here. + drm_info(&engine->i915->drm, "Warning, clamping timeslice duration to %d to prevent possibly overflow\n", + GUC_POLICY_MAX_EXEC_QUANTUM_MS); + engine->props.timeslice_duration_ms = GUC_POLICY_MAX_EXEC_QUANTUM_MS; I am not sure logging such message during driver load is useful. Sounds more like a confused driver which starts with one value and then overrides itself. I'd just silently set the value appropriate for the active backend. Preemption timeout kconfig text already documents the fact timeouts can get overriden at runtime depending on platform+engine. So maybe just add same text to timeslice kconfig. The point is to make people aware if they compile with unsupported config options. As far as I know, there is no way to apply range checking or other limits to config defines. Which means that a user would silently get unwanted behaviour. That seems like a bad thing to me. If the driver is confused because the user built it in a confused manner then we should let them know. + } + + if (engine->props.preempt_timeout_ms > GUC_POLICY_MAX_PREEMPT_TIMEOUT_MS) { + drm_info(&engine->i915->drm, "Warning, clamping pre-emption timeout to %d to prevent possibly overflow\n", + GUC_POLICY_MAX_PREEMPT_TIMEOUT_MS); + engine->props.preempt_timeout_ms = GUC_POLICY_MAX_PREEMPT_TIMEOUT_MS; + } + } + engine->defaults = engine->props; /* never to change again */ engine->context_size = intel_engine_context_size(gt, engine->class); diff --git a/drivers/gpu/dr
Re: [Intel-gfx] [PATCH 1/3] drm/i915/guc: Limit scheduling properties to avoid overflow
On 2/22/2022 16:52, Ceraolo Spurio, Daniele wrote: On 2/18/2022 1:33 PM, john.c.harri...@intel.com wrote: From: John Harrison GuC converts the pre-emption timeout and timeslice quantum values into clock ticks internally. That significantly reduces the point of 32bit overflow. On current platforms, worst case scenario is approximately 110 seconds. Rather than allowing the user to set higher values and then get confused by early timeouts, add limits when setting these values. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 15 +++ drivers/gpu/drm/i915/gt/sysfs_engines.c | 14 ++ drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 9 + 3 files changed, 38 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index e53008b4dd05..2a1e9f36e6f5 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -389,6 +389,21 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id, if (GRAPHICS_VER(i915) == 12 && engine->class == RENDER_CLASS) engine->props.preempt_timeout_ms = 0; + /* Cap timeouts to prevent overflow inside GuC */ + if (intel_guc_submission_is_wanted(>->uc.guc)) { + if (engine->props.timeslice_duration_ms > GUC_POLICY_MAX_EXEC_QUANTUM_MS) { + drm_info(&engine->i915->drm, "Warning, clamping timeslice duration to %d to prevent possibly overflow\n", I'd drop the word "possibly" + GUC_POLICY_MAX_EXEC_QUANTUM_MS); + engine->props.timeslice_duration_ms = GUC_POLICY_MAX_EXEC_QUANTUM_MS; + } + + if (engine->props.preempt_timeout_ms > GUC_POLICY_MAX_PREEMPT_TIMEOUT_MS) { + drm_info(&engine->i915->drm, "Warning, clamping pre-emption timeout to %d to prevent possibly overflow\n", + GUC_POLICY_MAX_PREEMPT_TIMEOUT_MS); + engine->props.preempt_timeout_ms = GUC_POLICY_MAX_PREEMPT_TIMEOUT_MS; + } + } + engine->defaults = engine->props; /* never to change again */ engine->context_size = intel_engine_context_size(gt, engine->class); diff --git a/drivers/gpu/drm/i915/gt/sysfs_engines.c b/drivers/gpu/drm/i915/gt/sysfs_engines.c index 967031056202..f57efe026474 100644 --- a/drivers/gpu/drm/i915/gt/sysfs_engines.c +++ b/drivers/gpu/drm/i915/gt/sysfs_engines.c @@ -221,6 +221,13 @@ timeslice_store(struct kobject *kobj, struct kobj_attribute *attr, if (duration > jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT)) return -EINVAL; + if (intel_uc_uses_guc_submission(&engine->gt->uc) && + duration > GUC_POLICY_MAX_EXEC_QUANTUM_MS) { + duration = GUC_POLICY_MAX_EXEC_QUANTUM_MS; + drm_info(&engine->i915->drm, "Warning, clamping timeslice duration to %lld to prevent possibly overflow\n", + duration); + } + WRITE_ONCE(engine->props.timeslice_duration_ms, duration); if (execlists_active(&engine->execlists)) @@ -325,6 +332,13 @@ preempt_timeout_store(struct kobject *kobj, struct kobj_attribute *attr, if (timeout > jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT)) return -EINVAL; + if (intel_uc_uses_guc_submission(&engine->gt->uc) && + timeout > GUC_POLICY_MAX_PREEMPT_TIMEOUT_MS) { + timeout = GUC_POLICY_MAX_PREEMPT_TIMEOUT_MS; + drm_info(&engine->i915->drm, "Warning, clamping pre-emption timeout to %lld to prevent possibly overflow\n", + timeout); + } + WRITE_ONCE(engine->props.preempt_timeout_ms, timeout); if (READ_ONCE(engine->execlists.pending[0])) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h index 6a4612a852e2..ad131092f8df 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h @@ -248,6 +248,15 @@ struct guc_lrc_desc { #define GLOBAL_POLICY_DEFAULT_DPC_PROMOTE_TIME_US 50 +/* + * GuC converts the timeout to clock ticks internally. Different platforms have + * different GuC clocks. Thus, the maximum value before overflow is platform + * dependent. Current worst case scenario is about 110s. So, limit to 100s to be + * safe. + */ +#define GUC_POLICY_MAX_EXEC_QUANTUM_MS (100 * 1000) +#define GUC_POLICY_MAX_PREEMPT_TIMEOUT_MS (100 * 1000) Those values don't seem to be defined in the GuC interface. If I'm correct, IMO we need to ask the GuC team to add them in, because it shouldn't be our responsibility to convert from ms to GuC clocks, considering that the interface is in ms. Not a blocker for this patch. As per other reply, no. GuC doesn't give us any hints or clues on any limits of these values.
Re: [Intel-gfx] [PATCH 0/3] Improve anti-pre-emption w/a for compute workloads
On 2/22/2022 01:53, Tvrtko Ursulin wrote: On 18/02/2022 21:33, john.c.harri...@intel.com wrote: From: John Harrison Compute workloads are inherently not pre-emptible on current hardware. Thus the pre-emption timeout was disabled as a workaround to prevent unwanted resets. Instead, the hang detection was left to the heartbeat and its (longer) timeout. This is undesirable with GuC submission as the heartbeat is a full GT reset rather than a per engine reset and so is much more destructive. Instead, just bump the pre-emption timeout Can we have a feature request to allow asking GuC for an engine reset? For what purpose? GuC manages the scheduling of contexts across engines. With virtual engines, the KMD has no knowledge of which engine a context might be executing on. Even without virtual engines, the KMD still has no knowledge of which context is currently executing on any given engine at any given time. There is a reason why hang detection should be left to the entity that is doing the scheduling. Any other entity is second guessing at best. The reason for keeping the heartbeat around even when GuC submission is enabled is for the case where the KMD/GuC have got out of sync with either other somehow or GuC itself has just crashed. I.e. when no submission at all is working and we need to reset the GuC itself and start over. John. Regards, Tvrtko to a big value. Also, update the heartbeat to allow such a long pre-emption delay in the final heartbeat period. Signed-off-by: John Harrison John Harrison (3): drm/i915/guc: Limit scheduling properties to avoid overflow drm/i915/gt: Make the heartbeat play nice with long pre-emption timeouts drm/i915: Improve long running OCL w/a for GuC submission drivers/gpu/drm/i915/gt/intel_engine_cs.c | 37 +-- .../gpu/drm/i915/gt/intel_engine_heartbeat.c | 16 drivers/gpu/drm/i915/gt/sysfs_engines.c | 14 +++ drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 9 + 4 files changed, 73 insertions(+), 3 deletions(-)
Re: [Intel-gfx] [PATCH 2/3] drm/i915/gt: Make the heartbeat play nice with long pre-emption timeouts
On 2/22/2022 03:19, Tvrtko Ursulin wrote: On 18/02/2022 21:33, john.c.harri...@intel.com wrote: From: John Harrison Compute workloads are inherantly not pre-emptible for long periods on current hardware. As a workaround for this, the pre-emption timeout for compute capable engines was disabled. This is undesirable with GuC submission as it prevents per engine reset of hung contexts. Hence the next patch will re-enable the timeout but bumped up by an order of magnititude. (Some typos above.) I'm spotting 'inherently' but not anything else. However, the heartbeat might not respect that. Depending upon current activity, a pre-emption to the heartbeat pulse might not even be attempted until the last heartbeat period. Which means that only one Might not be attempted, but could be if something is running with lower priority. In which case I think special casing the last heartbeat does not feel right because it can end up resetting the engine before it was intended. Like if first heartbeat decides to preempt (the decision is backend specific, could be same prio + timeslicing), and preempt timeout has been set to heartbeat interval * 3, then 2nd heartbeat gets queued up, then 3rd, and so reset is triggered even before the first preempt timeout legitimately expires (or just as it is about to react). Instead, how about preempt timeout is always considered when calculating when to emit the next heartbeat? End result would be similar to your patch, in terms of avoiding the direct problem, although hang detection would be overall longer (but more correct I think). And it also means in the next patch you don't have to add coupling between preempt timeout and heartbeat to intel_engine_setup. Instead just some long preempt timeout would be needed. Granted, the decoupling argument is not super strong since then the heartbeat code has the coupling instead, but that still feels better to me. (Since we can say heartbeats only make sense on loaded engines, and so things like preempt timeout can legitimately be considered from there.) Incidentally, that would be similar to a patch which Chris had a year ago (https://patchwork.freedesktop.org/patch/419783/?series=86841&rev=1) to fix some CI issue. I'm not following your arguments. Chris' patch is about not having two i915 based resets triggered concurrently - i915 based engine reset and i915 based GT reset. The purpose of this patch is to allow the GuC based engine reset to have a chance to occur before the i915 based GT reset kicks in. It sounds like your argument above is about making the engine reset slower so that it doesn't happen before the appropriate heartbeat period for that potential reset scenario has expired. I don't see why that is at all necessary or useful. If an early heartbeat period triggers an engine reset then the heartbeat pulse will go through. The heartbeat will thus see a happy system and not do anything further. If the given period does not trigger an engine reset but still does not get the pulse through (because the pulse is of too low a priority) then we move on to the next period and bump the priority. If the pre-emption has actually already been triggered anyway (and we are just waiting a while for it to timeout) then that's fine. The priority bump will have no effect because the context is already attempting to run. The heartbeat code doesn't care which priority level actually triggers the reset. It just cares whether or not the pulse finally makes it through. And the GuC doesn't care which heartbeat period the i915 is in. All it knows is that it has a request to schedule and whether the current context is pre-empting or not. So if period #1 triggers the pre-emption but the timeout doesn't happen until period #3, who cares? The result is the same as if period #3 triggered the pre-emption and the timeout was shorter. The result being that the hung context is reset, the pulse makes it through and the heartbeat goes to sleep again. The only period that really matters is the final one. At that point the pulse request is at highest priority and so must trigger a pre-emption request. We then need at least one full pre-emption period (plus some wiggle room for random delays in reset time, context switching, processing messages, etc.) to allow the GuC based timeout and reset to occur. Hence ensuring that the final heartbeat period is at least twice the pre-emption timeout (because 1.25 times is just messy when working with ints!). That guarantees that GuC will get at least one complete opportunity to detect and recover the hang before i915 nukes the universe. Whereas, bumping all heartbeat periods to be greater than the pre-emption timeout is wasteful and unnecessary. That leads to a total heartbeat time of about a minute. Which is a very long time to wait for a hang to be detected and recovered. Especi
Re: [Intel-gfx] [PATCH 1/3] drm/i915/guc: Limit scheduling properties to avoid overflow
On 2/23/2022 04:13, Tvrtko Ursulin wrote: On 23/02/2022 02:11, John Harrison wrote: On 2/22/2022 01:52, Tvrtko Ursulin wrote: On 18/02/2022 21:33, john.c.harri...@intel.com wrote: From: John Harrison GuC converts the pre-emption timeout and timeslice quantum values into clock ticks internally. That significantly reduces the point of 32bit overflow. On current platforms, worst case scenario is approximately Where does 32-bit come from, the GuC side? We already use 64-bits so that something to fix to start with. Yep... Yes, the GuC API is defined as 32bits only and then does a straight multiply by the clock speed with no range checking. We have requested 64bit support but there was push back on the grounds that it is not something the GuC timer hardware supports and such long timeouts are not real world usable anyway. As long as compute are happy with 100 seconds, then it "should be enough for everbody". :D Compute disable all forms of reset and rely on manual kill. So yes. But even if they aren't. That's all we can do at the moment. If there is a genuine customer requirement for more then we can push for full 64bit software implemented timers in the GuC but until that happens, we don't have much choice. ./gt/uc/intel_guc_fwif.h: u32 execution_quantum; ./gt/uc/intel_guc_submission.c: desc->execution_quantum = engine->props.timeslice_duration_ms * 1000; ./gt/intel_engine_types.h: unsigned long timeslice_duration_ms; timeslice_store/preempt_timeout_store: err = kstrtoull(buf, 0, &duration); So both kconfig and sysfs can already overflow GuC, not only because of tick conversion internally but because at backend level nothing was done for assigning 64-bit into 32-bit. Or I failed to find where it is handled. That's why I'm adding this range check to make sure we don't allow overflows. Yes and no, this fixes it, but the first bug was not only due GuC internal tick conversion. It was present ever since the u64 from i915 was shoved into u32 sent to GuC. So even if GuC used the value without additional multiplication, bug was be there. My point being when GuC backend was added timeout_ms values should have been limited/clamped to U32_MAX. The tick discovery is additional limit on top. I'm not disagreeing. I'm just saying that the truncation wasn't noticed until I actually tried using very long timeouts to debug a particular problem. Now that it is noticed, we need some method of range checking and this simple clamp solves all the truncation problems. 110 seconds. Rather than allowing the user to set higher values and then get confused by early timeouts, add limits when setting these values. Btw who is reviewing GuC patches these days - things have somehow gotten pretty quiet in activity and I don't think that's due absence of stuff to improve or fix? Asking since I think I noticed a few already which you posted and then crickets on the mailing list. Too much work to do and not enough engineers to do it all :(. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 15 +++ drivers/gpu/drm/i915/gt/sysfs_engines.c | 14 ++ drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 9 + 3 files changed, 38 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index e53008b4dd05..2a1e9f36e6f5 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -389,6 +389,21 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id, if (GRAPHICS_VER(i915) == 12 && engine->class == RENDER_CLASS) engine->props.preempt_timeout_ms = 0; + /* Cap timeouts to prevent overflow inside GuC */ + if (intel_guc_submission_is_wanted(>->uc.guc)) { + if (engine->props.timeslice_duration_ms > GUC_POLICY_MAX_EXEC_QUANTUM_MS) { Hm "wanted".. There's been too much back and forth on the GuC load options over the years to keep track.. intel_engine_uses_guc work sounds like would work and read nicer. I'm not adding a new feature check here. I'm just using the existing one. If we want to rename it yet again then that would be a different patch set. $ grep intel_engine_uses_guc . -rl ./i915_perf.c ./i915_request.c ./selftests/intel_scheduler_helpers.c ./gem/i915_gem_context.c ./gt/intel_context.c ./gt/intel_engine.h ./gt/intel_engine_cs.c ./gt/intel_engine_heartbeat.c ./gt/intel_engine_pm.c ./gt/intel_reset.c ./gt/intel_lrc.c ./gt/selftest_context.c ./gt/selftest_engine_pm.c ./gt/selftest_hangcheck.c ./gt/selftest_mocs.c ./gt/selftest_workarounds.c Sounds better to me than intel_guc_submission_is_wanted. What does the reader know whether "is wanted" translates to "is actually used". Shrug on "is
Re: [Intel-gfx] [PATCH 2/3] drm/i915/gt: Make the heartbeat play nice with long pre-emption timeouts
On 2/23/2022 05:58, Tvrtko Ursulin wrote: On 23/02/2022 02:45, John Harrison wrote: On 2/22/2022 03:19, Tvrtko Ursulin wrote: On 18/02/2022 21:33, john.c.harri...@intel.com wrote: From: John Harrison Compute workloads are inherantly not pre-emptible for long periods on current hardware. As a workaround for this, the pre-emption timeout for compute capable engines was disabled. This is undesirable with GuC submission as it prevents per engine reset of hung contexts. Hence the next patch will re-enable the timeout but bumped up by an order of magnititude. (Some typos above.) I'm spotting 'inherently' but not anything else. Magnititude! O;) Doh! [snip] Whereas, bumping all heartbeat periods to be greater than the pre-emption timeout is wasteful and unnecessary. That leads to a total heartbeat time of about a minute. Which is a very long time to wait for a hang to be detected and recovered. Especially when the official limit on a context responding to an 'are you dead' query is only 7.5 seconds. Not sure how did you get one minute? 7.5 * 2 (to be safe) = 15. 15 * 5 (number of heartbeat periods) = 75 => 1 minute 15 seconds Even ignoring any safety factor and just going with 7.5 * 5 still gets you to 37.5 seconds which is over a half a minute and likely to race. Regardless, crux of argument was to avoid GuC engine reset and heartbeat reset racing with each other, and to do that by considering the preempt timeout with the heartbeat interval. I was thinking about this scenario in this series: [Please use fixed width font and no line wrap to view.] A) tP = preempt timeout tH = hearbeat interval tP = 3 * tH 1) Background load = I915_PRIORITY_DISPLAY <-- [tH] --> Pulse1 <-- [tH] --> Pulse2 <-- [tH] --> Pulse3 < [2 * tH] > FULL RESET | \- preemption triggered, tP = 3 * tH --\ \-> preempt timeout would hit here Here we have collateral damage due full reset, since we can't tell GuC to reset just one engine and we fudged tP just to "account" for heartbeats. You are missing the whole point of the patch series which is that the last heartbeat period is '2 * tP' not '2 * tH'. + longer = READ_ONCE(engine->props.preempt_timeout_ms) * 2; By making the last period double the pre-emption timeout, it is guaranteed that the FULL RESET stage cannot be hit before the hardware has attempted and timed-out on at least one pre-emption. [snip] <-- [tH] --> Pulse1 <-- [tH] --> Pulse2 <-- [tH] --> Pulse3 < [2 * tH] > full reset would be here | \- preemption triggered, tP = 3 * tH \ \-> Preempt timeout reset Here is is kind of least worse, but question is why we fudged tP when it gives us nothing good in this case. The point of fudging tP(RCS) is to give compute workloads longer to reach a pre-emptible point (given that EU walkers are basically not pre-emptible). The reason for doing the fudge is not connected to the heartbeat at all. The fact that it causes problems for the heartbeat is an undesired side effect. Note that the use of 'tP(RCS) = tH * 3' was just an arbitrary calculation that gave us something that all interested parties were vaguely happy with. It could just as easily be a fixed, hard coded value of 7.5s but having it based on something configurable seemed more sensible. The other option was 'tP(RCS) = tP * 12' but that felt more arbitrary than basing it on the average heartbeat timeout. As in, three heartbeat periods is about what a normal prio task gets before it gets pre-empted by the heartbeat. So using that for general purpose pre-emptions (e.g. time slicing between multiple user apps) seems reasonable. B) Instead, my idea to account for preempt timeout when calculating when to schedule next hearbeat would look like this: First of all tP can be left at a large value unrelated to tH. Lets say tP = 640ms. tH stays 2.5s. 640ms is not 'large'. The requirement is either zero (disabled) or region of 7.5s. The 640ms figure is the default for non-compute engines. Anything that can run EUs needs to be 'huge'. 1) Background load = I915_PRIORITY_DISPLAY <-- [tH + tP] --> Pulse1 <-- [tH + tP] --> Pulse2 <-- [tH + tP] --> Pulse3 <-- [tH + tP] --> full reset would be here Sure, this works but each period is now 2.5 + 7.5 = 10s. The full five periods is therefore 50s, which is practically a minute. [snip] Am I missing some requirement or you see another problem with this idea? On a related topic, if GuC engine resets stop working when preempt timeout is set to zero - I think we need to somehow let the user know if they try to tweak it via sysfs. Perhaps go as far as -EINVA
Re: [Intel-gfx] [PATCH 5/8] drm/i915/guc: Move lrc desc setup to where it is needed
On 2/22/2022 17:12, Ceraolo Spurio, Daniele wrote: On 2/17/2022 3:52 PM, john.c.harri...@intel.com wrote: From: John Harrison The LRC descriptor was being initialised early on in the context registration sequence. It could then be determined that the actual registration needs to be delayed and the descriptor would be wiped out. This is inefficient, so move the setup to later in the process after the point of no return. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 0ab2d1a24bf6..aa74ec74194a 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -2153,6 +2153,8 @@ static int __guc_action_register_context(struct intel_guc *guc, 0, loop); } +static void prepare_context_registration_info(struct intel_context *ce); + static int register_context(struct intel_context *ce, bool loop) { struct intel_guc *guc = ce_to_guc(ce); @@ -2163,6 +2165,8 @@ static int register_context(struct intel_context *ce, bool loop) GEM_BUG_ON(intel_context_is_child(ce)); trace_intel_context_register(ce); + prepare_context_registration_info(ce); + if (intel_context_is_parent(ce)) ret = __guc_action_register_multi_lrc(guc, ce, ce->guc_id.id, offset, loop); @@ -2246,7 +2250,6 @@ static void prepare_context_registration_info(struct intel_context *ce) struct intel_context *child; GEM_BUG_ON(!engine->mask); - GEM_BUG_ON(!sched_state_is_init(ce)); /* * Ensure LRC + CT vmas are is same region as write barrier is done @@ -2314,9 +2317,13 @@ static int try_context_registration(struct intel_context *ce, bool loop) bool context_registered; int ret = 0; + GEM_BUG_ON(!sched_state_is_init(ce)); + context_registered = ctx_id_mapped(guc, desc_idx); - prepare_context_registration_info(ce); + if (context_registered) + clr_ctx_id_mapping(guc, desc_idx); + set_ctx_id_mapping(guc, desc_idx, ce); I think we can do the clr unconditionally. Also, should we drop the clr/set pair in prepare_context_registration_info? it shouldn't be needed, unless I'm missing a path where we don;t pass through here. Daniele I don't believe so. The point is that the context id might have changed (it got stolen, re-used, etc. - all the state machine code below can cause aborts and retries and such like if something is pending and the register needs to be delayed). So we need to clear out the old mapping and add a new one to be safe. Also, I'm not sure if it is safe to do a xa_store to an already used entry as an update or if you are supposed to clear it first? But that's what the code did before and I'm trying to not change any actual behaviour here. John. /* * The context_lookup xarray is used to determine if the hardware
Re: [Intel-gfx] [PATCH 1/3] drm/i915/guc: Limit scheduling properties to avoid overflow
On 2/24/2022 01:59, Tvrtko Ursulin wrote: On 23/02/2022 19:03, John Harrison wrote: On 2/23/2022 04:13, Tvrtko Ursulin wrote: On 23/02/2022 02:11, John Harrison wrote: On 2/22/2022 01:52, Tvrtko Ursulin wrote: On 18/02/2022 21:33, john.c.harri...@intel.com wrote: From: John Harrison GuC converts the pre-emption timeout and timeslice quantum values into clock ticks internally. That significantly reduces the point of 32bit overflow. On current platforms, worst case scenario is approximately Where does 32-bit come from, the GuC side? We already use 64-bits so that something to fix to start with. Yep... Yes, the GuC API is defined as 32bits only and then does a straight multiply by the clock speed with no range checking. We have requested 64bit support but there was push back on the grounds that it is not something the GuC timer hardware supports and such long timeouts are not real world usable anyway. As long as compute are happy with 100 seconds, then it "should be enough for everbody". :D Compute disable all forms of reset and rely on manual kill. So yes. But even if they aren't. That's all we can do at the moment. If there is a genuine customer requirement for more then we can push for full 64bit software implemented timers in the GuC but until that happens, we don't have much choice. Yeah. ./gt/uc/intel_guc_fwif.h: u32 execution_quantum; ./gt/uc/intel_guc_submission.c: desc->execution_quantum = engine->props.timeslice_duration_ms * 1000; ./gt/intel_engine_types.h: unsigned long timeslice_duration_ms; timeslice_store/preempt_timeout_store: err = kstrtoull(buf, 0, &duration); So both kconfig and sysfs can already overflow GuC, not only because of tick conversion internally but because at backend level nothing was done for assigning 64-bit into 32-bit. Or I failed to find where it is handled. That's why I'm adding this range check to make sure we don't allow overflows. Yes and no, this fixes it, but the first bug was not only due GuC internal tick conversion. It was present ever since the u64 from i915 was shoved into u32 sent to GuC. So even if GuC used the value without additional multiplication, bug was be there. My point being when GuC backend was added timeout_ms values should have been limited/clamped to U32_MAX. The tick discovery is additional limit on top. I'm not disagreeing. I'm just saying that the truncation wasn't noticed until I actually tried using very long timeouts to debug a particular problem. Now that it is noticed, we need some method of range checking and this simple clamp solves all the truncation problems. Agreed in principle, just please mention in the commit message all aspects of the problem. I think we can get away without a Fixes: tag since it requires user fiddling to break things in unexpected ways. I would though put in a code a clamping which expresses both, something like min(u32, ..GUC LIMIT..). So the full story is documented forever. Or "if > u32 || > ..GUC LIMIT..) return -EINVAL". Just in case GuC limit one day changes but u32 stays. Perhaps internal ticks go away or anything and we are left with plain 1:1 millisecond relationship. Can certainly add a comment along the lines of "GuC API only takes a 32bit field but that is further reduced to GUC_LIMIT due to internal calculations which would otherwise overflow". But if the GuC limit is > u32 then, by definition, that means the GuC API has changed to take a u64 instead of a u32. So there will no u32 truncation any more. So I'm not seeing a need to explicitly test the integer size when the value check covers that. 110 seconds. Rather than allowing the user to set higher values and then get confused by early timeouts, add limits when setting these values. Btw who is reviewing GuC patches these days - things have somehow gotten pretty quiet in activity and I don't think that's due absence of stuff to improve or fix? Asking since I think I noticed a few already which you posted and then crickets on the mailing list. Too much work to do and not enough engineers to do it all :(. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 15 +++ drivers/gpu/drm/i915/gt/sysfs_engines.c | 14 ++ drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 9 + 3 files changed, 38 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index e53008b4dd05..2a1e9f36e6f5 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -389,6 +389,21 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id, if (GRAPHICS_VER(i915) == 12 && engine->class == RENDER_CLASS) engine->props.preempt_timeout_ms = 0; +
Re: [Intel-gfx] [PATCH 2/3] drm/i915/gt: Make the heartbeat play nice with long pre-emption timeouts
On 2/24/2022 03:41, Tvrtko Ursulin wrote: On 23/02/2022 20:00, John Harrison wrote: On 2/23/2022 05:58, Tvrtko Ursulin wrote: On 23/02/2022 02:45, John Harrison wrote: On 2/22/2022 03:19, Tvrtko Ursulin wrote: On 18/02/2022 21:33, john.c.harri...@intel.com wrote: From: John Harrison Compute workloads are inherantly not pre-emptible for long periods on current hardware. As a workaround for this, the pre-emption timeout for compute capable engines was disabled. This is undesirable with GuC submission as it prevents per engine reset of hung contexts. Hence the next patch will re-enable the timeout but bumped up by an order of magnititude. (Some typos above.) I'm spotting 'inherently' but not anything else. Magnititude! O;) Doh! [snip] Whereas, bumping all heartbeat periods to be greater than the pre-emption timeout is wasteful and unnecessary. That leads to a total heartbeat time of about a minute. Which is a very long time to wait for a hang to be detected and recovered. Especially when the official limit on a context responding to an 'are you dead' query is only 7.5 seconds. Not sure how did you get one minute? 7.5 * 2 (to be safe) = 15. 15 * 5 (number of heartbeat periods) = 75 => 1 minute 15 seconds Even ignoring any safety factor and just going with 7.5 * 5 still gets you to 37.5 seconds which is over a half a minute and likely to race. Ah because my starting point is there should be no preempt timeout = heartbeat * 3, I just think that's too ugly. Then complain at the hardware designers to give us mid-thread pre-emption back. The heartbeat is only one source of pre-emption events. For example, a user can be running multiple contexts in parallel and expecting them to time slice on a single engine. Or maybe a user is just running one compute task in the background but is doing render work in the foreground. Etc. There was a reason the original hack was to disable pre-emption rather than increase the heartbeat. This is simply a slightly less ugly version of the same hack. And unfortunately, the basic idea of the hack is non-negotiable. As per other comments, 'tP(RCS) = tH *3' or 'tP(RCS) = tP(default) * 12' or 'tP(RCS) = 7500' are the available options. Given that the heartbeat is the ever present hard limit, it seems most plausible to base the hack on that. Any of the others works, though. Although I think a explicit hardcoded value is the most ugly. I guess the other option is to add CONFIG_DRM_I915_PREEMPT_TIMEOUT_COMPUTE and default that to 7500. Take your pick. But 640ms is not allowed. Regardless, crux of argument was to avoid GuC engine reset and heartbeat reset racing with each other, and to do that by considering the preempt timeout with the heartbeat interval. I was thinking about this scenario in this series: [Please use fixed width font and no line wrap to view.] A) tP = preempt timeout tH = hearbeat interval tP = 3 * tH 1) Background load = I915_PRIORITY_DISPLAY <-- [tH] --> Pulse1 <-- [tH] --> Pulse2 <-- [tH] --> Pulse3 < [2 * tH] > FULL RESET | \- preemption triggered, tP = 3 * tH --\ \-> preempt timeout would hit here Here we have collateral damage due full reset, since we can't tell GuC to reset just one engine and we fudged tP just to "account" for heartbeats. You are missing the whole point of the patch series which is that the last heartbeat period is '2 * tP' not '2 * tH'. + longer = READ_ONCE(engine->props.preempt_timeout_ms) * 2; By making the last period double the pre-emption timeout, it is guaranteed that the FULL RESET stage cannot be hit before the hardware has attempted and timed-out on at least one pre-emption. Oh well :) that probably means the overall scheme is too odd for me. tp = 3tH and last pulse after 2tP I mean. To be accurate, it is 'tP(RCS) = 3 * tH(default); tH(final) = tP(current) * 2;'. Seems fairly straight forward to me. It's not a recursive definition or anything like that. It gives us a total heartbeat timeout that is close to the original version but still allows at least one pre-emption event. [snip] <-- [tH] --> Pulse1 <-- [tH] --> Pulse2 <-- [tH] --> Pulse3 < [2 * tH] > full reset would be here | \- preemption triggered, tP = 3 * tH \ \-> Preempt timeout reset Here is is kind of least worse, but question is why we fudged tP when it gives us nothing good in this case. The point of fudging tP(RCS) is to give compute workloads longer to reach a pre-emptible point (given that EU walkers are basically not pre-emptible). The reason for doing the fudge is not connected to the heartbeat at all. The fact that it ca
Re: [Intel-gfx] [PATCH 1/3] drm/i915/guc: Limit scheduling properties to avoid overflow
On 2/24/2022 11:19, John Harrison wrote: [snip] I'll change it to _uses_ and repost, then. [ 7.683149] kernel BUG at drivers/gpu/drm/i915/gt/uc/intel_guc.h:367! Told you that one went bang. John.
Re: [Intel-gfx] [PATCH 0/3] Improve anti-pre-emption w/a for compute workloads
On 2/23/2022 04:00, Tvrtko Ursulin wrote: On 23/02/2022 02:22, John Harrison wrote: On 2/22/2022 01:53, Tvrtko Ursulin wrote: On 18/02/2022 21:33, john.c.harri...@intel.com wrote: From: John Harrison Compute workloads are inherently not pre-emptible on current hardware. Thus the pre-emption timeout was disabled as a workaround to prevent unwanted resets. Instead, the hang detection was left to the heartbeat and its (longer) timeout. This is undesirable with GuC submission as the heartbeat is a full GT reset rather than a per engine reset and so is much more destructive. Instead, just bump the pre-emption timeout Can we have a feature request to allow asking GuC for an engine reset? For what purpose? To allow "stopped heartbeat" to reset the engine, however.. GuC manages the scheduling of contexts across engines. With virtual engines, the KMD has no knowledge of which engine a context might be executing on. Even without virtual engines, the KMD still has no knowledge of which context is currently executing on any given engine at any given time. There is a reason why hang detection should be left to the entity that is doing the scheduling. Any other entity is second guessing at best. The reason for keeping the heartbeat around even when GuC submission is enabled is for the case where the KMD/GuC have got out of sync with either other somehow or GuC itself has just crashed. I.e. when no submission at all is working and we need to reset the GuC itself and start over. .. I wasn't really up to speed to know/remember heartbeats are nerfed already in GuC mode. Not sure what you mean by that claim. Engine resets are handled by GuC because GuC handles the scheduling. You can't do the former if you aren't doing the latter. However, the heartbeat is still present and is still the watchdog by which engine resets are triggered. As per the rest of the submission process, the hang detection and recovery is split between i915 and GuC. I am not sure it was the best way since full reset penalizes everyone for one hanging engine. Better question would be why leave heartbeats around to start with with GuC? If you want to use it to health check GuC, as you say, maybe just send a CT message and expect replies? Then full reset would make sense. It also achieves the goal of not seconding guessing the submission backend you raise. Adding yet another hang detection mechanism is yet more complication and is unnecessary when we already have one that works well enough. As above, the heartbeat is still required for sending the pulses that cause pre-emptions and so let GuC detect hangs. It also provides a fallback against a dead GuC by default. So why invent yet another wheel? Like it is now, and the need for this series demonstrates it, the whole thing has a pretty poor "impedance" match. Not even sure what intel_guc_find_hung_context is doing in intel_engine_hearbeat.c - why is that not in intel_gt_handle_error at least? Why is hearbeat code special and other callers of intel_gt_handle_error don't need it? There is no guilty context if the reset was triggered via debugfs or similar. And as stated ad nauseam, i915 is no longer handling the scheduling and so cannot make assumptions about what is or is not running on the hardware at any given time. And obviously, if the reset initiated by GuC itself then i915 should not be second guessing the guilty context as the GuC notification has already told us who was responsible. And to be clear, the 'poor impedance match' is purely because we don't have mid-thread pre-emption and so need a stupidly huge timeout on compute capable engines. Whereas, we don't want a total heatbeat timeout of a minute or more. That is the impedance mis-match. If the 640ms was acceptable for RCS then none of this hacky timeout algorithm mush would be needed. John. Regards, Tvrtko
Re: [Intel-gfx] [PATCH 5/8] drm/i915/guc: Move lrc desc setup to where it is needed
On 2/23/2022 18:03, Ceraolo Spurio, Daniele wrote: On 2/23/2022 12:23 PM, John Harrison wrote: On 2/22/2022 17:12, Ceraolo Spurio, Daniele wrote: On 2/17/2022 3:52 PM, john.c.harri...@intel.com wrote: From: John Harrison The LRC descriptor was being initialised early on in the context registration sequence. It could then be determined that the actual registration needs to be delayed and the descriptor would be wiped out. This is inefficient, so move the setup to later in the process after the point of no return. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 0ab2d1a24bf6..aa74ec74194a 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -2153,6 +2153,8 @@ static int __guc_action_register_context(struct intel_guc *guc, 0, loop); } +static void prepare_context_registration_info(struct intel_context *ce); + static int register_context(struct intel_context *ce, bool loop) { struct intel_guc *guc = ce_to_guc(ce); @@ -2163,6 +2165,8 @@ static int register_context(struct intel_context *ce, bool loop) GEM_BUG_ON(intel_context_is_child(ce)); trace_intel_context_register(ce); + prepare_context_registration_info(ce); + if (intel_context_is_parent(ce)) ret = __guc_action_register_multi_lrc(guc, ce, ce->guc_id.id, offset, loop); @@ -2246,7 +2250,6 @@ static void prepare_context_registration_info(struct intel_context *ce) struct intel_context *child; GEM_BUG_ON(!engine->mask); - GEM_BUG_ON(!sched_state_is_init(ce)); /* * Ensure LRC + CT vmas are is same region as write barrier is done @@ -2314,9 +2317,13 @@ static int try_context_registration(struct intel_context *ce, bool loop) bool context_registered; int ret = 0; + GEM_BUG_ON(!sched_state_is_init(ce)); + context_registered = ctx_id_mapped(guc, desc_idx); - prepare_context_registration_info(ce); + if (context_registered) + clr_ctx_id_mapping(guc, desc_idx); + set_ctx_id_mapping(guc, desc_idx, ce); I think we can do the clr unconditionally. Also, should we drop the clr/set pair in prepare_context_registration_info? it shouldn't be needed, unless I'm missing a path where we don;t pass through here. Daniele I don't believe so. The point is that the context id might have changed (it got stolen, re-used, etc. - all the state machine code below can cause aborts and retries and such like if something is pending and the register needs to be delayed). So we need to clear out the old mapping and add a new one to be safe. Also, I'm not sure if it is safe to do a xa_store to an already used entry as an update or if you are supposed to clear it first? But that's what the code did before and I'm trying to not change any actual behaviour here. I was comparing with previous behavior. before this patch, we only do the setting of the ctx_id here (inside prepare_context_registration_info) and you're not changing any of the abort/retry behavior, so if it was enough before it should be enough now. Hmm, I think I must have confused myself with the intermediate steps along the way. Yes, it looks like the clr/set in prepare is redundant by the end. Regarding the xa ops, we did an unconditional clear before, so it should be ok to just do the same and have the clear and set back to back without checking if the context ID was already in use or not. Actually, I was thinking you meant to drop the clr completely rather than just drop the condition. Yeah, that sounds fine. Will post an update. John. Daniele John. /* * The context_lookup xarray is used to determine if the hardware
Re: [PATCH v5 1/4] drm/i915/guc: Add fetch of hwconfig table
On 2/22/2022 02:36, Jordan Justen wrote: From: John Harrison Implement support for fetching the hardware description table from the GuC. The call is made twice - once without a destination buffer to query the size and then a second time to fill in the buffer. Note that the table is only available on ADL-P and later platforms. v5 (of Jordan's posting): * Various changes made by Jordan and recommended by Michal - Makefile ordering - Adjust "struct intel_guc_hwconfig hwconfig" comment - Set Copyright year to 2022 in intel_guc_hwconfig.c/.h - Drop inline from hwconfig_to_guc() - Replace hwconfig param with guc in __guc_action_get_hwconfig() - Move zero size check into guc_hwconfig_discover_size() - Change comment to say zero size offset/size is needed to get size - Add has_guc_hwconfig to devinfo and drop has_table() - Change drm_err to notice in __uc_init_hw() and use %pe Cc: Michal Wajdeczko Signed-off-by: Rodrigo Vivi Signed-off-by: John Harrison Reviewed-by: Matthew Brost Acked-by: Jon Bloomfield Signed-off-by: Jordan Justen --- drivers/gpu/drm/i915/Makefile | 1 + .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h | 1 + .../gpu/drm/i915/gt/uc/abi/guc_errors_abi.h | 4 + drivers/gpu/drm/i915/gt/uc/intel_guc.h| 3 + .../gpu/drm/i915/gt/uc/intel_guc_hwconfig.c | 145 ++ .../gpu/drm/i915/gt/uc/intel_guc_hwconfig.h | 19 +++ drivers/gpu/drm/i915/gt/uc/intel_uc.c | 7 + drivers/gpu/drm/i915/i915_pci.c | 1 + drivers/gpu/drm/i915/intel_device_info.h | 1 + 9 files changed, 182 insertions(+) create mode 100644 drivers/gpu/drm/i915/gt/uc/intel_guc_hwconfig.c create mode 100644 drivers/gpu/drm/i915/gt/uc/intel_guc_hwconfig.h diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index e9ce09620eb5..661f1afb51d7 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -188,6 +188,7 @@ i915-y += gt/uc/intel_uc.o \ gt/uc/intel_guc_ct.o \ gt/uc/intel_guc_debugfs.o \ gt/uc/intel_guc_fw.o \ + gt/uc/intel_guc_hwconfig.o \ gt/uc/intel_guc_log.o \ gt/uc/intel_guc_log_debugfs.o \ gt/uc/intel_guc_rc.o \ diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h index fe5d7d261797..4a61c819f32b 100644 --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h @@ -137,6 +137,7 @@ enum intel_guc_action { INTEL_GUC_ACTION_ENGINE_FAILURE_NOTIFICATION = 0x1009, INTEL_GUC_ACTION_SETUP_PC_GUCRC = 0x3004, INTEL_GUC_ACTION_AUTHENTICATE_HUC = 0x4000, + INTEL_GUC_ACTION_GET_HWCONFIG = 0x4100, INTEL_GUC_ACTION_REGISTER_CONTEXT = 0x4502, INTEL_GUC_ACTION_DEREGISTER_CONTEXT = 0x4503, INTEL_GUC_ACTION_REGISTER_COMMAND_TRANSPORT_BUFFER = 0x4505, diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_errors_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_errors_abi.h index 488b6061ee89..f9e2a6aaef4a 100644 --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_errors_abi.h +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_errors_abi.h @@ -8,6 +8,10 @@ enum intel_guc_response_status { INTEL_GUC_RESPONSE_STATUS_SUCCESS = 0x0, + INTEL_GUC_RESPONSE_NOT_SUPPORTED = 0x20, + INTEL_GUC_RESPONSE_NO_ATTRIBUTE_TABLE = 0x201, + INTEL_GUC_RESPONSE_NO_DECRYPTION_KEY = 0x202, + INTEL_GUC_RESPONSE_DECRYPTION_FAILED = 0x204, INTEL_GUC_RESPONSE_STATUS_GENERIC_FAIL = 0xF000, }; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index f9240d4baa69..2058eb8c3d0c 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -13,6 +13,7 @@ #include "intel_guc_fw.h" #include "intel_guc_fwif.h" #include "intel_guc_ct.h" +#include "intel_guc_hwconfig.h" #include "intel_guc_log.h" #include "intel_guc_reg.h" #include "intel_guc_slpc_types.h" @@ -37,6 +38,8 @@ struct intel_guc { struct intel_guc_ct ct; /** @slpc: sub-structure containing SLPC related data and objects */ struct intel_guc_slpc slpc; + /** @hwconfig: data related to hardware configuration KLV blob */ + struct intel_guc_hwconfig hwconfig; /** @sched_engine: Global engine used to submit requests to GuC */ struct i915_sched_engine *sched_engine; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_hwconfig.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_hwconfig.c new file mode 100644 index ..ad289603460c --- /dev/null +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_hwconfig.c @@ -0,0 +1,145 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2022 Intel Corporation + */ + +#include "gt/intel_gt.h" +#include "i915_drv.h" +#in
Re: [Intel-gfx] [PATCH v5 1/4] drm/i915/guc: Add fetch of hwconfig table
On 2/25/2022 05:26, Tvrtko Ursulin wrote: On 25/02/2022 09:44, Michal Wajdeczko wrote: On 25.02.2022 06:03, Jordan Justen wrote: John Harrison writes: On 2/22/2022 02:36, Jordan Justen wrote: From: John Harrison Implement support for fetching the hardware description table from the GuC. The call is made twice - once without a destination buffer to query the size and then a second time to fill in the buffer. Note that the table is only available on ADL-P and later platforms. v5 (of Jordan's posting): * Various changes made by Jordan and recommended by Michal - Makefile ordering - Adjust "struct intel_guc_hwconfig hwconfig" comment - Set Copyright year to 2022 in intel_guc_hwconfig.c/.h - Drop inline from hwconfig_to_guc() - Replace hwconfig param with guc in __guc_action_get_hwconfig() - Move zero size check into guc_hwconfig_discover_size() - Change comment to say zero size offset/size is needed to get size - Add has_guc_hwconfig to devinfo and drop has_table() - Change drm_err to notice in __uc_init_hw() and use %pe Cc: Michal Wajdeczko Signed-off-by: Rodrigo Vivi Signed-off-by: John Harrison Reviewed-by: Matthew Brost Acked-by: Jon Bloomfield Signed-off-by: Jordan Justen --- + ret = intel_guc_hwconfig_init(&guc->hwconfig); + if (ret) + drm_notice(&i915->drm, "Failed to retrieve hwconfig table: %pe\n", Why only drm_notice? As you are keen to point out, the UMDs won't work if the table is not available. All the failure paths in your own verification function are 'drm_err'. So why is it only a 'notice' if there is no table at all? This was requested by Michal in my v3 posting: https://patchwork.freedesktop.org/patch/472936/?series=99787&rev=3 I don't think that it should be a failure for i915 if it is unable to read the table, or if the table read is invalid. I think it should be up to the UMD to react to the missing hwconfig however they think is appropriate, but I would like the i915 to guarantee & document the format returned to userspace to whatever extent is feasible. As you point out there is a discrepancy, and I think I should be consistent with whatever is used here in my "drm/i915/guc: Verify hwconfig blob matches supported format" patch. I guess I'd tend to agree with Michal that "maybe drm_notice since we continue probe", but I would go along with either if you two want to discuss further. having consistent message level is a clear benefit but on other hand these other 'errors' may indicate more serious problems related to use of wrong/incompatible firmware that returns corrupted HWconfig (or we use wrong actions), while since we are not using this HWconfig in the As stated ad nauseam, you can rule out 'corrupted' hwconfig. The GuC firmware is signed and will not load if it has become corrupted somehow. Likewise, a 'wrong/incompatible' firmware will refuse to load. So it is physically impossible for the later verification stage to ever find an error. driver we don't care that much that we failed to load HWconfig and 'notice' is enough. but I'm fine with all messages being drm_err (as we will not have to change that once again after HWconfig will be mandatory for the driver as well) I would be against drm_err. #define KERN_EMERG KERN_SOH "0" /* system is unusable */ #define KERN_ALERT KERN_SOH "1" /* action must be taken immediately */ #define KERN_CRIT KERN_SOH "2" /* critical conditions */ #define KERN_ERR KERN_SOH "3" /* error conditions */ #define KERN_WARNING KERN_SOH "4" /* warning conditions */ #define KERN_NOTICE KERN_SOH "5" /* normal but significant condition */ #define KERN_INFO KERN_SOH "6" /* informational */ #define KERN_DEBUG KERN_SOH "7" /* debug-level messages */ From the point of view of the kernel driver, this is not an error to its operation. It can at most be a warning, but notice is also fine by me. One could argue when reading "normal but significant condition" that it is not normal, when it is in fact unexpected, so if people prefer warning that is also okay by me. I still lean towards notice becuase of the hands-off nature i915 has with the pass-through of this blob. From the point of view of the KMD, i915 will load and be 'functional' if it can't talk to the hardware at all. The UMDs won't work at all but the driver load will be 'fine'. That's a requirement to be able to get the user to a software fallback desktop in order to work out why the hardware isn't working (e.g. no GuC firmware file). I would view this as similar. The KMD might have loaded but the UMDs are not functional. That is definitely an