Hi John, On Fri, Mar 29, 2024 at 04:53:05PM -0700, john.c.harri...@intel.com wrote: > From: John Harrison <john.c.harri...@intel.com> > > The previous fix for the circlular lock splat about the busyness > worker wasn't quite complete. Even though the reset-in-progress flag > is cleared at the start of intel_uc_reset_finish, the entire function > is still inside the reset mutex lock. Not sure why the patch appeared > to fix the issue both locally and in CI. However, it is now back > again. > > There is a further complication the wedge code path within > intel_gt_reset() jumps around so much it results in nested > reset_prepare/_finish calls. That is, the call sequence is: > intel_gt_reset > | reset_prepare > | __intel_gt_set_wedged > | | reset_prepare > | | reset_finish > | reset_finish > > The nested finish means that even if the clear of the in-progress flag > was moved to the end of _finish, it would still be clear for the > entire second call. Surprisingly, this does not seem to be causing any > other problems at present. > > As an aside, a wedge on fini does not call the finish functions at > all. The reset_in_progress flag is left set (twice). > > So instead of trying to cancel the worker anywhere at all in the reset > path, just add a cancel to intel_guc_submission_fini instead. Note > that it is not a problem if the worker is still active during a reset. > Either it will run before the reset path starts locking things and > will simply block the reset code for a tiny amount of time. Or it will > run after the locks have been acquired and will early exit due to the > try-lock. > > Also, do not use the reset-in-progress flag to decide whether a > synchronous cancel is safe (from a lockdep perspective) or not. > Instead, use the actual reset mutex state (both the genuine one and > the custom rolled BACKOFF one). > > Fixes: 0e00a8814eec ("drm/i915/guc: Avoid circular locking issue on busyness > flush") > Signed-off-by: John Harrison <john.c.harri...@intel.com> > Cc: Zhanjun Dong <zhanjun.d...@intel.com> > Cc: John Harrison <john.c.harri...@intel.com> > Cc: Andi Shyti <andi.sh...@linux.intel.com> > Cc: Daniel Vetter <dan...@ffwll.ch> > Cc: Daniel Vetter <daniel.vet...@ffwll.ch> > Cc: Rodrigo Vivi <rodrigo.v...@intel.com> > Cc: Nirmoy Das <nirmoy....@intel.com> > Cc: Tvrtko Ursulin <tvrtko.ursu...@intel.com> > Cc: Umesh Nerlige Ramappa <umesh.nerlige.rama...@intel.com> > Cc: Andrzej Hajda <andrzej.ha...@intel.com> > Cc: Matt Roper <matthew.d.ro...@intel.com> > Cc: Jonathan Cavitt <jonathan.cav...@intel.com> > Cc: Prathap Kumar Valsan <prathap.kumar.val...@intel.com> > Cc: Alan Previn <alan.previn.teres.ale...@intel.com> > Cc: Madhumitha Tolakanahalli Pradeep > <madhumitha.tolakanahalli.prad...@intel.com> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospu...@intel.com> > Cc: Ashutosh Dixit <ashutosh.di...@intel.com> > Cc: Dnyaneshwar Bhadane <dnyaneshwar.bhad...@intel.com>
Looks good: Reviewed-by: Andi Shyti <andi.sh...@linux.intel.com> Thanks, Andi