Chris Wilson <ch...@chris-wilson.co.uk> writes:

> If we fail to reset the GPU, we declare the machine wedged. However, the
> GPU may well still be running in the background with an in-flight
> request. So despite our efforts in cleaning up the request queue and
> faking the breadcrumb in the HWSP, the GPU may eventually write the
> in-flght seqno there breaking all of our assumptions and throwing the
> driver into a deep turmoil, wedging beyond wedged.
>
> To avoid this we ideally want to reset the GPU. Since that has already
> failed, make sure the rings have the stop bit set instead. This is part
> of the normal GPU reset sequence, but that is actually disabled by
> igt/gem_eio to force the wedged state. If we assume the worst, we must
> poke at the bit again before we give up.
>
> v2: Move the intel_gpu_reset() from set-wedged in the reset error path
> into i915_gem_set_wedged() itself. Even if the reset fails (e.g. if it is
> disabled by gem_eio), it still tries to make sure the engines are
> stopped. For i915_gem_set_wedged() callers from outside of i915_reset(),
> this should make sure the GPU is disabled while the driver is marked as
> being wedged.
>
> Testcase: igt/gem_eio
> Signed-off-by: Chris Wilson <ch...@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuopp...@linux.intel.com>
> Cc: Michał Winiarski <michal.winiar...@intel.com>
> Cc: Michal Wajdeczko <michal.wajdec...@intel.com>
> Cc: Michel Thierry <michel.thie...@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.c | 1 -
>  drivers/gpu/drm/i915/i915_gem.c | 3 +++
>  2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index f03555efc520..3df5193487f3 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -1995,7 +1995,6 @@ void i915_reset(struct drm_i915_private *i915, unsigned 
> int flags)
>  error:
>       i915_gem_set_wedged(i915);
>       i915_retire_requests(i915);
> -     intel_gpu_reset(i915, ALL_ENGINES);
>       goto finish;
>  }
>  
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 2fbd622bba30..802df8e1a544 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3246,6 +3246,9 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
>       }
>       i915->caps.scheduler = 0;
>  
> +     /* Even if the GPU reset fails, it should still stop the engines */
> +     intel_gpu_reset(i915, ALL_ENGINES);
> +

Comment is very welcome in here as modparm.reset usage isn't
so transparent.

Reviewed-by: Mika Kuoppala <mika.kuopp...@linux.intel.com>

>       /*
>        * Make sure no one is running the old callback before we proceed with
>        * cancelling requests and resetting the completion tracking. Otherwise
> -- 
> 2.16.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to