Chris Wilson <ch...@chris-wilson.co.uk> writes:

> Resetting the engine requires us to hold the forcewake wakeref to
> prevent RC6 trying to happen in the middle of the reset sequence. The
> consequence of an unwanted RC6 event in the middle is that random state
> is then saved to the powercontext and restored later, which may
> overwrite the mmio state we need to preserve (e.g. PD_DIR_BASE in the
> legacy ringbuffer reset_ring_common()).
>
> This was noticed in the live_hangcheck selftests when Haswell would
> sporadically fail to restart during igt_reset_queue().
>
> Signed-off-by: Chris Wilson <ch...@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuopp...@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 17 +++++++++++++++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 82a10036fb38..eba23c239aae 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2832,7 +2832,17 @@ i915_gem_reset_prepare_engine(struct intel_engine_cs 
> *engine)
>  {
>       struct drm_i915_gem_request *request = NULL;
>  
> -     /* Prevent the signaler thread from updating the request
> +     /*
> +      * During the reset sequence, we must prevent the engine from
> +      * entering RC6. As the context state is undefined until we restart
> +      * the engine, if it does enter RC6 during the reset, the state
> +      * written to the powercontext is undefined and so we may lose
> +      * GPU state upon resume, i.e. fail to restart after a reset.
> +      */
> +     intel_uncore_forcewake_get(engine->i915, FORCEWAKE_ALL);

We do nested get when actually issuing the hw commands. I would
still keep them there and consider changing them to asserts
some day.

Reviewed-by: Mika Kuoppala <mika.kuopp...@linux.intel.com>

> +
> +     /*
> +      * Prevent the signaler thread from updating the request
>        * state (by calling dma_fence_signal) as we are processing
>        * the reset. The write from the GPU of the seqno is
>        * asynchronous and the signaler thread may see a different
> @@ -2843,7 +2853,8 @@ i915_gem_reset_prepare_engine(struct intel_engine_cs 
> *engine)
>        */
>       kthread_park(engine->breadcrumbs.signaler);
>  
> -     /* Prevent request submission to the hardware until we have
> +     /*
> +      * Prevent request submission to the hardware until we have
>        * completed the reset in i915_gem_reset_finish(). If a request
>        * is completed by one engine, it may then queue a request
>        * to a second via its engine->irq_tasklet *just* as we are
> @@ -3033,6 +3044,8 @@ void i915_gem_reset_finish_engine(struct 
> intel_engine_cs *engine)
>  {
>       tasklet_enable(&engine->execlists.irq_tasklet);
>       kthread_unpark(engine->breadcrumbs.signaler);
> +
> +     intel_uncore_forcewake_put(engine->i915, FORCEWAKE_ALL);
>  }
>  
>  void i915_gem_reset_finish(struct drm_i915_private *dev_priv)
> -- 
> 2.14.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to