Chris Wilson <ch...@chris-wilson.co.uk> writes:

> The signal threads may be running concurrently with the GPU reset. The
> completion from the GPU run asynchronous with the reset and two threads
> may see different snapshots of the state, and the signaler may mark a
> request as complete as we try to reset it. We don't tolerate 2 different
> views of the same state and complain if we try to mark a request as
> failed if it is already complete. Disable the signal threads during
> reset to prevent this conflict (even though the conflict implies that
> the state we resetting to is invalid, we have already made our
> decision!).
>
> References: https://bugs.freedesktop.org/show_bug.cgi?id=99671
> Signed-off-by: Chris Wilson <ch...@chris-wilson.co.uk>

Reviewed-by: Mika Kuoppala <mika.kuopp...@intel.com>

> Cc: Mika Kuoppala <mika.kuopp...@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursu...@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c          | 16 +++++++++++++++-
>  drivers/gpu/drm/i915/intel_breadcrumbs.c |  3 +++
>  2 files changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 3c2c2296c1dc..730804ce9610 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -35,6 +35,7 @@
>  #include "intel_frontbuffer.h"
>  #include "intel_mocs.h"
>  #include <linux/dma-fence-array.h>
> +#include <linux/kthread.h>
>  #include <linux/reservation.h>
>  #include <linux/shmem_fs.h>
>  #include <linux/slab.h>
> @@ -2643,6 +2644,17 @@ int i915_gem_reset_prepare(struct drm_i915_private 
> *dev_priv)
>       for_each_engine(engine, dev_priv, id) {
>               struct drm_i915_gem_request *request;
>  
> +             /* Prevent the signaler thread from updating the request
> +              * state (by calling dma_fence_signal) as we are processing
> +              * the reset. The write from the GPU of the seqno is
> +              * asynchronous and the signaler thread may see a different
> +              * value to us and declare the request complete, even though
> +              * the reset routine have picked that request as the active
> +              * (incomplete) request. This conflict is not handled
> +              * gracefully!
> +              */
> +             kthread_park(engine->breadcrumbs.signaler);
> +
>               /* Prevent request submission to the hardware until we have
>                * completed the reset in i915_gem_reset_finish(). If a request
>                * is completed by one engine, it may then queue a request
> @@ -2796,8 +2808,10 @@ void i915_gem_reset_finish(struct drm_i915_private 
> *dev_priv)
>  
>       lockdep_assert_held(&dev_priv->drm.struct_mutex);
>  
> -     for_each_engine(engine, dev_priv, id)
> +     for_each_engine(engine, dev_priv, id) {
>               tasklet_enable(&engine->irq_tasklet);
> +             kthread_unpark(engine->breadcrumbs.signaler);
> +     }
>  }
>  
>  static void nop_submit_request(struct drm_i915_gem_request *request)
> diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c 
> b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> index 9fd002bcebb6..f5e05343110a 100644
> --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> @@ -490,6 +490,9 @@ static int intel_breadcrumbs_signaler(void *arg)
>                               break;
>  
>                       schedule();
> +
> +                     if (kthread_should_park())
> +                             kthread_parkme();
>               }
>       } while (1);
>       __set_current_state(TASK_RUNNING);
> -- 
> 2.11.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to