On Mon, Jun 08, 2015 at 06:03:19PM +0100, Tomas Elf wrote:
> When submitting semaphores in execlist mode the hang checker crashes in this
> function because it is only runnable in ring submission mode. The reason this
> is of particular interest to the TDR patch series is because we use semaphores
> as a mean to induce hangs during testing (which is the recommended way to
> induce hangs for gen8+). It's not clear how this is supposed to work in
> execlist mode since:
> 
> 1. This function requires a ring buffer.
> 
> 2. Retrieving a ring buffer in execlist mode requires us to retrieve the
> corresponding context, which we get from a request.
> 
> 3. Retieving a request from the hang checker is not straight-forward since 
> that
> requires us to grab the struct_mutex in order to synchronize against the
> request retirement thread.
> 
> 4. Grabbing the struct_mutex from the hang checker is nothing that we will do
> since that puts us at risk of deadlock since a hung thread might be holding 
> the
> struct_mutex already.
> 
> Therefore it's not obvious how we're supposed to deal with this. For now, 
> we're
> doing an early exit from this function, which avoids any kernel panic 
> situation
> when running our own internal TDR ULT.
> 
> Signed-off-by: Tomas Elf <tomas....@intel.com>

We should have a Testcase: line here which mentions the igt testcase which
provoke this bug. Or we need to fill this gap asap.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_irq.c |   20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 46bcbff..40c44fc 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -2698,6 +2698,26 @@ semaphore_waits_for(struct intel_engine_cs *ring, u32 
> *seqno)
>       u64 offset = 0;
>       int i, backwards;
>  
> +     /*
> +      * This function does not support execlist mode - any attempt to
> +      * proceed further into this function will result in a kernel panic
> +      * when dereferencing ring->buffer, which is not set up in execlist
> +      * mode.
> +      *
> +      * The correct way of doing it would be to derive the currently
> +      * executing ring buffer from the current context, which is derived
> +      * from the currently running request. Unfortunately, to get the
> +      * current request we would have to grab the struct_mutex before doing
> +      * anything else, which would be ill-advised since some other thread
> +      * might have grabbed it already and managed to hang itself, causing
> +      * the hang checker to deadlock.
> +      *
> +      * Therefore, this function does not support execlist mode in its
> +      * current form. Just return NULL and move on.
> +      */
> +     if (i915.enable_execlists)
> +             return NULL;
> +
>       ipehr = I915_READ(RING_IPEHR(ring->mmio_base));
>       if (!ipehr_is_semaphore_wait(ring->dev, ipehr))
>               return NULL;
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to