i915: Cope with request list state change during error state capture

Chris Wilson Mon, 19 Oct 2015 09:07:20 -0700

On Mon, Oct 19, 2015 at 03:55:48PM +0100, Tomas Elf wrote:
> Since we're not synchronizing the ring request list during error state capture
> the request list state might change between the time the corresponding error
> request list was allocated and dimensioned to the time when the ring request
> list is actually captured into the error state. If this happens then do an
> early exit and be aware that the captured error state might not be fully
> reliable.
> 
> * v2:
> - Chris Wilson: Removed WARN_ON from size check since having the error state
>   request list and the live driver request list diverge like this is a
>   legitimate behaviour.
> 
> - Tomas Elf: Removed update of num_request field since this made no sense. 
> Just
>   exit and move on.
> 
> * Resend:
> - Responded to the wrong mailthread
> 
> Signed-off-by: Tomas Elf <tomas....@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gpu_error.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c 
> b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 2f04e4f..b08a76b 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1071,6 +1071,18 @@ static void i915_gem_record_rings(struct drm_device 
> *dev,
>               list_for_each_entry(request, &ring->request_list, list) {
>                       struct drm_i915_error_request *erq;
>  
> +                     if (count >= error->ring[i].num_requests) {
> +                             /*
> +                              * If the ring request list was changed in
> +                              * between the point where the error request
> +                              * list was created and dimensioned and this
> +                              * point then just exit early to avoid crashes.
> +                              */
> +                             DRM_ERROR("Request list changed size since 
> allocation (%u->%u)\n",
> +                                     error->ring[i].num_requests, count);


The error message simply isn't that interesting. That requests were
added after the gpu hang occurred doesn't affect post-mortem debugging
of the hang, and if it were at all interesting, that information should
be stored in the error state itself.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH resend v2 3/8] drm/i915: Cope with request list state change during error state capture

Reply via email to