On Sun, 30 Oct 2011 20:12:09 +0100
Daniel Vetter <daniel.vet...@ffwll.ch> wrote:

> In the pre-gem days with non-existing hangcheck and gpu reset code,
> this timeout of 3 seconds was pretty important to avoid stuck
> processes.
> 
> But now we have the hangcheck code in gem that goes to great length
> to ensure that the gpu is really dead before declaring it wedged.
> 
> So there's no need for this timeout anymore. Actually it's even harmful
> because we can bail out too early (e.g. with xscreensaver slip)
> when running giant batchbuffers. And our code isn't robust enough
> to properly unroll any state-changes, we pretty much rely on the gpu
> reset code cleaning up the mess (like cache tracking, fencing state,
> active list/request tracking, ...).
> 
> With this change intel_begin_ring can only fail when the gpu is
> wedged, and it will return -EAGAIN (like wait_request in case the
> gpu reset is still outstanding).
> 
> v2: Chris Wilson noted that on resume timers aren't running and hence
> we won't ever get kicked out of this loop by the hangcheck code. Use
> an insanely large timeout instead for the HAS_GEM case to prevent
> resume bugs from totally hanging the machine.
> 
> Signed-off-by: Daniel Vetter <daniel.vet...@ffwll.ch>
> Reviewed-by: Chris Wilson <ch...@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/intel_ringbuffer.c |   11 ++++++++++-
>  1 files changed, 10 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
> b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index ca70e2f..6e28301 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1119,7 +1119,16 @@ int intel_wait_ring_buffer(struct intel_ring_buffer 
> *ring, int n)
>       }
>  
>       trace_i915_ring_wait_begin(ring);
> -     end = jiffies + 3 * HZ;
> +     if (drm_core_check_feature(dev, DRIVER_GEM))
> +             /* With GEM the hangcheck timer should kick us out of the loop,
> +              * leaving it early runs the risk of corrupting GEM state (due
> +              * to running on almost untested codepaths). But on resume
> +              * timers don't work yet, so prevent a complete hang in that
> +              * case by choosing an insanely large timeout. */
> +             end = jiffies + 60 * HZ;
> +     else
> +             end = jiffies + 3 * HZ;
> +
>       do {
>               ring->head = I915_READ_HEAD(ring);
>               ring->space = ring_space(ring);

I didn't really check to see if there is actually an issue here, but
instead of 60, do you want to play nice with timeouts such as
CONFIG_DEFAULT_HUNG_TASK_TIMEOUT (ie. the min of all the timeouts and
60)?

Acked-by: Ben Widawsky <b...@bwidawsk.net>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to