On Sun, 30 Oct 2011 20:12:09 +0100 Daniel Vetter <daniel.vet...@ffwll.ch> wrote:
> In the pre-gem days with non-existing hangcheck and gpu reset code, > this timeout of 3 seconds was pretty important to avoid stuck > processes. > > But now we have the hangcheck code in gem that goes to great length > to ensure that the gpu is really dead before declaring it wedged. > > So there's no need for this timeout anymore. Actually it's even harmful > because we can bail out too early (e.g. with xscreensaver slip) > when running giant batchbuffers. And our code isn't robust enough > to properly unroll any state-changes, we pretty much rely on the gpu > reset code cleaning up the mess (like cache tracking, fencing state, > active list/request tracking, ...). > > With this change intel_begin_ring can only fail when the gpu is > wedged, and it will return -EAGAIN (like wait_request in case the > gpu reset is still outstanding). > > v2: Chris Wilson noted that on resume timers aren't running and hence > we won't ever get kicked out of this loop by the hangcheck code. Use > an insanely large timeout instead for the HAS_GEM case to prevent > resume bugs from totally hanging the machine. > > Signed-off-by: Daniel Vetter <daniel.vet...@ffwll.ch> > Reviewed-by: Chris Wilson <ch...@chris-wilson.co.uk> > --- > drivers/gpu/drm/i915/intel_ringbuffer.c | 11 ++++++++++- > 1 files changed, 10 insertions(+), 1 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c > b/drivers/gpu/drm/i915/intel_ringbuffer.c > index ca70e2f..6e28301 100644 > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > @@ -1119,7 +1119,16 @@ int intel_wait_ring_buffer(struct intel_ring_buffer > *ring, int n) > } > > trace_i915_ring_wait_begin(ring); > - end = jiffies + 3 * HZ; > + if (drm_core_check_feature(dev, DRIVER_GEM)) > + /* With GEM the hangcheck timer should kick us out of the loop, > + * leaving it early runs the risk of corrupting GEM state (due > + * to running on almost untested codepaths). But on resume > + * timers don't work yet, so prevent a complete hang in that > + * case by choosing an insanely large timeout. */ > + end = jiffies + 60 * HZ; > + else > + end = jiffies + 3 * HZ; > + > do { > ring->head = I915_READ_HEAD(ring); > ring->space = ring_space(ring); I didn't really check to see if there is actually an issue here, but instead of 60, do you want to play nice with timeouts such as CONFIG_DEFAULT_HUNG_TASK_TIMEOUT (ie. the min of all the timeouts and 60)? Acked-by: Ben Widawsky <b...@bwidawsk.net> _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx