On Tue, 27 Sep 2011 22:54:01 +0100 Chris Wilson <ch...@chris-wilson.co.uk> wrote:
> On Tue, 27 Sep 2011 12:38:59 -0700, Ben Widawsky <b...@bwidawsk.net> wrote: > > If we do this we lose the possibility to kick rings, but not reset the > > GPU (not that I find that terribly useful. If we do this, it does fire a > > wq event, but I don't see a problem with that for this case. > > > > I think I would rather do this: > > diff --git a/drivers/gpu/drm/i915/i915_irq.c > > b/drivers/gpu/drm/i915/i915_irq.c > > index 012732b..803524e 100644 > > --- a/drivers/gpu/drm/i915/i915_irq.c > > +++ b/drivers/gpu/drm/i915/i915_irq.c > > @@ -1698,6 +1698,10 @@ void i915_hangcheck_elapsed(unsigned long data) > > if (dev_priv->hangcheck_count++ > 1) { > > DRM_ERROR("Hangcheck timer elapsed... GPU hung\n"); > > > > + /* Save off error state before kicking the rings and > > + * possibly ruining the GPU state. > > + */ > > + i915_handle_error(dev, true); > > if (!IS_GEN2(dev)) { > > /* Is the chip hanging on a WAIT_FOR_EVENT? > > * If so we can simply poke the RB_WAIT bit > > @@ -1717,7 +1721,6 @@ void i915_hangcheck_elapsed(unsigned long data) > > goto repeat; > > } > > > > - i915_handle_error(dev, true); > > return; > > } > > } else { > > Interesting, if we simply call i915_capture_error_state() rather than move > the i195_handle_error() earlier we do in fact get the best of both worlds. We can do this except i915_handle_error() is called i915_driver_irq_handler, so we have to modify that as well. But yeah, I'm fine with that too, though I don't think it makes much difference either way. > > However, it doesn't address Daniel's statement that kick_rings() provoked > an unrecoverable hang and so we still need to disable that in order to > save the error-state. The origin of ring-kicking was to try and recover > from the modesetting/vsync issues, which apart from the outstanding issue > in intel_crtc_disable() are behind us. (I hope ;-) We shouldn't be relying > on i915_reset() and i915.reset=0 tends to be either deliberate or an act of > desparation so I don't see the issue in also preventing ring-kicking with > the same parameter. Is there an issue I'm overlooking? No issue, I just feel that this is redundant to hangcheck_enable, so to me at least, this just adds extra confusion to an already confusing situation. I seem to be in the minority though. To me it's: reset=0, don't ever try to reset enable_hangcheck=0, don't ever check if we're hung (ie. don't reset or kick) And now it's reset=0, don't every try to reset or kick enabled_hangcheck=0, don't ever check if we're hung (ie. don't reset or kick) I'd definitely be in favo(u)r of removing the kick_ring() if it isn't really useful anymore. It has some forcewake race if I remember correctly which I never bothered to fix. > -Chris > Ben _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx