With a broken GPU we expect it to fail during the initial
GPU setup where do a couple of context switches to record the defaults.
This is a task that takes a few milliseconds even on the slowest of
devices, but we may have to wait 60s for hangcheck to give in and
declare the machine inoperable. In this a case where any gpu hang is
unacceptable, both from a timeliness and practical standpoint.

We can therefore set a timeout on our wait-for-idle that is shorter than
the hangcheck (which may be up to 60s for a declaring a wedged driver)
and so detect the broken GPU much more quickly during driver load (and
so prevent stalling userspace for ages).

Signed-off-by: Chris Wilson <ch...@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahti...@linux.intel.com>
Cc: Mika Kuoppala <mika.kuopp...@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursu...@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuopp...@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 91d705a67d38..4ee65b9941e9 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -5362,11 +5362,10 @@ static int __intel_engines_record_defaults(struct 
drm_i915_private *i915)
        if (err)
                goto err_active;
 
-       err = i915_gem_wait_for_idle(i915,
-                                    I915_WAIT_LOCKED,
-                                    MAX_SCHEDULE_TIMEOUT);
-       if (err)
+       if (i915_gem_wait_for_idle(i915, I915_WAIT_LOCKED, HZ / 5)) {
+               err = -EIO; /* Caller will declare us wedged */
                goto err_active;
+       }
 
        assert_kernel_context_is_current(i915);
 
-- 
2.18.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to