Since we are system_highpri_wq, we expected the heartbeat to be
scheduled promptly. However, we see delays of over 10ms upsetting our
assertions. Accept this as inevitable and bump the minimum error
threshold to 20ms (from 6 jiffies).

<6> [616.784749] rcs0: Heartbeat delay: 3570us [2802, 9188]
<6> [616.807790] bcs0: Heartbeat delay: 2111us [745, 4372]
<6> [616.853776] vcs0: Heartbeat delay: 6485us [2424, 11637]
<3> [616.859296] vcs0: Heartbeat delay was 6485us, expected less than 6000us
<3> [616.860901] i915/intel_heartbeat_live_selftests: live_heartbeat_fast 
failed with error -22

v2: More context from CI.

Signed-off-by: Chris Wilson <ch...@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuopp...@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c 
b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
index b88aa35ad75b..223ab88f7e57 100644
--- a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
@@ -197,6 +197,7 @@ static int cmp_u32(const void *_a, const void *_b)
 
 static int __live_heartbeat_fast(struct intel_engine_cs *engine)
 {
+       const unsigned int error_threshold = max(20000u, jiffies_to_usecs(6));
        struct intel_context *ce;
        struct i915_request *rq;
        ktime_t t0, t1;
@@ -254,12 +255,18 @@ static int __live_heartbeat_fast(struct intel_engine_cs 
*engine)
                times[0],
                times[ARRAY_SIZE(times) - 1]);
 
-       /* Min work delay is 2 * 2 (worst), +1 for scheduling, +1 for slack */
-       if (times[ARRAY_SIZE(times) / 2] > jiffies_to_usecs(6)) {
+       /*
+        * Ideally, the upper bound on min work delay would be something like
+        * 2 * 2 (worst), +1 for scheduling, +1 for slack. In practice, we
+        * are, even with system_wq_highpri, at the mercy of the CPU scheduler
+        * and may be stuck behind some slow work for many millisecond. Such
+        * as our very own display workers.
+        */
+       if (times[ARRAY_SIZE(times) / 2] > error_threshold) {
                pr_err("%s: Heartbeat delay was %uus, expected less than 
%dus\n",
                       engine->name,
                       times[ARRAY_SIZE(times) / 2],
-                      jiffies_to_usecs(6));
+                      error_threshold);
                err = -EINVAL;
        }
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to