On 21/11/2017 15:24, Chris Wilson wrote:
Instead of sleeping for a fixed 1ms (roughly, depending on timer slack),
start with a small sleep and exponentially increase the sleep on each
cycle.
A good example of a beneficiary is the guc mmio communication channel.
Typically we expect (and so spin) for 10us for a quick response, but this
doesn't cover everything and so sometimes we fallback to the millisecond+
sleep. This incurs a significant delay in time-critical operations like
preemption (igt/gem_exec_latency), which can be improved significantly by
using a small sleep after the spin fails.
We've made this suggestion many times, but had little experimental data
to support adding the complexity.
References: 1758b90e38f5 ("drm/i915: Use a hybrid scheme for fast register
waits")
Signed-off-by: Chris Wilson <ch...@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursu...@intel.com>
Cc: John Harrison <john.c.harri...@intel.com>
Cc: Michał Winiarski <michal.winiar...@intel.com>
Cc: Ville Syrjala <ville.syrj...@linux.intel.com>
---
drivers/gpu/drm/i915/intel_drv.h | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 69aab324aaa1..c1ea9a009eb4 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -50,6 +50,7 @@
*/
#define _wait_for(COND, US, W) ({ \
unsigned long timeout__ = jiffies + usecs_to_jiffies(US) + 1; \
+ long wait__ = 1; \
int ret__; \
might_sleep(); \
for (;;) { \
@@ -62,7 +63,9 @@
ret__ = -ETIMEDOUT; \
break; \
} \
- usleep_range((W), (W) * 2); \
+ usleep_range(wait__, wait__ * 2); \
+ if (wait__ < (W)) \
+ wait__ <<= 1; \
} \
ret__; \
})
I would start the period at 10us since a) <10us is not recommended for
usleep family, b) most callers specify ms timeouts so <10us poll is
perhaps an overkill.
Latency sensitive callers like __intel_wait_for_register_us can be
tweaked at the call site to provide what they want.
For the actual guc mmio send it sounds like it should pass in 20us to
__intel_wait_for_register_us (referring to John's explanation email) to
cover 99% of the cases. And then the remaining 1% could be fine with a
10us delay?
Otherwise we are effectively making _wait_for partially busy looping, or
whatever the inefficiency in <10us usleep is. I mean, it makes no
practical difference to make a handful of quick loops there but it feels
a bit inelegant.
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx