On 21/11/2017 15:24, Chris Wilson wrote:
Instead of sleeping for a fixed 1ms (roughly, depending on timer slack),
start with a small sleep and exponentially increase the sleep on each
cycle.

A good example of a beneficiary is the guc mmio communication channel.
Typically we expect (and so spin) for 10us for a quick response, but this
doesn't cover everything and so sometimes we fallback to the millisecond+
sleep. This incurs a significant delay in time-critical operations like
preemption (igt/gem_exec_latency), which can be improved significantly by
using a small sleep after the spin fails.

We've made this suggestion many times, but had little experimental data
to support adding the complexity.

References: 1758b90e38f5 ("drm/i915: Use a hybrid scheme for fast register 
waits")
Signed-off-by: Chris Wilson <ch...@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursu...@intel.com>
Cc: John Harrison <john.c.harri...@intel.com>
Cc: Michał Winiarski <michal.winiar...@intel.com>
Cc: Ville Syrjala <ville.syrj...@linux.intel.com>
---
  drivers/gpu/drm/i915/intel_drv.h | 5 ++++-
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 69aab324aaa1..c1ea9a009eb4 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -50,6 +50,7 @@
   */
  #define _wait_for(COND, US, W) ({ \
        unsigned long timeout__ = jiffies + usecs_to_jiffies(US) + 1;   \
+       long wait__ = 1;                                                \
        int ret__;                                                      \
        might_sleep();                                                  \
        for (;;) {                                                      \
@@ -62,7 +63,9 @@
                        ret__ = -ETIMEDOUT;                             \
                        break;                                          \
                }                                                       \
-               usleep_range((W), (W) * 2);                             \
+               usleep_range(wait__, wait__ * 2);                       \
+               if (wait__ < (W))                                    \
+                       wait__ <<= 1;                                     \
        }                                                               \
        ret__;                                                          \
  })


I would start the period at 10us since a) <10us is not recommended for usleep family, b) most callers specify ms timeouts so <10us poll is perhaps an overkill.

Latency sensitive callers like __intel_wait_for_register_us can be tweaked at the call site to provide what they want.

For the actual guc mmio send it sounds like it should pass in 20us to __intel_wait_for_register_us (referring to John's explanation email) to cover 99% of the cases. And then the remaining 1% could be fine with a 10us delay?

Otherwise we are effectively making _wait_for partially busy looping, or whatever the inefficiency in <10us usleep is. I mean, it makes no practical difference to make a handful of quick loops there but it feels a bit inelegant.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to