On Tue, Nov 21, 2017 at 08:59:51PM +0000, Chris Wilson wrote:
> Instead of sleeping for a fixed 1ms (roughly, depending on timer slack),
> start with a small sleep and exponentially increase the sleep on each
> cycle.
> 
> A good example of a beneficiary is the guc mmio communication channel.
> Typically we expect (and so spin) for 10us for a quick response, but this
> doesn't cover everything and so sometimes we fallback to the millisecond+
> sleep. This incurs a significant delay in time-critical operations like
> preemption (igt/gem_exec_latency), which can be improved significantly by
> using a small sleep after the spin fails.
> 
> We've made this suggestion many times, but had little experimental data
> to support adding the complexity.
> 
> v2: Bump the minimum usleep to 10us on advice of
> Documentation/timers/timers-howto.txt (Tvrko)
> v3: Specify min, max range for usleep intervals -- some code may
> crucially depend upon and so want to specify the sleep pattern.

Since we see the effects for GuC preeption, let's gather some evidence.

(SKL)
intel_guc_send_mmio latency: 100 rounds of gem_exec_latency --r '*-preemption'

drm-tip:
     usecs               : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 0        |                                        |
         8 -> 15         : 44       |                                        |
        16 -> 31         : 1088     |                                        |
        32 -> 63         : 832      |                                        |
        64 -> 127        : 0        |                                        |
       128 -> 255        : 0        |                                        |
       256 -> 511        : 12       |                                        |
       512 -> 1023       : 0        |                                        |
      1024 -> 2047       : 29899    |*********                               |
      2048 -> 4095       : 131033   |****************************************|

exponential backoff:
     usecs               : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 0        |                                        |
         8 -> 15         : 58       |                                        |
        16 -> 31         : 2009     |                                        |
        32 -> 63         : 153615   |****************************************|
        64 -> 127        : 4360     |*                                       |
       128 -> 255        : 14       |                                        |
       256 -> 511        : 4236     |*                                       |
       512 -> 1023       : 4        |                                        |

s/10us/100us in __intel_wait_for_register_fw fast_timeout (without exponential
backoff):
     usecs               : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 0        |                                        |
         8 -> 15         : 946      |                                        |
        16 -> 31         : 159803   |****************************************|
        32 -> 63         : 3440     |                                        |
        64 -> 127        : 33       |                                        |
       128 -> 255        : 1        |                                        |
       256 -> 511        : 74       |                                        |

same as the previous one, except with lower GPU clock, let's say max == hw_min:
     usecs               : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 0        |                                        |
         8 -> 15         : 0        |                                        |
        16 -> 31         : 26       |                                        |
        32 -> 63         : 160411   |****************************************|
        64 -> 127        : 3402     |                                        |
       128 -> 255        : 12       |                                        |
       256 -> 511        : 50       |                                        |
       512 -> 1023       : 3        |                                        |
      1024 -> 2047       : 1        |                                        |
      2048 -> 4095       : 2        |                                        |

We're waaay better with exponential backoff compared with current drm-tip, and
we're almost as good as with spinning for a longer period. Unfortunately with
GuC the delay depends on platform (and/or GPU clock?), so we're still seeing the
long tail without exponential backoff if we manage to miss the fast timeout.

> References: 1758b90e38f5 ("drm/i915: Use a hybrid scheme for fast register 
> waits")
> Signed-off-by: Chris Wilson <ch...@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursu...@intel.com>
> Cc: John Harrison <john.c.harri...@intel.com>
> Cc: Michał Winiarski <michal.winiar...@intel.com>
> Cc: Ville Syrjala <ville.syrj...@linux.intel.com>

Reviewed-by: Michał Winiarski <michal.winiar...@intel.com>

-Michał

> ---
>  drivers/gpu/drm/i915/intel_drv.h | 11 +++++++----
>  drivers/gpu/drm/i915/intel_pm.c  |  2 +-
>  2 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_drv.h 
> b/drivers/gpu/drm/i915/intel_drv.h
> index 635a96fcd788..c00441a3d649 100644
> --- a/drivers/gpu/drm/i915/intel_drv.h
> +++ b/drivers/gpu/drm/i915/intel_drv.h
> @@ -48,8 +48,9 @@
>   * having timed out, since the timeout could be due to preemption or similar 
> and
>   * we've never had a chance to check the condition before the timeout.
>   */
> -#define _wait_for(COND, US, W) ({ \
> +#define _wait_for(COND, US, Wmin, Wmax) ({ \
>       unsigned long timeout__ = jiffies + usecs_to_jiffies(US) + 1;   \
> +     long wait__ = (Wmin); /* recommended min for usleep is 10 us */ \
>       int ret__;                                                      \
>       might_sleep();                                                  \
>       for (;;) {                                                      \
> @@ -62,12 +63,14 @@
>                       ret__ = -ETIMEDOUT;                             \
>                       break;                                          \
>               }                                                       \
> -             usleep_range((W), (W) * 2);                             \
> +             usleep_range(wait__, wait__ * 2);                       \
> +             if (wait__ < (Wmax))                                    \
> +                     wait__ <<= 1;                                   \
>       }                                                               \
>       ret__;                                                          \
>  })
>  
> -#define wait_for(COND, MS)           _wait_for((COND), (MS) * 1000, 1000)
> +#define wait_for(COND, MS)   _wait_for((COND), (MS) * 1000, 10, 1000)
>  
>  /* If CONFIG_PREEMPT_COUNT is disabled, in_atomic() always reports false. */
>  #if defined(CONFIG_DRM_I915_DEBUG) && defined(CONFIG_PREEMPT_COUNT)
> @@ -116,7 +119,7 @@
>       int ret__; \
>       BUILD_BUG_ON(!__builtin_constant_p(US)); \
>       if ((US) > 10) \
> -             ret__ = _wait_for((COND), (US), 10); \
> +             ret__ = _wait_for((COND), (US), 10, 10); \
>       else \
>               ret__ = _wait_for_atomic((COND), (US), 0); \
>       ret__; \
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index e445ec174831..f07f14ae198d 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -9294,7 +9294,7 @@ int skl_pcode_request(struct drm_i915_private 
> *dev_priv, u32 mbox, u32 request,
>               ret = 0;
>               goto out;
>       }
> -     ret = _wait_for(COND, timeout_base_ms * 1000, 10);
> +     ret = _wait_for(COND, timeout_base_ms * 1000, 10, 10);
>       if (!ret)
>               goto out;
>  
> -- 
> 2.15.0
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to