i915: Interrupt driven fences

Maarten Lankhorst Thu, 21 Apr 2016 00:39:07 -0700

Op 20-04-16 om 19:09 schreef john.c.harri...@intel.com:
> From: John Harrison <john.c.harri...@intel.com>
>
> The intended usage model for struct fence is that the signalled status
> should be set on demand rather than polled. That is, there should not
> be a need for a 'signaled' function to be called everytime the status
> is queried. Instead, 'something' should be done to enable a signal
> callback from the hardware which will update the state directly. In
> the case of requests, this is the seqno update interrupt. The idea is
> that this callback will only be enabled on demand when something
> actually tries to wait on the fence.
>
> This change removes the polling test and replaces it with the callback
> scheme. Each fence is added to a 'please poke me' list at the start of
> i915_add_request(). The interrupt handler then scans through the 'poke
> me' list when a new seqno pops out and signals any matching
> fence/request. The fence is then removed from the list so the entire
> request stack does not need to be scanned every time. Note that the
> fence is added to the list before the commands to generate the seqno
> interrupt are added to the ring. Thus the sequence is guaranteed to be
> race free if the interrupt is already enabled.
>
> Note that the interrupt is only enabled on demand (i.e. when
> __wait_request() is called). Thus there is still a potential race when
> enabling the interrupt as the request may already have completed.
> However, this is simply solved by calling the interrupt processing
> code immediately after enabling the interrupt and thereby checking for
> already completed requests.
>
> Lastly, the ring clean up code has the possibility to cancel
> outstanding requests (e.g. because TDR has reset the ring). These
> requests will never get signalled and so must be removed from the
> signal list manually. This is done by setting a 'cancelled' flag and
> then calling the regular notify/retire code path rather than
> attempting to duplicate the list manipulatation and clean up code in
> multiple places. This also avoid any race condition where the
> cancellation request might occur after/during the completion interrupt
> actually arriving.
>
> v2: Updated to take advantage of the request unreference no longer
> requiring the mutex lock.
>
> v3: Move the signal list processing around to prevent unsubmitted
> requests being added to the list. This was occurring on Android
> because the native sync implementation calls the
> fence->enable_signalling API immediately on fence creation.
>
> Updated after review comments by Tvrtko Ursulin. Renamed list nodes to
> 'link' instead of 'list'. Added support for returning an error code on
> a cancelled fence. Update list processing to be more efficient/safer
> with respect to spinlocks.
>
> v5: Made i915_gem_request_submit a static as it is only ever called
> from one place.
>
> Fixed up the low latency wait optimisation. The time delay between the
> seqno value being to memory and the drive's ISR running can be
> significant, at least for the wait request micro-benchmark. This can
> be greatly improved by explicitly checking for seqno updates in the
> pre-wait busy poll loop. Also added some documentation comments to the
> busy poll code.
>
> Fixed up support for the faking of lost interrupts
> (test_irq_rings/missed_irq_rings). That is, there is an IGT test that
> tells the driver to loose interrupts deliberately and then check that
> everything still works as expected (albeit much slower).
>
> Updates from review comments: use non IRQ-save spinlocking, early exit
> on WARN and improved comments (Tvrtko Ursulin).
>
> v6: Updated to newer nigthly and resolved conflicts around the
> wait_request busy spin optimisation. Also fixed a race condition
> between this early exit path and the regular completion path.
>
> v7: Updated to newer nightly - lots of ring -> engine renaming plus an
> interface change on get_seqno(). Also added a list_empty() check
> before acquring spinlocks and doing list processing.
>
> For: VIZ-5190
> Signed-off-by: John Harrison <john.c.harri...@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursu...@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h         |   8 +
>  drivers/gpu/drm/i915/i915_gem.c         | 249 
> +++++++++++++++++++++++++++++---
>  drivers/gpu/drm/i915/i915_irq.c         |   2 +
>  drivers/gpu/drm/i915/intel_lrc.c        |   2 +
>  drivers/gpu/drm/i915/intel_ringbuffer.c |   2 +
>  drivers/gpu/drm/i915/intel_ringbuffer.h |   2 +
>  6 files changed, 241 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 81324ba..9519b11 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2255,7 +2255,12 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
>  struct drm_i915_gem_request {
>       /** Underlying object for implementing the signal/wait stuff. */
>       struct fence fence;
> +     struct list_head signal_link;
> +     struct list_head unsignal_link;
>       struct list_head delayed_free_link;
> +     bool cancelled;
> +     bool irq_enabled;
> +     bool signal_requested;
>  
>       /** On Which ring this request was generated */
>       struct drm_i915_private *i915;
> @@ -2341,6 +2346,9 @@ struct drm_i915_gem_request * __must_check
>  i915_gem_request_alloc(struct intel_engine_cs *engine,
>                      struct intel_context *ctx);
>  void i915_gem_request_cancel(struct drm_i915_gem_request *req);
> +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req,
> +                                    bool fence_locked);
> +void i915_gem_request_notify(struct intel_engine_cs *ring, bool 
> fence_locked);
>  
>  int i915_create_fence_timeline(struct drm_device *dev,
>                              struct intel_context *ctx,
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 9071058..9b1de5f 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -40,6 +40,8 @@
>  
>  #define RQ_BUG_ON(expr)
>  
> +static void i915_gem_request_submit(struct drm_i915_gem_request *req);
> +
>  static void i915_gem_object_flush_gtt_write_domain(struct 
> drm_i915_gem_object *obj);
>  static void i915_gem_object_flush_cpu_write_domain(struct 
> drm_i915_gem_object *obj);
>  static void
> @@ -1253,9 +1255,8 @@ int __i915_wait_request(struct drm_i915_gem_request 
> *req,
>       struct intel_engine_cs *engine = i915_gem_request_get_engine(req);
>       struct drm_device *dev = engine->dev;
>       struct drm_i915_private *dev_priv = dev->dev_private;
> -     const bool irq_test_in_progress =
> -             ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & 
> intel_engine_flag(engine);
>       int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
> +     uint32_t seqno;
>       DEFINE_WAIT(wait);
>       unsigned long timeout_expire;
>       s64 before = 0; /* Only to silence a compiler warning. */
> @@ -1263,9 +1264,6 @@ int __i915_wait_request(struct drm_i915_gem_request 
> *req,
>  
>       WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
>  
> -     if (list_empty(&req->list))
> -             return 0;
> -
>       if (i915_gem_request_completed(req))
>               return 0;
>  
> @@ -1291,15 +1289,17 @@ int __i915_wait_request(struct drm_i915_gem_request 
> *req,
>       trace_i915_gem_request_wait_begin(req);
>  
>       /* Optimistic spin for the next jiffie before touching IRQs */
> -     ret = __i915_spin_request(req, state);
> -     if (ret == 0)
> -             goto out;
> -
> -     if (!irq_test_in_progress && WARN_ON(!engine->irq_get(engine))) {
> -             ret = -ENODEV;
> -             goto out;
> +     if (req->seqno) {
> +             ret = __i915_spin_request(req, state);
> +             if (ret == 0)
> +                     goto out;
>       }
>  
> +     /*
> +      * Enable interrupt completion of the request.
> +      */
> +     fence_enable_sw_signaling(&req->fence);
> +
>       for (;;) {
>               struct timer_list timer;
>  
> @@ -1321,6 +1321,21 @@ int __i915_wait_request(struct drm_i915_gem_request 
> *req,
>                       break;
>               }
>  
> +             if (req->seqno) {
> +                     /*
> +                      * There is quite a lot of latency in the user interrupt
> +                      * path. So do an explicit seqno check and potentially
> +                      * remove all that delay.
> +                      */
> +                     if (req->engine->irq_seqno_barrier)
> +                             req->engine->irq_seqno_barrier(req->engine);
> +                     seqno = engine->get_seqno(engine);
You're using this barrier + get_seqno pattern multiple times, maybe write a 
helper function for this?
> +                     if (i915_seqno_passed(seqno, req->seqno)) {
> +                             ret = 0;
> +                             break;
> +                     }
> +             }
> +
>               if (signal_pending_state(state, current)) {
>                       ret = -ERESTARTSYS;
>                       break;
> @@ -1347,14 +1362,32 @@ int __i915_wait_request(struct drm_i915_gem_request 
> *req,
>                       destroy_timer_on_stack(&timer);
>               }
>       }
> -     if (!irq_test_in_progress)
> -             engine->irq_put(engine);
>  
>       finish_wait(&engine->irq_queue, &wait);
>  
>  out:
>       trace_i915_gem_request_wait_end(req);
>  
> +     if ((ret == 0) && (req->seqno)) {
> +             if (req->engine->irq_seqno_barrier)
> +                     req->engine->irq_seqno_barrier(req->engine);
> +             seqno = engine->get_seqno(engine);
> +             if (i915_seqno_passed(seqno, req->seqno) &&
> +                 !i915_gem_request_completed(req)) {
> +                     /*
> +                      * Make sure the request is marked as completed before
> +                      * returning. NB: Need to acquire the spinlock around
> +                      * the whole call to avoid a race condition with the
> +                      * interrupt handler is running concurrently and could
> +                      * cause this invocation to early exit even though the
> +                      * request has not actually been fully processed yet.
> +                      */
> +                     spin_lock_irq(&req->engine->fence_lock);
> +                     i915_gem_request_notify(req->engine, true);
> +                     spin_unlock_irq(&req->engine->fence_lock);
> +             }
> +     }
> +
>       if (timeout) {
>               s64 tres = *timeout - (ktime_get_raw_ns() - before);
>  
> @@ -1432,6 +1465,22 @@ static void i915_gem_request_retire(struct 
> drm_i915_gem_request *request)
>       list_del_init(&request->list);
>       i915_gem_request_remove_from_client(request);
>  
> +     /*
> +      * In case the request is still in the signal pending list,
> +      * e.g. due to being cancelled by TDR, preemption, etc.
> +      */
> +     if (!list_empty(&request->signal_link)) {
> +             /*
> +              * The request must be marked as cancelled and the underlying
> +              * fence as failed. NB: There is no explicit fence fail API,
> +              * there is only a manual poke and signal.
> +              */
> +             request->cancelled = true;
> +             /* How to propagate to any associated sync_fence??? */
> +             request->fence.status = -EIO;
> +             fence_signal_locked(&request->fence);
> +     }
>
Can this still happen with


commit aa9b78104fe3210758fa9e6c644e9a108d371e8b
Author: Chris Wilson <ch...@chris-wilson.co.uk>
Date:   Wed Apr 13 17:35:15 2016 +0100

    drm/i915: Late request cancellations are harmful

?
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v7 6/8] drm/i915: Interrupt driven fences

Reply via email to