Op 02-06-16 om 15:25 schreef Tvrtko Ursulin: > > On 01/06/16 18:07, john.c.harri...@intel.com wrote: >> From: John Harrison <john.c.harri...@intel.com> >> >> The intended usage model for struct fence is that the signalled status >> should be set on demand rather than polled. That is, there should not >> be a need for a 'signaled' function to be called everytime the status >> is queried. Instead, 'something' should be done to enable a signal >> callback from the hardware which will update the state directly. In >> the case of requests, this is the seqno update interrupt. The idea is >> that this callback will only be enabled on demand when something >> actually tries to wait on the fence. >> >> This change removes the polling test and replaces it with the callback >> scheme. Each fence is added to a 'please poke me' list at the start of >> i915_add_request(). The interrupt handler then scans through the 'poke >> me' list when a new seqno pops out and signals any matching >> fence/request. The fence is then removed from the list so the entire >> request stack does not need to be scanned every time. Note that the >> fence is added to the list before the commands to generate the seqno >> interrupt are added to the ring. Thus the sequence is guaranteed to be >> race free if the interrupt is already enabled. >> >> Note that the interrupt is only enabled on demand (i.e. when >> __wait_request() is called). Thus there is still a potential race when >> enabling the interrupt as the request may already have completed. >> However, this is simply solved by calling the interrupt processing >> code immediately after enabling the interrupt and thereby checking for >> already completed requests. >> >> Lastly, the ring clean up code has the possibility to cancel >> outstanding requests (e.g. because TDR has reset the ring). These >> requests will never get signalled and so must be removed from the >> signal list manually. This is done by setting a 'cancelled' flag and >> then calling the regular notify/retire code path rather than >> attempting to duplicate the list manipulatation and clean up code in >> multiple places. This also avoid any race condition where the >> cancellation request might occur after/during the completion interrupt >> actually arriving. >> >> v2: Updated to take advantage of the request unreference no longer >> requiring the mutex lock. >> >> v3: Move the signal list processing around to prevent unsubmitted >> requests being added to the list. This was occurring on Android >> because the native sync implementation calls the >> fence->enable_signalling API immediately on fence creation. >> >> Updated after review comments by Tvrtko Ursulin. Renamed list nodes to >> 'link' instead of 'list'. Added support for returning an error code on >> a cancelled fence. Update list processing to be more efficient/safer >> with respect to spinlocks. >> >> v5: Made i915_gem_request_submit a static as it is only ever called >> from one place. >> >> Fixed up the low latency wait optimisation. The time delay between the >> seqno value being to memory and the drive's ISR running can be >> significant, at least for the wait request micro-benchmark. This can >> be greatly improved by explicitly checking for seqno updates in the >> pre-wait busy poll loop. Also added some documentation comments to the >> busy poll code. >> >> Fixed up support for the faking of lost interrupts >> (test_irq_rings/missed_irq_rings). That is, there is an IGT test that >> tells the driver to loose interrupts deliberately and then check that >> everything still works as expected (albeit much slower). >> >> Updates from review comments: use non IRQ-save spinlocking, early exit >> on WARN and improved comments (Tvrtko Ursulin). >> >> v6: Updated to newer nigthly and resolved conflicts around the >> wait_request busy spin optimisation. Also fixed a race condition >> between this early exit path and the regular completion path. >> >> v7: Updated to newer nightly - lots of ring -> engine renaming plus an >> interface change on get_seqno(). Also added a list_empty() check >> before acquring spinlocks and doing list processing. >> >> v8: Updated to newer nightly - changes to request clean up code mean >> non of the deferred free mess is needed any more. >> >> v9: Moved the request completion processing out of the interrupt >> handler and into a worker thread (Chris Wilson). >> >> For: VIZ-5190 >> Signed-off-by: John Harrison <john.c.harri...@intel.com> >> Cc: Tvrtko Ursulin <tvrtko.ursu...@intel.com> >> Cc: Maarten Lankhorst <maarten.lankho...@linux.intel.com> >> --- >> drivers/gpu/drm/i915/i915_dma.c | 9 +- >> drivers/gpu/drm/i915/i915_drv.h | 11 ++ >> drivers/gpu/drm/i915/i915_gem.c | 248 >> +++++++++++++++++++++++++++++--- >> drivers/gpu/drm/i915/i915_irq.c | 2 + >> drivers/gpu/drm/i915/intel_lrc.c | 5 + >> drivers/gpu/drm/i915/intel_ringbuffer.c | 5 + >> drivers/gpu/drm/i915/intel_ringbuffer.h | 3 + >> 7 files changed, 260 insertions(+), 23 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/i915_dma.c >> b/drivers/gpu/drm/i915/i915_dma.c >> index 07edaed..f8f60bb 100644 >> --- a/drivers/gpu/drm/i915/i915_dma.c >> +++ b/drivers/gpu/drm/i915/i915_dma.c >> @@ -1019,9 +1019,13 @@ static int i915_workqueues_init(struct >> drm_i915_private *dev_priv) >> if (dev_priv->wq == NULL) >> goto out_err; >> >> + dev_priv->req_wq = alloc_ordered_workqueue("i915-rq", 0); >> + if (dev_priv->req_wq == NULL) >> + goto out_free_wq; >> + > > Single (per-device) ordered workqueue will serialize interrupt processing > across all engines to one thread. Together with the fact request worker does > not seem to need the sleeping context, I am thinking that a tasklet per > engine would be much better (see engine->irq_tasklet for an example). > >> dev_priv->hotplug.dp_wq = alloc_ordered_workqueue("i915-dp", 0); >> if (dev_priv->hotplug.dp_wq == NULL) >> - goto out_free_wq; >> + goto out_free_req_wq; >> >> dev_priv->gpu_error.hangcheck_wq = >> alloc_ordered_workqueue("i915-hangcheck", 0); >> @@ -1032,6 +1036,8 @@ static int i915_workqueues_init(struct >> drm_i915_private *dev_priv) >> >> out_free_dp_wq: >> destroy_workqueue(dev_priv->hotplug.dp_wq); >> +out_free_req_wq: >> + destroy_workqueue(dev_priv->req_wq); >> out_free_wq: >> destroy_workqueue(dev_priv->wq); >> out_err: >> @@ -1044,6 +1050,7 @@ static void i915_workqueues_cleanup(struct >> drm_i915_private *dev_priv) >> { >> destroy_workqueue(dev_priv->gpu_error.hangcheck_wq); >> destroy_workqueue(dev_priv->hotplug.dp_wq); >> + destroy_workqueue(dev_priv->req_wq); >> destroy_workqueue(dev_priv->wq); >> } >> >> diff --git a/drivers/gpu/drm/i915/i915_drv.h >> b/drivers/gpu/drm/i915/i915_drv.h >> index 69c3412..5a7f256 100644 >> --- a/drivers/gpu/drm/i915/i915_drv.h >> +++ b/drivers/gpu/drm/i915/i915_drv.h >> @@ -1851,6 +1851,9 @@ struct drm_i915_private { >> */ >> struct workqueue_struct *wq; >> >> + /* Work queue for request completion processing */ >> + struct workqueue_struct *req_wq; >> + >> /* Display functions */ >> struct drm_i915_display_funcs display; >> >> @@ -2359,6 +2362,10 @@ struct drm_i915_gem_request { >> */ >> struct fence fence; >> struct rcu_head rcu_head; >> + struct list_head signal_link; >> + bool cancelled; >> + bool irq_enabled; >> + bool signal_requested; >> >> /** On Which ring this request was generated */ >> struct drm_i915_private *i915; >> @@ -2460,6 +2467,10 @@ struct drm_i915_gem_request { >> struct drm_i915_gem_request * __must_check >> i915_gem_request_alloc(struct intel_engine_cs *engine, >> struct i915_gem_context *ctx); >> +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req, >> + bool fence_locked); >> +void i915_gem_request_notify(struct intel_engine_cs *ring, bool >> fence_locked); >> +void i915_gem_request_worker(struct work_struct *work); >> >> static inline bool i915_gem_request_completed(struct drm_i915_gem_request >> *req) >> { >> diff --git a/drivers/gpu/drm/i915/i915_gem.c >> b/drivers/gpu/drm/i915/i915_gem.c >> index 97e3138..83cf9b0 100644 >> --- a/drivers/gpu/drm/i915/i915_gem.c >> +++ b/drivers/gpu/drm/i915/i915_gem.c >> @@ -39,6 +39,8 @@ >> #include <linux/pci.h> >> #include <linux/dma-buf.h> >> >> +static void i915_gem_request_submit(struct drm_i915_gem_request *req); >> + >> static void i915_gem_object_flush_gtt_write_domain(struct >> drm_i915_gem_object *obj); >> static void i915_gem_object_flush_cpu_write_domain(struct >> drm_i915_gem_object *obj); >> static void >> @@ -1237,9 +1239,8 @@ int __i915_wait_request(struct drm_i915_gem_request >> *req, >> { >> struct intel_engine_cs *engine = i915_gem_request_get_engine(req); >> struct drm_i915_private *dev_priv = req->i915; >> - const bool irq_test_in_progress = >> - ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & >> intel_engine_flag(engine); >> int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE; >> + uint32_t seqno; >> DEFINE_WAIT(wait); >> unsigned long timeout_expire; >> s64 before = 0; /* Only to silence a compiler warning. */ >> @@ -1247,9 +1248,6 @@ int __i915_wait_request(struct drm_i915_gem_request >> *req, >> >> WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled"); >> >> - if (list_empty(&req->list)) >> - return 0; >> - >> if (i915_gem_request_completed(req)) >> return 0; >> >> @@ -1275,15 +1273,17 @@ int __i915_wait_request(struct drm_i915_gem_request >> *req, >> trace_i915_gem_request_wait_begin(req); >> >> /* Optimistic spin for the next jiffie before touching IRQs */ >> - ret = __i915_spin_request(req, state); >> - if (ret == 0) >> - goto out; >> - >> - if (!irq_test_in_progress && WARN_ON(!engine->irq_get(engine))) { >> - ret = -ENODEV; >> - goto out; >> + if (req->seqno) { > > This needs a comment I think because it is so unusual and new that req->seqno > == 0 is a special path. To explain why and how it can happen here. > >> + ret = __i915_spin_request(req, state); >> + if (ret == 0) >> + goto out; >> } >> >> + /* >> + * Enable interrupt completion of the request. >> + */ >> + fence_enable_sw_signaling(&req->fence); >> + >> for (;;) { >> struct timer_list timer; >> >> @@ -1306,6 +1306,21 @@ int __i915_wait_request(struct drm_i915_gem_request >> *req, >> break; >> } >> >> + if (req->seqno) { >> + /* >> + * There is quite a lot of latency in the user interrupt >> + * path. So do an explicit seqno check and potentially >> + * remove all that delay. >> + */ >> + if (req->engine->irq_seqno_barrier) >> + req->engine->irq_seqno_barrier(req->engine); >> + seqno = engine->get_seqno(engine); >> + if (i915_seqno_passed(seqno, req->seqno)) { >> + ret = 0; >> + break; >> + } >> + } >> + >> if (signal_pending_state(state, current)) { >> ret = -ERESTARTSYS; >> break; >> @@ -1332,14 +1347,32 @@ int __i915_wait_request(struct drm_i915_gem_request >> *req, >> destroy_timer_on_stack(&timer); >> } >> } >> - if (!irq_test_in_progress) >> - engine->irq_put(engine); >> >> finish_wait(&engine->irq_queue, &wait); > > Hm I don't understand why our custom waiting remains? Shouldn't fence_wait > just be called after the optimistic spin, more or less? > >> >> out: >> trace_i915_gem_request_wait_end(req); >> >> + if ((ret == 0) && (req->seqno)) { >> + if (req->engine->irq_seqno_barrier) >> + req->engine->irq_seqno_barrier(req->engine); >> + seqno = engine->get_seqno(engine); >> + if (i915_seqno_passed(seqno, req->seqno) && >> + !i915_gem_request_completed(req)) { >> + /* >> + * Make sure the request is marked as completed before >> + * returning. NB: Need to acquire the spinlock around >> + * the whole call to avoid a race condition with the >> + * interrupt handler is running concurrently and could >> + * cause this invocation to early exit even though the >> + * request has not actually been fully processed yet. >> + */ >> + spin_lock_irq(&req->engine->fence_lock); >> + i915_gem_request_notify(req->engine, true); >> + spin_unlock_irq(&req->engine->fence_lock); >> + } >> + } >> + >> if (timeout) { >> s64 tres = *timeout - (ktime_get_raw_ns() - before); >> >> @@ -1405,6 +1438,11 @@ static void i915_gem_request_retire(struct >> drm_i915_gem_request *request) >> { >> trace_i915_gem_request_retire(request); >> >> + if (request->irq_enabled) { >> + request->engine->irq_put(request->engine); >> + request->irq_enabled = false; > > What protects request->irq_enabled? Here versus enable_signalling bit? It can > be called from the external fence users which would take the fence_lock, but > here it does not. > >> + } > >> + >> /* We know the GPU must have read the request to have >> * sent us the seqno + interrupt, so use the position >> * of tail of the request to update the last known position >> @@ -1418,6 +1456,22 @@ static void i915_gem_request_retire(struct >> drm_i915_gem_request *request) >> list_del_init(&request->list); >> i915_gem_request_remove_from_client(request); >> >> + /* >> + * In case the request is still in the signal pending list, >> + * e.g. due to being cancelled by TDR, preemption, etc. >> + */ >> + if (!list_empty(&request->signal_link)) { > > No locking required here? Considering the locked function is used, I'm assuming this function holds the fence_lock.
If not, something's seriously wrong. >> + /* >> + * The request must be marked as cancelled and the underlying >> + * fence as failed. NB: There is no explicit fence fail API, >> + * there is only a manual poke and signal. >> + */ >> + request->cancelled = true; >> + /* How to propagate to any associated sync_fence??? */ ^This way works, so comment can be removed. There's deliberately no way to cancel a fence, it's the same path but with status member set. If you have a fence for another driver, there's really no good way to handle failure. So you have to treat it as if it succeeded. >> + request->fence.status = -EIO; >> + fence_signal_locked(&request->fence); > > And here? > >> + } >> + >> if (request->previous_context) { >> if (i915.enable_execlists) >> intel_lr_context_unpin(request->previous_context, >> @@ -2670,6 +2724,12 @@ void __i915_add_request(struct drm_i915_gem_request >> *request, >> */ >> request->postfix = intel_ring_get_tail(ringbuf); >> >> + /* >> + * Add the fence to the pending list before emitting the commands to >> + * generate a seqno notification interrupt. >> + */ >> + i915_gem_request_submit(request); >> + >> if (i915.enable_execlists) >> ret = engine->emit_request(request); >> else { >> @@ -2755,25 +2815,154 @@ static void i915_gem_request_free(struct fence >> *req_fence) >> struct drm_i915_gem_request *req; >> >> req = container_of(req_fence, typeof(*req), fence); >> + >> + WARN_ON(req->irq_enabled); > > How useful is this? If it went wrong engine irq reference counting would be > bad. Okay no one would notice, but we could then stick some other warns here, > list !list_empy(req->list) and who knows what, which we don't have, so I am > just wondering if this one brings any value. > >> + >> call_rcu(&req->rcu_head, i915_gem_request_free_rcu); >> } >> >> -static bool i915_gem_request_enable_signaling(struct fence *req_fence) >> +/* >> + * The request is about to be submitted to the hardware so add the fence to >> + * the list of signalable fences. >> + * >> + * NB: This does not necessarily enable interrupts yet. That only occurs on >> + * demand when the request is actually waited on. However, adding it to the >> + * list early ensures that there is no race condition where the interrupt >> + * could pop out prematurely and thus be completely lost. The race is merely >> + * that the interrupt must be manually checked for after being enabled. >> + */ >> +static void i915_gem_request_submit(struct drm_i915_gem_request *req) >> { >> - /* Interrupt driven fences are not implemented yet.*/ >> - WARN(true, "This should not be called!"); >> - return true; >> + /* >> + * Always enable signal processing for the request's fence object >> + * before that request is submitted to the hardware. Thus there is no >> + * race condition whereby the interrupt could pop out before the >> + * request has been added to the signal list. Hence no need to check >> + * for completion, undo the list add and return false. >> + */ >> + i915_gem_request_reference(req); >> + spin_lock_irq(&req->engine->fence_lock); >> + WARN_ON(!list_empty(&req->signal_link)); >> + list_add_tail(&req->signal_link, &req->engine->fence_signal_list); >> + spin_unlock_irq(&req->engine->fence_lock); >> + >> + /* >> + * NB: Interrupts are only enabled on demand. Thus there is still a >> + * race where the request could complete before the interrupt has >> + * been enabled. Thus care must be taken at that point. >> + */ >> + >> + /* Have interrupts already been requested? */ >> + if (req->signal_requested) >> + i915_gem_request_enable_interrupt(req, false); > > I am thinking that the fence lock could be held here until the end of the > function and in such way i915_gem_request_enable_interrupt would not need the > fence_locked parameter any more. > > It would probably also be safer with regards to accesing the > req->signal_requested. I am not sure that enable signalling and this > otherwise can't race and miss the signal_requested getting set? > >> +} >> + >> +/* >> + * The request is being actively waited on, so enable interrupt based >> + * completion signalling. >> + */ >> +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req, >> + bool fence_locked) >> +{ >> + struct drm_i915_private *dev_priv = req->engine->i915; >> + const bool irq_test_in_progress = >> + ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & >> + intel_engine_flag(req->engine); >> + >> + if (req->irq_enabled) >> + return; >> + >> + if (irq_test_in_progress) >> + return; >> + >> + if (!WARN_ON(!req->engine->irq_get(req->engine))) >> + req->irq_enabled = true; > > The double negation confused me a bit. It is probably not ideal since > WARN_ONs go to the out of line section so in a way it is deliberately > penalising the fast and expected path. I think it would be better to put a > WARN on the else path. > >> + >> + /* >> + * Because the interrupt is only enabled on demand, there is a race >> + * where the interrupt can fire before anyone is looking for it. So >> + * do an explicit check for missed interrupts. >> + */ >> + i915_gem_request_notify(req->engine, fence_locked); >> } >> >> -static bool i915_gem_request_is_completed(struct fence *req_fence) >> +static bool i915_gem_request_enable_signaling(struct fence *req_fence) >> { >> struct drm_i915_gem_request *req = container_of(req_fence, >> typeof(*req), fence); >> + >> + /* >> + * No need to actually enable interrupt based processing until the >> + * request has been submitted to the hardware. At which point >> + * 'i915_gem_request_submit()' is called. So only really enable >> + * signalling in there. Just set a flag to say that interrupts are >> + * wanted when the request is eventually submitted. On the other hand >> + * if the request has already been submitted then interrupts do need >> + * to be enabled now. >> + */ >> + >> + req->signal_requested = true; >> + >> + if (!list_empty(&req->signal_link)) > > In what scenarios is the list_empty check needed? Someone can somehow enable > signalling on a fence not yet submitted? >> + i915_gem_request_enable_interrupt(req, true); >> + >> + return true; >> +} >> + >> +/** >> + * i915_gem_request_worker - request work handler callback. >> + * @work: Work structure >> + * Called in response to a seqno interrupt to process the completed >> requests. >> + */ >> +void i915_gem_request_worker(struct work_struct *work) >> +{ >> + struct intel_engine_cs *engine; >> + >> + engine = container_of(work, struct intel_engine_cs, request_work); >> + i915_gem_request_notify(engine, false); >> +} >> + >> +void i915_gem_request_notify(struct intel_engine_cs *engine, bool >> fence_locked) >> +{ >> + struct drm_i915_gem_request *req, *req_next; >> + unsigned long flags; >> u32 seqno; >> >> - seqno = req->engine->get_seqno(req->engine); >> + if (list_empty(&engine->fence_signal_list)) > > Okay this without the lock still makes me nervous. I'd rather not having to > think about why it is safe and can't miss a wakeup. > > Also I would be leaning toward having i915_gem_request_notify and > i915_gem_request_notify__unlocked. With the enable_interrupts simplification > I suggested it think it would look better and be more consistent with the > rest of the driver. > >> + return; >> + >> + if (!fence_locked) >> + spin_lock_irqsave(&engine->fence_lock, flags); > > Not called from hard irq context so can be just spin_lock_irq. > > But if you agree to go with the tasklet it would then be spin_lock_bh. fence is always spin_lock_irq, if this requires _bh then it can't go into the tasklet. >> >> - return i915_seqno_passed(seqno, req->seqno); >> + if (engine->irq_seqno_barrier) >> + engine->irq_seqno_barrier(engine); >> + seqno = engine->get_seqno(engine); >> + >> + list_for_each_entry_safe(req, req_next, &engine->fence_signal_list, >> signal_link) { >> + if (!req->cancelled) { >> + if (!i915_seqno_passed(seqno, req->seqno)) >> + break; > > Merge to one if statement? > >> + } >> + >> + /* >> + * Start by removing the fence from the signal list otherwise >> + * the retire code can run concurrently and get confused. >> + */ >> + list_del_init(&req->signal_link); >> + >> + if (!req->cancelled) >> + fence_signal_locked(&req->fence); > > I forgot how signalling errors to userspace works? Does that still work for > cancelled fences in this series? > >> + >> + if (req->irq_enabled) { >> + req->engine->irq_put(req->engine); >> + req->irq_enabled = false; >> + } >> + >> + i915_gem_request_unreference(req); >> + } >> + >> + if (!fence_locked) >> + spin_unlock_irqrestore(&engine->fence_lock, flags); >> } >> >> static const char *i915_gem_request_get_driver_name(struct fence >> *req_fence) >> @@ -2816,7 +3005,6 @@ static void i915_gem_request_fence_value_str(struct >> fence *req_fence, >> >> static const struct fence_ops i915_gem_request_fops = { >> .enable_signaling = i915_gem_request_enable_signaling, >> - .signaled = i915_gem_request_is_completed, >> .wait = fence_default_wait, >> .release = i915_gem_request_free, >> .get_driver_name = i915_gem_request_get_driver_name, >> @@ -2902,6 +3090,7 @@ __i915_gem_request_alloc(struct intel_engine_cs >> *engine, >> req->ctx = ctx; >> i915_gem_context_reference(req->ctx); >> >> + INIT_LIST_HEAD(&req->signal_link); >> fence_init(&req->fence, &i915_gem_request_fops, &engine->fence_lock, >> ctx->engine[engine->id].fence_timeline.fence_context, >> >> i915_fence_timeline_get_next_seqno(&ctx->engine[engine->id].fence_timeline)); >> @@ -3036,6 +3225,13 @@ static void i915_gem_reset_engine_cleanup(struct >> drm_i915_private *dev_priv, >> i915_gem_request_retire(request); >> } >> >> + /* >> + * Tidy up anything left over. This includes a call to >> + * i915_gem_request_notify() which will make sure that any requests >> + * that were on the signal pending list get also cleaned up. >> + */ >> + i915_gem_retire_requests_ring(engine); > > Hmm.. but this function has just walked the same lists this will, and done > the same processing. Why call this from here? It looks bad to me, the two are > different special cases of the similar thing so I can't see that calling this > from here makes sense. > >> + >> /* Having flushed all requests from all queues, we know that all >> * ringbuffers must now be empty. However, since we do not reclaim >> * all space when retiring the request (to prevent HEADs colliding >> @@ -3082,6 +3278,13 @@ i915_gem_retire_requests_ring(struct intel_engine_cs >> *engine) >> { >> WARN_ON(i915_verify_lists(engine->dev)); >> >> + /* >> + * If no-one has waited on a request recently then interrupts will >> + * not have been enabled and thus no requests will ever be marked as >> + * completed. So do an interrupt check now. >> + */ >> + i915_gem_request_notify(engine, false); > > Would it work to signal the fence from the existing loop a bit above in this > function which already walks the request list in search for completed ones? > Or maybe even in i915_gem_request_retire? > > I am thinking about doing less list walking and better integration with the > core GEM. Downside would be more traffic on the fence_lock, hmm.. not sure > then. It just looks a bit bolted on like this. > > I don't see it being a noticeable cost so perhaps it can stay like this for > now. > >> + >> /* Retire requests first as we use it above for the early return. >> * If we retire requests last, we may use a later seqno and so clear >> * the requests lists without clearing the active list, leading to >> @@ -5102,6 +5305,7 @@ init_engine_lists(struct intel_engine_cs *engine) >> { >> INIT_LIST_HEAD(&engine->active_list); >> INIT_LIST_HEAD(&engine->request_list); >> + INIT_LIST_HEAD(&engine->fence_signal_list); >> } >> >> void >> diff --git a/drivers/gpu/drm/i915/i915_irq.c >> b/drivers/gpu/drm/i915/i915_irq.c >> index f780421..a87a3c5 100644 >> --- a/drivers/gpu/drm/i915/i915_irq.c >> +++ b/drivers/gpu/drm/i915/i915_irq.c >> @@ -994,6 +994,8 @@ static void notify_ring(struct intel_engine_cs *engine) >> trace_i915_gem_request_notify(engine); >> engine->user_interrupts++; >> >> + queue_work(engine->i915->req_wq, &engine->request_work); >> + >> wake_up_all(&engine->irq_queue); > > Yes that is the weird part, why the engine->irq_queue has to remain with this > patch? > >> } >> >> diff --git a/drivers/gpu/drm/i915/intel_lrc.c >> b/drivers/gpu/drm/i915/intel_lrc.c >> index f126bcb..134759d 100644 >> --- a/drivers/gpu/drm/i915/intel_lrc.c >> +++ b/drivers/gpu/drm/i915/intel_lrc.c >> @@ -1879,6 +1879,8 @@ void intel_logical_ring_cleanup(struct intel_engine_cs >> *engine) >> >> dev_priv = engine->i915; >> >> + cancel_work_sync(&engine->request_work); >> + >> if (engine->buffer) { >> intel_logical_ring_stop(engine); >> WARN_ON((I915_READ_MODE(engine) & MODE_IDLE) == 0); >> @@ -2027,6 +2029,7 @@ logical_ring_setup(struct drm_device *dev, enum >> intel_engine_id id) >> >> INIT_LIST_HEAD(&engine->active_list); >> INIT_LIST_HEAD(&engine->request_list); >> + INIT_LIST_HEAD(&engine->fence_signal_list); >> INIT_LIST_HEAD(&engine->buffers); >> INIT_LIST_HEAD(&engine->execlist_queue); >> spin_lock_init(&engine->execlist_lock); >> @@ -2035,6 +2038,8 @@ logical_ring_setup(struct drm_device *dev, enum >> intel_engine_id id) >> tasklet_init(&engine->irq_tasklet, >> intel_lrc_irq_handler, (unsigned long)engine); >> >> + INIT_WORK(&engine->request_work, i915_gem_request_worker); >> + >> logical_ring_init_platform_invariants(engine); >> logical_ring_default_vfuncs(engine); >> logical_ring_default_irqs(engine, info->irq_shift); >> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c >> b/drivers/gpu/drm/i915/intel_ringbuffer.c >> index fbd3f12..1641096 100644 >> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c >> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c >> @@ -2254,6 +2254,7 @@ static int intel_init_ring_buffer(struct drm_device >> *dev, >> INIT_LIST_HEAD(&engine->request_list); >> INIT_LIST_HEAD(&engine->execlist_queue); >> INIT_LIST_HEAD(&engine->buffers); >> + INIT_LIST_HEAD(&engine->fence_signal_list); >> spin_lock_init(&engine->fence_lock); >> i915_gem_batch_pool_init(dev, &engine->batch_pool); >> memset(engine->semaphore.sync_seqno, 0, >> @@ -2261,6 +2262,8 @@ static int intel_init_ring_buffer(struct drm_device >> *dev, >> >> init_waitqueue_head(&engine->irq_queue); >> >> + INIT_WORK(&engine->request_work, i915_gem_request_worker); >> + >> ringbuf = intel_engine_create_ringbuffer(engine, 32 * PAGE_SIZE); >> if (IS_ERR(ringbuf)) { >> ret = PTR_ERR(ringbuf); >> @@ -2307,6 +2310,8 @@ void intel_cleanup_engine(struct intel_engine_cs >> *engine) >> >> dev_priv = engine->i915; >> >> + cancel_work_sync(&engine->request_work); >> + >> if (engine->buffer) { >> intel_stop_engine(engine); >> WARN_ON(!IS_GEN2(dev_priv) && (I915_READ_MODE(engine) & MODE_IDLE) >> == 0); >> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h >> b/drivers/gpu/drm/i915/intel_ringbuffer.h >> index 3f39daf..51779b4 100644 >> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h >> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h >> @@ -347,6 +347,9 @@ struct intel_engine_cs { >> u32 (*get_cmd_length_mask)(u32 cmd_header); >> >> spinlock_t fence_lock; >> + struct list_head fence_signal_list; >> + >> + struct work_struct request_work; >> }; >> >> static inline bool _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx