On 2026-03-10 12:57:08 [+0100], Maarten Lankhorst wrote:
> From: Sebastian Andrzej Siewior <[email protected]>
> 
> IRQ-Work (FIFO-1) will be preempted by the threaded-interrupt (FIFO-50)
> and the interrupt will poll on signaler_active while the irq-work can't
> make progress.

The threaded-interrupt is the interrupt.

| On PREEMPT_RT the irq_work can be preempted by threaded-interrupt which
| will be poll for completion but the irq_work routine can't make
| progress.

> Solve this by adding a spinlock to prevent starvation and force
> completion.

| Solve this by adding a spinlock_t to prevent starvation by forcing a
| context switch if lock is held based on `signaler_active'. On
| !PREEMPT_RT `signaler_active' can only be non-zero if multiple CPUs are
| involved and spinning on the lock leds to the same result.

> Signed-off-by: Maarten Lankhorst <[email protected]>

If I am the From: then I should have the Signed-off-by, too. Let me do
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

so it can be picked up.

You did suggest the following:

--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -209,7 +209,7 @@ static void signal_irq_work(struct irq_work *work)
                intel_breadcrumbs_disarm_irq(b);
 
        rcu_read_lock();
-       atomic_inc(&b->signaler_active);
+       spin_lock(&b->signaler_active_sync);
        list_for_each_entry_rcu(ce, &b->signalers, signal_link) {
                struct i915_request *rq;
 
@@ -245,7 +245,7 @@ static void signal_irq_work(struct irq_work *work)
                                i915_request_put(rq);
                }
        }
-       atomic_dec(&b->signaler_active);
+       spin_unlock(&b->signaler_active_sync);
        rcu_read_unlock();
 
        llist_for_each_safe(signal, sn, signal) {
@@ -290,6 +290,7 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine)
        init_llist_head(&b->signaled_requests);
 
        spin_lock_init(&b->irq_lock);
+       spin_lock_init(&b->signaler_active_sync);
        init_irq_work(&b->irq_work, signal_irq_work);
 
        b->irq_engine = irq_engine;
@@ -487,8 +488,10 @@ void intel_context_remove_breadcrumbs(struct intel_context 
*ce,
        if (release)
                intel_context_put(ce);
 
-       while (atomic_read(&b->signaler_active))
-               cpu_relax();
+       while (spin_is_locked(&b->signaler_active_sync)) {
+               spin_lock_irqsave(&b->signaler_active_sync, flags);
+               spin_unlock_irqrestore(&b->signaler_active_sync, flags);

And this does not work because spin_is_locked() returns true and spins
forever. This fails because there is a "corner case" where
spin_is_locked() returns but the lock has no lock owner as in locked.
This happens if there is a waiter which did not yet acquire the lock. 

So if you happy with this, we could keep it ;)

+       }
 }
 
 static void print_signals(struct intel_breadcrumbs *b, struct drm_printer *p)
diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h 
b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
index bdf09fd67b6e7..d071f4f1b1910 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
@@ -39,7 +39,7 @@ struct intel_breadcrumbs {
        spinlock_t signalers_lock; /* protects the list of signalers */
        struct list_head signalers;
        struct llist_head signaled_requests;
-       atomic_t signaler_active;
+       spinlock_t signaler_active_sync;
 
        spinlock_t irq_lock; /* protects the interrupt from hardirq context */
        struct irq_work irq_work; /* for use from inside irq_lock */


Sebastian

Reply via email to