On 2026-03-05 11:42:19 [+0100], Maarten Lankhorst wrote:
> Hey,
Hi,
> Den 2026-02-26 kl. 15:38, skrev Sebastian Andrzej Siewior:
> > On 2026-02-26 15:19:42 [+0100], To Maarten Lankhorst wrote:
> >> On 2026-02-26 13:07:18 [+0100], To Maarten Lankhorst wrote:
> >>> series somewhere I could pull and check. In meantime I would look what
> >>> causes the lockup on i915.
> >>
> >> I think I got it.
> >
> > This the atomic sync as-is, IRQ-Work (FIFO-1) will be preempted by the
> > threaded-interrupt (FIFO-50) and the interrupt will poll on
> > signaler_active while the irq-work can't make progress.
> >
> > This will provide the needed sync:
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> > b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> > index a2b413982ce64..337f6e88faf05 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> > @@ -209,6 +209,7 @@ static void signal_irq_work(struct irq_work *work)
> > intel_breadcrumbs_disarm_irq(b);
> >
> > rcu_read_lock();
> > + spin_lock(&b->signaler_active_sync);
> > atomic_inc(&b->signaler_active);
> > list_for_each_entry_rcu(ce, &b->signalers, signal_link) {
> > struct i915_request *rq;
> > @@ -246,6 +247,7 @@ static void signal_irq_work(struct irq_work *work)
> > }
> > }
> > atomic_dec(&b->signaler_active);
> > + spin_unlock(&b->signaler_active_sync);
> > rcu_read_unlock();
> >
> > llist_for_each_safe(signal, sn, signal) {
> > @@ -290,6 +292,7 @@ intel_breadcrumbs_create(struct intel_engine_cs
> > *irq_engine)
> > init_llist_head(&b->signaled_requests);
> >
> > spin_lock_init(&b->irq_lock);
> > + spin_lock_init(&b->signaler_active_sync);
> > init_irq_work(&b->irq_work, signal_irq_work);
> >
> > b->irq_engine = irq_engine;
> > @@ -487,8 +490,11 @@ void intel_context_remove_breadcrumbs(struct
> > intel_context *ce,
> > if (release)
> > intel_context_put(ce);
> >
> > - while (atomic_read(&b->signaler_active))
> > + while (atomic_read(&b->signaler_active)) {
> > + spin_lock(&b->signaler_active_sync);
> > + spin_unlock(&b->signaler_active_sync);
> > cpu_relax();
> > + }
> > }
> >
> > static void print_signals(struct intel_breadcrumbs *b, struct drm_printer
> > *p)
> > diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
> > b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
> > index bdf09fd67b6e7..28dae32628aab 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
> > @@ -40,6 +40,7 @@ struct intel_breadcrumbs {
> > struct list_head signalers;
> > struct llist_head signaled_requests;
> > atomic_t signaler_active;
> > + spinlock_t signaler_active_sync;
> >
> > spinlock_t irq_lock; /* protects the interrupt from hardirq context */
> > struct irq_work irq_work; /* for use from inside irq_lock */
> >
>
> Thinking some more, replacing signaler_active with signaler_active_sync might
> be the best fix.
> I'm not sure there's much use for parallel completion of the same breadcrumb,
> and using completion
> might be too heavy handed.
We have something similar in timer, tasklet and other "similar" code
where on RT you have preemption and therefore the possibility of another
user on the same CPU while on !RT it is only possible on a remote CPU.
Using the spinlock_t for synchronisation would restrict to one-on-one.
The closest API that comes to mind would be a sequence lock. One writer,
multiple reader. So that would be an option that you might like ;) If
the pure spinlock_t is off the table.
> Kind regards,
> ~Maarten Lankhorst
Sebastian