On Tue, May 14, 2024 at 9:12 AM Jan Beulich <jbeul...@suse.com> wrote:
>
> On 08.05.2024 09:10, Jens Wiklander wrote:
> > On Fri, May 3, 2024 at 12:32 PM Jan Beulich <jbeul...@suse.com> wrote:
> >> Furthermore, is it guaranteed that the IRQ handler won't interrupt code
> >> fiddling with the domain list? I don't think it is, since
> >> domlist_update_lock isn't acquired in an IRQ-safe manner. Looks like
> >> you need to defer the operation on the domain until softirq or tasklet
> >> context.
> >
> > Thanks for the suggestion, I'm testing it as:
> > static DECLARE_TASKLET(notif_sri_tasklet, notif_sri_action, NULL);
> >
> > static void notif_irq_handler(int irq, void *data)
> > {
> >     tasklet_schedule(&notif_sri_tasklet);
> > }
> >
> > Where notif_sri_action() does what notif_irq_handler() did before
> > (using rcu_lock_domain_by_id()).
> >
> > I have one more question regarding this.
> >
> > Even with the RCU lock if I understand it correctly, it's possible for
> > domain_kill() to tear down the domain. Or as Julien explained it in
> > another thread [3]:
> >> CPU0: ffa_get_domain_by_vm_id() (return the domain as it is alive)
> >>
> >> CPU1: call domain_kill()
> >> CPU1: teardown is called, free d->arch.tee (the pointer is not set to NULL)
> >>
> >> d->arch.tee is now a dangling pointer
> >>
> >> CPU0: access d->arch.tee
> >>
> >> This implies you may need to gain a global lock (I don't have a better
> >> idea so far) to protect the IRQ handler against domains teardown.
> >
> > I'm trying to address that (now in a tasklet) with:
> >     /*
> >      * domain_kill() calls ffa_domain_teardown() which will free
> >      * d->arch.tee, but not set it to NULL. This can happen while holding
> >      * the RCU lock.
> >      *
> >      * domain_lock() will stop rspin_barrier() in domain_kill(), unless
> >      * we're already past rspin_barrier(), but then will d->is_dying be
> >      * non-zero.
> >      */
> >     domain_lock(d);
> >     if ( !d->is_dying )
> >     {
> >         struct ffa_ctx *ctx = d->arch.tee;
> >
> >         ACCESS_ONCE(ctx->notif.secure_pending) = true;
> >     }
> >     domain_unlock(d);
> >
> > It seems to work, but I'm worried I'm missing something or abusing
> > domain_lock().
>
> Well. Yes, this is one way of dealing with the issue. Yet as you suspect it
> feels like an abuse of domain_lock(); that function would better be avoided
> whenever possible. (It had some very unhelpful uses long ago.)
>
> Another approach would generally be to do respective cleanup not from
> underneath domain_kill(), but complete_domain_destroy(). It's not really
> clear to me which of the two approaches is better in this case.

Thanks for the feedback. I tried moving the freeing of d->arch.tee to
complete_domain_destroy() while keeping the rest of the cleanup as is.
It works as expected, I'll use this for the next version of the patch
set.

Cheers,
Jens

Reply via email to