Hi,

> On 4 Jun 2024, at 12:24, Julien Grall <jul...@xen.org> wrote:
> 
> 
> 
> On 03/06/2024 10:50, Jens Wiklander wrote:
>> Hi Julien,
> 
> Hi Jens,
> 
> 
>> On Mon, Jun 3, 2024 at 11:12 AM Julien Grall <jul...@xen.org> wrote:
>>> 
>>> Hi Jens,
>>> 
>>> On 03/06/2024 10:01, Jens Wiklander wrote:
>>>> On Fri, May 31, 2024 at 4:28 PM Bertrand Marquis
>>>> <bertrand.marq...@arm.com> wrote:
>>>>> 
>>>>> Hi Jens,
>>>>> 
>>>>>> On 29 May 2024, at 09:25, Jens Wiklander <jens.wiklan...@linaro.org> 
>>>>>> wrote:
>>>>>> 
>>>>>> Add support for FF-A notifications, currently limited to an SP (Secure
>>>>>> Partition) sending an asynchronous notification to a guest.
>>>>>> 
>>>>>> Guests and Xen itself are made aware of pending notifications with an
>>>>>> interrupt. The interrupt handler triggers a tasklet to retrieve the
>>>>>> notifications using the FF-A ABI and deliver them to their destinations.
>>>>>> 
>>>>>> Update ffa_partinfo_domain_init() to return error code like
>>>>>> ffa_notif_domain_init().
>>>>>> 
>>>>>> Signed-off-by: Jens Wiklander <jens.wiklan...@linaro.org>
>>>>>> ---
>>>>>> v4->v5:
>>>>>> - Move the freeing of d->arch.tee to the new TEE mediator free_domain_ctx
>>>>>>   callback to make the context accessible during rcu_lock_domain_by_id()
>>>>>>   from a tasklet
>>>>>> - Initialize interrupt handlers for secondary CPUs from the new TEE 
>>>>>> mediator
>>>>>>   init_interrupt() callback
>>>>>> - Restore the ffa_probe() from v3, that is, remove the
>>>>>>   presmp_initcall(ffa_init) approach and use ffa_probe() as usual now 
>>>>>> that we
>>>>>>   have the init_interrupt callback.
>>>>>> - A tasklet is added to handle the Schedule Receiver interrupt. The 
>>>>>> tasklet
>>>>>>   finds each relevant domain with rcu_lock_domain_by_id() which now is 
>>>>>> enough
>>>>>>   to guarantee that the FF-A context can be accessed.
>>>>>> - The notification interrupt handler only schedules the notification
>>>>>>   tasklet mentioned above
>>>>>> 
>>>>>> v3->v4:
>>>>>> - Add another note on FF-A limitations
>>>>>> - Clear secure_pending in ffa_handle_notification_get() if both SP and 
>>>>>> SPM
>>>>>>   bitmaps are retrieved
>>>>>> - ASSERT that ffa_rcu_lock_domain_by_vm_id() isn't passed the vm_id FF-A
>>>>>>   uses for Xen itself
>>>>>> - Replace the get_domain_by_id() call done via ffa_get_domain_by_vm_id() 
>>>>>> in
>>>>>>   notif_irq_handler() with a call to rcu_lock_live_remote_domain_by_id() 
>>>>>> via
>>>>>>   ffa_rcu_lock_domain_by_vm_id()
>>>>>> - Remove spinlock in struct ffa_ctx_notif and use atomic functions as 
>>>>>> needed
>>>>>>   to access and update the secure_pending field
>>>>>> - In notif_irq_handler(), look for the first online CPU instead of 
>>>>>> assuming
>>>>>>   that the first CPU is online
>>>>>> - Initialize FF-A via presmp_initcall() before the other CPUs are online,
>>>>>>   use register_cpu_notifier() to install the interrupt handler
>>>>>>   notif_irq_handler()
>>>>>> - Update commit message to reflect recent updates
>>>>>> 
>>>>>> v2->v3:
>>>>>> - Add a GUEST_ prefix and move FFA_NOTIF_PEND_INTR_ID and
>>>>>>   FFA_SCHEDULE_RECV_INTR_ID to public/arch-arm.h
>>>>>> - Register the Xen SRI handler on each CPU using on_selected_cpus() and
>>>>>>   setup_irq()
>>>>>> - Check that the SGI ID retrieved with FFA_FEATURE_SCHEDULE_RECV_INTR
>>>>>>   doesn't conflict with static SGI handlers
>>>>>> 
>>>>>> v1->v2:
>>>>>> - Addressing review comments
>>>>>> - Change ffa_handle_notification_{bind,unbind,set}() to take struct
>>>>>>   cpu_user_regs *regs as argument.
>>>>>> - Update ffa_partinfo_domain_init() and ffa_notif_domain_init() to return
>>>>>>   an error code.
>>>>>> - Fixing a bug in handle_features() for FFA_FEATURE_SCHEDULE_RECV_INTR.
>>>>>> ---
>>>>>> xen/arch/arm/tee/Makefile       |   1 +
>>>>>> xen/arch/arm/tee/ffa.c          |  72 +++++-
>>>>>> xen/arch/arm/tee/ffa_notif.c    | 409 ++++++++++++++++++++++++++++++++
>>>>>> xen/arch/arm/tee/ffa_partinfo.c |   9 +-
>>>>>> xen/arch/arm/tee/ffa_private.h  |  56 ++++-
>>>>>> xen/arch/arm/tee/tee.c          |   2 +-
>>>>>> xen/include/public/arch-arm.h   |  14 ++
>>>>>> 7 files changed, 548 insertions(+), 15 deletions(-)
>>>>>> create mode 100644 xen/arch/arm/tee/ffa_notif.c
>>>>>> 
>>>> [...]
>>>>>> 
>>>>>> @@ -517,8 +567,10 @@ err_rxtx_destroy:
>>>>>> static const struct tee_mediator_ops ffa_ops =
>>>>>> {
>>>>>>      .probe = ffa_probe,
>>>>>> +    .init_interrupt = ffa_notif_init_interrupt,
>>>>> 
>>>>> With the previous change on init secondary i would suggest to
>>>>> have a ffa_init_secondary here actually calling the 
>>>>> ffa_notif_init_interrupt.
>>>> 
>>>> Yes, that makes sense. I'll update.
>>>> 
>>>>> 
>>>>>>      .domain_init = ffa_domain_init,
>>>>>>      .domain_teardown = ffa_domain_teardown,
>>>>>> +    .free_domain_ctx = ffa_free_domain_ctx,
>>>>>>      .relinquish_resources = ffa_relinquish_resources,
>>>>>>      .handle_call = ffa_handle_call,
>>>>>> };
>>>>>> diff --git a/xen/arch/arm/tee/ffa_notif.c b/xen/arch/arm/tee/ffa_notif.c
>>>>>> new file mode 100644
>>>>>> index 000000000000..e8e8b62590b3
>>>>>> --- /dev/null
>>>>>> +++ b/xen/arch/arm/tee/ffa_notif.c
>>>> [...]
>>>>>> +static void notif_vm_pend_intr(uint16_t vm_id)
>>>>>> +{
>>>>>> +    struct ffa_ctx *ctx;
>>>>>> +    struct domain *d;
>>>>>> +    struct vcpu *v;
>>>>>> +
>>>>>> +    /*
>>>>>> +     * vm_id == 0 means a notifications pending for Xen itself, but
>>>>>> +     * we don't support that yet.
>>>>>> +     */
>>>>>> +    if ( !vm_id )
>>>>>> +        return;
>>>>>> +
>>>>>> +    d = ffa_rcu_lock_domain_by_vm_id(vm_id);
>>>>>> +    if ( !d )
>>>>>> +        return;
>>>>>> +
>>>>>> +    ctx = d->arch.tee;
>>>>>> +    if ( !ctx )
>>>>>> +        goto out_unlock;
>>>>> 
>>>>> In both previous cases you are silently ignoring an interrupt
>>>>> due to an internal error.
>>>>> Is this something that we should trace ? maybe just debug ?
>>>>> 
>>>>> Could you add a comment to explain why this could happen
>>>>> (when possible) or not and the possible side effects ?
>>>>> 
>>>>> The second one would be a notification for a domain without
>>>>> FF-A enabled which is ok but i am a bit more wondering on
>>>>> the first one.
>>>> 
>>>> The SPMC must be out of sync in both cases. I've been looking for a
>>>> window where that can happen, but I can't find any. SPMC is called
>>>> with FFA_NOTIFICATION_BITMAP_DESTROY during domain teardown so the
>>>> SPMC shouldn't try to deliver any notifications after that.
>>> 
>>> I don't think I agree with the conclusion. I believe, this can also
>>> happen in normal operation.
>>> 
>>> For example, the SPMC could have trigger the interrupt before
>>> FFA_NOTIFICATION_BITMAP_DESTROY but Xen didn't handle the interrupt (or
>>> run the tasklet) until later.
>> You're right, there is a window. Delayed handling is OK since
>> FFA_NOTIFICATION_INFO_GET_64 is invoked from the tasklet, but there is
>> a window if the tasklet is suspended or another core destroys the
>> domain before the tasklet has called ffa_rcu_lock_domain_by_vm_id().
>> So far it's harmless and I guess we can afford a print.
> 
> I think it would confuse more the user than anything else because this is an 
> expected race. If we wanted to print a message, then I would argue it should 
> be in the case where...
> 
>>> 
>>> This could be at the time where the domain has been fully destroyed or
>>> even when...
>>> 
>>>> In the second case, the domain ID might have been reused for a domain
>>>> without FF-A enabled, but the SPMC should have known that already.
>>> 
>>> ... a new domain has been created. Although, the latter is rather unlikely.
>>> 
>>> So what if the new domain has FFA enabled? Is there any potential
>>> security issue?
>> In this case, we'll inject an NPI in the guest, but when it invokes
>> FFA_NOTIFICATION_GET it will get accurate information from the SPMC.
>> The worst case is a spurious NPI. This shouldn't be a security issue.
> 
> ... we inject the interrupt to the "wrong" domain. But I also understand that 
> it would be difficult for Xen to detect it.
> 
> So I would say no print should be needed. Bertrand, what do you think?

Yes i agree that it could confuse the user.
I would just put comments to explain the situations where it could occur in the 
code.

Cheers
Bertrand

> 
> Cheers,
> 
> -- 
> Julien Grall


Reply via email to