On 17/03/2025 1:21 pm, Choi, Anderson wrote:
> Jürgen,
>
>> On 17.03.25 06:07, Choi, Anderson wrote:
>>> I'd like to report xen panic when shutting down an ARINC653 domain 
>>> with the following setup. Note that this is only observed when 
>>> CONFIG_DEBUG is enabled.
>>>
>>> [Test environment]
>>> Yocto release : 5.05
>>> Xen release : 4.19 (hash = 026c9fa29716b0ff0f8b7c687908e71ba29cf239)
>>> Target machine : QEMU ARM64
>>> Number of physical CPUs : 4
>>>
>>> [Xen config]
>>> CONFIG_DEBUG = y
>>>
>>> [CPU pool configuration files]
>>> cpupool_arinc0.cfg
>>> - name= "Pool-arinc0"
>>> - sched="arinc653"
>>> - cpus=["2"]
>>>
>>> [Domain configuration file]
>>> dom1.cfg
>>> - vcpus = 1
>>> - pool = "Pool-arinc0"
>>>
>>> $ xl cpupool-cpu-remove Pool-0 2
>>> $ xl cpupool-create -f cpupool_arinc0.cfg $ xl create dom1.cfg $ 
>>> a653_sched -P Pool-arinc0 dom1:100
>>>
>>> ** Wait for DOM1 to complete boot.**
>>>
>>> $ xl shutdown dom1
>>>
>>> [xen log] root@boeing-linux-ref:~# xl shutdown dom1 Shutting down 
>>> domain 1 root@boeing-linux-ref:~# (XEN) Assertion '!in_irq() &&
>>> (local_irq_is_enabled() || num_online_cpus() <= 1)' failed at
>>> common/xmalloc_tlsf.c:714 (XEN) ----[ Xen-4.19.1-pre  arm64  debug=y 
>>> Tainted: I      ]---- (XEN) CPU:    2 (XEN) PC:     00000a000022d2b0
>>> xfree+0x130/0x1a4 (XEN) LR:     00000a000022d2a4 (XEN) SP:    
>>> 00008000fff77b50 (XEN) CPSR:   00000000200002c9 MODE:64-bit EL2h
>>> (Hypervisor, handler) ... (XEN) Xen call trace: (XEN)   
>>> [<00000a000022d2b0>] xfree+0x130/0x1a4 (PC) (XEN)   
>>> [<00000a000022d2a4>] xfree+0x124/0x1a4 (LR) (XEN)   
>>> [<00000a00002321f0>] arinc653.c#a653sched_free_udata+0x50/0xc4 (XEN)   
>>> [<00000a0000241bc0>] core.c#sched_move_domain_cleanup+0x5c/0x80 (XEN)  
>>>  [<00000a0000245328>] sched_move_domain+0x69c/0x70c (XEN)   
>>> [<00000a000022f840>] cpupool.c#cpupool_move_domain_locked+0x38/0x70
>>> (XEN)    [<00000a0000230f20>] cpupool_move_domain+0x34/0x54 (XEN)   
>>> [<00000a0000206c40>] domain_kill+0xc0/0x15c (XEN)   
>>> [<00000a000022e0d4>] do_domctl+0x904/0x12ec (XEN)   
>>> [<00000a0000277a1c>] traps.c#do_trap_hypercall+0x1f4/0x288 (XEN)   
>>> [<00000a0000279018>] do_trap_guest_sync+0x448/0x63c (XEN)   
>>> [<00000a0000262c80>] entry.o#guest_sync_slowpath+0xa8/0xd8 (XEN) 
>>> (XEN)
>>> (XEN) **************************************** (XEN) Panic on CPU 2:
>>> (XEN) Assertion '!in_irq() && (local_irq_is_enabled() ||
>>> num_online_cpus() <= 1)' failed at common/xmalloc_tlsf.c:714 (XEN)
>>> ****************************************
>>>
>>> In commit 19049f8d (sched: fix locking in a653sched_free_vdata()), 
>>> locking
>> was introduced to prevent a race against the list manipulation but 
>> leads to assertion failure when the ARINC 653 domain is shutdown.
>>> I think this can be fixed by calling xfree() after
>>> spin_unlock_irqrestore() as shown below.
>>>
>>> xen/common/sched/arinc653.c | 3 ++-
>>>   1 file changed, 2 insertions(+), 1 deletion(-) diff --git 
>>> a/xen/common/sched/arinc653.c b/xen/common/sched/arinc653.c index 
>>> 7bf288264c..1615f1bc46 100644
>>> --- a/xen/common/sched/arinc653.c
>>> +++ b/xen/common/sched/arinc653.c
>>> @@ -463,10 +463,11 @@ a653sched_free_udata(const struct scheduler 
>>> *ops,
>> void *priv)
>>>       if ( !is_idle_unit(av->unit) )
>>>           list_del(&av->list);
>>> -    xfree(av);
>>>       update_schedule_units(ops);
>>>       
>>>       spin_unlock_irqrestore(&sched_priv->lock, flags);
>>> +
>>> +    xfree(av);
>>>   }
>>> Can I hear your opinion on this?
>> Yes, this seems the right way to fix the issue.
>>
>> Could you please send a proper patch (please have a look at [1] in 
>> case you are unsure how a proper patch should look like)?
>>
>> Juergen
>>
>> [1]
>> http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/process/sending
>> -
>> patches.pandoc
> Thanks for your opinion. Let me read through the link and submit the patch.

Other good references are:

https://lore.kernel.org/xen-devel/20250313093157.30450-1-jgr...@suse.com/
https://lore.kernel.org/xen-devel/d8c08c22-ee70-4c06-8fcd-ad44fc0dc...@suse.com/

One you hopefully recognise, and the other is another bugfix to ARINC
noticed by the Coverity run over the weekend.

~Andrew

Reply via email to