> On 23 Mar 2021, at 19:26, Julien Grall <jul...@xen.org> wrote:
>
>
>
> On 23/03/2021 17:06, Luca Fancellu wrote:
>> Hi all,
>
> Hi,
>
> Please avoid top posting when answering to a comment. This makes more
> difficult to follow.
>
>> I have an update, changing the lock introduced by the serie from spinlock_t
>> to raw_spinlock_t, changing the lock/unlock function to use the raw_*
>> version and keeping the BUG_ON(…) (now we can because raw_* implementation
>> disable interrupts on preempt_rt) the kernel is booting correctly.
>> So seems that the BUG_ON(…) is needed and the unmask function should run
>> with interrupt disabled, anyone knows why this change worked?
>
> Do you mean why no-one spotted the issue before? If so, AFAIK, on vanilla
> Linux, spin_lock is still just a wrapper to raw_spinlock. IOW there is no
> option to replace it with a RT spinlock.
>
> So if you don't apply the RT patches, you would not be able to trigger the
> issue.
>
> As to the fix itself, I think using raw_spinlock_t is the correct thing to do
> because the lock is also used in interrupt context (even with RT enabled).
>
> Would you be able to send a patch?
Yes I’ll send a patch soon
>
>>> On 23 Mar 2021, at 15:39, Luca Fancellu <luca.fance...@arm.com> wrote:
>>>
>>> Hi Jason,
>>>
>>> Thanks for your hints, unfortunately seems not an init problem because in
>>> the same init configuration I tried the 5.10.23 (preempt_rt) without the
>>> Juergen patch but with the BUG_ON removed and it boots without problem. So
>>> seems that applying the serie does something (on a preempt_rt kernel) and
>>> we are trying to figure out what.
>>>
>>>
>>>> On 23 Mar 2021, at 12:36, Jason Andryuk <jandr...@gmail.com> wrote:
>>>>
>>>> On Mon, Mar 22, 2021 at 3:09 PM Luca Fancellu <luca.fance...@arm.com>
>>>> wrote:
>>>>>
>>>>> Hi Juergen,
>>>>>
>>>>> Yes you are right it was my mistake, as you said to remove the BUG_ON(…)
>>>>> this serie
>>>>> (https://patchwork.kernel.org/project/xen-devel/cover/20210306161833.4552-1-jgr...@suse.com/)
>>>>> is needed, since I’m using yocto I’m able to build a preempt_rt kernel
>>>>> up to the 5.10.23 and for this reason I’m applying that serie on top of
>>>>> this version, then I’m removing the BUG_ON(…).
>>>>>
>>>>> A thing that was not expected is that now the Dom0 kernel is stuck on
>>>>> “Setting domain 0 name, domid and JSON config…” step and the system seems
>>>>> unresponsive. Seems like a deadlock issue but looking into the serie we
>>>>> can’t spot anything and that serie was also tested by others from the
>>>>> community.
>
> The deadlock is expected. When you enable RT spinlock, the interrupts will
> not disabled even when you call spin_lock_irqsave().
>
> As the lock is also used in interrupt context (e.g. with interrupt masked),
> this will lead to a deadlock because the lock can be held with interrupt
> unmasked.
>
> This is quite a common error as developpers are not yet used to test RT. I
> remember finding a few other instances like that when I worked on RT a couple
> of years ago.
>
> For future reference, I think CONFIG_PROVE_LOCKING=y could help you to detect
> (potential) deadlock.
>
> Cheers,
>
> --
> Julien Grall