On 2022-Mar-7, at 08:45, Mark Johnston <ma...@freebsd.org> wrote:
> On Mon, Mar 07, 2022 at 04:25:22PM +0000, Andrew Turner wrote:
>>
>>> On 7 Mar 2022, at 15:13, Mark Johnston <ma...@freebsd.org> wrote:
>>> ...
>>> A (the?) problem is that the compiler is treating "pc" as an alias
>>> for x18, but the rmlock code assumes that the pcpu pointer is loaded
>>> once, as it dereferences "pc" outside of the critical section. On
>>> arm64, if a context switch occurs between the store at _rm_rlock+144 and
>>> the load at +152, and the thread is migrated to another CPU, then we'll
>>> end up using the wrong CPU ID in the rm->rm_writecpus test.
>>>
>>> I suspect the problem is unique to arm64 as its get_pcpu()
>>> implementation is different from the others in that it doesn't use
>>> volatile-qualified inline assembly. This has been the case since
>>> https://cgit.freebsd.org/src/commit/?id=63c858a04d56529eddbddf85ad04fc8e99e73762
>>>
>>> <https://cgit.freebsd.org/src/commit/?id=63c858a04d56529eddbddf85ad04fc8e99e73762>
>>> .
>>>
>>> I haven't been able to reproduce any crashes running poudriere in an
>>> arm64 AWS instance, though. Could you please try the patch below and
>>> confirm whether it fixes your panics? I verified that the apparent
>>> problem described above is gone with the patch.
>>
>> Alternatively (or additionally) we could do something like the following.
>> There are only a few MI users of get_pcpu with the main place being in rm
>> locks.
>>
>> diff --git a/sys/arm64/include/pcpu.h b/sys/arm64/include/pcpu.h
>> index 09f6361c651c..59b890e5c2ea 100644
>> --- a/sys/arm64/include/pcpu.h
>> +++ b/sys/arm64/include/pcpu.h
>> @@ -58,7 +58,14 @@ struct pcpu;
>>
>> register struct pcpu *pcpup __asm ("x18");
>>
>> -#define get_pcpu() pcpup
>> +static inline struct pcpu *
>> +get_pcpu(void)
>> +{
>> + struct pcpu *pcpu;
>> +
>> + __asm __volatile("mov %0, x18" : "=&r"(pcpu));
>> + return (pcpu);
>> +}
>>
>> static inline struct thread *
>> get_curthread(void)
>
> Indeed, I think this is probably the best solution.
Is this just partially reverting:
https://cgit.freebsd.org/src/commit/?id=63c858a04d56
If so, there might need to be comments about why the updated
code is as it will be.
Looks like stable/13 picked up sensitivity to the get_pcpu
details in rmlock in:
https://cgit.freebsd.org/src/commit/?h=stable/13&id=543157870da5
(a 2022-03-04 commit) and stable/13 also has the get_pcpu
misdefinition in:
https://cgit.freebsd.org/src/commit/sys/arm64/include/pcpu.h?h=stable/13&id=63c858a04d56
. So an MFC would be appropriate in order for aarch64
to be reliable for any variations in get_pcpu in stable/13
(and for 13.1 to be so as well).
===
Mark Millard
marklmi at yahoo.com