On Thu Apr 10, 2025 at 7:20 PM BST, Alejandro Vallejo wrote:
> On Thu Apr 10, 2025 at 10:17 AM BST, Andrew Cooper wrote:
>> On 10/04/2025 1:09 am, Jason Andryuk wrote:
>>> On 2025-04-09 13:01, Andrew Cooper wrote:
>>>> On 09/04/2025 5:36 pm, Andrew Cooper wrote:
>>>>> Various bits of cleanup, and support for arm64 Linux builds.
>>>>>
>>>>> Run using the new Linux 6.6.86 on (most) x86, and ARM64:
>>>>>   
>>>>> https://gitlab.com/xen-project/hardware/xen-staging/-/pipelines/1760667411
>>>>
>>>> Lovely, Linux 6.6.86 is broken for x86 PVH.  It triple faults very
>>>> early on.
>>>>
>>>> Sample log:
>>>> https://gitlab.com/xen-project/hardware/xen-staging/-/jobs/9673797450
>>>>
>>>> I guess we'll have to stay on 6.6.56 for now.  (Only affects the final
>>>> patch.)
>>>
>>> This is an AMD system:
>>>
>>> (XEN) [    2.577549] d0v0 Triple fault - invoking HVM shutdown action 1
>>> (XEN) [    2.577557] RIP:    0008:[<0000000001f851d4>]
>>>
>>> The instruction:
>>> ffffffff81f851d4:       0f 01 c1                vmcall
>>>
>>> vmcall is the Intel instruction, and vmmcall is the AMD one, so CPU
>>> detection is malfunctioning.
>>>
>>> (Early PVH is running identity mapped, so it's offset from
>>> ffffffff80000000)
>>>
>>> There are no debug symbols in the vmlinux I extracted from the bzImage
>>> from gitlab, but I can repro locally with on 6.6.86.  It's unclear to
>>> me why it's failing.
>>>
>>> Trying:
>>> diff --git i/arch/x86/xen/enlighten.c w/arch/x86/xen/enlighten.c
>>> index 0219f1c90202..fb4ad7fe3e34 100644
>>> --- i/arch/x86/xen/enlighten.c
>>> +++ w/arch/x86/xen/enlighten.c
>>> @@ -123,11 +123,10 @@ noinstr void *__xen_hypercall_setfunc(void)
>>>         if (!boot_cpu_has(X86_FEATURE_CPUID))
>>>                 xen_get_vendor();
>>>
>>> -       if ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
>>> -            boot_cpu_data.x86_vendor == X86_VENDOR_HYGON))
>>> -               func = xen_hypercall_amd;
>>> -       else
>>> +       if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
>>>                 func = xen_hypercall_intel;
>>> +       else
>>> +               func = xen_hypercall_amd;
>>>
>>>         static_call_update_early(xen_hypercall, func);
>>>
>>> But it still calls xen_hypercall_intel().  So maybe x86_vendor isn't
>>> getting set and ends up as 0 (X86_VENDOR_INTEL)?
>>>
>>> That's as far as I got here.
>>>
>>> Different but related, on mainline master, I also get a fail in
>>> vmcall. There, I see in the disassembly that
>>> __xen_hypercall_setfunc()'s calls to xen_get_vendor() is gone. 
>>> xen_get_vendor() seems to have been DCE-ed.  There is some new code
>>> that hardcodes features - "x86/cpufeatures: Add {REQUIRED,DISABLED}
>>> feature configs" - which may be responsible.
>>
>> 6.6.74 is broken too.  (That's the revision that the ARM tests want). 
>> So it broke somewhere between .56 and .74 which narrows the bisect a little.
>>
>> https://gitlab.com/xen-project/hardware/xen-staging/-/pipelines/1761323774
>>
>> In Gitlab, both AMD and Intel are failing in roughly the same way.
>>
>> ~Andrew
>
> I've bisected the tags and it was was introduced somewhere between the
> v6.6.66 and the v6.6.67 tags.
>
> The hypercall page was removed very shortly before v6.6.67 was tagged,
> so I have a nagging suspicion...
>
> Cheers,
> Alejandro

The cutoff point is bcf0e2fda80c6("x86/xen: remove hypercall page").

Together with Jason's observation it would seem that Linux doesn't guess
the correct instruction (or not early enough) when running as PVH dom0.
On PV it's just "syscall", but on PVH it's a tad more complicated.

Cheers,
Alejandro

Reply via email to