On 20.03.2025 15:33, Frediano Ziglio wrote:
> On Thu, Mar 6, 2025 at 3:02 PM Frediano Ziglio
> <frediano.zig...@cloud.com> wrote:
>>
>> On Thu, Mar 6, 2025 at 2:26 PM Jan Beulich <jbeul...@suse.com> wrote:
>>>
>>> On 26.02.2025 19:54, Marek Marczykowski-Górecki wrote:
>>>> On Mon, Feb 24, 2025 at 02:31:00PM +0000, Frediano Ziglio wrote:
>>>>> On Mon, Feb 24, 2025 at 1:16 PM Marek Marczykowski-Górecki
>>>>> <marma...@invisiblethingslab.com> wrote:
>>>>>>
>>>>>> On Mon, Feb 24, 2025 at 12:57:13PM +0000, Frediano Ziglio wrote:
>>>>>>> On Fri, Feb 21, 2025 at 8:20 PM Marek Marczykowski-Górecki
>>>>>>> <marma...@invisiblethingslab.com> wrote:
>>>>>>>>
>>>>>>>> On Mon, Feb 17, 2025 at 04:26:59PM +0000, Frediano Ziglio wrote:
>>>>>>>>> Although code is compiled with -fpic option data is not position
>>>>>>>>> independent. This causes data pointer to become invalid if
>>>>>>>>> code is not relocated properly which is what happens for
>>>>>>>>> efi_multiboot2 which is called by multiboot entry code.
>>>>>>>>>
>>>>>>>>> Code tested adding
>>>>>>>>>    PrintErrMesg(L"Test message", EFI_BUFFER_TOO_SMALL);
>>>>>>>>> in efi_multiboot2 before calling efi_arch_edd (this function
>>>>>>>>> can potentially call PrintErrMesg).
>>>>>>>>>
>>>>>>>>> Before the patch (XenServer installation on Qemu, xen replaced
>>>>>>>>> with vanilla xen.gz):
>>>>>>>>>   Booting `XenServer (Serial)'Booting `XenServer (Serial)'
>>>>>>>>>   Test message: !!!! X64 Exception Type - 0E(#PF - Page-Fault)  CPU 
>>>>>>>>> Apic ID - 00000000 !!!!
>>>>>>>>>   ExceptionData - 0000000000000000  I:0 R:0 U:0 W:0 P:0 PK:0 SS:0 
>>>>>>>>> SGX:0
>>>>>>>>>   RIP  - 000000007EE21E9A, CS  - 0000000000000038, RFLAGS - 
>>>>>>>>> 0000000000210246
>>>>>>>>>   RAX  - 000000007FF0C1B5, RCX - 0000000000000050, RDX - 
>>>>>>>>> 0000000000000010
>>>>>>>>>   RBX  - 0000000000000000, RSP - 000000007FF0C180, RBP - 
>>>>>>>>> 000000007FF0C210
>>>>>>>>>   RSI  - FFFF82D040467CE8, RDI - 0000000000000000
>>>>>>>>>   R8   - 000000007FF0C1C8, R9  - 000000007FF0C1C0, R10 - 
>>>>>>>>> 0000000000000000
>>>>>>>>>   R11  - 0000000000001020, R12 - FFFF82D040467CE8, R13 - 
>>>>>>>>> 000000007FF0C1B8
>>>>>>>>>   R14  - 000000007EA33328, R15 - 000000007EA332D8
>>>>>>>>>   DS   - 0000000000000030, ES  - 0000000000000030, FS  - 
>>>>>>>>> 0000000000000030
>>>>>>>>>   GS   - 0000000000000030, SS  - 0000000000000030
>>>>>>>>>   CR0  - 0000000080010033, CR2 - FFFF82D040467CE8, CR3 - 
>>>>>>>>> 000000007FC01000
>>>>>>>>>   CR4  - 0000000000000668, CR8 - 0000000000000000
>>>>>>>>>   DR0  - 0000000000000000, DR1 - 0000000000000000, DR2 - 
>>>>>>>>> 0000000000000000
>>>>>>>>>   DR3  - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 
>>>>>>>>> 0000000000000400
>>>>>>>>>   GDTR - 000000007F9DB000 0000000000000047, LDTR - 0000000000000000
>>>>>>>>>   IDTR - 000000007F48E018 0000000000000FFF,   TR - 0000000000000000
>>>>>>>>>   FXSAVE_STATE - 000000007FF0BDE0
>>>>>>>>>   !!!! Find image based on IP(0x7EE21E9A) (No PDB)  
>>>>>>>>> (ImageBase=000000007EE20000, EntryPoint=000000007EE23935) !!!!
>>>>>>>>>
>>>>>>>>> After the patch:
>>>>>>>>>   Booting `XenServer (Serial)'Booting `XenServer (Serial)'
>>>>>>>>>   Test message: Buffer too small
>>>>>>>>>   BdsDxe: loading Boot0000 "UiApp" from 
>>>>>>>>> Fv(7CB8BDC9-F8EB-4F34-AAEA-3EE4AF6516A1)/FvFile(462CAA21-7614-4503-836E-8AB6F4662331)
>>>>>>>>>   BdsDxe: starting Boot0000 "UiApp" from 
>>>>>>>>> Fv(7CB8BDC9-F8EB-4F34-AAEA-3EE4AF6516A1)/FvFile(462CAA21-7614-4503-836E-8AB6F4662331)
>>>>>>>>>
>>>>>>>>> This partially rollback commit 00d5d5ce23e6.
>>>>>>>>>
>>>>>>>>> Fixes: 9180f5365524 ("x86: add multiboot2 protocol support for EFI 
>>>>>>>>> platforms")
>>>>>>>>> Signed-off-by: Frediano Ziglio <frediano.zig...@cloud.com>
>>>>>>>>
>>>>>>>> I tried testing this patch, but it seems I cannot reproduce the 
>>>>>>>> original
>>>>>>>> failure...
>>>>>>>>
>>>>>>>> I did as the commit message suggests here:
>>>>>>>> https://gitlab.com/xen-project/people/marmarek/xen/-/commit/ca3d6911c448eb886990f33d4380b5646617a982
>>>>>>>>
>>>>>>>> With blexit() in PrintErrMesg(), it went back to the bootloader, so I'm
>>>>>>>> sure this code path was reached. But with blexit() commented out, Xen
>>>>>>>> started correctly both with and without this patch... The branch I used
>>>>>>>> is here:
>>>>>>>> https://gitlab.com/xen-project/people/marmarek/xen/-/commits/automation-tests?ref_type=heads
>>>>>>>>
>>>>>>>> Are there some extra condition to reproduce the issue? Maybe it depends
>>>>>>>> on the compiler version? I guess I can try also on QEMU, but based on
>>>>>>>> the description, I would expect it to crash in any case.
>>>>>>>>
>>>>>>>
>>>>>>> Did you see the correct message in both cases?
>>>>>>> Did you use Grub or direct EFI?
>>>>>>>
>>>>>>> With Grub and without this patch you won't see the message, with grub
>>>>>>> with the patch you see the correct message.
>>>>>>
>>>>>> I did use grub, and I didn't see the message indeed.
>>>>>> But in the case it was supposed to crash (with added PrintErrMesg(),
>>>>>> commented out blexit and without your patch) it did _not_ crashed and
>>>>>> continued to normal boot. Is that #PF non-fatal here?
>>>>>>
>>>>>
>>>>> Hi,
>>>>>    I tried again with my test environment.
>>>>> Added the PrintErrMesg line before efi_arch_edd call, I got a #PF, in
>>>>> my case the system hangs. With the fix patch machine is rebooting and
>>>>> I can see the message in the logs.
>>>>> I'm trying with Xen starting inside Qemu, EFI firmware, xen.gz
>>>>> compiled as ELF file. Host system is an Ubuntu 22.04.5 LTS. Gcc is
>>>>> version 11.4.
>>>>
>>>> My test was wrong, commenting out blexit made "mesg" variable unused.
>>>> After fixing that, I can reproduce it on both QEMU and real hardware:
>>>> without your patch it crashes and with your patch it works just fine.
>>>> While there may be more places with similar issue, this patch clearly
>>>> improves the situation, so:
>>>>
>>>> Acked-by: Marek Marczykowski-Górecki <marma...@invisiblethingslab.com>
>>>
>>> This had to be reverted, for breaking the build with old Clang. See the
>>> respective Matrix conversation.
>>
>> To sum up the failure is:
>>
>>     clang: error: unknown argument: '-fno-jump-tables'
> 
> Now that the minimum clang version supports this option, can this
> change be applied?

Not sure. I for one would expect that we actively reject building with
too old tool chains then, which is yet to be carried out. Plus I think
you'd want to re-submit, with all tags dropped. The change was wrong to
go in at that earlier point, and hence any such tags weren't quite
accurate.

Jan

Reply via email to