On 20.03.2025 21:10, Frediano Ziglio wrote: > On Thu, Mar 20, 2025 at 3:15 PM Jan Beulich <jbeul...@suse.com> wrote: >> >> On 20.03.2025 15:33, Frediano Ziglio wrote: >>> On Thu, Mar 6, 2025 at 3:02 PM Frediano Ziglio >>> <frediano.zig...@cloud.com> wrote: >>>> >>>> On Thu, Mar 6, 2025 at 2:26 PM Jan Beulich <jbeul...@suse.com> wrote: >>>>> >>>>> On 26.02.2025 19:54, Marek Marczykowski-Górecki wrote: >>>>>> On Mon, Feb 24, 2025 at 02:31:00PM +0000, Frediano Ziglio wrote: >>>>>>> On Mon, Feb 24, 2025 at 1:16 PM Marek Marczykowski-Górecki >>>>>>> <marma...@invisiblethingslab.com> wrote: >>>>>>>> >>>>>>>> On Mon, Feb 24, 2025 at 12:57:13PM +0000, Frediano Ziglio wrote: >>>>>>>>> On Fri, Feb 21, 2025 at 8:20 PM Marek Marczykowski-Górecki >>>>>>>>> <marma...@invisiblethingslab.com> wrote: >>>>>>>>>> >>>>>>>>>> On Mon, Feb 17, 2025 at 04:26:59PM +0000, Frediano Ziglio wrote: >>>>>>>>>>> Although code is compiled with -fpic option data is not position >>>>>>>>>>> independent. This causes data pointer to become invalid if >>>>>>>>>>> code is not relocated properly which is what happens for >>>>>>>>>>> efi_multiboot2 which is called by multiboot entry code. >>>>>>>>>>> >>>>>>>>>>> Code tested adding >>>>>>>>>>> PrintErrMesg(L"Test message", EFI_BUFFER_TOO_SMALL); >>>>>>>>>>> in efi_multiboot2 before calling efi_arch_edd (this function >>>>>>>>>>> can potentially call PrintErrMesg). >>>>>>>>>>> >>>>>>>>>>> Before the patch (XenServer installation on Qemu, xen replaced >>>>>>>>>>> with vanilla xen.gz): >>>>>>>>>>> Booting `XenServer (Serial)'Booting `XenServer (Serial)' >>>>>>>>>>> Test message: !!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU >>>>>>>>>>> Apic ID - 00000000 !!!! >>>>>>>>>>> ExceptionData - 0000000000000000 I:0 R:0 U:0 W:0 P:0 PK:0 SS:0 >>>>>>>>>>> SGX:0 >>>>>>>>>>> RIP - 000000007EE21E9A, CS - 0000000000000038, RFLAGS - >>>>>>>>>>> 0000000000210246 >>>>>>>>>>> RAX - 000000007FF0C1B5, RCX - 0000000000000050, RDX - >>>>>>>>>>> 0000000000000010 >>>>>>>>>>> RBX - 0000000000000000, RSP - 000000007FF0C180, RBP - >>>>>>>>>>> 000000007FF0C210 >>>>>>>>>>> RSI - FFFF82D040467CE8, RDI - 0000000000000000 >>>>>>>>>>> R8 - 000000007FF0C1C8, R9 - 000000007FF0C1C0, R10 - >>>>>>>>>>> 0000000000000000 >>>>>>>>>>> R11 - 0000000000001020, R12 - FFFF82D040467CE8, R13 - >>>>>>>>>>> 000000007FF0C1B8 >>>>>>>>>>> R14 - 000000007EA33328, R15 - 000000007EA332D8 >>>>>>>>>>> DS - 0000000000000030, ES - 0000000000000030, FS - >>>>>>>>>>> 0000000000000030 >>>>>>>>>>> GS - 0000000000000030, SS - 0000000000000030 >>>>>>>>>>> CR0 - 0000000080010033, CR2 - FFFF82D040467CE8, CR3 - >>>>>>>>>>> 000000007FC01000 >>>>>>>>>>> CR4 - 0000000000000668, CR8 - 0000000000000000 >>>>>>>>>>> DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - >>>>>>>>>>> 0000000000000000 >>>>>>>>>>> DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - >>>>>>>>>>> 0000000000000400 >>>>>>>>>>> GDTR - 000000007F9DB000 0000000000000047, LDTR - 0000000000000000 >>>>>>>>>>> IDTR - 000000007F48E018 0000000000000FFF, TR - 0000000000000000 >>>>>>>>>>> FXSAVE_STATE - 000000007FF0BDE0 >>>>>>>>>>> !!!! Find image based on IP(0x7EE21E9A) (No PDB) >>>>>>>>>>> (ImageBase=000000007EE20000, EntryPoint=000000007EE23935) !!!! >>>>>>>>>>> >>>>>>>>>>> After the patch: >>>>>>>>>>> Booting `XenServer (Serial)'Booting `XenServer (Serial)' >>>>>>>>>>> Test message: Buffer too small >>>>>>>>>>> BdsDxe: loading Boot0000 "UiApp" from >>>>>>>>>>> Fv(7CB8BDC9-F8EB-4F34-AAEA-3EE4AF6516A1)/FvFile(462CAA21-7614-4503-836E-8AB6F4662331) >>>>>>>>>>> BdsDxe: starting Boot0000 "UiApp" from >>>>>>>>>>> Fv(7CB8BDC9-F8EB-4F34-AAEA-3EE4AF6516A1)/FvFile(462CAA21-7614-4503-836E-8AB6F4662331) >>>>>>>>>>> >>>>>>>>>>> This partially rollback commit 00d5d5ce23e6. >>>>>>>>>>> >>>>>>>>>>> Fixes: 9180f5365524 ("x86: add multiboot2 protocol support for EFI >>>>>>>>>>> platforms") >>>>>>>>>>> Signed-off-by: Frediano Ziglio <frediano.zig...@cloud.com> >>>>>>>>>> >>>>>>>>>> I tried testing this patch, but it seems I cannot reproduce the >>>>>>>>>> original >>>>>>>>>> failure... >>>>>>>>>> >>>>>>>>>> I did as the commit message suggests here: >>>>>>>>>> https://gitlab.com/xen-project/people/marmarek/xen/-/commit/ca3d6911c448eb886990f33d4380b5646617a982 >>>>>>>>>> >>>>>>>>>> With blexit() in PrintErrMesg(), it went back to the bootloader, so >>>>>>>>>> I'm >>>>>>>>>> sure this code path was reached. But with blexit() commented out, Xen >>>>>>>>>> started correctly both with and without this patch... The branch I >>>>>>>>>> used >>>>>>>>>> is here: >>>>>>>>>> https://gitlab.com/xen-project/people/marmarek/xen/-/commits/automation-tests?ref_type=heads >>>>>>>>>> >>>>>>>>>> Are there some extra condition to reproduce the issue? Maybe it >>>>>>>>>> depends >>>>>>>>>> on the compiler version? I guess I can try also on QEMU, but based on >>>>>>>>>> the description, I would expect it to crash in any case. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Did you see the correct message in both cases? >>>>>>>>> Did you use Grub or direct EFI? >>>>>>>>> >>>>>>>>> With Grub and without this patch you won't see the message, with grub >>>>>>>>> with the patch you see the correct message. >>>>>>>> >>>>>>>> I did use grub, and I didn't see the message indeed. >>>>>>>> But in the case it was supposed to crash (with added PrintErrMesg(), >>>>>>>> commented out blexit and without your patch) it did _not_ crashed and >>>>>>>> continued to normal boot. Is that #PF non-fatal here? >>>>>>>> >>>>>>> >>>>>>> Hi, >>>>>>> I tried again with my test environment. >>>>>>> Added the PrintErrMesg line before efi_arch_edd call, I got a #PF, in >>>>>>> my case the system hangs. With the fix patch machine is rebooting and >>>>>>> I can see the message in the logs. >>>>>>> I'm trying with Xen starting inside Qemu, EFI firmware, xen.gz >>>>>>> compiled as ELF file. Host system is an Ubuntu 22.04.5 LTS. Gcc is >>>>>>> version 11.4. >>>>>> >>>>>> My test was wrong, commenting out blexit made "mesg" variable unused. >>>>>> After fixing that, I can reproduce it on both QEMU and real hardware: >>>>>> without your patch it crashes and with your patch it works just fine. >>>>>> While there may be more places with similar issue, this patch clearly >>>>>> improves the situation, so: >>>>>> >>>>>> Acked-by: Marek Marczykowski-Górecki <marma...@invisiblethingslab.com> >>>>> >>>>> This had to be reverted, for breaking the build with old Clang. See the >>>>> respective Matrix conversation. >>>> >>>> To sum up the failure is: >>>> >>>> clang: error: unknown argument: '-fno-jump-tables' >>> >>> Now that the minimum clang version supports this option, can this >>> change be applied? >> >> Not sure. I for one would expect that we actively reject building with >> too old tool chains then, which is yet to be carried out. Plus I think >> you'd want to re-submit, with all tags dropped. The change was wrong to >> go in at that earlier point, and hence any such tags weren't quite >> accurate. > > not sure what you intend with "tags" in the above sentence. Git tags ?
Acks and R-b-s. > Not sure we need to carry on using old tool chains if we decide to > bump the minimal versions. I fear I don't understand this remark in this context. In any event, Andrew meanwhile has sent a patch to the effect of what my comment was saying. Jan