Re: [Xen-devel] [PATCH v3 2/5] x86/boot: Split bootsym() into four types of relocations
On Fri, 2019-08-30 at 17:10 +0200, Jan Beulich wrote: > On 21.08.2019 18:35, David Woodhouse wrote: > > --- a/xen/arch/x86/boot/head.S > > +++ b/xen/arch/x86/boot/head.S > > @@ -699,14 +699,30 @@ trampoline_setup: > > cmp $sym_offs(__trampoline_rel_stop),%edi > > jb 1b > > > > -/* Patch in the trampoline segment. */ > > +mov $sym_offs(__trampoline32_rel_start),%edi > > +1: > > +mov %fs:(%edi),%eax > > +add %edx,%fs:(%edi,%eax) > > +add $4,%edi > > +cmp $sym_offs(__trampoline32_rel_stop),%edi > > +jb 1b > > + > > +mov $sym_offs(__bootsym_rel_start),%edi > > +1: > > +mov %fs:(%edi),%eax > > +add %edx,%fs:(%edi,%eax) > > +add $4,%edi > > +cmp $sym_offs(__bootsym_rel_stop),%edi > > +jb 1b > > With the smaller sets now - are we risking misbehavior if one > of the relocation sets ends up empty? This wasn't reasonable to > expect before, but I think it would be nice to have a build-time > check rather than a hard to debug crash in case this happens. Or just code it differently as a while() instead of a do{}while() so that it actually copes with a zero-length section. > > --- a/xen/arch/x86/boot/trampoline.S > > +++ b/xen/arch/x86/boot/trampoline.S > > @@ -16,21 +16,62 @@ > > * not guaranteed to persist. > > */ > > > > -/* NB. bootsym() is only usable in real mode, or via BOOT_PSEUDORM_DS. */ > > +/* > > + * There are four sets of relocations: > > + * > > + * bootsym(): Boot-time code relocated to low memory and run only once. > > + *Only usable at boot; in real mode or via > > BOOT_PSEUDORM_DS. > > + * bootdatasym(): Boot-time BIOS-discovered data, relocated back up to Xen > > + *image after discovery. > > + * trampsym():Permanent trampoline code relocated into low memory for > > AP > > + *startup and wakeup. > > + * tramp32sym(): 32-bit trampoline code which at boot can be used directly > > + *from the Xen image in memory, but which will need to be > > + *relocated into low (well, into *mapped*) memory in order > > + *to be used for AP startup. > > + */ > > #undef bootsym > > #define bootsym(s) ((s)-trampoline_start) > > > > #define bootsym_rel(sym, off, opnd...) \ > > bootsym(sym),##opnd; \ > > 111:; \ > > -.pushsection .trampoline_rel, "a"; \ > > +.pushsection .bootsym_rel, "a";\ > > .long 111b - (off) - .;\ > > .popsection > > > > #define bootsym_segrel(sym, off) \ > > $0,$bootsym(sym); \ > > 111:; \ > > -.pushsection .trampoline_seg, "a"; \ > > +.pushsection .bootsym_seg, "a";\ > > +.long 111b - (off) - .;\ > > +.popsection > > + > > +#define bootdatasym(s) ((s)-trampoline_start) > > +#define bootdatasym_rel(sym, off, opnd...) \ > > +bootdatasym(sym),##opnd; \ > > +111:; \ > > +.pushsection .bootdatasym_rel, "a";\ > > +.long 111b - (off) - .;\ > > +.popsection > > + > > +#undef trampsym > > Why this and ... > > > +#define trampsym(s) ((s)-trampoline_start) > > + > > +#define trampsym_rel(sym, off, opnd...)\ > > +trampsym(sym),##opnd; \ > > +111:; \ > > +.pushsection .trampsym_rel, "a"; \ > > +.long 111b - (off) - .;\ > > +.popsection > > + > > +#undef tramp32sym > > ... this #undef? You have none ahead of the bootdatasym #define-s, > and (other than for bootsym) there's not conflicting C level one > afaics. > > > +#define tramp32sym(s) ((s)-trampoline_start) > > + > > +#define tramp32sym_rel(sym, off, opnd...) \ > > +tramp32sym(sym),##opnd;\ > > +111:; \ > > +.pushsection .tramp32sym_rel, "a"; \ > > .long 111b - (off) - .;\ > > .popsection > > After your reply to my comment regarding the redundancy here I've > checked
Re: [Xen-devel] [PATCH v3 4/5] x86/boot: Copy 16-bit boot variables back up to Xen image
On Fri, 2019-08-30 at 17:43 +0200, Jan Beulich wrote: > On 21.08.2019 18:35, David Woodhouse wrote: > > From: David Woodhouse > > > > Ditch the bootsym() access from C code for the variables populated by > > 16-bit boot code. As well as being cleaner this also paves the way for > > not having the 16-bit boot code in low memory for no-real-mode or EFI > > loader boots at all. > > > > These variables are put into a separate .data.boot16 section and > > accessed in low memory during the real-mode boot, then copied back to > > their native location in the Xen image when real mode has finished. > > > > Fix the limit in gdt_48 to admit that trampoline_gdt actually includes > > 7 entries, since we do now use the seventh (BOOT_FS) in late code so it > > matters. Andrew has a patch to further tidy up the GDT and initialise > > accessed bits etc., so I won't go overboard with more than the trivial > > size fix for now. > > > > The bootsym() macro remains in C code purely for the variables which > > are written for the later AP startup and wakeup trampoline to use. > > > > Signed-off-by: David Woodhouse > > --- > > xen/arch/x86/boot/edd.S | 2 ++ > > xen/arch/x86/boot/head.S | 16 +++ > > xen/arch/x86/boot/mem.S | 2 ++ > > xen/arch/x86/boot/trampoline.S| 33 --- > > xen/arch/x86/boot/video.S | 30 +++- > > xen/arch/x86/platform_hypercall.c | 18 - > > xen/arch/x86/setup.c | 22 ++--- > > xen/arch/x86/xen.lds.S| 9 - > > xen/include/asm-x86/edd.h | 1 - > > 9 files changed, 94 insertions(+), 39 deletions(-) > > > > diff --git a/xen/arch/x86/boot/edd.S b/xen/arch/x86/boot/edd.S > > index 434bbbd960..138d04c964 100644 > > --- a/xen/arch/x86/boot/edd.S > > +++ b/xen/arch/x86/boot/edd.S > > @@ -163,6 +163,7 @@ edd_done: > > .Ledd_mbr_sig_skip: > > ret > > > > +.pushsection .data.boot16, "aw", @progbits > > GLOBAL(boot_edd_info_nr) > > .byte 0 > > GLOBAL(boot_mbr_signature_nr) > > @@ -171,3 +172,4 @@ GLOBAL(boot_mbr_signature) > > .fill EDD_MBR_SIG_MAX*8,1,0 > > GLOBAL(boot_edd_info) > > .fill EDD_INFO_MAX * (EDDEXTSIZE + EDDPARMSIZE), 1, 0 > > +.popsection > > diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S > > index 4118f73683..6d315020d2 100644 > > --- a/xen/arch/x86/boot/head.S > > +++ b/xen/arch/x86/boot/head.S > > @@ -725,6 +725,17 @@ trampoline_setup: > > cmp $sym_offs(__bootsym_seg_stop),%edi > > jb 1b > > > > +/* Relocations for the boot data section. */ > > +mov sym_fs(trampoline_phys),%edx > > +add $(boot_trampoline_end - boot_trampoline_start),%edx > > +mov $sym_offs(__bootdatasym_rel_start),%edi > > +1: > > +mov %fs:(%edi),%eax > > +add %edx,%fs:(%edi,%eax) > > +add $4,%edi > > +cmp $sym_offs(__bootdatasym_rel_stop),%edi > > +jb 1b > > + > > /* Do not parse command line on EFI platform here. */ > > cmpb$0,sym_fs(efi_platform) > > jnz 1f > > @@ -762,6 +773,11 @@ trampoline_setup: > > mov $((boot_trampoline_end - boot_trampoline_start) / 4),%ecx > > rep movsl %fs:(%esi),%es:(%edi) > > > > +/* Copy boot data template to low memory. */ > > +mov $sym_offs(bootdata_start),%esi > > +mov $((bootdata_end - bootdata_start) / 4),%ecx > > +rep movsl %fs:(%esi),%es:(%edi) > > Afaict neither bootdata_start nor bootdata_end are aligned, and so > the difference isn't necessarily a multiple of 4. In fact the > other (preexisting) movsl looks to have the same issue; I wonder > if we propagate bad EDID data for that reason on certain builds / > in certain versions. Hm, I'm not sure I quite realised the distinction between bootdata_start and __bootdata_start (and likewise _end). Now that things are placed in the .data.boot16 section by .pushsection/.popsection can we rely on the ordering, and that the globals in the .S files are actually at the start and end? I thought we *needed* to use the ones in the linker script, and what I should probably do here is kill bootdata_start/bootdata_end completely and rely only on the ones from the linker script? Either that or I should kill the ones in the linker script com
Re: [Xen-devel] [PATCH v3 4/5] x86/boot: Copy 16-bit boot variables back up to Xen image
On Mon, 2019-09-02 at 09:44 +0200, Jan Beulich wrote: > Right, just one pair should survive. And seeing how things work before > this series I think it indeed should be linker script symbols only. > And then the ALIGN() ahead of the "start" ones should stay, but there's > no need for one on the "end" ones (again as is currently the case). If we don't align the end symbol then we need to go back to rounding up the length with ((boot_trampoline_end - boot_trampoline_start) + 3) / 4 again though, right? smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 4/5] x86/boot: Copy 16-bit boot variables back up to Xen image
On Mon, 2019-09-02 at 15:47 +0200, Jan Beulich wrote: > On 02.09.2019 14:51, David Woodhouse wrote: > > On Mon, 2019-09-02 at 09:44 +0200, Jan Beulich wrote: > > > Right, just one pair should survive. And seeing how things work before > > > this series I think it indeed should be linker script symbols only. > > > And then the ALIGN() ahead of the "start" ones should stay, but there's > > > no need for one on the "end" ones (again as is currently the case). > > > > If we don't align the end symbol then we need to go back to rounding up > > the length with ((boot_trampoline_end - boot_trampoline_start) + 3) / 4 > > again though, right? > > Wait - we've been talking about the *_rel sections / tables here, > haven't we? All entries of these tables ought to be of equal size, > and hence alignment of a table's "end" label automatically matches > the size of the table entries. The specific one we were taking about just then was bootdata_{start,end} which is the data itself to be copied up/down, not the relocations. The _rel sections indeed need no alignment at the end, as you say. smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC PATCH 1/7] x86/wakeup: Stop using %fs for lidt/lgdt
On Wed, 2019-05-01 at 17:09 +0100, Andrew Cooper wrote: > I'm afraid testing says no. S3 works fine without this change, and > resets with it. Thanks for testing. That's obvious in retrospect — although the wakeup code is relocated alongside the trampoline code, it runs in real mode with its own segment, and %ip=. So the idt_48 and gdt_48 really do need to use "wakesym". For which they need to be shifted later in trampline.S so that they're actually within the range of that segment. However, neither staging-4.11 nor master are working here even before the patch, so while I can fix my patch to go back to the existing breakage and not add my own, I haven't managed to test. This is what I get: [root@localhost ~]# echo mem > /sys/power/state (XEN) Preparing system for ACPI S3 state. (XEN) Disabling non-boot CPUs ... (XEN) Broke affinity for irq 14 (XEN) Removing cpu 1 from runqueue 1 (XEN) No cpus left on runqueue, disabling (XEN) Entering ACPI S3 state. QEMU 2.11.1 monitor - type 'help' for more information (qemu) system_wakeup (qemu) (XEN) mce_intel.c:780: MCA Capability: firstbank 1, extended MCE MSR 0, SER (XEN) Finishing wakeup from ACPI S3 state. (XEN) Enabling non-boot CPUs ... (XEN) Adding cpu 1 to runqueue 0 (XEN) Assertion 'c != old_pool && (c != NULL || old_pool != NULL)' failed at schedule.c:1848 (XEN) [ Xen-4.13-unstable x86_64 debug=y Not tainted ] (XEN) CPU:0 (XEN) RIP:e008:[] schedule_cpu_switch+0x25b/0x318 (XEN) RFLAGS: 00010246 CONTEXT: hypervisor (XEN) rax: 82d0805c7060 rbx: 0001 rcx: 0031bbf05100 (XEN) rdx: 83023c509840 rsi: 83023c509840 rdi: 0001 (XEN) rbp: 8300bfa8fce0 rsp: 8300bfa8fc80 r8: 0008f000 (XEN) r9: 82d0805c4ec0 r10: bf9f7600 r11: 000979798d09 (XEN) r12: 83023c509840 r13: 82d080470720 r14: 0004 (XEN) r15: cr0: 8005003b cr4: 26e0 (XEN) cr3: bfa83000 cr2: (XEN) fsb: 7f91b5b6c740 gsb: 880075c0 gss: (XEN) ds: 0018 es: fs: b800 gs: ss: e010 cs: e008 (XEN) Xen code around (schedule_cpu_switch+0x25b/0x318): (XEN) 85 d2 0f 85 f1 fd ff ff <0f> 0b 0f 0b 0f 0b 0f 0b 45 84 c0 75 39 48 8b 46 (XEN) Xen stack trace from rsp=8300bfa8fc80: (XEN) 7cff40570347 82d0803818aa 82cfb300 (XEN)82cfb280 0001 0001 (XEN)83023c509840 82d080596128 0004 (XEN)8300bfa8fd10 82d080203f3e 0286 0001 (XEN)0001 82d080596128 8300bfa8fd40 82d080204108 (XEN)82d0804859e0 82d080485368 82d080485360 82d080485348 (XEN)8300bfa8fd90 82d0802251bf 0001 (XEN)8300bfa8fd80 0001 82d0803e9e81 (XEN)0001 8300bfa8 8300bfa8fdc0 82d0802039d0 (XEN)8300bfa8 82d080485340 0001 0003 (XEN)8300bfa8fdf0 82d080203bc9 0003 (XEN)8300bfa8 26e0 8300bfa8fe40 82d0802dcd03 (XEN)83023c49b000 0282 83023c51ef90 (XEN)83023c49b000 8300bfa8 (XEN)8300bfa8fe60 82d080208355 83023c49b1d8 82d0805c7170 (XEN)8300bfa8fe80 82d08023f7b4 8300bfa8fe80 82d0805c7160 (XEN)8300bfa8feb0 82d08023fadc 82d0805c7170 (XEN)82d08059b880 82d0805b20a0 8300bfa8fef0 82d080272397 (XEN)83023c4fb000 83023c4fb000 83023c49b000 83023c508000 (XEN) 83023c4b8000 8300bfa8fda8 c900025abd60 (XEN) Xen call trace: (XEN)[] schedule_cpu_switch+0x25b/0x318 (XEN)[] cpupool.c#cpupool_assign_cpu_locked+0x31/0x126 (XEN)[] cpupool.c#cpu_callback+0xd5/0x31d (XEN)[] notifier_call_chain+0x64/0x8f (XEN)[] cpu_up+0xca/0xec (XEN)[] enable_nonboot_cpus+0x87/0xd9 (XEN)[] power.c#enter_state_helper+0xd7/0x48f (XEN)[] domain.c#continue_hypercall_tasklet_handler+0x4a/0xab (XEN)[] tasklet.c#do_tasklet_work+0x7a/0xb2 (XEN)[] do_tasklet+0x58/0x8a (XEN)[] domain.c#idle_loop+0x5e/0xb9 (XEN) (XEN) (XEN) (XEN) Panic on CPU 0: (XEN) Assertion 'c != old_pool && (c != NULL || old_pool != NULL)' failed at schedule.c:1848 (XEN) (XEN) (XEN) Reboot in five seconds... smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [RFC PATCH v2 1/7] x86/wakeup: Stop using %fs for lidt/lgdt
From: David Woodhouse The wakeup code is now relocated alongside the trampoline code, so %ds is just fine here. Signed-off-by: David Woodhouse --- xen/arch/x86/boot/wakeup.S | 11 ++- 1 file changed, 2 insertions(+), 9 deletions(-) diff --git a/xen/arch/x86/boot/wakeup.S b/xen/arch/x86/boot/wakeup.S index f9632eef95..8c52819171 100644 --- a/xen/arch/x86/boot/wakeup.S +++ b/xen/arch/x86/boot/wakeup.S @@ -40,11 +40,8 @@ ENTRY(wakeup_start) movw%ax, %fs movw$0x0e00 + 'L', %fs:(0x10) -# boot trampoline is under 1M, and shift its start into -# %fs to reference symbols in that area -mov wakesym(trampoline_seg), %fs -lidt%fs:bootsym(idt_48) -lgdt%fs:bootsym(gdt_48) +lidtbootsym(idt_48) +lgdtbootsym(gdt_48) movw$1, %ax lmsw%ax # Turn on CR0.PE @@ -102,10 +99,6 @@ GLOBAL(video_mode) .long 0 GLOBAL(video_flags) .long 0 -trampoline_seg: .word 0 -.pushsection .trampoline_seg, "a" -.long trampoline_seg - . -.popsection .code32 -- 2.17.2 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [RFC PATCH v2 5/7] x86/boot: Rename trampoline_{start, end} to boot_trampoline_{start, end}
From: David Woodhouse In preparation for splitting the boot and permanent trampolines from each other. Some of these will change back, but most are boot so do the plain search/replace that way first, then a subsequent patch will extract the permanent trampoline code. Signed-off-by: David Woodhouse --- xen/arch/x86/boot/head.S | 12 ++-- xen/arch/x86/boot/trampoline.S | 10 +- xen/arch/x86/boot/video.S | 4 ++-- xen/arch/x86/efi/efi-boot.h| 4 ++-- xen/arch/x86/mm.c | 2 +- xen/arch/x86/setup.c | 4 ++-- xen/arch/x86/tboot.c | 6 +++--- xen/arch/x86/x86_64/mm.c | 2 +- xen/arch/x86/xen.lds.S | 6 +++--- xen/include/asm-x86/config.h | 6 +++--- 10 files changed, 28 insertions(+), 28 deletions(-) diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S index 5b4f211a9b..82342769c7 100644 --- a/xen/arch/x86/boot/head.S +++ b/xen/arch/x86/boot/head.S @@ -753,20 +753,20 @@ trampoline_setup: cmpb$0, sym_fs(skip_realmode) jz 1f /* If no-real-mode, jump straight to trampoline_protmode_entry */ -lea trampoline_protmode_entry-trampoline_start(%edi),%eax +lea trampoline_protmode_entry-boot_trampoline_start(%edi),%eax /* EBX == 0 indicates we are the BP (Boot Processor). */ xor %ebx,%ebx jmp 2f 1: /* Go via 16-bit code in trampoline_boot_cpu_entry */ -lea trampoline_boot_cpu_entry-trampoline_start(%edi),%eax +lea trampoline_boot_cpu_entry-boot_trampoline_start(%edi),%eax 2: pushl $BOOT_CS32 push%eax /* Copy bootstrap trampoline to low memory, below 1MB. */ -mov $sym_offs(trampoline_start),%esi -mov $((trampoline_end - trampoline_start) / 4),%ecx +mov $sym_offs(boot_trampoline_start),%esi +mov $((boot_trampoline_end - boot_trampoline_start) / 4),%ecx rep movsl %fs:(%esi),%es:(%edi) /* Jump into the relocated trampoline. */ @@ -778,8 +778,8 @@ cmdline_parse_early: reloc: #include "reloc.S" -ENTRY(trampoline_start) +ENTRY(boot_trampoline_start) #include "trampoline.S" -ENTRY(trampoline_end) +ENTRY(boot_trampoline_end) #include "x86_64.S" diff --git a/xen/arch/x86/boot/trampoline.S b/xen/arch/x86/boot/trampoline.S index e77b4bea36..0f4a740fcb 100644 --- a/xen/arch/x86/boot/trampoline.S +++ b/xen/arch/x86/boot/trampoline.S @@ -31,7 +31,7 @@ *to be used for AP startup. */ #undef bootsym -#define bootsym(s) ((s)-trampoline_start) +#define bootsym(s) ((s)-boot_trampoline_start) #define bootsym_rel(sym, off, opnd...) \ bootsym(sym),##opnd; \ @@ -47,7 +47,7 @@ .long 111b - (off) - .;\ .popsection -#define bootdatasym(s) ((s)-trampoline_start) +#define bootdatasym(s) ((s)-boot_trampoline_start) #define bootdatasym_rel(sym, off, opnd...) \ bootdatasym(sym),##opnd; \ 111:; \ @@ -56,7 +56,7 @@ .popsection #undef trampsym -#define trampsym(s) ((s)-trampoline_start) +#define trampsym(s) ((s)-boot_trampoline_start) #define trampsym_rel(sym, off, opnd...)\ trampsym(sym),##opnd; \ @@ -66,7 +66,7 @@ .popsection #undef tramp32sym -#define tramp32sym(s) ((s)-trampoline_start) +#define tramp32sym(s) ((s)-boot_trampoline_start) #define tramp32sym_rel(sym, off, opnd...) \ tramp32sym(sym),##opnd;\ @@ -226,7 +226,7 @@ start64: /* The first page of trampoline is permanent, the rest boot-time only. */ /* Reuse the boot trampoline on the 1st trampoline page as stack for wakeup. */ -.equwakeup_stack, trampoline_start + PAGE_SIZE +.equwakeup_stack, boot_trampoline_start + PAGE_SIZE .global wakeup_stack /* From here on early boot only. */ diff --git a/xen/arch/x86/boot/video.S b/xen/arch/x86/boot/video.S index 03907e9e9a..5087c6a4d5 100644 --- a/xen/arch/x86/boot/video.S +++ b/xen/arch/x86/boot/video.S @@ -15,8 +15,8 @@ #include "video.h" -/* Scratch space layout: trampoline_end to trampoline_end+0x1000. */ -#define modelist bootsym(trampoline_end) /* 2kB (256 entries) */ +/* Scratch space layout: boot_trampoline_end to boot_trampoline_end+0x1000. */ +#define modelist bootsym(boot_trampoline_end) /* 2kB (256 entries) */ #define vesa_glob_info (modelist + 0x800)/* 1kB */ #define vesa_mode_info (vesa_glob_info + 0x400) /* 1kB */ diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h index abc7d3e3b7..f6f435a4c5 100644 --- a/xen/arch/x86/efi/efi-boot.h +++ b/xen/arch/x86/efi/efi-boot.h @@ -232,7 +232,7 @@ static void __init noreturn efi_arch_post_exit_boot(void) u64 cr4 = XEN_MINIMAL_CR4 & ~X86_CR4_PGE, efer; efi_arch_relocate_image(__XEN_VIR
[Xen-devel] [RFC PATCH v2 7/7] x86/boot: Do not use trampoline for no-real-mode boot paths
From: David Woodhouse Where booted from EFI or with no-real-mode, there is no need to stomp on low memory with the 16-boot code. Instead, just go straight to trampoline_protmode_entry() at its physical location within the Xen image. For now, the boot code (including the EFI loader path) still determines what the trampoline_phys address should be. The trampoline is actually relocated for that address and copied into low memory, from a relocate_trampoline() call made from __start_xen(). For subsequent AP startup and wakeup, the 32-bit trampoline can't trivially be used in-place as that region isn't mapped. So copy it down to low memory too, having relocated it (again) to work from there. Signed-off-by: David Woodhouse --- xen/arch/x86/acpi/power.c | 6 +-- xen/arch/x86/boot/head.S | 67 -- xen/arch/x86/boot/trampoline.S | 30 +++ xen/arch/x86/cpu/common.c | 2 +- xen/arch/x86/cpu/intel.c | 2 +- xen/arch/x86/efi/efi-boot.h| 31 ++-- xen/arch/x86/mm.c | 2 +- xen/arch/x86/setup.c | 43 -- xen/arch/x86/smpboot.c | 6 +-- xen/arch/x86/tboot.c | 6 +-- xen/arch/x86/x86_64/mm.c | 2 +- xen/include/asm-x86/acpi.h | 2 +- xen/include/asm-x86/config.h | 10 ++--- 13 files changed, 117 insertions(+), 92 deletions(-) diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c index a704c7c340..a1ce8acb21 100644 --- a/xen/arch/x86/acpi/power.c +++ b/xen/arch/x86/acpi/power.c @@ -152,9 +152,9 @@ static void acpi_sleep_prepare(u32 state) return; if ( acpi_sinfo.vector_width == 32 ) -*(uint32_t *)wakeup_vector_va = bootsym_phys(wakeup_start); +*(uint32_t *)wakeup_vector_va = trampsym_phys(wakeup_start); else -*(uint64_t *)wakeup_vector_va = bootsym_phys(wakeup_start); +*(uint64_t *)wakeup_vector_va = trampsym_phys(wakeup_start); } static void acpi_sleep_post(u32 state) {} @@ -388,7 +388,7 @@ static void tboot_sleep(u8 sleep_state) g_tboot_shared->acpi_sinfo.wakeup_vector = acpi_sinfo.wakeup_vector; g_tboot_shared->acpi_sinfo.vector_width = acpi_sinfo.vector_width; g_tboot_shared->acpi_sinfo.kernel_s3_resume_vector = - bootsym_phys(wakeup_start); + trampsym_phys(wakeup_start); switch ( sleep_state ) { diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S index 7d6c8d3292..c23eeb3aa4 100644 --- a/xen/arch/x86/boot/head.S +++ b/xen/arch/x86/boot/head.S @@ -696,16 +696,23 @@ trampoline_setup: lea __PAGE_HYPERVISOR+sym_esi(l1_identmap),%edi mov %edi,sym_fs(l2_bootmap) -/* Apply relocations to bootstrap trampoline. */ -mov sym_fs(trampoline_phys),%edx -mov $sym_offs(__trampoline_rel_start),%edi -1: -mov %fs:(%edi),%eax -add %edx,%fs:(%edi,%eax) -add $4,%edi -cmp $sym_offs(__trampoline_rel_stop),%edi -jb 1b +/* Do not parse command line on EFI platform here. */ +cmpb$0,sym_fs(efi_platform) +jnz 1f +/* Bail if there is no command line to parse. */ +mov sym_fs(multiboot_ptr),%ebx +testl $MBI_CMDLINE,MB_flags(%ebx) +jz 1f + +lea sym_esi(early_boot_opts),%eax +push%eax +pushl MB_cmdline(%ebx) +callcmdline_parse_early + +1: +/* Apply relocations to 32-bit trampoline for execution in place. */ +lea sym_esi(perm_trampoline_start),%edx mov $sym_offs(__trampoline32_rel_start),%edi 1: mov %fs:(%edi),%eax @@ -714,6 +721,21 @@ trampoline_setup: cmp $sym_offs(__trampoline32_rel_stop),%edi jb 1b +cmp $0,sym_esi(skip_realmode) +jz .Ldo_realmode + +/* Go directly to trampoline_protmode_entry at its physical address */ +lea trampoline_protmode_entry-__XEN_VIRT_START(%esi),%eax +pushl $BOOT_CS32 +push%eax + +/* EBX == 0 indicates we are the BP (Boot Processor). */ +xor %ebx,%ebx +retl + +.Ldo_realmode: +/* Apply relocations to 16-bit boot code. */ +mov sym_fs(trampoline_phys),%edx mov $sym_offs(__bootsym_rel_start),%edi 1: mov %fs:(%edi),%eax @@ -743,35 +765,12 @@ trampoline_setup: cmp $sym_offs(__bootdatasym_rel_stop),%edi jb 1b -/* Do not parse command line on EFI platform here. */ -cmpb$0,sym_fs(efi_platform) -jnz 1f - -/* Bail if there is no command line to parse. */ -mov sym_fs(multiboot_ptr),%ebx -testl $MBI_CMDLINE,MB_flags(%ebx) -jz 1f - -lea sym_esi(early_boot_opts),%eax -push
[Xen-devel] [RFC PATCH v2 2/7] x86/boot: Remove gratuitous call back into low-memory code
From: David Woodhouse We appear to have implemented a memcpy() in the low-memory trampoline which we then call into from __start_xen(), for no adequately defined reason. Kill it with fire. Signed-off-by: David Woodhouse --- xen/arch/x86/boot/mem.S| 27 +-- xen/arch/x86/setup.c | 12 xen/include/asm-x86/e820.h | 5 ++--- 3 files changed, 19 insertions(+), 25 deletions(-) diff --git a/xen/arch/x86/boot/mem.S b/xen/arch/x86/boot/mem.S index c6a9bd4d3b..2d61d28835 100644 --- a/xen/arch/x86/boot/mem.S +++ b/xen/arch/x86/boot/mem.S @@ -7,7 +7,7 @@ get_memory_map: .Lmeme820: xorl%ebx, %ebx # continuation counter -movw$bootsym(e820map), %di # point into the whitelist +movw$bootsym(bios_e820map), %di # point into the whitelist # so we can have the bios # directly write into it. @@ -22,8 +22,8 @@ get_memory_map: cmpl$SMAP,%eax # check the return is `SMAP' jne .Lmem88 -incwbootsym(e820nr) -cmpw$E820_BIOS_MAX,bootsym(e820nr) # up to this many entries +incwbootsym(bios_e820nr) +cmpw$E820_BIOS_MAX,bootsym(bios_e820nr) # up to this many entries jae .Lmem88 movw%di,%ax @@ -66,27 +66,10 @@ get_memory_map: ret -/* - * Copy E820 map obtained from BIOS to a buffer allocated by Xen. - * Input: %rdi: target address of e820 entry array - *%esi: maximum number of entries to copy - * Output: %eax: number of entries copied - */ -.code64 -ENTRY(e820map_copy) -mov %esi, %eax -lea e820map(%rip), %rsi -mov e820nr(%rip), %ecx -cmp %ecx, %eax -cmova %ecx, %eax # number of entries to move -imul$5, %eax, %ecx -rep movsl # do the move -ret - .align 4 -e820map: +GLOBAL(bios_e820map) .fill E820_BIOS_MAX*20,1,0 -e820nr: +GLOBAL(bios_e820nr) .long 0 GLOBAL(lowmem_kb) .long 0 diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c index a353d76f9a..5fa7d3b79c 100644 --- a/xen/arch/x86/setup.c +++ b/xen/arch/x86/setup.c @@ -664,6 +664,18 @@ static char * __init cmdline_cook(char *p, const char *loader_name) return p; } +static int copy_bios_e820(struct e820entry *map, unsigned int limit) +{ +unsigned int n = bootsym(bios_e820nr); +if (n > limit) +n = limit; + +if (n) +memcpy(map, bootsym(bios_e820map), sizeof(*map) * n); + +return n; +} + void __init noreturn __start_xen(unsigned long mbi_p) { char *memmap_type = NULL; diff --git a/xen/include/asm-x86/e820.h b/xen/include/asm-x86/e820.h index ee317b17aa..52916fb75d 100644 --- a/xen/include/asm-x86/e820.h +++ b/xen/include/asm-x86/e820.h @@ -37,8 +37,7 @@ extern struct e820map e820_raw; /* These symbols live in the boot trampoline. */ extern unsigned int lowmem_kb, highmem_kb; -unsigned int e820map_copy(struct e820entry *map, unsigned int limit); - -#define copy_bios_e820 bootsym(e820map_copy) +extern struct e820map bios_e820map[]; +extern unsigned int bios_e820nr; #endif /*__E820_HEADER*/ -- 2.17.2 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [RFC PATCH v2 3/7] x86/boot: Only jump into low trampoline code for real-mode boot
From: David Woodhouse If the no-real-mode flag is set, don't go there at all. This is a prelude to not even putting it there in the first place. Signed-off-by: David Woodhouse --- xen/arch/x86/boot/head.S | 10 ++ xen/arch/x86/boot/trampoline.S | 4 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S index db19ac6fd8..7c30de3671 100644 --- a/xen/arch/x86/boot/head.S +++ b/xen/arch/x86/boot/head.S @@ -734,7 +734,17 @@ trampoline_setup: /* Switch to low-memory stack which lives at the end of trampoline region. */ mov sym_fs(trampoline_phys),%edi lea TRAMPOLINE_SPACE+TRAMPOLINE_STACK_SPACE(%edi),%esp +cmpb$0, sym_fs(skip_realmode) +jz 1f +/* If no-real-mode, jump straight to trampoline_protmode_entry */ +lea trampoline_protmode_entry-trampoline_start(%edi),%eax +/* EBX == 0 indicates we are the BP (Boot Processor). */ +xor %ebx,%ebx +jmp 2f +1: +/* Go via 16-bit code in trampoline_boot_cpu_entry */ lea trampoline_boot_cpu_entry-trampoline_start(%edi),%eax +2: pushl $BOOT_CS32 push%eax diff --git a/xen/arch/x86/boot/trampoline.S b/xen/arch/x86/boot/trampoline.S index 5588c7986a..df0ffd5013 100644 --- a/xen/arch/x86/boot/trampoline.S +++ b/xen/arch/x86/boot/trampoline.S @@ -187,9 +187,6 @@ start64: .code32 trampoline_boot_cpu_entry: -cmpb$0,bootsym_rel(skip_realmode,5) -jnz .Lskip_realmode - /* Load pseudo-real-mode segments. */ mov $BOOT_PSEUDORM_DS,%eax mov %eax,%ds @@ -269,7 +266,6 @@ trampoline_boot_cpu_entry: mov %eax,%gs mov %eax,%ss -.Lskip_realmode: /* EBX == 0 indicates we are the BP (Boot Processor). */ xor %ebx,%ebx -- 2.17.2 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [RFC PATCH v2 4/7] x86/boot: Split bootsym() into four types of relocations
From: David Woodhouse As a first step toward using the low-memory trampoline only when necessary for a legacy boot without no-real-mode, clean up the relocations into three separate groups. • bootsym() is now used only at boot time when no-real-mode isn't set. • bootdatasym() is for variables containing information discovered by the 16-bit boot code. This is currently accessed directly in place in low memory by Xen at runtime, but will be copied back to its location in high memory to avoid the pointer gymnastics (and because a subsequent patch will stop copying the 16-bit boot code into low memory at all when it isn't being used). • trampsym() is for the permanent 16-bit trampoline used for AP startup and for wake from sleep. This is not used at boot, and can be copied into (properly allocated) low memory once the system is running. • tramp32sym() is used both at boot and for AP startup/wakeup. During boot it can be used in-place, running from the physical address of the Xen image. For AP startup it can't, because at that point there isn't a full 1:1 mapping of all memory; only the low trampoline page is mapped. No (intentional) functional change yet; just a "cleanup" to allow the various parts to be treated separately in subsequent patches. Signed-off-by: David Woodhouse --- xen/arch/x86/boot/edd.S| 16 +++ xen/arch/x86/boot/head.S | 22 +++-- xen/arch/x86/boot/mem.S| 12 ++--- xen/arch/x86/boot/trampoline.S | 86 ++ xen/arch/x86/boot/video.S | 6 +-- xen/arch/x86/boot/wakeup.S | 16 +++ xen/arch/x86/efi/efi-boot.h| 8 ++-- xen/arch/x86/xen.lds.S | 15 -- 8 files changed, 125 insertions(+), 56 deletions(-) diff --git a/xen/arch/x86/boot/edd.S b/xen/arch/x86/boot/edd.S index 3df712bce1..434bbbd960 100644 --- a/xen/arch/x86/boot/edd.S +++ b/xen/arch/x86/boot/edd.S @@ -41,7 +41,7 @@ get_edd: # This code is sensitive to the size of the structs in edd.h edd_start: /* ds:si points at fn48 results. Fn41 results go immediately before. */ -movw$bootsym(boot_edd_info)+EDDEXTSIZE, %si +movw$bootdatasym(boot_edd_info)+EDDEXTSIZE, %si movb$0x80, %dl # BIOS device 0x80 edd_check_ext: @@ -56,7 +56,7 @@ edd_check_ext: movb%dl, %ds:-8(%si)# store device number movb%ah, %ds:-7(%si)# store version movw%cx, %ds:-6(%si)# store extensions -incbbootsym(boot_edd_info_nr) # note that we stored something +incbbootdatasym(boot_edd_info_nr) # note that we stored something edd_get_device_params: movw$EDDPARMSIZE, %ds:(%si) # put size @@ -97,7 +97,7 @@ edd_legacy_done: edd_next: incb%dl # increment to next device jz edd_done -cmpb$EDD_INFO_MAX,bootsym(boot_edd_info_nr) +cmpb$EDD_INFO_MAX,bootdatasym(boot_edd_info_nr) jb edd_check_ext edd_done: @@ -108,11 +108,11 @@ edd_done: .Ledd_mbr_sig_start: pushw %es movb$0x80, %dl # from device 80 -movw$bootsym(boot_mbr_signature), %bx # store buffer ptr in bx +movw$bootdatasym(boot_mbr_signature), %bx # store buffer ptr in bx .Ledd_mbr_sig_read: pushw %bx -movw$bootsym(boot_edd_info), %bx -movzbw bootsym(boot_edd_info_nr), %cx +movw$bootdatasym(boot_edd_info), %bx +movzbw bootdatasym(boot_edd_info_nr), %cx jcxz.Ledd_mbr_sig_default .Ledd_mbr_sig_find_info: cmpb%dl, (%bx) @@ -151,12 +151,12 @@ edd_done: jne .Ledd_mbr_sig_next movb%dl, (%bx) # store BIOS drive number movl%ecx, 4(%bx)# store signature from MBR -incbbootsym(boot_mbr_signature_nr) # note that we stored something +incbbootdatasym(boot_mbr_signature_nr) # note that we stored something addw$8, %bx # increment sig buffer ptr .Ledd_mbr_sig_next: incb%dl # increment to next device jz .Ledd_mbr_sig_done -cmpb$EDD_MBR_SIG_MAX, bootsym(boot_mbr_signature_nr) +cmpb$EDD_MBR_SIG_MAX, bootdatasym(boot_mbr_signature_nr) jb .Ledd_mbr_sig_read .Ledd_mbr_sig_done: popw%es diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S index 7c30de3671..5b4f211a9b 100644 --- a/xen/arch/x86/boot/head.S +++ b/xen/arch/x86/boot/head.S @@ -706,14 +706,30 @@ trampoline_setup: cmp $sym_offs(__trampoline_rel_stop),%edi jb 1b -/* Patch in the trampoline segment. */ +mov $sym_offs(__trampoline32_rel_start),%edi +1: +mov %fs:(%e
[Xen-devel] [RFC PATCH v2 6/7] x86/boot: Copy 16-bit boot variables back up to Xen image
From: David Woodhouse Ditch the bootsym() access from C code for the variables populated by 16-bit boot code. As well as being cleaner this also paves the way for not having the 16-bit boot code in low memory for no-real-mode or EFI loader boots at all. Signed-off-by: David Woodhouse --- xen/arch/x86/boot/edd.S | 2 ++ xen/arch/x86/boot/head.S | 16 +++ xen/arch/x86/boot/mem.S | 2 ++ xen/arch/x86/boot/trampoline.S| 33 --- xen/arch/x86/boot/video.S | 30 +++- xen/arch/x86/platform_hypercall.c | 18 - xen/arch/x86/setup.c | 23 +++-- xen/arch/x86/xen.lds.S| 8 +++- xen/include/asm-x86/edd.h | 1 - 9 files changed, 93 insertions(+), 40 deletions(-) diff --git a/xen/arch/x86/boot/edd.S b/xen/arch/x86/boot/edd.S index 434bbbd960..138d04c964 100644 --- a/xen/arch/x86/boot/edd.S +++ b/xen/arch/x86/boot/edd.S @@ -163,6 +163,7 @@ edd_done: .Ledd_mbr_sig_skip: ret +.pushsection .data.boot16, "aw", @progbits GLOBAL(boot_edd_info_nr) .byte 0 GLOBAL(boot_mbr_signature_nr) @@ -171,3 +172,4 @@ GLOBAL(boot_mbr_signature) .fill EDD_MBR_SIG_MAX*8,1,0 GLOBAL(boot_edd_info) .fill EDD_INFO_MAX * (EDDEXTSIZE + EDDPARMSIZE), 1, 0 +.popsection diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S index 82342769c7..7d6c8d3292 100644 --- a/xen/arch/x86/boot/head.S +++ b/xen/arch/x86/boot/head.S @@ -732,6 +732,17 @@ trampoline_setup: cmp $sym_offs(__bootsym_seg_stop),%edi jb 1b +/* Relocations for the boot data section. */ +mov sym_fs(trampoline_phys),%edx +add $(boot_trampoline_end - boot_trampoline_start),%edx +mov $sym_offs(__bootdatasym_rel_start),%edi +1: +mov %fs:(%edi),%eax +add %edx,%fs:(%edi,%eax) +add $4,%edi +cmp $sym_offs(__bootdatasym_rel_stop),%edi +jb 1b + /* Do not parse command line on EFI platform here. */ cmpb$0,sym_fs(efi_platform) jnz 1f @@ -769,6 +780,11 @@ trampoline_setup: mov $((boot_trampoline_end - boot_trampoline_start) / 4),%ecx rep movsl %fs:(%esi),%es:(%edi) +/* Copy boot data template to low memory. */ +mov $sym_offs(bootdata_start),%esi +mov $((bootdata_end - bootdata_start + 3) / 4),%ecx +rep movsl %fs:(%esi),%es:(%edi) + /* Jump into the relocated trampoline. */ lret diff --git a/xen/arch/x86/boot/mem.S b/xen/arch/x86/boot/mem.S index aa39608442..86f0fa9af7 100644 --- a/xen/arch/x86/boot/mem.S +++ b/xen/arch/x86/boot/mem.S @@ -67,6 +67,7 @@ get_memory_map: ret .align 4 +.pushsection .data.boot16, "aw", @progbits GLOBAL(bios_e820map) .fill E820_BIOS_MAX*20,1,0 GLOBAL(bios_e820nr) @@ -75,3 +76,4 @@ GLOBAL(lowmem_kb) .long 0 GLOBAL(highmem_kb) .long 0 + .popsection diff --git a/xen/arch/x86/boot/trampoline.S b/xen/arch/x86/boot/trampoline.S index 0f4a740fcb..fdfee2edb1 100644 --- a/xen/arch/x86/boot/trampoline.S +++ b/xen/arch/x86/boot/trampoline.S @@ -47,11 +47,15 @@ .long 111b - (off) - .;\ .popsection -#define bootdatasym(s) ((s)-boot_trampoline_start) +.pushsection .data.boot16, "aw", @progbits +GLOBAL(bootdata_start) +.popsection + +#define bootdatasym(s) ((s)-bootdata_start+(boot_trampoline_end-boot_trampoline_start)) #define bootdatasym_rel(sym, off, opnd...) \ bootdatasym(sym),##opnd; \ 111:; \ -.pushsection .bootdatasym_rel, "a";\ +.pushsection .bootsym_rel, "a";\ .long 111b - (off) - .;\ .popsection @@ -100,7 +104,7 @@ GLOBAL(trampoline_cpu_started) .word 0 idt_48: .word 0, 0, 0 # base = limit = 0 .word 0 -gdt_48: .word 6*8-1 +gdt_48: .word 7*8-1 .long tramp32sym_rel(trampoline_gdt,4) /* Start of tramp32sym section which can be used in place during boot */ @@ -312,6 +316,23 @@ trampoline_boot_cpu_entry: mov %eax,%gs mov %eax,%ss +/* + * Copy locally-gathered data back up into the Xen physical image + */ +mov $BOOT_FS,%eax +mov %eax,%es + +mov $sym_offs(bootdata_end),%ecx +mov $sym_offs(bootdata_start),%edi +sub %edi,%ecx +mov $bootdatasym_rel(bootdata_start,4,%esi) +rep movsb %ds:(%esi),%es:(%edi) + +/* + * %es still points to BOOT_FS but trampoline_protmode_entry + * reloads it anyway. + */ + /* EBX == 0 indicates we are the BP (Boot Processor). */ xor %ebx,%ebx @@ -
[Xen-devel] [RFC PATCH v2 0/7] Clean up x86_64 boot code
Some cleanups for the boot path, originally inspired by an attempt to avoid scribbling on arbitrarily-chosen low memory. In the no-real-mode case we don't need to bounce through low memory at all; we can run the 32-bit trampoline in-place in the Xen image. The variables containing information which is optionally discovered by the real-mode boot code can be put back in place in the Xen image and we can dispense with the bootsym() pointer gymnastics in C code which access them in low memory. Next up is killing reloc(), which AFAICT only exists to ensure that the various breadcrumbs left all over the place by the Multiboot bootloader aren't scribbled on when we copy the 16-bit boot trampoline into low memory. Except that it's copying it out of the way... into the top of low memory. Suspect if I tell kexec to put a segment there, it's going to fail quite horribly. Which leaves me with no valid reason for reloc() to be running this early, so I may well kill it with fire too. I just need to find a safe location for the 16-bit boot code. v2: Fix wake code. Thanks Andy for testing. David Woodhouse (7): x86/wakeup: Stop using %fs for lidt/lgdt x86/boot: Remove gratuitous call back into low-memory code x86/boot: Only jump into low trampoline code for real-mode boot x86/boot: Split bootsym() into four types of relocations x86/boot: Rename trampoline_{start,end} to boot_trampoline_{start,end} x86/boot: Copy 16-bit boot variables back up to Xen image x86/boot: Do not use trampoline for no-real-mode boot paths xen/arch/x86/acpi/power.c | 6 +- xen/arch/x86/boot/edd.S | 18 ++-- xen/arch/x86/boot/head.S | 89 -- xen/arch/x86/boot/mem.S | 35 +++- xen/arch/x86/boot/trampoline.S| 145 +++--- xen/arch/x86/boot/video.S | 36 xen/arch/x86/boot/wakeup.S| 23 ++--- xen/arch/x86/cpu/common.c | 2 +- xen/arch/x86/cpu/intel.c | 2 +- xen/arch/x86/efi/efi-boot.h | 31 +-- xen/arch/x86/mm.c | 2 +- xen/arch/x86/platform_hypercall.c | 18 ++-- xen/arch/x86/setup.c | 72 --- xen/arch/x86/smpboot.c| 6 +- xen/arch/x86/tboot.c | 6 +- xen/arch/x86/x86_64/mm.c | 2 +- xen/arch/x86/xen.lds.S| 27 -- xen/include/asm-x86/acpi.h| 2 +- xen/include/asm-x86/config.h | 10 +-- xen/include/asm-x86/e820.h| 5 +- xen/include/asm-x86/edd.h | 1 - 21 files changed, 339 insertions(+), 199 deletions(-) -- 2.17.2 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC PATCH v2 0/7] Clean up x86_64 boot code
Argh, that's the first version again. Sorry. The fixed version is in http://git.infradead.org/users/dwmw2/xen.git/shortlog/refs/heads/bootcleanup but I won't post the whole series again right now. smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC PATCH 2/7] x86/boot: Remove gratuitous call back into low-memory code
> On 02/05/2019 09:14, Jan Beulich wrote: > On 01.05.19 at 13:17, wrote: >>> We appear to have implemented a memcpy() in the low-memory trampoline >>> which we then call into from __start_xen(), for no adequately defined >>> reason. >> May I suggest that in cases like this you look at the commit >> introducing the function? It supplies a reason: >> >> "Add a new raw e820 table for common purpose and copy the BIOS buffer >> to it. Doing the copying in assembly avoids the need to export the >> symbols for the BIOS E820 buffer and number of entries." > > I completely agree with David here. That description is completely > insufficient for justifying why the logic was done that way in the end, > and I would not have accepted the patch in that form at all. > > To be clear. I understand (and agree) why having our main copy of the > e820 table in the trampoline is a bad thing, and moving the main copy > out of the trampoline is fine. > > What isn't fine is why, in the adjusted world, we call back into what > appears to be the trampoline, but isn't, to get access to data which the > calling code can already access. In particular... > >> I have to admit that I see value in this, as it avoids introduction >> of wrong accesses to these variables. > > ...this reasoning is bogus. We're either accessing the data itself, or > the memcpy function, but there is no possible way to programatically > avoid "wrong" access into the trampoline, because we're still accessing > it. > > Therefore, I don't understand what possible benefit not exporting the > data has in the first place, and why complicating it with a call to a > function (which isn't actually executing where it appears to live), is > considered a benefit. Note that after bringing these two gratuitously differently handled symbols in line with everything else in the boot trampoline, a later patch in the series puts all this stuff into its own data section which is copied back up to the Xen image at the end of the real mode phase of boot, and all the pointer gymnastics for all of them go away completely -- dwmw2 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC PATCH 2/7] x86/boot: Remove gratuitous call back into low-memory code
On Thu, 2019-05-02 at 10:23 +0100, Andrew Cooper wrote: > ...this reasoning is bogus. We're either accessing the data itself, or > the memcpy function, but there is no possible way to programatically > avoid "wrong" access into the trampoline, because we're still accessing it. Just to be clear, now I'm back at a real computer: yes there is a way to avoid "wrong" access into the trampoline from the 64-bit C code. It's the 6th patch in my series, for which this one is a necessary preliminary cleanup. http://git.infradead.org/users/dwmw2/xen.git/commitdiff/f54044fb3ad5d It rips out all those horrid bootsym() pointer gymnastics in the 64-bit code, except for the limited cases where we really do need to put things in place for the permanently AP/wakeup trampolines. It puts all these boot-discovered variables into their own section, then copies them back into the Xen image when the 16-bit boot code has finished running, to be accessed from there ever after. Which also paves the way for the no-real-mode boot not actually putting *anything* into low memory during boot at all. This allows us to clean up the EFI code to stop having to do that, too. smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/boot: Annotate the Real Mode entry points
On Thu, 2019-05-02 at 15:08 +0100, Andrew Cooper wrote: > Sadly it cant. > > We have a number of alignment issues (well - confusions at least), most > obviously that trampoline_{start,end} in the linked Xen image has no > particular alignment, but the relocated trampoline_start has a 4k > requirement as a consequence of its alias with trampoline_realmode_entry. > > All it takes is one error in the 32bit asm which relocates the > trampoline and this ends up exploding in a way which can only be > detected at runtime. I'm relocating the permanent trampoline from 64-bit C code now, not in assembly. In the no-real-mode case (ignoring reloc(), qv) nothing is put into low memory until __start_xen() calls relocate_trampoline(). smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/boot: Drop vestigial support for pre-SIPI APICs
On Wed, 2019-06-12 at 06:00 -0600, Jan Beulich wrote: > > > > Reported-by: David Woodhouse > > > > > > Does this mean there was an actual problem resulting from this code > > > being there, or simply the observation that this code is all dead? > > > Clarifying one way or the other by half a sentence would be nice. > > > > It was more an observation that when you kexec Xen, it blindly writes > > into the BDA even when that wasn't set up properly by the tools. > > > > Arguably that may have been kexec setup problem more than a Xen problem, > > but after looking at this code, it is obviously that what Xen was doing > > definitely wasn't right to do unconditionally. It just so happens that > > it safe to unconditionally drop the logic, rather than to make it > > dependant on other system properties. > > > > I'm not sure how best to phrase this. > > Maybe "In practice issues with this were observed only with kexec"? Not sure that's true either, is it? I found *lots* of issues when doing kexec, and I should resend that series of boot code cleanups — but this wasn't one of the ones I remember spotting :) smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/boot: Drop vestigial support for pre-SIPI APICs
On Wed, 2019-06-12 at 13:20 +0100, Andrew Cooper wrote: > You definitely complained about the BDA on IRC, which is why I started > looking, but I'm happy to leave you out of the patch if you'd prefer. I did indeed complain about the BDA, but mostly when we were reading the amount of memory available from the BDA... when that had been overwritten by kexec loading data segments over it :) I don't think I spotted this *particular* issue, so perhaps "Dave made me start staring at this crap again" instead of literally "Reported by Dave" is truer... smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2] tools/xenstore: Do not abort xenstore-ls if a node disappears while iterating
From: David Woodhouse When recursing, a node sometimes disappears. Deal with it and move on instead of aborting and failing to print the rest of what was requested. Either EACCES or ENOENT may occur as the result of race conditions with updates; any other error should abort as before. Signed-off-by: David Woodhouse --- And thus did an extremely sporadic "not going to delete that device because it already doesn't exist" failure mode become painfully obvious in retrospect... tools/xenstore/xenstore_client.c | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/tools/xenstore/xenstore_client.c b/tools/xenstore/xenstore_client.c index 3afc630ab8..9fcd3d2f9e 100644 --- a/tools/xenstore/xenstore_client.c +++ b/tools/xenstore/xenstore_client.c @@ -148,14 +148,20 @@ static void do_ls(struct xs_handle *h, char *path, int cur_depth, int show_perms int i; unsigned int num, len; +e = xs_directory(h, XBT_NULL, path, &num); +if (e == NULL) { +if (cur_depth && (errno == ENOENT || errno == EACCES)) { +/* If a node disappears while recursing, silently move on. */ +return; +} + +err(1, "xs_directory (%s)", path); +} + newpath = malloc(STRING_MAX); if (!newpath) err(1, "malloc in do_ls"); -e = xs_directory(h, XBT_NULL, path, &num); -if (e == NULL) -err(1, "xs_directory (%s)", path); - for (i = 0; i smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] tools/xenstore: Do not abort xenstore-ls if a node disappears while iterating
On Wed, 2019-08-07 at 16:37 +0100, Ian Jackson wrote: > I think it is not safe to continue if we get a permissions error. I > think that would mean "we were not able to determine whether this node > exists", not "this node does not exist". Either way, I interpreted it more as "haha screw you, now I'm not going to tell you whether any *other* node exists". This is the test case I actually started with: (dom0) # for a in `seq 1 1000` ; do xenstore-write /local/domain/2/foo/$a $a ; done Now simultaneously: (dom0) # for a in `seq 1 999` ; do xenstore-rm /local/domain/2/foo/$a ; done (dom2) # while true ; do ./xenstore-ls -p /local/domain/2/foo | grep -c 1000; done I do expect to see node 1000, every time. With my patch, that works. > So with a permissions error silently ignored, xenstore-ls might > silently produce partial output. Even without the race condition, we get partial output. In this test case, observe that node four is absent from what dom2 reports, and we get *two* error messages about node three. (dom0) # xenstore-ls -p /local/domain/2/foo one = "one" . . . . . . . . . . . . . . . . . . . . . . . . (n0,r2) two = "two" . . . . . . . . . . . . . . . . . . . . . . . . (n0,r2) three = "three" . . . . . . . . . . . . . . . . . . . . . . (n0) four = "four" . . . . . . . . . . . . . . . . . . . . . . . (n0,r2) (dom2) # xenstore-ls -p /local/domain/2/foo one = "one" . . . . . . . . . . . . . . . . . . . . . . . . (n0,r2) two = "two" . . . . . . . . . . . . . . . . . . . . . . . . (n0,r2) three: xenstore-ls: could not access permissions for three: Permission denied xenstore-ls: xs_directory (/local/domain/2/foo/three): Permission denied The tool actually makes three calls for each node it touches. If the xs_read() fails, it silently prints ":\n" and doesn't report the errno. If the (optional) xs_get_permissions() fails, it *warns* but continues, attempting to recurse into the node in question. If the xs_directory() fails, it aborts and doesn't even continue. My v2 patch makes the third case (xs_directory()) behave like the first (xs_read()), and silently continue. It gives me: (dom2) # ./xenstore-ls -p /local/domain/2/foo one = "one" . . . . . . . . . . . . . . . . . . . . . . . . (n0,r2) two = "two" . . . . . . . . . . . . . . . . . . . . . . . . (n0,r2) three: xenstore-ls: could not access permissions for three: Permission denied four = "four" . . . . . . . . . . . . . . . . . . . . . . . (n0,r2) (dom2) # ./xenstore-ls /local/domain/2/foo one = "one" two = "two" three: four = "four" > Given that the code doesn't have a way to print an error message and > record the error code or exit status for later, I think the best > approach is probably exactly David's patch only without the mention of > EACCES. Well... here's an incremental straw man patch based on the existing v2, which will print an error for the *first* operation that fails, and should cope with race conditions too. Tested with (dom0) # while true; do xenstore-chmod /local/domain/2/foo/three n0; xenstore-chmod /local/domain/2/foo/three n0 r2 ; done (dom2) # while true ; do echo = ; sudo ./xenstore-ls -p /local/domain/2/foo; done diff --git a/tools/xenstore/xenstore_client.c b/tools/xenstore/xenstore_client.c index 9fcd3d2f9e..c5d0b18106 100644 --- a/tools/xenstore/xenstore_client.c +++ b/tools/xenstore/xenstore_client.c @@ -140,7 +140,7 @@ static int show_whole_path = 0; #define MIN(a, b) (((a) < (b))? (a) : (b)) -static void do_ls(struct xs_handle *h, char *path, int cur_depth, int show_perms) +static void do_ls(struct xs_handle *h, char *path, int cur_depth, int show_perms, int ignore_errors) { char **e; char *newpath, *val; @@ -151,7 +151,13 @@ static void do_ls(struct xs_handle *h, char *path, int cur_depth, int show_perms e = xs_directory(h, XBT_NULL, path, &num); if (e == NULL) { if (cur_depth && (errno == ENOENT || errno == EACCES)) { -/* If a node disappears while recursing, silently move on. */ +/* + * If a node disappears or becomes inaccessible while traversing, + * only print an error if previous operations on this node haven't + * done do. Then move on. + */ +if (!ignore_errors) +warn("xs_directory (%s)", path); return; } @@ -197,7 +203,7 @@ static void do_ls(struct xs_handle *h, char *path, int cur_depth, int show_perms /* Print value */ if (val == NULL) { -printf(":\n"); +printf(": (%s)", strerror(errno)); } else { if (max_width < (linewid + len + TAG_LEN)) { @@ -222,7 +228,10 @@ static void do_ls(struct xs_handle *h, char *path, int cur_depth, int show_perms if (show_perms) { perms = xs_get_permissions(h, XBT_NULL, newpath, &nperms); if (perms == NULL) { -warn("\ncould not
Re: [Xen-devel] [PATCH v2] tools/xenstore: Do not abort xenstore-ls if a node disappears while iterating
On Fri, 2019-08-09 at 12:27 +0100, Ian Jackson wrote: > Can you explain to me what your needs are for all this ? I want > xenstore-ls to be able to do what you want, but I don't want to impose > a lot of work on you. I also want the correctness property I mention > above. Ultimately, I don't really care about anything but the ENOENT example that I gave in my previous email: (dom0) # for a in `seq 1 1000` ; do xenstore-write /local/domain/2/foo/$a $a ; done Now simultaneously: (dom0) # for a in `seq 1 999` ; do xenstore-rm /local/domain/2/foo/$a ; done (dom2) # while true ; do ./xenstore-ls -p /local/domain/2/foo | grep -c 1000; done I want to see node 1000, reliably, every single time. I absolutely don't want it aborting because of the other nodes disappearing between the xs_directory() on the parent, and on a (now removed) child that I don't even care about. > My view of the ideal situation would be for xenstore-ls to have > defined exit status values, like > > 0everything went well > 1not used, reserved because some runtimes etc. invent 1 > 2permissions of at least one node could not be printed > 3value of at least one node could not be printed > 4children of at least one node could not be listed, > nodes may be missing > 127 catastrophe, all bets are off Sure. > And maybe some command line options to control which of these > conditions print a message to stderr, and/or to turn some of them into > 0 for the caller's convenience. That part seems like overkill. > But this would involve xenstore-ls having a global variable where it > accumulates the error status, and going through the whole program and > fixing the error handling (which currently seems not very good to me). I don't think we need a global variable. The do_ls() function is recursive and could return an error which then accumulates. The straw man in my previous patch fixes up most of the error handling and makes sure we're only printing one error per node, and could fairly easily be extended to return an error value. Then we just need to define priorities for ENOENT vs. EACCESS (or make it a bitfield, but let's not). > If you want to do that, then great, let us talk about the details. > > If that's too much work then maybe you can do something now which at > least goes in the right direction. I'll send a v3 of the original patch which *only* ignores ENOENT. That one really shouldn't need to cause a non-zero exit status because it's equivalent to the case where the child node didn't exist when we called xs_directory() on the parent. Then we can refine the rest of the cleanup in a follow-on patch. smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] tools/xenstore: Do not abort xenstore-ls if a node disappears while iterating
On Fri, 2019-08-09 at 12:48 +0100, David Woodhouse wrote: > > I'll send a v3 of the original patch which *only* ignores ENOENT. That > one really shouldn't need to cause a non-zero exit status because it's > equivalent to the case where the child node didn't exist when we called > xs_directory() on the parent. Then we can refine the rest of the > cleanup in a follow-on patch. How does this look? https://xenbits.xen.org/gitweb/?p=people/dwmw2/xen.git;a=commitdiff;h=77b922bd027c321ff5c9426d5380f6d47c7022f1 smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 2/6] x86/boot: Only jump into low trampoline code for real-mode boot
From: David Woodhouse If the no-real-mode flag is set, don't go there at all. This is a prelude to not even putting it there in the first place. Signed-off-by: David Woodhouse --- xen/arch/x86/boot/head.S | 10 ++ xen/arch/x86/boot/trampoline.S | 4 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S index d78bed394a..e3b42e3263 100644 --- a/xen/arch/x86/boot/head.S +++ b/xen/arch/x86/boot/head.S @@ -735,7 +735,17 @@ trampoline_setup: /* Switch to low-memory stack which lives at the end of trampoline region. */ mov sym_fs(trampoline_phys),%edi lea TRAMPOLINE_SPACE+TRAMPOLINE_STACK_SPACE(%edi),%esp +cmpb$0, sym_fs(skip_realmode) +jz 1f +/* If no-real-mode, jump straight to trampoline_protmode_entry */ +lea trampoline_protmode_entry-trampoline_start(%edi),%eax +/* EBX == 0 indicates we are the BP (Boot Processor). */ +xor %ebx,%ebx +jmp 2f +1: +/* Go via 16-bit code in trampoline_boot_cpu_entry */ lea trampoline_boot_cpu_entry-trampoline_start(%edi),%eax +2: pushl $BOOT_CS32 push%eax diff --git a/xen/arch/x86/boot/trampoline.S b/xen/arch/x86/boot/trampoline.S index 7c6a2328d2..429a088b19 100644 --- a/xen/arch/x86/boot/trampoline.S +++ b/xen/arch/x86/boot/trampoline.S @@ -194,9 +194,6 @@ gdt_48: .word 6*8-1 .code32 trampoline_boot_cpu_entry: -cmpb$0,bootsym_rel(skip_realmode,5) -jnz .Lskip_realmode - /* Load pseudo-real-mode segments. */ mov $BOOT_PSEUDORM_DS,%eax mov %eax,%ds @@ -276,7 +273,6 @@ trampoline_boot_cpu_entry: mov %eax,%gs mov %eax,%ss -.Lskip_realmode: /* EBX == 0 indicates we are the BP (Boot Processor). */ xor %ebx,%ebx smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 0/6] Clean up x86_64 boot code
Some cleanups for the boot path, originally inspired by an attempt to avoid scribbling on arbitrarily-chosen low memory. In the no-real-mode case we don't need to bounce through low memory at all; we can run the 32-bit trampoline in-place in the Xen image. The variables containing information which is optionally discovered by the real-mode boot code can be put back in place in the Xen image and we can dispense with the bootsym() pointer gymnastics in C code which access them in low memory. I haven't yet got to reloc(), which I think exists only to ensure that the various breadcrumbs left all over the place by the Multiboot bootloader aren't scribbled on when we copy the 16-bit boot trampoline into low memory. I'd quite like to kill reloc() and pass the original pointer up to 64-bit code to be handled in C. That would require finding a *safe* location to put the 16-bit boot trampoline though, which doesn't already contain anything that the bootloader created for us. In fact, isn't there already a chance that head.S will choose a location for the trampoline which is already part of a module or contains one of the Multiboot breadcrumbs? https://xenbits.xen.org/gitweb/?p=people/dwmw2/xen.git;a=shortlog;h=refs/heads/bootcleanup v2: Patch #1 is the first series is already merged. Fold in minor fixup from Andrew to what is now patch #1. Verbally agree to overcome Jan's objections to patch #1, in Chicago. David Woodhouse (6): x86/boot: Remove gratuitous call back into low-memory code x86/boot: Only jump into low trampoline code for real-mode boot x86/boot: Split bootsym() into four types of relocations x86/boot: Rename trampoline_{start,end} to boot_trampoline_{start,end} x86/boot: Copy 16-bit boot variables back up to Xen image x86/boot: Do not use trampoline for no-real-mode boot paths xen/arch/x86/acpi/power.c | 6 +- xen/arch/x86/boot/edd.S | 18 ++-- xen/arch/x86/boot/head.S | 89 +- xen/arch/x86/boot/mem.S | 35 ++- xen/arch/x86/boot/trampoline.S| 146 -- xen/arch/x86/boot/video.S | 36 xen/arch/x86/boot/wakeup.S| 12 +-- xen/arch/x86/cpu/common.c | 2 +- xen/arch/x86/cpu/intel.c | 2 +- xen/arch/x86/efi/efi-boot.h | 31 +-- xen/arch/x86/platform_hypercall.c | 18 ++-- xen/arch/x86/setup.c | 71 --- xen/arch/x86/smpboot.c| 6 +- xen/arch/x86/tboot.c | 6 +- xen/arch/x86/x86_64/mm.c | 2 +- xen/arch/x86/xen.lds.S| 27 -- xen/include/asm-x86/acpi.h| 2 +- xen/include/asm-x86/config.h | 10 +- xen/include/asm-x86/e820.h| 5 +- xen/include/asm-x86/edd.h | 1 - 20 files changed, 336 insertions(+), 189 deletions(-) smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 3/6] x86/boot: Split bootsym() into four types of relocations
From: David Woodhouse As a first step toward using the low-memory trampoline only when necessary for a legacy boot without no-real-mode, clean up the relocations into three separate groups. • bootsym() is now used only at boot time when no-real-mode isn't set. • bootdatasym() is for variables containing information discovered by the 16-bit boot code. This is currently accessed directly in place in low memory by Xen at runtime, but will be copied back to its location in high memory to avoid the pointer gymnastics (and because a subsequent patch will stop copying the 16-bit boot code into low memory at all when it isn't being used). • trampsym() is for the permanent 16-bit trampoline used for AP startup and for wake from sleep. This is not used at boot, and can be copied into (properly allocated) low memory once the system is running. • tramp32sym() is used both at boot and for AP startup/wakeup. During boot it can be used in-place, running from the physical address of the Xen image. For AP startup it can't, because at that point there isn't a full 1:1 mapping of all memory; only the low trampoline page is mapped. No (intentional) functional change yet; just a "cleanup" to allow the various parts to be treated separately in subsequent patches. Signed-off-by: David Woodhouse --- xen/arch/x86/boot/edd.S| 16 +++ xen/arch/x86/boot/head.S | 22 +++-- xen/arch/x86/boot/mem.S| 12 ++--- xen/arch/x86/boot/trampoline.S | 85 ++ xen/arch/x86/boot/video.S | 6 +-- xen/arch/x86/boot/wakeup.S | 12 ++--- xen/arch/x86/efi/efi-boot.h| 8 ++-- xen/arch/x86/xen.lds.S | 15 -- 8 files changed, 122 insertions(+), 54 deletions(-) diff --git a/xen/arch/x86/boot/edd.S b/xen/arch/x86/boot/edd.S index 3df712bce1..434bbbd960 100644 --- a/xen/arch/x86/boot/edd.S +++ b/xen/arch/x86/boot/edd.S @@ -41,7 +41,7 @@ get_edd: # This code is sensitive to the size of the structs in edd.h edd_start: /* ds:si points at fn48 results. Fn41 results go immediately before. */ -movw$bootsym(boot_edd_info)+EDDEXTSIZE, %si +movw$bootdatasym(boot_edd_info)+EDDEXTSIZE, %si movb$0x80, %dl # BIOS device 0x80 edd_check_ext: @@ -56,7 +56,7 @@ edd_check_ext: movb%dl, %ds:-8(%si)# store device number movb%ah, %ds:-7(%si)# store version movw%cx, %ds:-6(%si)# store extensions -incbbootsym(boot_edd_info_nr) # note that we stored something +incbbootdatasym(boot_edd_info_nr) # note that we stored something edd_get_device_params: movw$EDDPARMSIZE, %ds:(%si) # put size @@ -97,7 +97,7 @@ edd_legacy_done: edd_next: incb%dl # increment to next device jz edd_done -cmpb$EDD_INFO_MAX,bootsym(boot_edd_info_nr) +cmpb$EDD_INFO_MAX,bootdatasym(boot_edd_info_nr) jb edd_check_ext edd_done: @@ -108,11 +108,11 @@ edd_done: .Ledd_mbr_sig_start: pushw %es movb$0x80, %dl # from device 80 -movw$bootsym(boot_mbr_signature), %bx # store buffer ptr in bx +movw$bootdatasym(boot_mbr_signature), %bx # store buffer ptr in bx .Ledd_mbr_sig_read: pushw %bx -movw$bootsym(boot_edd_info), %bx -movzbw bootsym(boot_edd_info_nr), %cx +movw$bootdatasym(boot_edd_info), %bx +movzbw bootdatasym(boot_edd_info_nr), %cx jcxz.Ledd_mbr_sig_default .Ledd_mbr_sig_find_info: cmpb%dl, (%bx) @@ -151,12 +151,12 @@ edd_done: jne .Ledd_mbr_sig_next movb%dl, (%bx) # store BIOS drive number movl%ecx, 4(%bx)# store signature from MBR -incbbootsym(boot_mbr_signature_nr) # note that we stored something +incbbootdatasym(boot_mbr_signature_nr) # note that we stored something addw$8, %bx # increment sig buffer ptr .Ledd_mbr_sig_next: incb%dl # increment to next device jz .Ledd_mbr_sig_done -cmpb$EDD_MBR_SIG_MAX, bootsym(boot_mbr_signature_nr) +cmpb$EDD_MBR_SIG_MAX, bootdatasym(boot_mbr_signature_nr) jb .Ledd_mbr_sig_read .Ledd_mbr_sig_done: popw%es diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S index e3b42e3263..07621d1a30 100644 --- a/xen/arch/x86/boot/head.S +++ b/xen/arch/x86/boot/head.S @@ -707,14 +707,30 @@ trampoline_setup: cmp $sym_offs(__trampoline_rel_stop),%edi jb 1b -/* Patch in the trampoline segment. */ +mov $sym_offs(__trampoline32_rel_start),%edi +1: +mov %fs:(%edi),
[Xen-devel] [PATCH v2 4/6] x86/boot: Rename trampoline_{start, end} to boot_trampoline_{start, end}
From: David Woodhouse In preparation for splitting the boot and permanent trampolines from each other. Some of these will change back, but most are boot so do the plain search/replace that way first, then a subsequent patch will extract the permanent trampoline code. Signed-off-by: David Woodhouse --- xen/arch/x86/boot/head.S | 12 ++-- xen/arch/x86/boot/trampoline.S | 10 +- xen/arch/x86/boot/video.S | 4 ++-- xen/arch/x86/efi/efi-boot.h| 4 ++-- xen/arch/x86/setup.c | 4 ++-- xen/arch/x86/tboot.c | 6 +++--- xen/arch/x86/x86_64/mm.c | 2 +- xen/arch/x86/xen.lds.S | 6 +++--- xen/include/asm-x86/config.h | 6 +++--- 9 files changed, 27 insertions(+), 27 deletions(-) diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S index 07621d1a30..556dab127f 100644 --- a/xen/arch/x86/boot/head.S +++ b/xen/arch/x86/boot/head.S @@ -754,20 +754,20 @@ trampoline_setup: cmpb$0, sym_fs(skip_realmode) jz 1f /* If no-real-mode, jump straight to trampoline_protmode_entry */ -lea trampoline_protmode_entry-trampoline_start(%edi),%eax +lea trampoline_protmode_entry-boot_trampoline_start(%edi),%eax /* EBX == 0 indicates we are the BP (Boot Processor). */ xor %ebx,%ebx jmp 2f 1: /* Go via 16-bit code in trampoline_boot_cpu_entry */ -lea trampoline_boot_cpu_entry-trampoline_start(%edi),%eax +lea trampoline_boot_cpu_entry-boot_trampoline_start(%edi),%eax 2: pushl $BOOT_CS32 push%eax /* Copy bootstrap trampoline to low memory, below 1MB. */ -mov $sym_offs(trampoline_start),%esi -mov $((trampoline_end - trampoline_start) / 4),%ecx +mov $sym_offs(boot_trampoline_start),%esi +mov $((boot_trampoline_end - boot_trampoline_start) / 4),%ecx rep movsl %fs:(%esi),%es:(%edi) /* Jump into the relocated trampoline. */ @@ -779,8 +779,8 @@ cmdline_parse_early: reloc: #include "reloc.S" -ENTRY(trampoline_start) +ENTRY(boot_trampoline_start) #include "trampoline.S" -ENTRY(trampoline_end) +ENTRY(boot_trampoline_end) #include "x86_64.S" diff --git a/xen/arch/x86/boot/trampoline.S b/xen/arch/x86/boot/trampoline.S index 95a4bef553..26af9c6beb 100644 --- a/xen/arch/x86/boot/trampoline.S +++ b/xen/arch/x86/boot/trampoline.S @@ -31,7 +31,7 @@ *to be used for AP startup. */ #undef bootsym -#define bootsym(s) ((s)-trampoline_start) +#define bootsym(s) ((s)-boot_trampoline_start) #define bootsym_rel(sym, off, opnd...) \ bootsym(sym),##opnd; \ @@ -47,7 +47,7 @@ .long 111b - (off) - .;\ .popsection -#define bootdatasym(s) ((s)-trampoline_start) +#define bootdatasym(s) ((s)-boot_trampoline_start) #define bootdatasym_rel(sym, off, opnd...) \ bootdatasym(sym),##opnd; \ 111:; \ @@ -56,7 +56,7 @@ .popsection #undef trampsym -#define trampsym(s) ((s)-trampoline_start) +#define trampsym(s) ((s)-boot_trampoline_start) #define trampsym_rel(sym, off, opnd...)\ trampsym(sym),##opnd; \ @@ -66,7 +66,7 @@ .popsection #undef tramp32sym -#define tramp32sym(s) ((s)-trampoline_start) +#define tramp32sym(s) ((s)-boot_trampoline_start) #define tramp32sym_rel(sym, off, opnd...) \ tramp32sym(sym),##opnd;\ @@ -232,7 +232,7 @@ gdt_48: .word 6*8-1 /* The first page of trampoline is permanent, the rest boot-time only. */ /* Reuse the boot trampoline on the 1st trampoline page as stack for wakeup. */ -.equwakeup_stack, trampoline_start + PAGE_SIZE +.equwakeup_stack, boot_trampoline_start + PAGE_SIZE .global wakeup_stack /* From here on early boot only. */ diff --git a/xen/arch/x86/boot/video.S b/xen/arch/x86/boot/video.S index 03907e9e9a..5087c6a4d5 100644 --- a/xen/arch/x86/boot/video.S +++ b/xen/arch/x86/boot/video.S @@ -15,8 +15,8 @@ #include "video.h" -/* Scratch space layout: trampoline_end to trampoline_end+0x1000. */ -#define modelist bootsym(trampoline_end) /* 2kB (256 entries) */ +/* Scratch space layout: boot_trampoline_end to boot_trampoline_end+0x1000. */ +#define modelist bootsym(boot_trampoline_end) /* 2kB (256 entries) */ #define vesa_glob_info (modelist + 0x800)/* 1kB */ #define vesa_mode_info (vesa_glob_info + 0x400) /* 1kB */ diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h index 556942482e..fc2ea554b5 100644 --- a/xen/arch/x86/efi/efi-boot.h +++ b/xen/arch/x86/efi/efi-boot.h @@ -232,7 +232,7 @@ static void __init noreturn efi_arch_post_exit_boot(void) u64 cr4 = XEN_MINIMAL_CR4 & ~X86_CR4_PGE, efer; efi_arch_relocate_image(__XEN_VIRT_START - xen_phys_start); -memc
[Xen-devel] [PATCH v2 1/6] x86/boot: Remove gratuitous call back into low-memory code
From: David Woodhouse We appear to have implemented a memcpy() in the low-memory trampoline which we then call into from __start_xen(), for no adequately defined reason. Kill it with fire. Signed-off-by: David Woodhouse Acked-by: Andrew Cooper --- v2: Minor fixups from Andrew. xen/arch/x86/boot/mem.S| 27 +-- xen/arch/x86/setup.c | 10 ++ xen/include/asm-x86/e820.h | 5 ++--- 3 files changed, 17 insertions(+), 25 deletions(-) diff --git a/xen/arch/x86/boot/mem.S b/xen/arch/x86/boot/mem.S index c6a9bd4d3b..2d61d28835 100644 --- a/xen/arch/x86/boot/mem.S +++ b/xen/arch/x86/boot/mem.S @@ -7,7 +7,7 @@ get_memory_map: .Lmeme820: xorl%ebx, %ebx # continuation counter -movw$bootsym(e820map), %di # point into the whitelist +movw$bootsym(bios_e820map), %di # point into the whitelist # so we can have the bios # directly write into it. @@ -22,8 +22,8 @@ get_memory_map: cmpl$SMAP,%eax # check the return is `SMAP' jne .Lmem88 -incwbootsym(e820nr) -cmpw$E820_BIOS_MAX,bootsym(e820nr) # up to this many entries +incwbootsym(bios_e820nr) +cmpw$E820_BIOS_MAX,bootsym(bios_e820nr) # up to this many entries jae .Lmem88 movw%di,%ax @@ -66,27 +66,10 @@ get_memory_map: ret -/* - * Copy E820 map obtained from BIOS to a buffer allocated by Xen. - * Input: %rdi: target address of e820 entry array - *%esi: maximum number of entries to copy - * Output: %eax: number of entries copied - */ -.code64 -ENTRY(e820map_copy) -mov %esi, %eax -lea e820map(%rip), %rsi -mov e820nr(%rip), %ecx -cmp %ecx, %eax -cmova %ecx, %eax # number of entries to move -imul$5, %eax, %ecx -rep movsl # do the move -ret - .align 4 -e820map: +GLOBAL(bios_e820map) .fill E820_BIOS_MAX*20,1,0 -e820nr: +GLOBAL(bios_e820nr) .long 0 GLOBAL(lowmem_kb) .long 0 diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c index d2011910fa..decea2e77a 100644 --- a/xen/arch/x86/setup.c +++ b/xen/arch/x86/setup.c @@ -672,6 +672,16 @@ static char * __init cmdline_cook(char *p, const char *loader_name) return p; } +static unsigned int __init copy_bios_e820(struct e820entry *map, unsigned int limit) +{ +unsigned int n = min(bootsym(bios_e820nr), limit); + +if ( n ) +memcpy(map, bootsym(bios_e820map), sizeof(*map) * n); + +return n; +} + void __init noreturn __start_xen(unsigned long mbi_p) { char *memmap_type = NULL; diff --git a/xen/include/asm-x86/e820.h b/xen/include/asm-x86/e820.h index ee317b17aa..52916fb75d 100644 --- a/xen/include/asm-x86/e820.h +++ b/xen/include/asm-x86/e820.h @@ -37,8 +37,7 @@ extern struct e820map e820_raw; /* These symbols live in the boot trampoline. */ extern unsigned int lowmem_kb, highmem_kb; -unsigned int e820map_copy(struct e820entry *map, unsigned int limit); - -#define copy_bios_e820 bootsym(e820map_copy) +extern struct e820map bios_e820map[]; +extern unsigned int bios_e820nr; #endif /*__E820_HEADER*/ smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 6/6] x86/boot: Do not use trampoline for no-real-mode boot paths
From: David Woodhouse Where booted from EFI or with no-real-mode, there is no need to stomp on low memory with the 16-boot code. Instead, just go straight to trampoline_protmode_entry() at its physical location within the Xen image. For now, the boot code (including the EFI loader path) still determines what the trampoline_phys address should be. The trampoline is actually relocated for that address and copied into low memory, from a relocate_trampoline() call made from __start_xen(). For subsequent AP startup and wakeup, the 32-bit trampoline can't trivially be used in-place as that region isn't mapped. So copy it down to low memory too, having relocated it (again) to work from there. Signed-off-by: David Woodhouse --- xen/arch/x86/acpi/power.c | 6 +-- xen/arch/x86/boot/head.S | 67 -- xen/arch/x86/boot/trampoline.S | 32 xen/arch/x86/cpu/common.c | 2 +- xen/arch/x86/cpu/intel.c | 2 +- xen/arch/x86/efi/efi-boot.h| 31 ++-- xen/arch/x86/setup.c | 43 -- xen/arch/x86/smpboot.c | 6 +-- xen/arch/x86/tboot.c | 6 +-- xen/arch/x86/x86_64/mm.c | 2 +- xen/include/asm-x86/acpi.h | 2 +- xen/include/asm-x86/config.h | 10 ++--- 12 files changed, 118 insertions(+), 91 deletions(-) diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c index aecc754fdb..d2a355429e 100644 --- a/xen/arch/x86/acpi/power.c +++ b/xen/arch/x86/acpi/power.c @@ -152,9 +152,9 @@ static void acpi_sleep_prepare(u32 state) return; if ( acpi_sinfo.vector_width == 32 ) -*(uint32_t *)wakeup_vector_va = bootsym_phys(wakeup_start); +*(uint32_t *)wakeup_vector_va = trampsym_phys(wakeup_start); else -*(uint64_t *)wakeup_vector_va = bootsym_phys(wakeup_start); +*(uint64_t *)wakeup_vector_va = trampsym_phys(wakeup_start); } static void acpi_sleep_post(u32 state) {} @@ -388,7 +388,7 @@ static void tboot_sleep(u8 sleep_state) g_tboot_shared->acpi_sinfo.wakeup_vector = acpi_sinfo.wakeup_vector; g_tboot_shared->acpi_sinfo.vector_width = acpi_sinfo.vector_width; g_tboot_shared->acpi_sinfo.kernel_s3_resume_vector = - bootsym_phys(wakeup_start); + trampsym_phys(wakeup_start); switch ( sleep_state ) { diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S index 104eb0eb3c..2268ec99ed 100644 --- a/xen/arch/x86/boot/head.S +++ b/xen/arch/x86/boot/head.S @@ -697,16 +697,23 @@ trampoline_setup: lea __PAGE_HYPERVISOR+sym_esi(l1_identmap),%edi mov %edi,sym_fs(l2_bootmap) -/* Apply relocations to bootstrap trampoline. */ -mov sym_fs(trampoline_phys),%edx -mov $sym_offs(__trampoline_rel_start),%edi -1: -mov %fs:(%edi),%eax -add %edx,%fs:(%edi,%eax) -add $4,%edi -cmp $sym_offs(__trampoline_rel_stop),%edi -jb 1b +/* Do not parse command line on EFI platform here. */ +cmpb$0,sym_fs(efi_platform) +jnz 1f +/* Bail if there is no command line to parse. */ +mov sym_fs(multiboot_ptr),%ebx +testl $MBI_CMDLINE,MB_flags(%ebx) +jz 1f + +lea sym_esi(early_boot_opts),%eax +push%eax +pushl MB_cmdline(%ebx) +callcmdline_parse_early + +1: +/* Apply relocations to 32-bit trampoline for execution in place. */ +lea sym_esi(perm_trampoline_start),%edx mov $sym_offs(__trampoline32_rel_start),%edi 1: mov %fs:(%edi),%eax @@ -715,6 +722,21 @@ trampoline_setup: cmp $sym_offs(__trampoline32_rel_stop),%edi jb 1b +cmp $0,sym_esi(skip_realmode) +jz .Ldo_realmode + +/* Go directly to trampoline_protmode_entry at its physical address */ +lea trampoline_protmode_entry-__XEN_VIRT_START(%esi),%eax +pushl $BOOT_CS32 +push%eax + +/* EBX == 0 indicates we are the BP (Boot Processor). */ +xor %ebx,%ebx +retl + +.Ldo_realmode: +/* Apply relocations to 16-bit boot code. */ +mov sym_fs(trampoline_phys),%edx mov $sym_offs(__bootsym_rel_start),%edi 1: mov %fs:(%edi),%eax @@ -744,35 +766,12 @@ trampoline_setup: cmp $sym_offs(__bootdatasym_rel_stop),%edi jb 1b -/* Do not parse command line on EFI platform here. */ -cmpb$0,sym_fs(efi_platform) -jnz 1f - -/* Bail if there is no command line to parse. */ -mov sym_fs(multiboot_ptr),%ebx -testl $MBI_CMDLINE,MB_flags(%ebx) -jz 1f - -lea sym_esi(early_boot_opts),%eax -push%eax -pushl MB_cmdline(
[Xen-devel] [PATCH v2 5/6] x86/boot: Copy 16-bit boot variables back up to Xen image
From: David Woodhouse Ditch the bootsym() access from C code for the variables populated by 16-bit boot code. As well as being cleaner this also paves the way for not having the 16-bit boot code in low memory for no-real-mode or EFI loader boots at all. Signed-off-by: David Woodhouse --- xen/arch/x86/boot/edd.S | 2 ++ xen/arch/x86/boot/head.S | 16 +++ xen/arch/x86/boot/mem.S | 2 ++ xen/arch/x86/boot/trampoline.S| 33 --- xen/arch/x86/boot/video.S | 30 +++- xen/arch/x86/platform_hypercall.c | 18 - xen/arch/x86/setup.c | 22 ++--- xen/arch/x86/xen.lds.S| 8 +++- xen/include/asm-x86/edd.h | 1 - 9 files changed, 93 insertions(+), 39 deletions(-) diff --git a/xen/arch/x86/boot/edd.S b/xen/arch/x86/boot/edd.S index 434bbbd960..138d04c964 100644 --- a/xen/arch/x86/boot/edd.S +++ b/xen/arch/x86/boot/edd.S @@ -163,6 +163,7 @@ edd_done: .Ledd_mbr_sig_skip: ret +.pushsection .data.boot16, "aw", @progbits GLOBAL(boot_edd_info_nr) .byte 0 GLOBAL(boot_mbr_signature_nr) @@ -171,3 +172,4 @@ GLOBAL(boot_mbr_signature) .fill EDD_MBR_SIG_MAX*8,1,0 GLOBAL(boot_edd_info) .fill EDD_INFO_MAX * (EDDEXTSIZE + EDDPARMSIZE), 1, 0 +.popsection diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S index 556dab127f..104eb0eb3c 100644 --- a/xen/arch/x86/boot/head.S +++ b/xen/arch/x86/boot/head.S @@ -733,6 +733,17 @@ trampoline_setup: cmp $sym_offs(__bootsym_seg_stop),%edi jb 1b +/* Relocations for the boot data section. */ +mov sym_fs(trampoline_phys),%edx +add $(boot_trampoline_end - boot_trampoline_start),%edx +mov $sym_offs(__bootdatasym_rel_start),%edi +1: +mov %fs:(%edi),%eax +add %edx,%fs:(%edi,%eax) +add $4,%edi +cmp $sym_offs(__bootdatasym_rel_stop),%edi +jb 1b + /* Do not parse command line on EFI platform here. */ cmpb$0,sym_fs(efi_platform) jnz 1f @@ -770,6 +781,11 @@ trampoline_setup: mov $((boot_trampoline_end - boot_trampoline_start) / 4),%ecx rep movsl %fs:(%esi),%es:(%edi) +/* Copy boot data template to low memory. */ +mov $sym_offs(bootdata_start),%esi +mov $((bootdata_end - bootdata_start + 3) / 4),%ecx +rep movsl %fs:(%esi),%es:(%edi) + /* Jump into the relocated trampoline. */ lret diff --git a/xen/arch/x86/boot/mem.S b/xen/arch/x86/boot/mem.S index aa39608442..86f0fa9af7 100644 --- a/xen/arch/x86/boot/mem.S +++ b/xen/arch/x86/boot/mem.S @@ -67,6 +67,7 @@ get_memory_map: ret .align 4 +.pushsection .data.boot16, "aw", @progbits GLOBAL(bios_e820map) .fill E820_BIOS_MAX*20,1,0 GLOBAL(bios_e820nr) @@ -75,3 +76,4 @@ GLOBAL(lowmem_kb) .long 0 GLOBAL(highmem_kb) .long 0 + .popsection diff --git a/xen/arch/x86/boot/trampoline.S b/xen/arch/x86/boot/trampoline.S index 26af9c6beb..b5517a44bb 100644 --- a/xen/arch/x86/boot/trampoline.S +++ b/xen/arch/x86/boot/trampoline.S @@ -47,11 +47,15 @@ .long 111b - (off) - .;\ .popsection -#define bootdatasym(s) ((s)-boot_trampoline_start) +.pushsection .data.boot16, "aw", @progbits +GLOBAL(bootdata_start) +.popsection + +#define bootdatasym(s) ((s)-bootdata_start+(boot_trampoline_end-boot_trampoline_start)) #define bootdatasym_rel(sym, off, opnd...) \ bootdatasym(sym),##opnd; \ 111:; \ -.pushsection .bootdatasym_rel, "a";\ +.pushsection .bootsym_rel, "a";\ .long 111b - (off) - .;\ .popsection @@ -227,7 +231,7 @@ start64: .word 0 idt_48: .word 0, 0, 0 # base = limit = 0 .word 0 -gdt_48: .word 6*8-1 +gdt_48: .word 7*8-1 .long tramp32sym_rel(trampoline_gdt,4) /* The first page of trampoline is permanent, the rest boot-time only. */ @@ -318,6 +322,23 @@ trampoline_boot_cpu_entry: mov %eax,%gs mov %eax,%ss +/* + * Copy locally-gathered data back up into the Xen physical image + */ +mov $BOOT_FS,%eax +mov %eax,%es + +mov $sym_offs(bootdata_end),%ecx +mov $sym_offs(bootdata_start),%edi +sub %edi,%ecx +mov $bootdatasym_rel(bootdata_start,4,%esi) +rep movsb %ds:(%esi),%es:(%edi) + +/* + * %es still points to BOOT_FS but trampoline_protmode_entry + * reloads it anyway. + */ + /* EBX == 0 indicates we are the BP (Boot Processor). */ xor %ebx,%ebx @@ -345,8 +366,10 @@ vesa_size: .word
Re: [Xen-devel] [PATCH v2 4/6] x86/boot: Rename trampoline_{start, end} to boot_trampoline_{start, end}
On Mon, 2019-08-12 at 11:55 +0200, Jan Beulich wrote: > On 09.08.2019 17:02, David Woodhouse wrote: > > From: David Woodhouse > > > > In preparation for splitting the boot and permanent trampolines from > > each other. Some of these will change back, but most are boot so do the > > plain search/replace that way first, then a subsequent patch will extract > > the permanent trampoline code. > > To be honest I don't view it as helpful to do things in this order. > If you first re-arranged the ordering of items within the trampoline, > we'd then not end up with an intermediate state where the labels are > misleading. Is there a reason things can't sensibly be done the other > way around? Obviously I did all this in a working tree first, swore at it a lot and finally got it working, then attempted to split it up into separate meaningful commits which individually made sense. There is plenty of room for subjectivity in the choices I made in that last step. I'm not sure I quite see why you say the labels are misleading. My intent was to apply labels based on what each object is *used* for, despite the fact that to start with they're all actually in the same place. And then to actually move each different type of symbol into its separate section/location to clean things up. Is it just the code comments at the start of trampoline.S that you find misleading in the interim stage? Because those *don't* purely talk about what bootsym/bootdatasym/trampsym/tramp32sym are used for; they do say how they are (eventually) relocated. I suppose I could rip that code comment out of patch #3 completely and add it again in a later commit... or just just add it again. I write code comments in an attempt to be helpful to those who come after me (especially when that's actually myself) but if they're going to cause problems, then maybe they're more hassle than they're worth? smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 3/6] x86/boot: Split bootsym() into four types of relocations
Apologies for delayed response; I was away last week and was frowned at every time I so much as looked towards the laptop. On Mon, 2019-08-12 at 11:41 +0200, Jan Beulich wrote: > On 09.08.2019 17:01, David Woodhouse wrote: > > --- a/xen/arch/x86/boot/trampoline.S > > +++ b/xen/arch/x86/boot/trampoline.S > > @@ -16,21 +16,62 @@ > >* not guaranteed to persist. > >*/ > > > > -/* NB. bootsym() is only usable in real mode, or via BOOT_PSEUDORM_DS. */ > > +/* > > + * There are four sets of relocations: > > + * > > + * bootsym(): Boot-time code relocated to low memory and run only once. > > + *Only usable at boot, in real mode or via > > BOOT_PSEUDORM_DS. > > I'm not a native speaker, so my viewing this as ambiguous may be wrong, > but to me it reads as "Only usable at boot or in real mode or via > BOOT_PSEUDORM_DS" when aiui it ought to be "Only usable at boot AND (in > real mode OR via BOOT_PSEUDORM_DS)". In which case how about "Only usable > at boot from real mode or via BOOT_PSEUDORM_DS"? Yes, I suppose I see that ambiguity. But ultimately I file it under the category of "don't hack boot code while drunk". I agree that to the reader of English (native or otherwise), that sentence may have either meaning. But this isn't documentation; it's code comments in an assembler file. Anyone who is actually going to make a meaningful contribution to the boot code, might reasonably be expected to understand that "real mode or using the pseudo-realmode segment selector" are grouped together, and that they must use one or the other of those. At boot time. This is not an attempt at a two-line tutorial on all the pitfalls of touching the boot code/data; via bootsym() or otherwise. It's just a pointer in the right direction. But sure, I'll have a look at fixing it — if I don't feel that what I come up with is too clumsy. > > + * bootdatasym(): Boot-time BIOS-discovered data, relocated back up to Xen > > + *image after discovery. > > + * trampsym():Trampoline code relocated into low memory for AP startup > > + *and wakeup. > > + * tramp32sym(): 32-bit trampoline code which at boot can be used directly > > + *from the Xen image in memory, but which will need to be > > + *relocated into low (well, into *mapped*) memory in order > > + *to be used for AP startup. > > + */ > > #undef bootsym > > #define bootsym(s) ((s)-trampoline_start) > > > > #define bootsym_rel(sym, off, opnd...) \ > > bootsym(sym),##opnd; \ > > 111:; \ > > -.pushsection .trampoline_rel, "a"; \ > > +.pushsection .bootsym_rel, "a";\ > > .long 111b - (off) - .;\ > > .popsection > > > > #define bootsym_segrel(sym, off) \ > > $0,$bootsym(sym); \ > > 111:; \ > > -.pushsection .trampoline_seg, "a"; \ > > +.pushsection .bootsym_seg, "a";\ > > +.long 111b - (off) - .;\ > > +.popsection > > + > > +#define bootdatasym(s) ((s)-trampoline_start) > > +#define bootdatasym_rel(sym, off, opnd...) \ > > +bootdatasym(sym),##opnd; \ > > +111:; \ > > +.pushsection .bootdatasym_rel, "a";\ > > +.long 111b - (off) - .;\ > > +.popsection > > + > > +#undef trampsym > > +#define trampsym(s) ((s)-trampoline_start) > > + > > +#define trampsym_rel(sym, off, opnd...)\ > > +trampsym(sym),##opnd; \ > > +111:; \ > > +.pushsection .trampsym_rel, "a"; \ > > +.long 111b - (off) - .;\ > > +.popsection > > + > > +#undef tramp32sym > > +#define tramp32sym(s) ((s)-trampoline_start) > > + > > +#define tramp32sym_rel(sym, off, opnd...) \ > > +tramp32sym(sym),##opnd;\ > > +111:; \ > > +.pushsection .tramp32sym_rel, "a"; \ > > .long 111b - (off) - .;\ > > .popsection > > This repeats the basically same sequence of things several times. > I've not peeked ahead yet to see in how far more differences would > appear later on, but I don't really expect much of a fu
Re: [Xen-devel] [PATCH v2 5/6] x86/boot: Copy 16-bit boot variables back up to Xen image
On Mon, 2019-08-12 at 12:24 +0200, Jan Beulich wrote: > On 09.08.2019 17:02, David Woodhouse wrote: > > --- a/xen/arch/x86/boot/head.S > > +++ b/xen/arch/x86/boot/head.S > > @@ -733,6 +733,17 @@ trampoline_setup: > > cmp $sym_offs(__bootsym_seg_stop),%edi > > jb 1b > > > > +/* Relocations for the boot data section. */ > > +mov sym_fs(trampoline_phys),%edx > > +add $(boot_trampoline_end - boot_trampoline_start),%edx > > +mov $sym_offs(__bootdatasym_rel_start),%edi > > +1: > > +mov %fs:(%edi),%eax > > +add %edx,%fs:(%edi,%eax) > > +add $4,%edi > > +cmp $sym_offs(__bootdatasym_rel_stop),%edi > > +jb 1b > > + > > /* Do not parse command line on EFI platform here. */ > > cmpb$0,sym_fs(efi_platform) > > jnz 1f > > @@ -770,6 +781,11 @@ trampoline_setup: > > mov $((boot_trampoline_end - boot_trampoline_start) / 4),%ecx > > rep movsl %fs:(%esi),%es:(%edi) > > > > +/* Copy boot data template to low memory. */ > > +mov $sym_offs(bootdata_start),%esi > > +mov $((bootdata_end - bootdata_start + 3) / 4),%ecx > > +rep movsl %fs:(%esi),%es:(%edi) > > The new data arrangement should be described in the commit message. > Also just like for the trampoline copying I think it would be better > if you suitable aligned bootdata_start and bootdata_end, such that > you wouldn't need to add 3 here before dividing by 4. Ack. > > @@ -227,7 +231,7 @@ start64: > > .word 0 > > idt_48: .word 0, 0, 0 # base = limit = 0 > > .word 0 > > -gdt_48: .word 6*8-1 > > +gdt_48: .word 7*8-1 > > .long tramp32sym_rel(trampoline_gdt,4) > > You don't grow trampoline_gdt here, so I think this change is > wrong. And if a change was needed at all (perhaps in the next > patch), then I think it would be better to replace the use of > literal numbers, using the difference of two labels instead > (the "end" lable preferably being a .L-prefixed one). I don't grow it but... count it ☺. I do start using sym_fs() here in places that it wasn't before, so the incorrect size started to *matter* because the BOOT_FS selector wasn't included in the limit. I will make sure I explicitly comment on that in the commit message; no need for a code comment to explain why the limit actually *does* match the size of the table. > > --- a/xen/arch/x86/boot/video.S > > +++ b/xen/arch/x86/boot/video.S > > @@ -15,10 +15,10 @@ > > > > #include "video.h" > > > > -/* Scratch space layout: boot_trampoline_end to > > boot_trampoline_end+0x1000. */ > > -#define modelist bootsym(boot_trampoline_end) /* 2kB (256 entries) > > */ > > -#define vesa_glob_info (modelist + 0x800)/* 1kB */ > > -#define vesa_mode_info (vesa_glob_info + 0x400) /* 1kB */ > > +/* Scratch space layout: bootdata_end to bootdata_end+0x1000. */ > > +#define modelist(t) bootdatasym_rel(bootdata_end,2,t) /* > > 2KiB (256 entries) */ > > +#define vesa_glob_info(t) bootdatasym_rel((bootdata_end+0x800),2,t) /* > > 1KiB */ > > +#define vesa_mode_info(t) bootdatasym_rel((bootdata_end+0xc00),2,t) /* > > 1KiB */ > > > > /* Retrieve Extended Display Identification Data. */ > > #define CONFIG_FIRMWARE_EDID > > @@ -113,7 +113,7 @@ mopar2: movb%al, _param(PARAM_VIDEO_LINES) > > > > # Fetching of VESA frame buffer parameters > > mopar_gr: > > -leawvesa_mode_info, %di > > +leawvesa_mode_info(%di) > > Just as a note, as I can't really see how to improve the situation: > The embedding of the relocation offset (2) in the macros is making > this code even more fragile, as they're now not usable anymore in > an arbitrary way (consider e.g. their use for the memory operand if > an insn which also requires an immediate). I think you want to at > least warn about this restriction in the comment above. Yeah. I file that one under "don't touch the VESA code unless you want your brain to dribble out of your ears". Which was basically true before I touched it too, in my defence ☺. > > @@ -291,6 +293,10 @@ SECTIONS > > DECL_SECTION(.data) { > > *(.data.page_aligned) > > *(.data) > > + . = ALIGN(16); > > + __bootdata_start = .; > > + *(.data.boot16) > > + __bootdata_end = .; > > Why 16-byte
Re: [Xen-devel] [PATCH v2 6/6] x86/boot: Do not use trampoline for no-real-mode boot paths
On Mon, 2019-08-12 at 12:55 +0200, Jan Beulich wrote: > On 09.08.2019 17:02, David Woodhouse wrote: > > From: David Woodhouse > > > > Where booted from EFI or with no-real-mode, there is no need to stomp > > on low memory with the 16-boot code. Instead, just go straight to > > trampoline_protmode_entry() at its physical location within the Xen > > image. > > > > For now, the boot code (including the EFI loader path) still determines > > what the trampoline_phys address should be. The trampoline is actually > > relocated for that address and copied into low memory, from a > > relocate_trampoline() call made from __start_xen(). > > I assume this talks about the real mode part of the trampoline, as > opposed to the next paragraph? Would be nice if you made this > explicit. This is the permanent real-mode trampoline used for AP startup and wakeup, not the real-mode boot code (which the boot code has to have put there for itself it it wanted it). I will try to make the commit message clearer; thanks for pointing it out. > > For subsequent AP startup and wakeup, the 32-bit trampoline can't > > trivially be used in-place as that region isn't mapped. So copy it > > down to low memory too, having relocated it (again) to work from > > there. > > trampoline_protmode_entry gets entered with CR0.PG=0, i.e. at > that point there's not even the question yet of there being a > mapping. Subsequently idle_pg_table gets loaded into CR3. I wonder > if, rather than relocating the 32-bit part of the trampoline, it > wouldn't be better to install a 1:1 mapping into idle_pg_table. > Such a mapping would need to have the G bits clear in order to > not conflict with PV guest mappings of the same linear addresses. Yeah, I tried making that happen. It made me sad. This seemed to be simpler and less fragile. > > --- a/xen/arch/x86/acpi/power.c > > +++ b/xen/arch/x86/acpi/power.c > > @@ -152,9 +152,9 @@ static void acpi_sleep_prepare(u32 state) > > return; > > > > if ( acpi_sinfo.vector_width == 32 ) > > -*(uint32_t *)wakeup_vector_va = bootsym_phys(wakeup_start); > > +*(uint32_t *)wakeup_vector_va = trampsym_phys(wakeup_start); > > else > > -*(uint64_t *)wakeup_vector_va = bootsym_phys(wakeup_start); > > +*(uint64_t *)wakeup_vector_va = trampsym_phys(wakeup_start); > > } > > > > static void acpi_sleep_post(u32 state) {} > > @@ -388,7 +388,7 @@ static void tboot_sleep(u8 sleep_state) > > g_tboot_shared->acpi_sinfo.wakeup_vector = acpi_sinfo.wakeup_vector; > > g_tboot_shared->acpi_sinfo.vector_width = acpi_sinfo.vector_width; > > g_tboot_shared->acpi_sinfo.kernel_s3_resume_vector = > > - bootsym_phys(wakeup_start); > > + trampsym_phys(wakeup_start); > > Shouldn't changes like these have happened earlier, when you > introduce the (logical only at that point) distinction between > trampoline pieces? That was in assembler code. This is C, which never had to be that involved with the distinction. But now that all the dust has settled, I'm making it consistent, using 'trampsym' for stuff in the permanent trampoline just like the asm code does. > > @@ -97,7 +100,7 @@ GLOBAL(trampoline_realmode_entry) > > cld > > cli > > lidttrampsym(idt_48) > > -lgdttrampsym(gdt_48) > > +lgdtl trampsym(gdt_48) > > Stray / unrelated change (and if needed, then also for lidt)? The difference between 16bit l.dt and 32-bit l.dtl is that the former only loads 24 bits of the actual table address (trampoline_gdt in this case). Thus, when trampoline_gdt is being used in-place, as it is during early boot, and *if* the Xen image is loaded higher than 16MiB, lgdt doesn't work. That's half a day of my life I want back. It doesn't matter for lidt because we're just loading an empty limit and pointer there, and we don't care about bits 24-31 of a zero value. > > @@ -236,11 +239,23 @@ gdt_48: .word 7*8-1 > > > > /* The first page of trampoline is permanent, the rest boot-time only. */ > > /* Reuse the boot trampoline on the 1st trampoline page as stack for > > wakeup. */ > > -.equwakeup_stack, boot_trampoline_start + PAGE_SIZE > > +.equwakeup_stack, perm_trampoline_start + PAGE_SIZE > > .global wakeup_stack > > > > +ENTRY(perm_trampoline_end) > > + > > /* From here on early boot only. */ > > > > +ENTRY(b
Re: [Xen-devel] [PATCH] x86/boot: Reposition trampoline data
On Mon, 2019-08-19 at 14:42 +0100, Andrew Cooper wrote: > ... to separate code from data. In particular, > trampoline_realmode_entry's > write to trampoline_cpu_started clobbers the I-cache line containing > trampoline_protmode_entry, which won't be great for AP startup. > > Reformat the comments for trampoline_gdt to reduce their volume. > > No functional change. Please, let's not do this one until my other boot cleanups have landed. It just hurts. I have also reordered some of these for functional reasons, because they are used in different contexts (and end up in completely different trampolines). > Signed-off-by: Andrew Cooper > --- > CC: Jan Beulich > CC: Wei Liu > CC: Roger Pau Monné > --- > xen/arch/x86/boot/trampoline.S | 67 ++ > > 1 file changed, 28 insertions(+), 39 deletions(-) > > diff --git a/xen/arch/x86/boot/trampoline.S > b/xen/arch/x86/boot/trampoline.S > index 1761fc1213..1b11b4757a 100644 > --- a/xen/arch/x86/boot/trampoline.S > +++ b/xen/arch/x86/boot/trampoline.S > @@ -59,45 +59,6 @@ GLOBAL(trampoline_realmode_entry) > lmsw%ax # CR0.PE = 1 (enter > protected mode) > ljmpl $BOOT_CS32,$bootsym_rel(trampoline_protmode_entry,6) > > -trampoline_gdt: > -/* 0x: unused */ > -.quad 0x > -/* 0x0008: ring 0 code, 32-bit mode */ > -.quad 0x00cf9b00 > -/* 0x0010: ring 0 code, 64-bit mode */ > -.quad 0x00af9b00 > -/* 0x0018: ring 0 data */ > -.quad 0x00cf9300 > -/* 0x0020: real-mode code @ BOOT_TRAMPOLINE */ > -.long 0x > -.long 0x9b00 > -/* 0x0028: real-mode data @ BOOT_TRAMPOLINE */ > -.long 0x > -.long 0x9300 > -/* > - * 0x0030: ring 0 Xen data, 16 MiB size, base > - * address is computed at runtime. > - */ > -.quad 0x00c093000fff > -.Ltramopline_gdt_end: > - > -.pushsection .trampoline_rel, "a" > -.long trampoline_gdt + BOOT_PSEUDORM_CS + 2 - . > -.long trampoline_gdt + BOOT_PSEUDORM_DS + 2 - . > -.popsection > - > -GLOBAL(trampoline_misc_enable_off) > -.quad 0 > - > -GLOBAL(cpuid_ext_features) > -.long 0 > - > -GLOBAL(trampoline_xen_phys_start) > -.long 0 > - > -GLOBAL(trampoline_cpu_started) > -.byte 0 > - > .code32 > trampoline_protmode_entry: > /* Set up a few descriptors: on entry only CS is guaranteed > good. */ > @@ -186,6 +147,34 @@ idt_48: .word 0, 0, 0 # base = limit = 0 > gdt_48: .word .Ltramopline_gdt_end - trampoline_gdt - 1 > .long bootsym_rel(trampoline_gdt,4) > > +trampoline_gdt: > +.quad 0x /* 0x: unused */ > +.quad 0x00cf9b00 /* 0x0008: ring 0 code, 32-bit > mode */ > +.quad 0x00af9b00 /* 0x0010: ring 0 code, 64-bit > mode */ > +.quad 0x00cf9300 /* 0x0018: ring 0 data */ > +.quad 0x9b00 /* 0x0020: real-mode code @ > BOOT_TRAMPOLINE */ > +.quad 0x9300 /* 0x0028: real-mode data @ > BOOT_TRAMPOLINE */ > +.quad 0x00c093000fff /* 0x0030: ring 0 Xen data, 16M @ > XEN */ > +.Ltramopline_gdt_end: > + > +/* Relocations for trampoline Real Mode segments. */ > +.pushsection .trampoline_rel, "a" > +.long trampoline_gdt + BOOT_PSEUDORM_CS + 2 - . > +.long trampoline_gdt + BOOT_PSEUDORM_DS + 2 - . > +.popsection > + > +GLOBAL(trampoline_misc_enable_off) > +.quad 0 > + > +GLOBAL(cpuid_ext_features) > +.long 0 > + > +GLOBAL(trampoline_xen_phys_start) > +.long 0 > + > +GLOBAL(trampoline_cpu_started) > +.byte 0 > + > /* The first page of trampoline is permanent, the rest boot-time > only. */ > /* Reuse the boot trampoline on the 1st trampoline page as stack for > wakeup. */ > .equwakeup_stack, trampoline_start + PAGE_SIZE smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 2/6] x86/boot: Only jump into low trampoline code for real-mode boot
On Mon, 2019-08-12 at 11:10 +0200, Jan Beulich wrote: > On 09.08.2019 17:01, David Woodhouse wrote: > > --- a/xen/arch/x86/boot/head.S > > +++ b/xen/arch/x86/boot/head.S > > @@ -735,7 +735,17 @@ trampoline_setup: > > /* Switch to low-memory stack which lives at the end of trampoline > > region. */ > > mov sym_fs(trampoline_phys),%edi > > lea TRAMPOLINE_SPACE+TRAMPOLINE_STACK_SPACE(%edi),%esp > > +cmpb$0, sym_fs(skip_realmode) > > +jz 1f > > +/* If no-real-mode, jump straight to trampoline_protmode_entry */ > > +lea trampoline_protmode_entry-trampoline_start(%edi),%eax > > +/* EBX == 0 indicates we are the BP (Boot Processor). */ > > +xor %ebx,%ebx > > +jmp 2f > > +1: > > +/* Go via 16-bit code in trampoline_boot_cpu_entry */ > > lea trampoline_boot_cpu_entry-trampoline_start(%edi),%eax > > +2: > > pushl $BOOT_CS32 > > push%eax > > May I suggest to slightly streamline this into > > /* Switch to low-memory stack which lives at the end of trampoline > region. */ > mov sym_fs(trampoline_phys),%edi > lea TRAMPOLINE_SPACE+TRAMPOLINE_STACK_SPACE(%edi),%esp > /* Go via 16-bit code in trampoline_boot_cpu_entry */ > lea trampoline_boot_cpu_entry-trampoline_start(%edi),%eax > cmpb$0,sym_fs(skip_realmode) > je 1f > /* If no-real-mode, jump straight to trampoline_protmode_entry */ > lea trampoline_protmode_entry-trampoline_start(%edi),%eax > /* EBX == 0 indicates we are the BP (Boot Processor). */ > xor %ebx,%ebx > 1: > pushl $BOOT_CS32 > push%eax > > perhaps with the second slightly adapted to it now being an override > rather than an alternative path? It's a *temporary* alternative path, and it's gone away by the end of the series. Obviously we do insist on each interim commit being buildable and working. I kind of draw the line at optimising the asm for each intermediate step :) > Additionally I think it would be nice if the clearing of %ebx wasn't > replicated here and ... > > > --- a/xen/arch/x86/boot/trampoline.S > > +++ b/xen/arch/x86/boot/trampoline.S > > @@ -194,9 +194,6 @@ gdt_48: .word 6*8-1 > > > > .code32 > > trampoline_boot_cpu_entry: > > -cmpb$0,bootsym_rel(skip_realmode,5) > > -jnz .Lskip_realmode > > - > > /* Load pseudo-real-mode segments. */ > > mov $BOOT_PSEUDORM_DS,%eax > > mov %eax,%ds > > @@ -276,7 +273,6 @@ trampoline_boot_cpu_entry: > > mov %eax,%gs > > mov %eax,%ss > > > > -.Lskip_realmode: > > /* EBX == 0 indicates we are the BP (Boot Processor). */ > > xor %ebx,%ebx > > ... here. Why don't you further do > > .code32 > trampoline_protmode_entry_bsp: > /* EBX == 0 indicates we are the BSP (Boot Strap Processor). > */ > xor %ebx,%ebx > trampoline_protmode_entry: > > directing the BSP paths to the new label? Yeah, I kind of see your point. But that gives us one entry point which clears %ebx... and one which doesn't, so you still have to make sure it's not already zero for the AP startup. If we ended up with two simple entry points that didn't care about %ebx for trampoline_protmode_entry_bsp and trampoline_protmode_entry_ap then that'd be nice and simple — but I don't like the inconsistency. I think I prefer having to set %ebx explicitly in all three separate callers, over having one entry point that requires it and another that doesn't. smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 2/5] x86/boot: Split bootsym() into four types of relocations
From: David Woodhouse As a first step toward using the low-memory trampoline only when necessary for a legacy boot without no-real-mode, clean up the relocations into three separate groups. • bootsym() is now used only at boot time when no-real-mode isn't set. • bootdatasym() is for variables containing information discovered by the 16-bit boot code. This is currently accessed directly in place in low memory by Xen at runtime, but will be copied back to its location in high memory to avoid the pointer gymnastics (and because a subsequent patch will stop copying the 16-bit boot code into low memory at all when it isn't being used). • trampsym() is for the permanent 16-bit trampoline used for AP startup and for wake from sleep. This is not used at boot, and can be copied into (properly allocated) low memory once the system is running. • tramp32sym() is used both at boot and for AP startup/wakeup. During boot it can be used in-place, running from the physical address of the Xen image. For AP startup it can't, because at that point there isn't a full 1:1 mapping of all memory; only the low trampoline page is mapped. No (intentional) functional change yet; just a "cleanup" to allow the various parts to be treated separately in subsequent patches. Signed-off-by: David Woodhouse --- xen/arch/x86/boot/edd.S| 16 +++ xen/arch/x86/boot/head.S | 22 +++-- xen/arch/x86/boot/mem.S| 12 ++--- xen/arch/x86/boot/trampoline.S | 85 ++ xen/arch/x86/boot/video.S | 6 +-- xen/arch/x86/boot/wakeup.S | 16 +++ xen/arch/x86/efi/efi-boot.h| 8 ++-- xen/arch/x86/xen.lds.S | 15 -- 8 files changed, 124 insertions(+), 56 deletions(-) diff --git a/xen/arch/x86/boot/edd.S b/xen/arch/x86/boot/edd.S index 3df712bce1..434bbbd960 100644 --- a/xen/arch/x86/boot/edd.S +++ b/xen/arch/x86/boot/edd.S @@ -41,7 +41,7 @@ get_edd: # This code is sensitive to the size of the structs in edd.h edd_start: /* ds:si points at fn48 results. Fn41 results go immediately before. */ -movw$bootsym(boot_edd_info)+EDDEXTSIZE, %si +movw$bootdatasym(boot_edd_info)+EDDEXTSIZE, %si movb$0x80, %dl # BIOS device 0x80 edd_check_ext: @@ -56,7 +56,7 @@ edd_check_ext: movb%dl, %ds:-8(%si)# store device number movb%ah, %ds:-7(%si)# store version movw%cx, %ds:-6(%si)# store extensions -incbbootsym(boot_edd_info_nr) # note that we stored something +incbbootdatasym(boot_edd_info_nr) # note that we stored something edd_get_device_params: movw$EDDPARMSIZE, %ds:(%si) # put size @@ -97,7 +97,7 @@ edd_legacy_done: edd_next: incb%dl # increment to next device jz edd_done -cmpb$EDD_INFO_MAX,bootsym(boot_edd_info_nr) +cmpb$EDD_INFO_MAX,bootdatasym(boot_edd_info_nr) jb edd_check_ext edd_done: @@ -108,11 +108,11 @@ edd_done: .Ledd_mbr_sig_start: pushw %es movb$0x80, %dl # from device 80 -movw$bootsym(boot_mbr_signature), %bx # store buffer ptr in bx +movw$bootdatasym(boot_mbr_signature), %bx # store buffer ptr in bx .Ledd_mbr_sig_read: pushw %bx -movw$bootsym(boot_edd_info), %bx -movzbw bootsym(boot_edd_info_nr), %cx +movw$bootdatasym(boot_edd_info), %bx +movzbw bootdatasym(boot_edd_info_nr), %cx jcxz.Ledd_mbr_sig_default .Ledd_mbr_sig_find_info: cmpb%dl, (%bx) @@ -151,12 +151,12 @@ edd_done: jne .Ledd_mbr_sig_next movb%dl, (%bx) # store BIOS drive number movl%ecx, 4(%bx)# store signature from MBR -incbbootsym(boot_mbr_signature_nr) # note that we stored something +incbbootdatasym(boot_mbr_signature_nr) # note that we stored something addw$8, %bx # increment sig buffer ptr .Ledd_mbr_sig_next: incb%dl # increment to next device jz .Ledd_mbr_sig_done -cmpb$EDD_MBR_SIG_MAX, bootsym(boot_mbr_signature_nr) +cmpb$EDD_MBR_SIG_MAX, bootdatasym(boot_mbr_signature_nr) jb .Ledd_mbr_sig_read .Ledd_mbr_sig_done: popw%es diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S index d303379083..e19b31fb85 100644 --- a/xen/arch/x86/boot/head.S +++ b/xen/arch/x86/boot/head.S @@ -699,14 +699,30 @@ trampoline_setup: cmp $sym_offs(__trampoline_rel_stop),%edi jb 1b -/* Patch in the trampoline segment. */ +mov $sym_offs(__trampoline32_rel_start),%edi +1: +mov %fs:(%e
[Xen-devel] [PATCH v3 0/5] Clean up x86_64 boot code
Some cleanups for the boot path, originally inspired by an attempt to avoid scribbling on arbitrarily-chosen low memory. In the no-real-mode case we don't need to bounce through low memory at all; we can run the 32-bit trampoline in-place in the Xen image. The variables containing information which is optionally discovered by the real-mode boot code can be put back in place in the Xen image and we can dispense with the bootsym() pointer gymnastics in C code which access them in low memory. I haven't yet got to reloc(), which I think exists only to ensure that the various breadcrumbs left all over the place by the Multiboot bootloader aren't scribbled on when we copy the 16-bit boot trampoline into low memory. I'd quite like to kill reloc() and pass the original pointer up to 64-bit code to be handled in C. That would require finding a *safe* location to put the 16-bit boot trampoline though, which doesn't already contain anything that the bootloader created for us. In fact, isn't there already a chance that head.S will choose a location for the trampoline which is already part of a module or contains one of the Multiboot breadcrumbs? https://xenbits.xen.org/gitweb/?p=people/dwmw2/xen.git;a=shortlog;h=refs/heads/bootcleanup v2: Patch #1 of the first series is already merged. Fold in minor fixup from Andrew to what is now patch #1. Verbally agree to overcome Jan's objections to patch #1, in Chicago. v3: Another patch merged from v2 posting. And then there were five. Update to staging branch, especially commit c3cfa5b30 ("x86: Restore IA32_MISC_ENABLE on wakeup"). Minor bikeshedding and commit comment improvements. Ignore Andy's conflicting cleanups except to use them as an excuse for not making the GDT sizing automatic as discussed. David Woodhouse (5): x86/boot: Only jump into low trampoline code for real-mode boot x86/boot: Split bootsym() into four types of relocations x86/boot: Rename trampoline_{start,end} to boot_trampoline_{start,end} x86/boot: Copy 16-bit boot variables back up to Xen image x86/boot: Do not use trampoline for no-real-mode boot paths xen/arch/x86/acpi/power.c | 6 +++--- xen/arch/x86/boot/edd.S | 18 ++ xen/arch/x86/boot/head.S | 89 - xen/arch/x86/boot/mem.S | 14 -- xen/arch/x86/boot/trampoline.S| 146 -- xen/arch/x86/boot/video.S | 36 +++- xen/arch/x86/boot/wakeup.S| 16 xen/arch/x86/cpu/common.c | 2 +- xen/arch/x86/cpu/intel.c | 2 +- xen/arch/x86/efi/efi-boot.h | 31 +++ xen/arch/x86/platform_hypercall.c | 18 +- xen/arch/x86/setup.c | 68 ++-- xen/arch/x86/smpboot.c| 6 +++--- xen/arch/x86/tboot.c | 6 +++--- xen/arch/x86/x86_64/mm.c | 2 +- xen/arch/x86/xen.lds.S| 28 +--- xen/include/asm-x86/acpi.h| 2 +- xen/include/asm-x86/config.h | 12 ++-- xen/include/asm-x86/edd.h | 1 - 19 files changed, 331 insertions(+), 172 deletions(-) smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 5/5] x86/boot: Do not use trampoline for no-real-mode boot paths
From: David Woodhouse Where booted from EFI or with no-real-mode, there is no need to stomp on low memory with the 16-boot code. Instead, just go straight to trampoline_protmode_entry() at its physical location within the Xen image, having applied suitable relocations. This means that the GDT has to be loaded with lgdtl because the 16-bit lgdt instruction would drop the high 8 bits of the gdt_trampoline address, causing failures if the Xen image was loaded above 16MiB. For now, the boot code (including the EFI loader path) still determines what the trampoline_phys address should be for the permanent 16-bit trampoline (used for AP startup, and wakeup). But that trampoline is now only relocated for that address and copied into low memory later, from a relocate_trampoline() call made from __start_xen(). The permanent trampoline can't trivially use the 32-bit code in place in its physical location in the Xen image, as idle_pg_table doesn't have a full physical mapping. Fixing that is theoretically possible but it's actually much simpler just to relocate the 32-bit code (again) into low memory, alongside the 16-bit code for the permanent trampoline. Signed-off-by: David Woodhouse --- xen/arch/x86/acpi/power.c | 6 +-- xen/arch/x86/boot/head.S | 67 -- xen/arch/x86/boot/trampoline.S | 32 xen/arch/x86/cpu/common.c | 2 +- xen/arch/x86/cpu/intel.c | 2 +- xen/arch/x86/efi/efi-boot.h| 31 ++-- xen/arch/x86/setup.c | 46 +-- xen/arch/x86/smpboot.c | 6 +-- xen/arch/x86/tboot.c | 6 +-- xen/arch/x86/x86_64/mm.c | 2 +- xen/include/asm-x86/acpi.h | 2 +- xen/include/asm-x86/config.h | 12 +++--- 12 files changed, 122 insertions(+), 92 deletions(-) diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c index aecc754fdb..d2a355429e 100644 --- a/xen/arch/x86/acpi/power.c +++ b/xen/arch/x86/acpi/power.c @@ -152,9 +152,9 @@ static void acpi_sleep_prepare(u32 state) return; if ( acpi_sinfo.vector_width == 32 ) -*(uint32_t *)wakeup_vector_va = bootsym_phys(wakeup_start); +*(uint32_t *)wakeup_vector_va = trampsym_phys(wakeup_start); else -*(uint64_t *)wakeup_vector_va = bootsym_phys(wakeup_start); +*(uint64_t *)wakeup_vector_va = trampsym_phys(wakeup_start); } static void acpi_sleep_post(u32 state) {} @@ -388,7 +388,7 @@ static void tboot_sleep(u8 sleep_state) g_tboot_shared->acpi_sinfo.wakeup_vector = acpi_sinfo.wakeup_vector; g_tboot_shared->acpi_sinfo.vector_width = acpi_sinfo.vector_width; g_tboot_shared->acpi_sinfo.kernel_s3_resume_vector = - bootsym_phys(wakeup_start); + trampsym_phys(wakeup_start); switch ( sleep_state ) { diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S index 6d315020d2..cebad00c13 100644 --- a/xen/arch/x86/boot/head.S +++ b/xen/arch/x86/boot/head.S @@ -689,16 +689,23 @@ trampoline_setup: lea __PAGE_HYPERVISOR+sym_esi(l1_identmap),%edi mov %edi,sym_fs(l2_bootmap) -/* Apply relocations to bootstrap trampoline. */ -mov sym_fs(trampoline_phys),%edx -mov $sym_offs(__trampoline_rel_start),%edi -1: -mov %fs:(%edi),%eax -add %edx,%fs:(%edi,%eax) -add $4,%edi -cmp $sym_offs(__trampoline_rel_stop),%edi -jb 1b +/* Do not parse command line on EFI platform here. */ +cmpb$0,sym_fs(efi_platform) +jnz 1f +/* Bail if there is no command line to parse. */ +mov sym_fs(multiboot_ptr),%ebx +testl $MBI_CMDLINE,MB_flags(%ebx) +jz 1f + +lea sym_esi(early_boot_opts),%eax +push%eax +pushl MB_cmdline(%ebx) +callcmdline_parse_early + +1: +/* Apply relocations to 32-bit trampoline for execution in place. */ +lea sym_esi(perm_trampoline_start),%edx mov $sym_offs(__trampoline32_rel_start),%edi 1: mov %fs:(%edi),%eax @@ -707,6 +714,21 @@ trampoline_setup: cmp $sym_offs(__trampoline32_rel_stop),%edi jb 1b +cmp $0,sym_esi(skip_realmode) +jz .Ldo_realmode + +/* Go directly to trampoline_protmode_entry at its physical address */ +lea trampoline_protmode_entry-__XEN_VIRT_START(%esi),%eax +pushl $BOOT_CS32 +push%eax + +/* EBX == 0 indicates we are the BP (Boot Processor). */ +xor %ebx,%ebx +retl + +.Ldo_realmode: +/* Apply relocations to 16-bit boot code. */ +mov sym_fs(trampoline_phys),%edx mov $sym_offs(__bootsym_rel_start),%edi 1: mov %fs:(%edi),%eax @@ -736,35 +758,12 @@ trampoline_setup:
[Xen-devel] [PATCH v3 4/5] x86/boot: Copy 16-bit boot variables back up to Xen image
From: David Woodhouse Ditch the bootsym() access from C code for the variables populated by 16-bit boot code. As well as being cleaner this also paves the way for not having the 16-bit boot code in low memory for no-real-mode or EFI loader boots at all. These variables are put into a separate .data.boot16 section and accessed in low memory during the real-mode boot, then copied back to their native location in the Xen image when real mode has finished. Fix the limit in gdt_48 to admit that trampoline_gdt actually includes 7 entries, since we do now use the seventh (BOOT_FS) in late code so it matters. Andrew has a patch to further tidy up the GDT and initialise accessed bits etc., so I won't go overboard with more than the trivial size fix for now. The bootsym() macro remains in C code purely for the variables which are written for the later AP startup and wakeup trampoline to use. Signed-off-by: David Woodhouse --- xen/arch/x86/boot/edd.S | 2 ++ xen/arch/x86/boot/head.S | 16 +++ xen/arch/x86/boot/mem.S | 2 ++ xen/arch/x86/boot/trampoline.S| 33 --- xen/arch/x86/boot/video.S | 30 +++- xen/arch/x86/platform_hypercall.c | 18 - xen/arch/x86/setup.c | 22 ++--- xen/arch/x86/xen.lds.S| 9 - xen/include/asm-x86/edd.h | 1 - 9 files changed, 94 insertions(+), 39 deletions(-) diff --git a/xen/arch/x86/boot/edd.S b/xen/arch/x86/boot/edd.S index 434bbbd960..138d04c964 100644 --- a/xen/arch/x86/boot/edd.S +++ b/xen/arch/x86/boot/edd.S @@ -163,6 +163,7 @@ edd_done: .Ledd_mbr_sig_skip: ret +.pushsection .data.boot16, "aw", @progbits GLOBAL(boot_edd_info_nr) .byte 0 GLOBAL(boot_mbr_signature_nr) @@ -171,3 +172,4 @@ GLOBAL(boot_mbr_signature) .fill EDD_MBR_SIG_MAX*8,1,0 GLOBAL(boot_edd_info) .fill EDD_INFO_MAX * (EDDEXTSIZE + EDDPARMSIZE), 1, 0 +.popsection diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S index 4118f73683..6d315020d2 100644 --- a/xen/arch/x86/boot/head.S +++ b/xen/arch/x86/boot/head.S @@ -725,6 +725,17 @@ trampoline_setup: cmp $sym_offs(__bootsym_seg_stop),%edi jb 1b +/* Relocations for the boot data section. */ +mov sym_fs(trampoline_phys),%edx +add $(boot_trampoline_end - boot_trampoline_start),%edx +mov $sym_offs(__bootdatasym_rel_start),%edi +1: +mov %fs:(%edi),%eax +add %edx,%fs:(%edi,%eax) +add $4,%edi +cmp $sym_offs(__bootdatasym_rel_stop),%edi +jb 1b + /* Do not parse command line on EFI platform here. */ cmpb$0,sym_fs(efi_platform) jnz 1f @@ -762,6 +773,11 @@ trampoline_setup: mov $((boot_trampoline_end - boot_trampoline_start) / 4),%ecx rep movsl %fs:(%esi),%es:(%edi) +/* Copy boot data template to low memory. */ +mov $sym_offs(bootdata_start),%esi +mov $((bootdata_end - bootdata_start) / 4),%ecx +rep movsl %fs:(%esi),%es:(%edi) + /* Jump into the relocated trampoline. */ lret diff --git a/xen/arch/x86/boot/mem.S b/xen/arch/x86/boot/mem.S index c5bc774325..9eec1c6ae2 100644 --- a/xen/arch/x86/boot/mem.S +++ b/xen/arch/x86/boot/mem.S @@ -67,6 +67,7 @@ get_memory_map: ret .align 4 +.pushsection .data.boot16, "aw", @progbits GLOBAL(bios_e820map) .fill E820_BIOS_MAX*20,1,0 GLOBAL(bios_e820nr) @@ -75,3 +76,4 @@ GLOBAL(lowmem_kb) .long 0 GLOBAL(highmem_kb) .long 0 + .popsection diff --git a/xen/arch/x86/boot/trampoline.S b/xen/arch/x86/boot/trampoline.S index 71d5424179..c747d70404 100644 --- a/xen/arch/x86/boot/trampoline.S +++ b/xen/arch/x86/boot/trampoline.S @@ -47,11 +47,15 @@ .long 111b - (off) - .;\ .popsection -#define bootdatasym(s) ((s)-boot_trampoline_start) +.pushsection .data.boot16, "aw", @progbits +GLOBAL(bootdata_start) +.popsection + +#define bootdatasym(s) ((s)-bootdata_start+(boot_trampoline_end-boot_trampoline_start)) #define bootdatasym_rel(sym, off, opnd...) \ bootdatasym(sym),##opnd; \ 111:; \ -.pushsection .bootdatasym_rel, "a";\ +.pushsection .bootsym_rel, "a";\ .long 111b - (off) - .;\ .popsection @@ -227,7 +231,7 @@ start64: .word 0 idt_48: .word 0, 0, 0 # base = limit = 0 .word 0 -gdt_48: .word 6*8-1 +gdt_48: .word 7*8-1 .long tramp32sym_rel(trampoline_gdt,4) /* The first page of trampoline is permanent, the rest boot-time only. */ @@ -318,6 +322,23 @@ trampoline_boot_cpu_entry: mov %eax,%gs mov %eax,%ss +
[Xen-devel] [PATCH v3 1/5] x86/boot: Only jump into low trampoline code for real-mode boot
From: David Woodhouse If the no-real-mode flag is set, don't go there at all. This is a prelude to not even putting it there in the first place. Signed-off-by: David Woodhouse --- xen/arch/x86/boot/head.S | 10 ++ xen/arch/x86/boot/trampoline.S | 4 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S index 26b680521d..d303379083 100644 --- a/xen/arch/x86/boot/head.S +++ b/xen/arch/x86/boot/head.S @@ -727,7 +727,17 @@ trampoline_setup: /* Switch to low-memory stack which lives at the end of trampoline region. */ mov sym_fs(trampoline_phys),%edi lea TRAMPOLINE_SPACE+TRAMPOLINE_STACK_SPACE(%edi),%esp +cmpb$0, sym_fs(skip_realmode) +jz 1f +/* If no-real-mode, jump straight to trampoline_protmode_entry */ +lea trampoline_protmode_entry-trampoline_start(%edi),%eax +/* EBX == 0 indicates we are the BP (Boot Processor). */ +xor %ebx,%ebx +jmp 2f +1: +/* Go via 16-bit code in trampoline_boot_cpu_entry */ lea trampoline_boot_cpu_entry-trampoline_start(%edi),%eax +2: pushl $BOOT_CS32 push%eax diff --git a/xen/arch/x86/boot/trampoline.S b/xen/arch/x86/boot/trampoline.S index 7c6a2328d2..429a088b19 100644 --- a/xen/arch/x86/boot/trampoline.S +++ b/xen/arch/x86/boot/trampoline.S @@ -194,9 +194,6 @@ gdt_48: .word 6*8-1 .code32 trampoline_boot_cpu_entry: -cmpb$0,bootsym_rel(skip_realmode,5) -jnz .Lskip_realmode - /* Load pseudo-real-mode segments. */ mov $BOOT_PSEUDORM_DS,%eax mov %eax,%ds @@ -276,7 +273,6 @@ trampoline_boot_cpu_entry: mov %eax,%gs mov %eax,%ss -.Lskip_realmode: /* EBX == 0 indicates we are the BP (Boot Processor). */ xor %ebx,%ebx -- 2.21.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 3/5] x86/boot: Rename trampoline_{start, end} to boot_trampoline_{start, end}
From: David Woodhouse In preparation for splitting the boot and permanent trampolines from each other. Some of these will change back, but most are boot so do the plain search/replace that way first, then a subsequent patch will extract the permanent trampoline code. Signed-off-by: David Woodhouse --- xen/arch/x86/boot/head.S | 12 ++-- xen/arch/x86/boot/trampoline.S | 10 +- xen/arch/x86/boot/video.S | 4 ++-- xen/arch/x86/efi/efi-boot.h| 4 ++-- xen/arch/x86/setup.c | 4 ++-- xen/arch/x86/tboot.c | 6 +++--- xen/arch/x86/x86_64/mm.c | 2 +- xen/arch/x86/xen.lds.S | 6 +++--- xen/include/asm-x86/config.h | 6 +++--- 9 files changed, 27 insertions(+), 27 deletions(-) diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S index e19b31fb85..4118f73683 100644 --- a/xen/arch/x86/boot/head.S +++ b/xen/arch/x86/boot/head.S @@ -746,20 +746,20 @@ trampoline_setup: cmpb$0, sym_fs(skip_realmode) jz 1f /* If no-real-mode, jump straight to trampoline_protmode_entry */ -lea trampoline_protmode_entry-trampoline_start(%edi),%eax +lea trampoline_protmode_entry-boot_trampoline_start(%edi),%eax /* EBX == 0 indicates we are the BP (Boot Processor). */ xor %ebx,%ebx jmp 2f 1: /* Go via 16-bit code in trampoline_boot_cpu_entry */ -lea trampoline_boot_cpu_entry-trampoline_start(%edi),%eax +lea trampoline_boot_cpu_entry-boot_trampoline_start(%edi),%eax 2: pushl $BOOT_CS32 push%eax /* Copy bootstrap trampoline to low memory, below 1MB. */ -mov $sym_offs(trampoline_start),%esi -mov $((trampoline_end - trampoline_start) / 4),%ecx +mov $sym_offs(boot_trampoline_start),%esi +mov $((boot_trampoline_end - boot_trampoline_start) / 4),%ecx rep movsl %fs:(%esi),%es:(%edi) /* Jump into the relocated trampoline. */ @@ -771,8 +771,8 @@ cmdline_parse_early: reloc: #include "reloc.S" -ENTRY(trampoline_start) +ENTRY(boot_trampoline_start) #include "trampoline.S" -ENTRY(trampoline_end) +ENTRY(boot_trampoline_end) #include "x86_64.S" diff --git a/xen/arch/x86/boot/trampoline.S b/xen/arch/x86/boot/trampoline.S index 8537aeb917..71d5424179 100644 --- a/xen/arch/x86/boot/trampoline.S +++ b/xen/arch/x86/boot/trampoline.S @@ -31,7 +31,7 @@ *to be used for AP startup. */ #undef bootsym -#define bootsym(s) ((s)-trampoline_start) +#define bootsym(s) ((s)-boot_trampoline_start) #define bootsym_rel(sym, off, opnd...) \ bootsym(sym),##opnd; \ @@ -47,7 +47,7 @@ .long 111b - (off) - .;\ .popsection -#define bootdatasym(s) ((s)-trampoline_start) +#define bootdatasym(s) ((s)-boot_trampoline_start) #define bootdatasym_rel(sym, off, opnd...) \ bootdatasym(sym),##opnd; \ 111:; \ @@ -56,7 +56,7 @@ .popsection #undef trampsym -#define trampsym(s) ((s)-trampoline_start) +#define trampsym(s) ((s)-boot_trampoline_start) #define trampsym_rel(sym, off, opnd...)\ trampsym(sym),##opnd; \ @@ -66,7 +66,7 @@ .popsection #undef tramp32sym -#define tramp32sym(s) ((s)-trampoline_start) +#define tramp32sym(s) ((s)-boot_trampoline_start) #define tramp32sym_rel(sym, off, opnd...) \ tramp32sym(sym),##opnd;\ @@ -232,7 +232,7 @@ gdt_48: .word 6*8-1 /* The first page of trampoline is permanent, the rest boot-time only. */ /* Reuse the boot trampoline on the 1st trampoline page as stack for wakeup. */ -.equwakeup_stack, trampoline_start + PAGE_SIZE +.equwakeup_stack, boot_trampoline_start + PAGE_SIZE .global wakeup_stack /* From here on early boot only. */ diff --git a/xen/arch/x86/boot/video.S b/xen/arch/x86/boot/video.S index 03907e9e9a..5087c6a4d5 100644 --- a/xen/arch/x86/boot/video.S +++ b/xen/arch/x86/boot/video.S @@ -15,8 +15,8 @@ #include "video.h" -/* Scratch space layout: trampoline_end to trampoline_end+0x1000. */ -#define modelist bootsym(trampoline_end) /* 2kB (256 entries) */ +/* Scratch space layout: boot_trampoline_end to boot_trampoline_end+0x1000. */ +#define modelist bootsym(boot_trampoline_end) /* 2kB (256 entries) */ #define vesa_glob_info (modelist + 0x800)/* 1kB */ #define vesa_mode_info (vesa_glob_info + 0x400) /* 1kB */ diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h index 8aefd7bfb2..fac2b5e12a 100644 --- a/xen/arch/x86/efi/efi-boot.h +++ b/xen/arch/x86/efi/efi-boot.h @@ -232,7 +232,7 @@ static void __init noreturn efi_arch_post_exit_boot(void) u64 cr4 = XEN_MINIMAL_CR4 & ~X86_CR4_PGE, efer; efi_arch_relocate_image(__XEN_VIRT_START - xen_phys_start); -memc
Re: [Xen-devel] [PATCH v2 6/6] x86/boot: Do not use trampoline for no-real-mode boot paths
> On 19.08.2019 17:25, David Woodhouse wrote: >> On Mon, 2019-08-12 at 12:55 +0200, Jan Beulich wrote: >>> On 09.08.2019 17:02, David Woodhouse wrote: >>>> @@ -97,7 +100,7 @@ GLOBAL(trampoline_realmode_entry) >>>> cld >>>> cli >>>> lidttrampsym(idt_48) >>>> -lgdttrampsym(gdt_48) >>>> +lgdtl trampsym(gdt_48) >>> >>> Stray / unrelated change (and if needed, then also for lidt)? >> >> The difference between 16bit l.dt and 32-bit l.dtl is that the former >> only loads 24 bits of the actual table address (trampoline_gdt in this >> case). >> >> Thus, when trampoline_gdt is being used in-place, as it is during early >> boot, and *if* the Xen image is loaded higher than 16MiB, lgdt doesn't >> work. That's half a day of my life I want back. > > But isn't this an issue even independent of your series? I.e. doesn't > this want to be fixed in a separate (to be backported) patch? Before my series it isn't used in place in the Xen image; it's also (mostly gratuitously) copied to low memory. -- dwmw2 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 5/6] x86/boot: Copy 16-bit boot variables back up to Xen image
> On 19.08.2019 17:25, David Woodhouse wrote: >> On Mon, 2019-08-12 at 12:24 +0200, Jan Beulich wrote: >>> On 09.08.2019 17:02, David Woodhouse wrote: >>>> @@ -227,7 +231,7 @@ start64: >>>> .word 0 >>>> idt_48: .word 0, 0, 0 # base = limit = 0 >>>> .word 0 >>>> -gdt_48: .word 6*8-1 >>>> +gdt_48: .word 7*8-1 >>>> .long tramp32sym_rel(trampoline_gdt,4) >>> >>> You don't grow trampoline_gdt here, so I think this change is >>> wrong. And if a change was needed at all (perhaps in the next >>> patch), then I think it would be better to replace the use of >>> literal numbers, using the difference of two labels instead >>> (the "end" lable preferably being a .L-prefixed one). >> >> I don't grow it but... count it ☺. >> >> I do start using sym_fs() here in places that it wasn't before, so the >> incorrect size started to *matter* because the BOOT_FS selector wasn't >> included in the limit. > > Oh, I see - a (latent) bug introduced by b28044226e. Should perhaps > be a separate patch to fix this then (by, as suggested, using an > "end" label rather than hard coded numbers). Indeed. Andrew already posted a patch for that, which (along with his others) I have rebased on top of my tree at https://xenbits.xen.org/gitweb/?p=people/dwmw2/xen.git;a=shortlog;h=refs/heads/bootcleanup-andy -- dwmw2 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 4/6] x86/boot: Rename trampoline_{start, end} to boot_trampoline_{start, end}
> On 19.08.2019 17:24, David Woodhouse wrote: >> On Mon, 2019-08-12 at 11:55 +0200, Jan Beulich wrote: >>> On 09.08.2019 17:02, David Woodhouse wrote: >>>> From: David Woodhouse >>>> >>>> In preparation for splitting the boot and permanent trampolines from >>>> each other. Some of these will change back, but most are boot so do >>>> the >>>> plain search/replace that way first, then a subsequent patch will >>>> extract >>>> the permanent trampoline code. >>> >>> To be honest I don't view it as helpful to do things in this order. >>> If you first re-arranged the ordering of items within the trampoline, >>> we'd then not end up with an intermediate state where the labels are >>> misleading. Is there a reason things can't sensibly be done the other >>> way around? >> >> Obviously I did all this in a working tree first, swore at it a lot and >> finally got it working, then attempted to split it up into separate >> meaningful commits which individually made sense. There is plenty of >> room for subjectivity in the choices I made in that last step. >> >> I'm not sure I quite see why you say the labels are misleading. My >> intent was to apply labels based on what each object is *used* for, >> despite the fact that to start with they're all actually in the same >> place. And then to actually move each different type of symbol into its >> separate section/location to clean things up. >> >> Is it just the code comments at the start of trampoline.S that you find >> misleading in the interim stage? Because those *don't* purely talk >> about what bootsym/bootdatasym/trampsym/tramp32sym are used for; they >> do say how they are (eventually) relocated. I suppose I could rip that >> code comment out of patch #3 completely and add it again in a later >> commit... or just just add it again. I write code comments in an >> attempt to be helpful to those who come after me (especially when >> that's actually myself) but if they're going to cause problems, then >> maybe they're more hassle than they're worth? > > No, it's actually the label names: The "boot" that this patch prefixes > to them isn't correct until all post-boot (i.e. AP bringup) parts > have been moved out of the framed block of code. Hm, OK. AFK at this moment but I'll take another look. Basically there wasn't a perfect way to label and move things; either way there were glitches during the transition and my recollection was that I preferred this one because it was purely cosmetic and only lasted for a commit or two. Will see if I can come up with something nicer within the amount of time it is reasonable to spend on such a transitional issue. -- dwmw2 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 1/5] x86/boot: Only jump into low trampoline code for real-mode boot
On Fri, 2019-08-30 at 16:25 +0200, Jan Beulich wrote: > On 21.08.2019 18:35, David Woodhouse wrote: > > --- a/xen/arch/x86/boot/head.S > > +++ b/xen/arch/x86/boot/head.S > > @@ -727,7 +727,17 @@ trampoline_setup: > > /* Switch to low-memory stack which lives at the end of > > trampoline region. */ > > mov sym_fs(trampoline_phys),%edi > > lea TRAMPOLINE_SPACE+TRAMPOLINE_STACK_SPACE(%edi),%esp > > +cmpb$0, sym_fs(skip_realmode) > > +jz 1f > > +/* If no-real-mode, jump straight to > > trampoline_protmode_entry */ > > +lea trampoline_protmode_entry- > > trampoline_start(%edi),%eax > > +/* EBX == 0 indicates we are the BP (Boot Processor). */ > > +xor %ebx,%ebx > > +jmp 2f > > +1: > > +/* Go via 16-bit code in trampoline_boot_cpu_entry */ > > lea trampoline_boot_cpu_entry- > > trampoline_start(%edi),%eax > > +2: > > pushl $BOOT_CS32 > > push%eax > > Provided it goes in together with the subsequent change removing this > double jump again > Acked-by: Jan Beulich Thanks. > Of course it would have been nice if within you addition you'd been > consistent with adding (or not) blanks after commas separating insn > operands. Yeah, I can see how that would be useful. I'll do it as I rebase on top of the current staging branch and redo the conflict resolution with Andy's changes. smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH] tools/xenstore: Do not abort xenstore-ls if a node disappears while iterating
From: David Woodhouse When recursing, a node sometimes disappears. Deal with it and move on instead of aborting and failing to print the rest of what was requested. Signed-off-by: David Woodhouse --- And thus did an extremely sporadic "not going to delete that device because it already doesn't exist" failure mode become painfully obvious in retrospect... diff --git a/tools/xenstore/xenstore_client.c b/tools/xenstore/xenstore_client.c index 3afc630ab8..c089d60a2a 100644 --- a/tools/xenstore/xenstore_client.c +++ b/tools/xenstore/xenstore_client.c @@ -153,8 +153,13 @@ static void do_ls(struct xs_handle *h, char *path, int cur_depth, int show_perms err(1, "malloc in do_ls"); e = xs_directory(h, XBT_NULL, path, &num); -if (e == NULL) -err(1, "xs_directory (%s)", path); +if (e == NULL) { +if (!cur_depth) +err(1, "xs_directory (%s)", path); + +/* If a node disappears while recursing, silently move on. */ +num = 0; +} for (i = 0; i smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] tools/xenstore: Do not abort xenstore-ls if a node disappears while iterating
On Mon, 2019-03-04 at 14:18 +, Wei Liu wrote: > CC Ian as well. > > It would be better if you run ./scripts/get_maintainers.pl on your > patches in the future to CC the correct people. Will do; thanks. > On Fri, Mar 01, 2019 at 12:16:56PM +, David Woodhouse wrote: > > From: David Woodhouse > > > > When recursing, a node sometimes disappears. Deal with it and move on > > instead of aborting and failing to print the rest of what was > > requested. > > > > Signed-off-by: David Woodhouse > > --- > > And thus did an extremely sporadic "not going to delete that device > > because it already doesn't exist" failure mode become painfully obvious > > in retrospect... > > > > diff --git a/tools/xenstore/xenstore_client.c > > b/tools/xenstore/xenstore_client.c > > index 3afc630ab8..c089d60a2a 100644 > > --- a/tools/xenstore/xenstore_client.c > > +++ b/tools/xenstore/xenstore_client.c > > @@ -153,8 +153,13 @@ static void do_ls(struct xs_handle *h, char > > *path, > > int cur_depth, int show_perms > >err(1, "malloc in do_ls"); > > > > e = xs_directory(h, XBT_NULL, path, &num); > > -if (e == NULL) > > -err(1, "xs_directory (%s)", path); > > +if (e == NULL) { > > +if (!cur_depth) > > +err(1, "xs_directory (%s)", path); > > + > > +/* If a node disappears while recursing, silently move on. > > */ > > +num = 0; > > +} > > Can you check if the errno is ENOENT? I would rather not ignore other > types of errors if possible. Under what circumstances is it correct for xenstore-ls to abort immediately and not attempt to print anything more? I'm sure there are circumstances where the world is so hosed that it'll keep getting errors and *fail* to print anything more. But to deliberately bail out? Should that really be restricted to ENOENT? smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] tools/xenstore: Do not abort xenstore-ls if a node disappears while iterating
On Mon, 2019-03-04 at 15:51 +0100, Juergen Gross wrote: > On 04/03/2019 15:31, David Woodhouse wrote: > > On Mon, 2019-03-04 at 14:18 +, Wei Liu wrote: > > > CC Ian as well. > > > > > > It would be better if you run ./scripts/get_maintainers.pl on > > > your > > > patches in the future to CC the correct people. > > > > Will do; thanks. > > > > > On Fri, Mar 01, 2019 at 12:16:56PM +, David Woodhouse wrote: > > > > From: David Woodhouse > > > > > > > > When recursing, a node sometimes disappears. Deal with it and > > > > move on > > > > instead of aborting and failing to print the rest of what was > > > > requested. > > > > > > > > Signed-off-by: David Woodhouse > > > > --- > > > > And thus did an extremely sporadic "not going to delete that > > > > device > > > > because it already doesn't exist" failure mode become painfully > > > > obvious > > > > in retrospect... > > > > > > > > diff --git a/tools/xenstore/xenstore_client.c > > > > b/tools/xenstore/xenstore_client.c > > > > index 3afc630ab8..c089d60a2a 100644 > > > > --- a/tools/xenstore/xenstore_client.c > > > > +++ b/tools/xenstore/xenstore_client.c > > > > @@ -153,8 +153,13 @@ static void do_ls(struct xs_handle *h, > > > > char > > > > *path, > > > > int cur_depth, int show_perms > > > >err(1, "malloc in do_ls"); > > > > > > > > e = xs_directory(h, XBT_NULL, path, &num); > > > > -if (e == NULL) > > > > -err(1, "xs_directory (%s)", path); > > > > +if (e == NULL) { > > > > +if (!cur_depth) > > > > +err(1, "xs_directory (%s)", path); > > > > + > > > > +/* If a node disappears while recursing, silently move > > > > on. > > > > */ > > > > +num = 0; > > > > +} > > > > > > Can you check if the errno is ENOENT? I would rather not ignore > > > other > > > types of errors if possible. > > > > Under what circumstances is it correct for xenstore-ls to abort > > immediately and not attempt to print anything more? > > > > I'm sure there are circumstances where the world is so hosed that > > it'll > > keep getting errors and *fail* to print anything more. But to > > deliberately bail out? Should that really be restricted to ENOENT? > > EACCES seems to be another candidate for trying to continue. > > EINVAL, ENOMEM and EIO should never occur, so aborting the operation > would be okay IMO. If you get one of those errors for a given path, but then everything else afterwards works fine do you really *want* to abort and not print the later ones? I agree it would be OK — at least tolerable — to abort. But is it really the most desired outcome? smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] tools/xenstore: Do not abort xenstore-ls if a node disappears while iterating
On Mon, 2019-03-04 at 15:46 +, Wei Liu wrote: > To me it is just a bit weird to guard with cur_depth -- if you really > want to continue at all cost, why don't you make it really continue at > all cost? There isn't another early exit from the loop. It really does continue at all costs. The only exception is when cur_depth==0 because that means xenstore-ls was actually invoked on a path that doesn't exist at all. In that case it *should* print the 'No such file or directory' message. And continuing in that case would be meaningless anyway, since it won't have found any child directories to iterate over anyway, so it would just silently exit. > Also you mentioned "a node disappears", I thought that was the case you > cared about, hence my suggestion. That's my primary use case, yes. Obviously we should make it do the right thing in the general case while we're thinking about it. I'm inclined to go with Juergen's suggestion of ignoring ENOENT and EACCES (or should that have been EPERM?), yet still aborting for everything else. smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Coreboot + Linux + kexec + Xen
On Mon, 2016-08-08 at 18:54 +, Trammell Hudson wrote: > Keir Fraser replied to Ward's follow up question: > > > > Is there a significant difference between booting 3.1.4 and > > > 3.2.1 with kexec in terms of BIOS requirements? > > > > If you specify no-real-mode in both cases then there > > should be no difference. If Xen does not drop back into > > real mode then it cannot make use of any legacy BIOS > > services. > > After some tracing, I determined that this is not enitrely accurate and > likely the cause of Ward's failure to boot 3.2.1 on real hardware. > Xen assumes that there is an Extended BIOS Data Area (EBDA), which is > not the case for the Linux-as-bootloader configuration. The parameters > have not been parsed at this time in the boot code, so it always trusts > that it can find this structure. > > Patching xen/arch/x86/boot/head.S to set trampoline_phys to point to > the end of MB_mem_lower (as it was done in 3.1.0) and also passing in > "no-real-mode" allows 4.6.3 to boot on the Coreboot system with Linux > payload. This appears to be true for the general case of kexec into Xen from Linux (or from Xen, once I get round to dealing with the separate issues that exist there). However... > > diff --recursive -u > /home/hudson/build/clean/xen-4.6.3/xen/arch/x86/boot/head.S > ./xen/arch/x86/boot/head.S > --- /home/hudson/build/clean/xen-4.6.3/xen/arch/x86/boot/head.S > 2016-06-20 08:08:22.0 -0400 > +++ ./xen/arch/x86/boot/head.S2016-08-03 17:56:37.511121443 -0400 > @@ -86,6 +86,8 @@ > cmp $MULTIBOOT_BOOTLOADER_MAGIC,%eax > jne not_multiboot > > +#if 0 > + > /* Set up trampoline segment 64k below EBDA */ > movzwl 0x40e,%eax /* EBDA segment */ > cmp $0xa000,%eax/* sanity check (high) */ > @@ -108,6 +110,12 @@ > shl $10-4,%edx > cmp %eax,%edx /* compare with BDA value */ > cmovb %edx,%eax /* and use the smaller */ > +#else > + // coreboot does not provide an Extended BIOS Data Area pointer > + // just stash things the Multiboot structure, adjusted to bytes > +mov MB_mem_lower(%ebx),%eax > +shl $10-4,%eax > +#endif > > 2: /* Reserve 64kb for the trampoline */ > sub $0x1000,%eax That much is overkill if the page has just been zeroed. This (along with "no-real-mode") suffices to get Xen booting for me: --- a/xen/arch/x86/boot/head.S +++ b/xen/arch/x86/boot/head.S @@ -541,6 +541,8 @@ trampoline_bios_setup: cmp $0x100,%edx /* is the multiboot value too small? */ jb 2f /* if so, do not use it */ shl $10-4,%edx +or %ecx,%ecx /* Don't use the BDA value if it's zero */ +cmovz %edx, %ecx cmp %ecx,%edx /* compare with BDA value */ cmovb %edx,%ecx /* and use the smaller */ Not sure if we can trust that it's always actually zero, though. Before I contrive other options like checking it for sanity as we do for the multiboot value, or actually checking (from asm) the command line for no-real-mode and avoiding the BDA completely what *is* the reason we don't trust the multiboot data? Did bootloaders really lie to us? If I'm about to update kexec to support multiboot2 anyway, because I want the relocation support (and I note Xen's MB2 image doesn't provide a non-UEFI entry point, so I'll need to work on that too)... perhaps it makes sense to trust the information we get from MB2 and never look at the BDA in *that* case? smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] Xen kexec status
I've been looking at kexec into Xen, and from Xen. Kexec-tools doesn't support Multiboot v2, and doesn't treat the Xen image as relocatable. So it loads it at address zero, which causes lots of amusement: Firstly, head.S trusts the low memory limit found in the BDA, which has been scribbled on. Hacking around that and setting no-real-mode does make kexec into Xen from Linux work. Secondly, kexec (in xen_kexec_load()) adds a mapping of the 0-1MiB region, which "overlaps" with where Xen is actually loaded, so *Xen* refuses the kexec_load hypercall. For kexec from Xen I also reverted to kexec-tools 2.0.16 as commit 894bea9335f57b62c ("kexec-tools: Perform run-time linking of libxenctrl.so") seems to have broken things by not always defining HAVE_LIBXENCTRL when it should. I'll fix that shortly. Most of the above is relatively simply worked around by hacking the Xen image to be ET_DYN (so that kexec will relocate it) and then using kexec --mem-min=0x10. I'll probably implement Multiboot v2 support in kexec-tools to allow for saner relocation. We should fix head.S. One option is to recognise when the load address is zero, and automatically eschew the BDA and trigger the no-real-mode behaviour when that is the case. Better suggestions welcome. Should we also avoid having a load segment at offset zero in the image, so that it doesn't scribble on the BDA by default? Should we also fix Xen's kexec_load not to refuse overlapping segments if they are not loaded (bufsz==0)? I'm not quite sure what's going on there; doesn't this happen with paging disabled anyway, so why would we need an explicit mapping of RAM? After that, I'm looking at using Xen as a crash kernel, which means I really don't want it scribbling on low memory that it hasn't been explicitly told it can use. First attempt at this is at http://david.woodhou.se/0001-x86-boot-Use-trampoline_protmode_entry-in-place.patch but as noted there, it only works for a single processor for now; I'll fix it as described therein. smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen kexec status
On Sat, 2019-04-27 at 08:15 +0200, David Woodhouse wrote: > I've been looking at kexec into Xen, and from Xen. > > Kexec-tools doesn't support Multiboot v2, and doesn't treat the Xen > image as relocatable. So it loads it at address zero, which causes lots > of amusement: > > Firstly, head.S trusts the low memory limit found in the BDA, which has > been scribbled on. Hacking around that and setting no-real-mode does > make kexec into Xen from Linux work. > > Secondly, kexec (in xen_kexec_load()) adds a mapping of the 0-1MiB > region, which "overlaps" with where Xen is actually loaded, so *Xen* > refuses the kexec_load hypercall. > > For kexec from Xen I also reverted to kexec-tools 2.0.16 as commit > 894bea9335f57b62c ("kexec-tools: Perform run-time linking of > libxenctrl.so") seems to have broken things by not always defining > HAVE_LIBXENCTRL when it should. I'll fix that shortly. > > Most of the above is relatively simply worked around by hacking the Xen > image to be ET_DYN (so that kexec will relocate it) and then using > kexec --mem-min=0x10. I'll probably implement Multiboot v2 support > in kexec-tools to allow for saner relocation. > > We should fix head.S. One option is to recognise when the load address > is zero, and automatically eschew the BDA and trigger the no-real-mode > behaviour when that is the case. Better suggestions welcome. > > Should we also avoid having a load segment at offset zero in the image, > so that it doesn't scribble on the BDA by default? > > Should we also fix Xen's kexec_load not to refuse overlapping segments > if they are not loaded (bufsz==0)? I'm not quite sure what's going on > there; doesn't this happen with paging disabled anyway, so why would we > need an explicit mapping of RAM? Oh, and then there's this... [dwmw2@localhost kexec-tools]$ sudo mv /dev/xen/hypercall /dev/xen/nothypercall [dwmw2@localhost kexec-tools]$ sudo ./build/sbin/kexec -l /root/xen --mem-min=0x20 [dwmw2@localhost kexec-tools]$ sudo mv /dev/xen/nothypercall /dev/xen/hypercall [dwmw2@localhost kexec-tools]$ sudo ./build/sbin/kexec -l /root/xen --mem-min=0x20 xencall: error: alloc_pages: mmap failed: Invalid argument openat(AT_FDCWD, "/dev/xen/privcmd", O_RDWR|O_CLOEXEC) = 4 openat(AT_FDCWD, "/dev/xen/hypercall", O_RDWR|O_CLOEXEC) = 5 openat(AT_FDCWD, "/dev/xen/privcmd", O_RDWR|O_CLOEXEC) = 6 openat(AT_FDCWD, "/dev/xen/privcmd", O_RDWR|O_CLOEXEC) = 7 openat(AT_FDCWD, "/dev/xen/hypercall", O_RDWR|O_CLOEXEC) = 8 openat(AT_FDCWD, "/dev/xen/privcmd", O_RDWR|O_CLOEXEC) = 9 ioctl(9, _IOC(_IOC_NONE, 0x50, 0x5, 0x10), 0x7ffe34ea3650) = 0 mmap(NULL, 2641920, PROT_READ|PROT_WRITE, MAP_SHARED, 5, 0) = -1 EINVAL (Invalid argument) write(2, "xencall: ", 9xencall: )= 9 write(2, "error: ", 7error: ) = 7 write(2, "alloc_pages: mmap failed", 24alloc_pages: mmap failed) = 24 write(2, ": Invalid argument", 18: Invalid argument) = 18 smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen kexec status
> On 27/04/2019 07:15, David Woodhouse wrote: >> I've been looking at kexec into Xen, and from Xen. >> >> Kexec-tools doesn't support Multiboot v2, and doesn't treat the Xen >> image as relocatable. So it loads it at address zero, which causes lots >> of amusement: > > Which binary are you trying to load? > > xen-syms gets converted into an elf32 which should be linked to run a > 2M. See mkelf32 and XEN_IMG_OFFSET Yeah, looking closer it isn't Xen that's being loaded at zero... >> Firstly, head.S trusts the low memory limit found in the BDA, which has >> been scribbled on. Hacking around that and setting no-real-mode does >> make kexec into Xen from Linux work. > > Do we know what scribbles on it? ... kexec in its wisdom is choosing to put something small (0xfe bytes or so) there. Probably the multiboot info. Telling it --mem-min=0x1000 should suffice. > For better or worse, the IVT needs to remain valid wherever possible to > reduce the number of corner cases where an errant NMI/#MC will take out > the entire system. > >> Secondly, kexec (in xen_kexec_load()) adds a mapping of the 0-1MiB >> region, which "overlaps" with where Xen is actually loaded, so *Xen* >> refuses the kexec_load hypercall. > > ISTR this being necessary for purgatory to function at the time David > did the kexec work, but really it seems like a bug with the > configuration of purgatory. I have fixed this to fill in between any gaps below 1MiB but if we can ditch it, all the better. >> For kexec from Xen I also reverted to kexec-tools 2.0.16 as commit >> 894bea9335f57b62c ("kexec-tools: Perform run-time linking of >> libxenctrl.so") seems to have broken things by not always defining >> HAVE_LIBXENCTRL when it should. I'll fix that shortly. That one appears to have been transient; it works now. >> Most of the above is relatively simply worked around by hacking the Xen >> image to be ET_DYN (so that kexec will relocate it) and then using >> kexec --mem-min=0x10. I'll probably implement Multiboot v2 support >> in kexec-tools to allow for saner relocation. > > I think having MB2 support would be a very good move. It also provides > a better way to pass the UEFI details. > >> We should fix head.S. One option is to recognise when the load address >> is zero, and automatically eschew the BDA and trigger the no-real-mode >> behaviour when that is the case. Better suggestions welcome. >> >> Should we also avoid having a load segment at offset zero in the image, >> so that it doesn't scribble on the BDA by default? > > I don't think we should ever be loading a binary at 0, but it might be > worth having a dedicated kexec entry point which can be more selective > about what it does. Compare the MB bootloader name with "kexec"? I note we already do that for grub. > The EFI and PVH entrypoints already set skip_realmode amongst other > things. > > Another option might be to only use the BDA/EBDA in the absence of any > memory map information. The code seems to be fairly insistently not trusting the MB information. I assumed there were reasons for that... but perhaps we could trust MB2? >> Should we also fix Xen's kexec_load not to refuse overlapping segments >> if they are not loaded (bufsz==0)? I'm not quite sure what's going on >> there; doesn't this happen with paging disabled anyway, so why would we >> need an explicit mapping of RAM? > > Do you have a dump of which segments are attempting to be loaded? TBH, > this sounds like fallout from the earlier issues, but it is also > possible that we've got a bug in the overlap checks. Indeed, the overlap is real and it's due to the earlier issues. >> After that, I'm looking at using Xen as a crash kernel, which means I >> really don't want it scribbling on low memory that it hasn't been >> explicitly told it can use. First attempt at this is at >> http://david.woodhou.se/0001-x86-boot-Use-trampoline_protmode_entry-in-place.patch >> but as noted there, it only works for a single processor for now; I'll >> fix it as described therein. > > I think it is well past time to (re)consider and strip down the early > assembly code. There are a number of at-best-questionable things, and > it is extremely thick going. (TBH, I'd also like to replace most of it > with C, but doing that will first require understanding how it actually > all works.) I am... some way to understanding how it works. My current plan is to let head.S relocate *only* the one-time temporary 16-bit boot code, and only if !no-real-mode (where I may yet set no-real-mo
[Xen-devel] [RFC PATCH 2/7] x86/boot: Remove gratuitous call back into low-memory code
From: David Woodhouse We appear to have implemented a memcpy() in the low-memory trampoline which we then call into from __start_xen(), for no adequately defined reason. Kill it with fire. Signed-off-by: David Woodhouse --- xen/arch/x86/boot/mem.S| 27 +-- xen/arch/x86/setup.c | 12 xen/include/asm-x86/e820.h | 5 ++--- 3 files changed, 19 insertions(+), 25 deletions(-) diff --git a/xen/arch/x86/boot/mem.S b/xen/arch/x86/boot/mem.S index c6a9bd4d3b..2d61d28835 100644 --- a/xen/arch/x86/boot/mem.S +++ b/xen/arch/x86/boot/mem.S @@ -7,7 +7,7 @@ get_memory_map: .Lmeme820: xorl%ebx, %ebx # continuation counter -movw$bootsym(e820map), %di # point into the whitelist +movw$bootsym(bios_e820map), %di # point into the whitelist # so we can have the bios # directly write into it. @@ -22,8 +22,8 @@ get_memory_map: cmpl$SMAP,%eax # check the return is `SMAP' jne .Lmem88 -incwbootsym(e820nr) -cmpw$E820_BIOS_MAX,bootsym(e820nr) # up to this many entries +incwbootsym(bios_e820nr) +cmpw$E820_BIOS_MAX,bootsym(bios_e820nr) # up to this many entries jae .Lmem88 movw%di,%ax @@ -66,27 +66,10 @@ get_memory_map: ret -/* - * Copy E820 map obtained from BIOS to a buffer allocated by Xen. - * Input: %rdi: target address of e820 entry array - *%esi: maximum number of entries to copy - * Output: %eax: number of entries copied - */ -.code64 -ENTRY(e820map_copy) -mov %esi, %eax -lea e820map(%rip), %rsi -mov e820nr(%rip), %ecx -cmp %ecx, %eax -cmova %ecx, %eax # number of entries to move -imul$5, %eax, %ecx -rep movsl # do the move -ret - .align 4 -e820map: +GLOBAL(bios_e820map) .fill E820_BIOS_MAX*20,1,0 -e820nr: +GLOBAL(bios_e820nr) .long 0 GLOBAL(lowmem_kb) .long 0 diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c index a353d76f9a..5fa7d3b79c 100644 --- a/xen/arch/x86/setup.c +++ b/xen/arch/x86/setup.c @@ -664,6 +664,18 @@ static char * __init cmdline_cook(char *p, const char *loader_name) return p; } +static int copy_bios_e820(struct e820entry *map, unsigned int limit) +{ +unsigned int n = bootsym(bios_e820nr); +if (n > limit) +n = limit; + +if (n) +memcpy(map, bootsym(bios_e820map), sizeof(*map) * n); + +return n; +} + void __init noreturn __start_xen(unsigned long mbi_p) { char *memmap_type = NULL; diff --git a/xen/include/asm-x86/e820.h b/xen/include/asm-x86/e820.h index ee317b17aa..52916fb75d 100644 --- a/xen/include/asm-x86/e820.h +++ b/xen/include/asm-x86/e820.h @@ -37,8 +37,7 @@ extern struct e820map e820_raw; /* These symbols live in the boot trampoline. */ extern unsigned int lowmem_kb, highmem_kb; -unsigned int e820map_copy(struct e820entry *map, unsigned int limit); - -#define copy_bios_e820 bootsym(e820map_copy) +extern struct e820map bios_e820map[]; +extern unsigned int bios_e820nr; #endif /*__E820_HEADER*/ -- 2.17.2 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [RFC PATCH 4/7] x86/boot: Split bootsym() into four types of relocations
From: David Woodhouse As a first step toward using the low-memory trampoline only when necessary for a legacy boot without no-real-mode, clean up the relocations into three separate groups. • bootsym() is now used only at boot time when no-real-mode isn't set. • bootdatasym() is for variables containing information discovered by the 16-bit boot code. This is currently accessed directly in place in low memory by Xen at runtime, but will be copied back to its location in high memory to avoid the pointer gymnastics (and because a subsequent patch will stop copying the 16-bit boot code into low memory at all when it isn't being used). • trampsym() is for the permanent 16-bit trampoline used for AP startup and for wake from sleep. This is not used at boot, and can be copied into (properly allocated) low memory once the system is running. • tramp32sym() is used both at boot and for AP startup/wakeup. During boot it can be used in-place, running from the physical address of the Xen image. For AP startup it can't, because at that point there isn't a full 1:1 mapping of all memory; only the low trampoline page is mapped. No (intentional) functional change yet; just a "cleanup" to allow the various parts to be treated separately in subsequent patches. Signed-off-by: David Woodhouse --- xen/arch/x86/boot/edd.S| 16 +++ xen/arch/x86/boot/head.S | 22 +++-- xen/arch/x86/boot/mem.S| 12 ++--- xen/arch/x86/boot/trampoline.S | 86 ++ xen/arch/x86/boot/video.S | 6 +-- xen/arch/x86/boot/wakeup.S | 16 +++ xen/arch/x86/efi/efi-boot.h| 8 ++-- xen/arch/x86/xen.lds.S | 15 -- 8 files changed, 125 insertions(+), 56 deletions(-) diff --git a/xen/arch/x86/boot/edd.S b/xen/arch/x86/boot/edd.S index 3df712bce1..434bbbd960 100644 --- a/xen/arch/x86/boot/edd.S +++ b/xen/arch/x86/boot/edd.S @@ -41,7 +41,7 @@ get_edd: # This code is sensitive to the size of the structs in edd.h edd_start: /* ds:si points at fn48 results. Fn41 results go immediately before. */ -movw$bootsym(boot_edd_info)+EDDEXTSIZE, %si +movw$bootdatasym(boot_edd_info)+EDDEXTSIZE, %si movb$0x80, %dl # BIOS device 0x80 edd_check_ext: @@ -56,7 +56,7 @@ edd_check_ext: movb%dl, %ds:-8(%si)# store device number movb%ah, %ds:-7(%si)# store version movw%cx, %ds:-6(%si)# store extensions -incbbootsym(boot_edd_info_nr) # note that we stored something +incbbootdatasym(boot_edd_info_nr) # note that we stored something edd_get_device_params: movw$EDDPARMSIZE, %ds:(%si) # put size @@ -97,7 +97,7 @@ edd_legacy_done: edd_next: incb%dl # increment to next device jz edd_done -cmpb$EDD_INFO_MAX,bootsym(boot_edd_info_nr) +cmpb$EDD_INFO_MAX,bootdatasym(boot_edd_info_nr) jb edd_check_ext edd_done: @@ -108,11 +108,11 @@ edd_done: .Ledd_mbr_sig_start: pushw %es movb$0x80, %dl # from device 80 -movw$bootsym(boot_mbr_signature), %bx # store buffer ptr in bx +movw$bootdatasym(boot_mbr_signature), %bx # store buffer ptr in bx .Ledd_mbr_sig_read: pushw %bx -movw$bootsym(boot_edd_info), %bx -movzbw bootsym(boot_edd_info_nr), %cx +movw$bootdatasym(boot_edd_info), %bx +movzbw bootdatasym(boot_edd_info_nr), %cx jcxz.Ledd_mbr_sig_default .Ledd_mbr_sig_find_info: cmpb%dl, (%bx) @@ -151,12 +151,12 @@ edd_done: jne .Ledd_mbr_sig_next movb%dl, (%bx) # store BIOS drive number movl%ecx, 4(%bx)# store signature from MBR -incbbootsym(boot_mbr_signature_nr) # note that we stored something +incbbootdatasym(boot_mbr_signature_nr) # note that we stored something addw$8, %bx # increment sig buffer ptr .Ledd_mbr_sig_next: incb%dl # increment to next device jz .Ledd_mbr_sig_done -cmpb$EDD_MBR_SIG_MAX, bootsym(boot_mbr_signature_nr) +cmpb$EDD_MBR_SIG_MAX, bootdatasym(boot_mbr_signature_nr) jb .Ledd_mbr_sig_read .Ledd_mbr_sig_done: popw%es diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S index 7c30de3671..5b4f211a9b 100644 --- a/xen/arch/x86/boot/head.S +++ b/xen/arch/x86/boot/head.S @@ -706,14 +706,30 @@ trampoline_setup: cmp $sym_offs(__trampoline_rel_stop),%edi jb 1b -/* Patch in the trampoline segment. */ +mov $sym_offs(__trampoline32_rel_start),%edi +1: +mov %fs:(%e
[Xen-devel] [RFC PATCH 3/7] x86/boot: Only jump into low trampoline code for real-mode boot
From: David Woodhouse If the no-real-mode flag is set, don't go there at all. This is a prelude to not even putting it there in the first place. Signed-off-by: David Woodhouse --- xen/arch/x86/boot/head.S | 10 ++ xen/arch/x86/boot/trampoline.S | 4 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S index db19ac6fd8..7c30de3671 100644 --- a/xen/arch/x86/boot/head.S +++ b/xen/arch/x86/boot/head.S @@ -734,7 +734,17 @@ trampoline_setup: /* Switch to low-memory stack which lives at the end of trampoline region. */ mov sym_fs(trampoline_phys),%edi lea TRAMPOLINE_SPACE+TRAMPOLINE_STACK_SPACE(%edi),%esp +cmpb$0, sym_fs(skip_realmode) +jz 1f +/* If no-real-mode, jump straight to trampoline_protmode_entry */ +lea trampoline_protmode_entry-trampoline_start(%edi),%eax +/* EBX == 0 indicates we are the BP (Boot Processor). */ +xor %ebx,%ebx +jmp 2f +1: +/* Go via 16-bit code in trampoline_boot_cpu_entry */ lea trampoline_boot_cpu_entry-trampoline_start(%edi),%eax +2: pushl $BOOT_CS32 push%eax diff --git a/xen/arch/x86/boot/trampoline.S b/xen/arch/x86/boot/trampoline.S index 5588c7986a..df0ffd5013 100644 --- a/xen/arch/x86/boot/trampoline.S +++ b/xen/arch/x86/boot/trampoline.S @@ -187,9 +187,6 @@ start64: .code32 trampoline_boot_cpu_entry: -cmpb$0,bootsym_rel(skip_realmode,5) -jnz .Lskip_realmode - /* Load pseudo-real-mode segments. */ mov $BOOT_PSEUDORM_DS,%eax mov %eax,%ds @@ -269,7 +266,6 @@ trampoline_boot_cpu_entry: mov %eax,%gs mov %eax,%ss -.Lskip_realmode: /* EBX == 0 indicates we are the BP (Boot Processor). */ xor %ebx,%ebx -- 2.17.2 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [RFC PATCH 1/7] x86/wakeup: Stop using %fs for lidt/lgdt
From: David Woodhouse The wakeup code is now relocated alongside the trampoline code, so %ds is just fine here. Signed-off-by: David Woodhouse --- xen/arch/x86/boot/wakeup.S | 11 ++- 1 file changed, 2 insertions(+), 9 deletions(-) diff --git a/xen/arch/x86/boot/wakeup.S b/xen/arch/x86/boot/wakeup.S index f9632eef95..8c52819171 100644 --- a/xen/arch/x86/boot/wakeup.S +++ b/xen/arch/x86/boot/wakeup.S @@ -40,11 +40,8 @@ ENTRY(wakeup_start) movw%ax, %fs movw$0x0e00 + 'L', %fs:(0x10) -# boot trampoline is under 1M, and shift its start into -# %fs to reference symbols in that area -mov wakesym(trampoline_seg), %fs -lidt%fs:bootsym(idt_48) -lgdt%fs:bootsym(gdt_48) +lidtbootsym(idt_48) +lgdtbootsym(gdt_48) movw$1, %ax lmsw%ax # Turn on CR0.PE @@ -102,10 +99,6 @@ GLOBAL(video_mode) .long 0 GLOBAL(video_flags) .long 0 -trampoline_seg: .word 0 -.pushsection .trampoline_seg, "a" -.long trampoline_seg - . -.popsection .code32 -- 2.17.2 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [RFC PATCH 6/7] x86/boot: Copy 16-bit boot variables back up to Xen image
From: David Woodhouse Ditch the bootsym() access from C code for the variables populated by 16-bit boot code. As well as being cleaner this also paves the way for not having the 16-bit boot code in low memory for no-real-mode or EFI loader boots at all. Signed-off-by: David Woodhouse --- xen/arch/x86/boot/edd.S | 2 ++ xen/arch/x86/boot/head.S | 16 +++ xen/arch/x86/boot/mem.S | 2 ++ xen/arch/x86/boot/trampoline.S| 33 --- xen/arch/x86/boot/video.S | 30 +++- xen/arch/x86/platform_hypercall.c | 18 - xen/arch/x86/setup.c | 23 +++-- xen/arch/x86/xen.lds.S| 8 +++- xen/include/asm-x86/edd.h | 1 - 9 files changed, 93 insertions(+), 40 deletions(-) diff --git a/xen/arch/x86/boot/edd.S b/xen/arch/x86/boot/edd.S index 434bbbd960..138d04c964 100644 --- a/xen/arch/x86/boot/edd.S +++ b/xen/arch/x86/boot/edd.S @@ -163,6 +163,7 @@ edd_done: .Ledd_mbr_sig_skip: ret +.pushsection .data.boot16, "aw", @progbits GLOBAL(boot_edd_info_nr) .byte 0 GLOBAL(boot_mbr_signature_nr) @@ -171,3 +172,4 @@ GLOBAL(boot_mbr_signature) .fill EDD_MBR_SIG_MAX*8,1,0 GLOBAL(boot_edd_info) .fill EDD_INFO_MAX * (EDDEXTSIZE + EDDPARMSIZE), 1, 0 +.popsection diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S index 82342769c7..7d6c8d3292 100644 --- a/xen/arch/x86/boot/head.S +++ b/xen/arch/x86/boot/head.S @@ -732,6 +732,17 @@ trampoline_setup: cmp $sym_offs(__bootsym_seg_stop),%edi jb 1b +/* Relocations for the boot data section. */ +mov sym_fs(trampoline_phys),%edx +add $(boot_trampoline_end - boot_trampoline_start),%edx +mov $sym_offs(__bootdatasym_rel_start),%edi +1: +mov %fs:(%edi),%eax +add %edx,%fs:(%edi,%eax) +add $4,%edi +cmp $sym_offs(__bootdatasym_rel_stop),%edi +jb 1b + /* Do not parse command line on EFI platform here. */ cmpb$0,sym_fs(efi_platform) jnz 1f @@ -769,6 +780,11 @@ trampoline_setup: mov $((boot_trampoline_end - boot_trampoline_start) / 4),%ecx rep movsl %fs:(%esi),%es:(%edi) +/* Copy boot data template to low memory. */ +mov $sym_offs(bootdata_start),%esi +mov $((bootdata_end - bootdata_start + 3) / 4),%ecx +rep movsl %fs:(%esi),%es:(%edi) + /* Jump into the relocated trampoline. */ lret diff --git a/xen/arch/x86/boot/mem.S b/xen/arch/x86/boot/mem.S index aa39608442..86f0fa9af7 100644 --- a/xen/arch/x86/boot/mem.S +++ b/xen/arch/x86/boot/mem.S @@ -67,6 +67,7 @@ get_memory_map: ret .align 4 +.pushsection .data.boot16, "aw", @progbits GLOBAL(bios_e820map) .fill E820_BIOS_MAX*20,1,0 GLOBAL(bios_e820nr) @@ -75,3 +76,4 @@ GLOBAL(lowmem_kb) .long 0 GLOBAL(highmem_kb) .long 0 + .popsection diff --git a/xen/arch/x86/boot/trampoline.S b/xen/arch/x86/boot/trampoline.S index 0f4a740fcb..fdfee2edb1 100644 --- a/xen/arch/x86/boot/trampoline.S +++ b/xen/arch/x86/boot/trampoline.S @@ -47,11 +47,15 @@ .long 111b - (off) - .;\ .popsection -#define bootdatasym(s) ((s)-boot_trampoline_start) +.pushsection .data.boot16, "aw", @progbits +GLOBAL(bootdata_start) +.popsection + +#define bootdatasym(s) ((s)-bootdata_start+(boot_trampoline_end-boot_trampoline_start)) #define bootdatasym_rel(sym, off, opnd...) \ bootdatasym(sym),##opnd; \ 111:; \ -.pushsection .bootdatasym_rel, "a";\ +.pushsection .bootsym_rel, "a";\ .long 111b - (off) - .;\ .popsection @@ -100,7 +104,7 @@ GLOBAL(trampoline_cpu_started) .word 0 idt_48: .word 0, 0, 0 # base = limit = 0 .word 0 -gdt_48: .word 6*8-1 +gdt_48: .word 7*8-1 .long tramp32sym_rel(trampoline_gdt,4) /* Start of tramp32sym section which can be used in place during boot */ @@ -312,6 +316,23 @@ trampoline_boot_cpu_entry: mov %eax,%gs mov %eax,%ss +/* + * Copy locally-gathered data back up into the Xen physical image + */ +mov $BOOT_FS,%eax +mov %eax,%es + +mov $sym_offs(bootdata_end),%ecx +mov $sym_offs(bootdata_start),%edi +sub %edi,%ecx +mov $bootdatasym_rel(bootdata_start,4,%esi) +rep movsb %ds:(%esi),%es:(%edi) + +/* + * %es still points to BOOT_FS but trampoline_protmode_entry + * reloads it anyway. + */ + /* EBX == 0 indicates we are the BP (Boot Processor). */ xor %ebx,%ebx @@ -
[Xen-devel] [RFC PATCH 7/7] x86/boot: Do not use trampoline for no-real-mode boot paths
From: David Woodhouse Where booted from EFI or with no-real-mode, there is no need to stomp on low memory with the 16-boot code. Instead, just go straight to trampoline_protmode_entry() at its physical location within the Xen image. For now, the boot code (including the EFI loader path) still determines what the trampoline_phys address should be. The trampoline is actually relocated for that address and copied into low memory, from a relocate_trampoline() call made from __start_xen(). For subsequent AP startup and wakeup, the 32-bit trampoline can't trivially be used in-place as that region isn't mapped. So copy it down to low memory too, having relocated it (again) to work from there. Signed-off-by: David Woodhouse --- xen/arch/x86/acpi/power.c | 6 +-- xen/arch/x86/boot/head.S | 67 -- xen/arch/x86/boot/trampoline.S | 30 +++ xen/arch/x86/cpu/common.c | 2 +- xen/arch/x86/cpu/intel.c | 2 +- xen/arch/x86/efi/efi-boot.h| 31 ++-- xen/arch/x86/mm.c | 2 +- xen/arch/x86/setup.c | 43 -- xen/arch/x86/smpboot.c | 6 +-- xen/arch/x86/tboot.c | 6 +-- xen/arch/x86/x86_64/mm.c | 2 +- xen/include/asm-x86/acpi.h | 2 +- xen/include/asm-x86/config.h | 10 ++--- 13 files changed, 117 insertions(+), 92 deletions(-) diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c index a704c7c340..a1ce8acb21 100644 --- a/xen/arch/x86/acpi/power.c +++ b/xen/arch/x86/acpi/power.c @@ -152,9 +152,9 @@ static void acpi_sleep_prepare(u32 state) return; if ( acpi_sinfo.vector_width == 32 ) -*(uint32_t *)wakeup_vector_va = bootsym_phys(wakeup_start); +*(uint32_t *)wakeup_vector_va = trampsym_phys(wakeup_start); else -*(uint64_t *)wakeup_vector_va = bootsym_phys(wakeup_start); +*(uint64_t *)wakeup_vector_va = trampsym_phys(wakeup_start); } static void acpi_sleep_post(u32 state) {} @@ -388,7 +388,7 @@ static void tboot_sleep(u8 sleep_state) g_tboot_shared->acpi_sinfo.wakeup_vector = acpi_sinfo.wakeup_vector; g_tboot_shared->acpi_sinfo.vector_width = acpi_sinfo.vector_width; g_tboot_shared->acpi_sinfo.kernel_s3_resume_vector = - bootsym_phys(wakeup_start); + trampsym_phys(wakeup_start); switch ( sleep_state ) { diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S index 7d6c8d3292..c23eeb3aa4 100644 --- a/xen/arch/x86/boot/head.S +++ b/xen/arch/x86/boot/head.S @@ -696,16 +696,23 @@ trampoline_setup: lea __PAGE_HYPERVISOR+sym_esi(l1_identmap),%edi mov %edi,sym_fs(l2_bootmap) -/* Apply relocations to bootstrap trampoline. */ -mov sym_fs(trampoline_phys),%edx -mov $sym_offs(__trampoline_rel_start),%edi -1: -mov %fs:(%edi),%eax -add %edx,%fs:(%edi,%eax) -add $4,%edi -cmp $sym_offs(__trampoline_rel_stop),%edi -jb 1b +/* Do not parse command line on EFI platform here. */ +cmpb$0,sym_fs(efi_platform) +jnz 1f +/* Bail if there is no command line to parse. */ +mov sym_fs(multiboot_ptr),%ebx +testl $MBI_CMDLINE,MB_flags(%ebx) +jz 1f + +lea sym_esi(early_boot_opts),%eax +push%eax +pushl MB_cmdline(%ebx) +callcmdline_parse_early + +1: +/* Apply relocations to 32-bit trampoline for execution in place. */ +lea sym_esi(perm_trampoline_start),%edx mov $sym_offs(__trampoline32_rel_start),%edi 1: mov %fs:(%edi),%eax @@ -714,6 +721,21 @@ trampoline_setup: cmp $sym_offs(__trampoline32_rel_stop),%edi jb 1b +cmp $0,sym_esi(skip_realmode) +jz .Ldo_realmode + +/* Go directly to trampoline_protmode_entry at its physical address */ +lea trampoline_protmode_entry-__XEN_VIRT_START(%esi),%eax +pushl $BOOT_CS32 +push%eax + +/* EBX == 0 indicates we are the BP (Boot Processor). */ +xor %ebx,%ebx +retl + +.Ldo_realmode: +/* Apply relocations to 16-bit boot code. */ +mov sym_fs(trampoline_phys),%edx mov $sym_offs(__bootsym_rel_start),%edi 1: mov %fs:(%edi),%eax @@ -743,35 +765,12 @@ trampoline_setup: cmp $sym_offs(__bootdatasym_rel_stop),%edi jb 1b -/* Do not parse command line on EFI platform here. */ -cmpb$0,sym_fs(efi_platform) -jnz 1f - -/* Bail if there is no command line to parse. */ -mov sym_fs(multiboot_ptr),%ebx -testl $MBI_CMDLINE,MB_flags(%ebx) -jz 1f - -lea sym_esi(early_boot_opts),%eax -push
[Xen-devel] [RFC PATCH 5/7] x86/boot: Rename trampoline_{start, end} to boot_trampoline_{start, end}
From: David Woodhouse In preparation for splitting the boot and permanent trampolines from each other. Some of these will change back, but most are boot so do the plain search/replace that way first, then a subsequent patch will extract the permanent trampoline code. Signed-off-by: David Woodhouse --- xen/arch/x86/boot/head.S | 12 ++-- xen/arch/x86/boot/trampoline.S | 10 +- xen/arch/x86/boot/video.S | 4 ++-- xen/arch/x86/efi/efi-boot.h| 4 ++-- xen/arch/x86/mm.c | 2 +- xen/arch/x86/setup.c | 4 ++-- xen/arch/x86/tboot.c | 6 +++--- xen/arch/x86/x86_64/mm.c | 2 +- xen/arch/x86/xen.lds.S | 6 +++--- xen/include/asm-x86/config.h | 6 +++--- 10 files changed, 28 insertions(+), 28 deletions(-) diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S index 5b4f211a9b..82342769c7 100644 --- a/xen/arch/x86/boot/head.S +++ b/xen/arch/x86/boot/head.S @@ -753,20 +753,20 @@ trampoline_setup: cmpb$0, sym_fs(skip_realmode) jz 1f /* If no-real-mode, jump straight to trampoline_protmode_entry */ -lea trampoline_protmode_entry-trampoline_start(%edi),%eax +lea trampoline_protmode_entry-boot_trampoline_start(%edi),%eax /* EBX == 0 indicates we are the BP (Boot Processor). */ xor %ebx,%ebx jmp 2f 1: /* Go via 16-bit code in trampoline_boot_cpu_entry */ -lea trampoline_boot_cpu_entry-trampoline_start(%edi),%eax +lea trampoline_boot_cpu_entry-boot_trampoline_start(%edi),%eax 2: pushl $BOOT_CS32 push%eax /* Copy bootstrap trampoline to low memory, below 1MB. */ -mov $sym_offs(trampoline_start),%esi -mov $((trampoline_end - trampoline_start) / 4),%ecx +mov $sym_offs(boot_trampoline_start),%esi +mov $((boot_trampoline_end - boot_trampoline_start) / 4),%ecx rep movsl %fs:(%esi),%es:(%edi) /* Jump into the relocated trampoline. */ @@ -778,8 +778,8 @@ cmdline_parse_early: reloc: #include "reloc.S" -ENTRY(trampoline_start) +ENTRY(boot_trampoline_start) #include "trampoline.S" -ENTRY(trampoline_end) +ENTRY(boot_trampoline_end) #include "x86_64.S" diff --git a/xen/arch/x86/boot/trampoline.S b/xen/arch/x86/boot/trampoline.S index e77b4bea36..0f4a740fcb 100644 --- a/xen/arch/x86/boot/trampoline.S +++ b/xen/arch/x86/boot/trampoline.S @@ -31,7 +31,7 @@ *to be used for AP startup. */ #undef bootsym -#define bootsym(s) ((s)-trampoline_start) +#define bootsym(s) ((s)-boot_trampoline_start) #define bootsym_rel(sym, off, opnd...) \ bootsym(sym),##opnd; \ @@ -47,7 +47,7 @@ .long 111b - (off) - .;\ .popsection -#define bootdatasym(s) ((s)-trampoline_start) +#define bootdatasym(s) ((s)-boot_trampoline_start) #define bootdatasym_rel(sym, off, opnd...) \ bootdatasym(sym),##opnd; \ 111:; \ @@ -56,7 +56,7 @@ .popsection #undef trampsym -#define trampsym(s) ((s)-trampoline_start) +#define trampsym(s) ((s)-boot_trampoline_start) #define trampsym_rel(sym, off, opnd...)\ trampsym(sym),##opnd; \ @@ -66,7 +66,7 @@ .popsection #undef tramp32sym -#define tramp32sym(s) ((s)-trampoline_start) +#define tramp32sym(s) ((s)-boot_trampoline_start) #define tramp32sym_rel(sym, off, opnd...) \ tramp32sym(sym),##opnd;\ @@ -226,7 +226,7 @@ start64: /* The first page of trampoline is permanent, the rest boot-time only. */ /* Reuse the boot trampoline on the 1st trampoline page as stack for wakeup. */ -.equwakeup_stack, trampoline_start + PAGE_SIZE +.equwakeup_stack, boot_trampoline_start + PAGE_SIZE .global wakeup_stack /* From here on early boot only. */ diff --git a/xen/arch/x86/boot/video.S b/xen/arch/x86/boot/video.S index 03907e9e9a..5087c6a4d5 100644 --- a/xen/arch/x86/boot/video.S +++ b/xen/arch/x86/boot/video.S @@ -15,8 +15,8 @@ #include "video.h" -/* Scratch space layout: trampoline_end to trampoline_end+0x1000. */ -#define modelist bootsym(trampoline_end) /* 2kB (256 entries) */ +/* Scratch space layout: boot_trampoline_end to boot_trampoline_end+0x1000. */ +#define modelist bootsym(boot_trampoline_end) /* 2kB (256 entries) */ #define vesa_glob_info (modelist + 0x800)/* 1kB */ #define vesa_mode_info (vesa_glob_info + 0x400) /* 1kB */ diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h index abc7d3e3b7..f6f435a4c5 100644 --- a/xen/arch/x86/efi/efi-boot.h +++ b/xen/arch/x86/efi/efi-boot.h @@ -232,7 +232,7 @@ static void __init noreturn efi_arch_post_exit_boot(void) u64 cr4 = XEN_MINIMAL_CR4 & ~X86_CR4_PGE, efer; efi_arch_relocate_image(__XEN_VIR
[Xen-devel] [RFC PATCH 0/7] Clean up x86_64 boot code
Some cleanups for the boot path, originally inspired by an attempt to avoid scribbling on arbitrarily-chosen low memory. In the no-real-mode case we don't need to bounce through low memory at all; we can run the 32-bit trampoline in-place in the Xen image. The variables containing information which is optionally discovered by the real-mode boot code can be put back in place in the Xen image and we can dispense with the bootsym() pointer gymnastics in C code which access them in low memory. I haven't yet got to reloc(), which I think exists only to ensure that the various breadcrumbs left all over the place by the Multiboot bootloader aren't scribbled on when we copy the 16-bit boot trampoline into low memory. I'd quite like to kill reloc() and pass the original pointer up to 64-bit code to be handled in C. That would require finding a *safe* location to put the 16-bit boot trampoline though, which doesn't already contain anything that the bootloader created for us. In fact, isn't there already a chance that head.S will choose a location for the trampoline which is already part of a module or contains one of the Multiboot breadcrumbs? David Woodhouse (7): x86/wakeup: Stop using %fs for lidt/lgdt x86/boot: Remove gratuitous call back into low-memory code x86/boot: Only jump into low trampoline code for real-mode boot x86/boot: Split bootsym() into four types of relocations x86/boot: Rename trampoline_{start,end} to boot_trampoline_{start,end} x86/boot: Copy 16-bit boot variables back up to Xen image x86/boot: Do not use trampoline for no-real-mode boot paths xen/arch/x86/acpi/power.c | 6 +- xen/arch/x86/boot/edd.S | 18 ++-- xen/arch/x86/boot/head.S | 89 -- xen/arch/x86/boot/mem.S | 35 +++- xen/arch/x86/boot/trampoline.S| 145 +++--- xen/arch/x86/boot/video.S | 36 xen/arch/x86/boot/wakeup.S| 23 ++--- xen/arch/x86/cpu/common.c | 2 +- xen/arch/x86/cpu/intel.c | 2 +- xen/arch/x86/efi/efi-boot.h | 31 +-- xen/arch/x86/mm.c | 2 +- xen/arch/x86/platform_hypercall.c | 18 ++-- xen/arch/x86/setup.c | 72 --- xen/arch/x86/smpboot.c| 6 +- xen/arch/x86/tboot.c | 6 +- xen/arch/x86/x86_64/mm.c | 2 +- xen/arch/x86/xen.lds.S| 27 -- xen/include/asm-x86/acpi.h| 2 +- xen/include/asm-x86/config.h | 10 +-- xen/include/asm-x86/e820.h| 5 +- xen/include/asm-x86/edd.h | 1 - 21 files changed, 339 insertions(+), 199 deletions(-) -- 2.17.2 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Patch "x86/entry/64: Remove %ebx handling from error_entry/exit" has been added to the 4.9-stable tree
On Wed, 2018-08-22 at 09:19 +0200, gre...@linuxfoundation.org wrote: > This is a note to let you know that I've just added the patch titled > > x86/entry/64: Remove %ebx handling from error_entry/exit > > to the 4.9-stable tree which can be found at: > > http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary > > The filename of the patch is: > x86-entry-64-remove-ebx-handling-from-error_entry-exit.patch > and it can be found in the queue-4.9 subdirectory. Can we have it for 4.4 too, please? > [ Note to stable maintainers: this should probably get applied to all > kernels. If you're nervous about that, a more conservative fix to > add xorl %ebx,%ebx; incl %ebx before the jump to error_exit should > also fix the problem. ] Can we assume it's always from kernel? The Xen code definitely seems to handle invoking this from both kernel and userspace contexts. Shouldn't %ebx get set to !(regs->rsp & 3) ? Either way, let's just do it in the stable tree exactly the same way it's done upstream. > - * On entry, EBX is a "return to kernel mode" flag: Re-introduce the typo 'EBS' here, to make the patch apply cleanly to 4.4. It's only removing that line anyway. Or just cherry-pick upstream commit 75ca5b22260ef7 first. smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Patch "x86/entry/64: Remove %ebx handling from error_entry/exit" has been added to the 4.9-stable tree
On Wed, 2018-11-28 at 08:44 -0800, Andy Lutomirski wrote: > > Can we assume it's always from kernel? The Xen code definitely seems to > > handle invoking this from both kernel and userspace contexts. > > I learned that my comment here was wrong shortly after the patch landed :( Turns out the only place I see it getting called from is under __context_switch(). #7 [8801144a7cf0] new_xen_failsafe_callback at a028028a [kmod_ebxfix] #8 [8801144a7d90] xen_hypercall_update_descriptor at 8100114a #9 [8801144a7db8] xen_hypercall_update_descriptor at 8100114a #10 [8801144a7df0] xen_mc_flush at 81006ab9 #11 [8801144a7e30] xen_end_context_switch at 81004e12 #12 [8801144a7e48] __switch_to at 81016582 #13 [8801144a7ea0] __schedule at 815d2b37 That …114a in xen_hypercall_update_descriptor is the 'pop' instruction right after the syscall; it's happening when Xen is preempting the domain in the hypercall and then reloads the segment registers to run that vCPU again later. [ 44185.225289] WARN: RDX: RSI: RDI: 000abbd76060 The update_descriptor hypercall args (rdi, rsi) were 0xabbd76060 and 0 respectively — it was setting a descriptor at that address, to zero. Xen then failed to load the selector 0x63 into the %gs register (since that descriptor has just been wiped?), leaving it zero. [ 44185.225256] WARN: xen_failsafe_callback from xen_hypercall_update_descriptor+0xa/0x40 [ 44185.225263] WARN: DS: 2b/2b ES: 2b/2b FS: 0/0 GS:0/63 This is on context switch from a 32-bit task to idle. So xen_failsafe_callback is returning to the "faulting" instruction, with a comment saying "Retry the IRET", but in fact is just continuing on its merry way with %gs unexpectedly set to zero. In fact I think this is probably fine in practice, since it's about to get explicitly set a few lines further down in __context_switch(). But it's odd enough, and far enough away from what's actually said by the comments, that I'm utterly unsure. In xen_load_tls() we explicitly only do the lazy_load_gs(0) for the 32-bit kernel. Is that really right? smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Patch "x86/entry/64: Remove %ebx handling from error_entry/exit" has been added to the 4.9-stable tree
On Thu, 2018-12-06 at 10:49 -0800, Andy Lutomirski wrote: > > On Dec 6, 2018, at 9:36 AM, Andrew Cooper wrote: > > Basically - what is happening is that xen_load_tls() is invalidating the > > %gs selector while %gs is still non-NUL. > > > > If this happens to intersect with a vcpu reschedule, %gs (being non-NUL) > > takes precedence over KERNGSBASE, and faults when Xen tries to reload > > it. This results in the failsafe callback being invoked. > > > > I think the correct course of action is to use xen_load_gs_index(0) > > (poorly named - it is a hypercall which does swapgs; mov to %gs; swapgs) > > before using update_descriptor() to invalidate the segment. > > > > That will reset %gs to 0 without touching KERNGSBASE, and can be queued > > in the same multicall as the update_descriptor() hypercall. > > Sounds good to me as long as we skip it on native. Like this? The other option is just to declare that we don't care. On the rare occasion that it does happen to preempt and then take the trap on reloading, xen_failsafe_callback is actually doing the right thing and just leaving %gs as zero. We'd just need to fix the comments so they explicitly note this case is handled there too. At the moment it just says 'Retry the IRET', as I noted before. diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h index ef05bea7010d..e8b383b24246 100644 --- a/arch/x86/include/asm/xen/hypercall.h +++ b/arch/x86/include/asm/xen/hypercall.h @@ -520,4 +520,15 @@ MULTI_stack_switch(struct multicall_entry *mcl, trace_xen_mc_entry(mcl, 2); } +static inline void +MULTI_set_segment_base(struct multicall_entry *mcl, + int reg, unsigned long value) +{ + mcl->op = __HYPERVISOR_set_segment_base; + mcl->args[0] = reg; + mcl->args[1] = value; + + trace_xen_mc_entry(mcl, 2); +} + #endif /* _ASM_X86_XEN_HYPERCALL_H */ diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index 2f6787fc7106..722f1f51e20c 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -527,6 +527,8 @@ static void load_TLS_descriptor(struct thread_struct *t, static void xen_load_tls(struct thread_struct *t, unsigned int cpu) { + xen_mc_batch(); + /* * XXX sleazy hack: If we're being called in a lazy-cpu zone * and lazy gs handling is enabled, it means we're in a @@ -537,24 +539,24 @@ static void xen_load_tls(struct thread_struct *t, unsigned int cpu) * This will go away as soon as Xen has been modified to not * save/restore %gs for normal hypercalls. * -* On x86_64, this hack is not used for %gs, because gs points -* to KERNEL_GS_BASE (and uses it for PDA references), so we -* must not zero %gs on x86_64 -* * For x86_64, we need to zero %fs, otherwise we may get an * exception between the new %fs descriptor being loaded and -* %fs being effectively cleared at __switch_to(). +* %fs being effectively cleared at __switch_to(). We can't +* just zero %gs, but we do need to clear the selector in +* case of a Xen vCPU context switch before it gets reloaded +* which would also cause a fault. */ if (paravirt_get_lazy_mode() == PARAVIRT_LAZY_CPU) { #ifdef CONFIG_X86_32 lazy_load_gs(0); #else + struct multicall_space mc = __xen_mc_entry(0); + MULTI_set_segment_base(mc.mc, SEGBASE_GS_USER_SEL, 0); + loadsegment(fs, 0); #endif } - xen_mc_batch(); - load_TLS_descriptor(t, cpu, 0); load_TLS_descriptor(t, cpu, 1); load_TLS_descriptor(t, cpu, 2); smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Patch "x86/entry/64: Remove %ebx handling from error_entry/exit" has been added to the 4.9-stable tree
On Thu, 2018-12-06 at 20:27 +, David Woodhouse wrote: > On Thu, 2018-12-06 at 10:49 -0800, Andy Lutomirski wrote: > > > On Dec 6, 2018, at 9:36 AM, Andrew Cooper < > > > andrew.coop...@citrix.com> wrote: > > > Basically - what is happening is that xen_load_tls() is > > > invalidating the > > > %gs selector while %gs is still non-NUL. > > > > > > If this happens to intersect with a vcpu reschedule, %gs (being > > > non-NUL) > > > takes precedence over KERNGSBASE, and faults when Xen tries to > > > reload > > > it. This results in the failsafe callback being invoked. > > > > > > I think the correct course of action is to use > > > xen_load_gs_index(0) > > > (poorly named - it is a hypercall which does swapgs; mov to %gs; > > > swapgs) > > > before using update_descriptor() to invalidate the segment. > > > > > > That will reset %gs to 0 without touching KERNGSBASE, and can be > > > queued > > > in the same multicall as the update_descriptor() hypercall. > > > > Sounds good to me as long as we skip it on native. > > Like this? > #else > + struct multicall_space mc = __xen_mc_entry(0); > + MULTI_set_segment_base(mc.mc, SEGBASE_GS_USER_SEL, 0); > + > loadsegment(fs, 0); > #endif That seems to boot and run, at least. I'm going to experiment with sticking a SCHEDOP_yield in the batch *after* the update_descriptor requests, to see if I can trigger the original problem a bit quicker than my current method — which involves running a hundred machines for a day or two. Still wondering if the better fix is just to adjust the comments to admit that xen_failsafe_callback catches the race condition and fixes it up perfectly, by just letting the %gs selector be zero for a while? smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Patch "x86/entry/64: Remove %ebx handling from error_entry/exit" has been added to the 4.9-stable tree
On Fri, 2018-12-07 at 12:18 +, David Woodhouse wrote: > > > #else > > + struct multicall_space mc = __xen_mc_entry(0); > > + MULTI_set_segment_base(mc.mc, SEGBASE_GS_USER_SEL, 0); > > + > > loadsegment(fs, 0); > > #endif > > That seems to boot and run, at least. > > I'm going to experiment with sticking a SCHEDOP_yield in the batch > *after* the update_descriptor requests, to see if I can trigger the > original problem a bit quicker than my current method — which involves > running a hundred machines for a day or two. That SCHEDOP_yield and some debug output in xen_failsafe_callback shows that it's nice and easy to reproduce now without the above MULTI_set_segment_base() call. And stopped happening when I add the MULTI_set_segment_base() call back again. I'll call that 'tested'. But now we're making a hypercall to clear user %gs even in the case where none of the descriptors have changed; we should probably stop doing that... Testing this (incremental) variant now. --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -688,7 +688,7 @@ static inline bool desc_equal(const struct desc_struct *d1, } static void load_TLS_descriptor(struct thread_struct *t, - unsigned int cpu, unsigned int i) + unsigned int cpu, unsigned int i, int flush_gs) { struct desc_struct *shadow = &per_cpu(shadow_tls_desc, cpu).desc[i]; struct desc_struct *gdt; @@ -698,6 +698,15 @@ static void load_TLS_descriptor(struct thread_struct *t, if (desc_equal(shadow, &t->tls_array[i])) return; + /* +* If the current user %gs points to a descriptor we're changing, +* zero it first to avoid taking a fault if Xen preempts this +* vCPU between now and the time that %gs is later loaded with +* the new value. +*/ + if ((flush_gs >> 3) == GDT_ENTRY_TLS_MIN + i) + MULTI_set_segment_base(mc.mc, SEGBASE_GS_USER_SEL, 0); + *shadow = t->tls_array[i]; gdt = get_cpu_gdt_table(cpu); @@ -709,7 +718,7 @@ static void load_TLS_descriptor(struct thread_struct *t, static void xen_load_tls(struct thread_struct *t, unsigned int cpu) { - xen_mc_batch(); + u16 flush_gs = 0; /* * XXX sleazy hack: If we're being called in a lazy-cpu zone @@ -723,17 +732,19 @@ static void xen_load_tls(struct thread_struct *t, unsigned int cpu) * * For x86_64, we need to zero %fs, otherwise we may get an * exception between the new %fs descriptor being loaded and -* %fs being effectively cleared at __switch_to(). We can't -* just zero %gs, but we do need to clear the selector in -* case of a Xen vCPU context switch before it gets reloaded -* which would also cause a fault. +* %fs being effectively cleared at __switch_to(). +* +* We may also need to zero %gs, if it refers to a descriptor +* which we are clearing. Otherwise a Xen vCPU context switch +* before it gets reloaded could also cause a fault. Since +* clearing the user %gs is another hypercall, do that only if +* it's necessary. */ if (paravirt_get_lazy_mode() == PARAVIRT_LAZY_CPU) { #ifdef CONFIG_X86_32 lazy_load_gs(0); #else - struct multicall_space mc = __xen_mc_entry(0); - MULTI_set_segment_base(mc.mc, SEGBASE_GS_USER_SEL, 0); + savesegment(gs, flush_gs); loadsegment(fs, 0); #endif @@ -741,9 +752,9 @@ static void xen_load_tls(struct thread_struct *t, unsigned int cpu) xen_mc_batch(); - load_TLS_descriptor(t, cpu, 0); - load_TLS_descriptor(t, cpu, 1); - load_TLS_descriptor(t, cpu, 2); + load_TLS_descriptor(t, cpu, 0, flush_gs); + load_TLS_descriptor(t, cpu, 1, flush_gs); + load_TLS_descriptor(t, cpu, 2, flush_gs); { struct multicall_space mc = __xen_mc_entry(0); smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH] x86/xen: Clear user %gs before updating segment descriptors
During a context switch, if clearing a descriptor which is currently referenced by the old process's user %gs, if Xen preempts the vCPU before %gs is set for the new process, a fault may occur. This fault actually seems to be fairly harmless; xen_failsafe_callback will just return to the "faulting" instruction (the random one after the vCPU was preempted) with the offending segment register zeroed, and then it'll get set again later during the context switch. But it's cleaner to avoid it. If the descriptor referenced by the %gs selector is being touched, then include a request to zero the user %gs in the multicall too. Signed-off-by: David Woodhouse --- arch/x86/include/asm/xen/hypercall.h | 11 arch/x86/xen/enlighten_pv.c | 42 +--- 2 files changed, 43 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h index ef05bea7010d..e8b383b24246 100644 --- a/arch/x86/include/asm/xen/hypercall.h +++ b/arch/x86/include/asm/xen/hypercall.h @@ -520,4 +520,15 @@ MULTI_stack_switch(struct multicall_entry *mcl, trace_xen_mc_entry(mcl, 2); } +static inline void +MULTI_set_segment_base(struct multicall_entry *mcl, + int reg, unsigned long value) +{ + mcl->op = __HYPERVISOR_set_segment_base; + mcl->args[0] = reg; + mcl->args[1] = value; + + trace_xen_mc_entry(mcl, 2); +} + #endif /* _ASM_X86_XEN_HYPERCALL_H */ diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index 2f6787fc7106..e502d12ffd17 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -506,7 +506,7 @@ static inline bool desc_equal(const struct desc_struct *d1, } static void load_TLS_descriptor(struct thread_struct *t, - unsigned int cpu, unsigned int i) + unsigned int cpu, unsigned int i, int flush_gs) { struct desc_struct *shadow = &per_cpu(shadow_tls_desc, cpu).desc[i]; struct desc_struct *gdt; @@ -516,6 +516,17 @@ static void load_TLS_descriptor(struct thread_struct *t, if (desc_equal(shadow, &t->tls_array[i])) return; + /* +* If the current user %gs points to a descriptor we're changing, +* zero it first to avoid taking a fault if Xen preempts this +* vCPU between now and the time that %gs is later loaded with +* the new value. +*/ + if ((flush_gs >> 3) == GDT_ENTRY_TLS_MIN + i) { + mc = __xen_mc_entry(0); + MULTI_set_segment_base(mc.mc, SEGBASE_GS_USER_SEL, 0); + } + *shadow = t->tls_array[i]; gdt = get_cpu_gdt_rw(cpu); @@ -527,6 +538,8 @@ static void load_TLS_descriptor(struct thread_struct *t, static void xen_load_tls(struct thread_struct *t, unsigned int cpu) { + u16 flush_gs = 0; + /* * XXX sleazy hack: If we're being called in a lazy-cpu zone * and lazy gs handling is enabled, it means we're in a @@ -537,27 +550,36 @@ static void xen_load_tls(struct thread_struct *t, unsigned int cpu) * This will go away as soon as Xen has been modified to not * save/restore %gs for normal hypercalls. * -* On x86_64, this hack is not used for %gs, because gs points -* to KERNEL_GS_BASE (and uses it for PDA references), so we -* must not zero %gs on x86_64 -* * For x86_64, we need to zero %fs, otherwise we may get an * exception between the new %fs descriptor being loaded and * %fs being effectively cleared at __switch_to(). +* +* We may also need to zero %gs, if it refers to a descriptor +* which we are clearing. Otherwise a Xen vCPU context switch +* before it gets reloaded could also cause a fault. Since +* clearing the user %gs is another hypercall, do that only if +* it's necessary. +* +* Note: These "faults" end up in xen_failsafe_callback(), +* which just returns immediately to the "faulting" instruction +* (i.e. the random one after Xen preempted this vCPU) with +* the offending segment register zeroed. Which is actually +* a perfectly safe thing to happen anyway, as it'll be loaded +* again shortly. So maybe we needn't bother? */ if (paravirt_get_lazy_mode() == PARAVIRT_LAZY_CPU) { #ifdef CONFIG_X86_32 lazy_load_gs(0); #else + savesegment(gs, flush_gs); + loadsegment(fs, 0); #endif } - xen_mc_batch(); - - load_TLS_descriptor(t, cpu, 0); - load_TLS_descriptor(t, cpu, 1); - load_TLS_descriptor(t, cpu, 2); + load_TLS_descriptor(t, cpu, 0, flush_gs); + load_TLS_descriptor(t, cpu, 1, flush_gs); + load_
[Xen-devel] [PATCH] x86/xen: Clear user %gs before updating segment descriptors
During a context switch, if clearing a descriptor which is currently referenced by the old process's user %gs, if Xen preempts the vCPU before %gs is set for the new process, a fault may occur. This fault actually seems to be fairly harmless; xen_failsafe_callback will just return to the "faulting" instruction (the random one after the vCPU was preempted) with the offending segment register zeroed, and then it'll get set again later during the context switch. But it's cleaner to avoid it. If the descriptor referenced by the %gs selector is being touched, then include a request to zero the user %gs in the multicall too. Signed-off-by: David Woodhouse --- v2: Don't accidentally remove the call to xen_mc_batch(). arch/x86/include/asm/xen/hypercall.h | 11 arch/x86/xen/enlighten_pv.c | 40 ++-- 2 files changed, 43 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h index ef05bea7010d..e8b383b24246 100644 --- a/arch/x86/include/asm/xen/hypercall.h +++ b/arch/x86/include/asm/xen/hypercall.h @@ -520,4 +520,15 @@ MULTI_stack_switch(struct multicall_entry *mcl, trace_xen_mc_entry(mcl, 2); } +static inline void +MULTI_set_segment_base(struct multicall_entry *mcl, + int reg, unsigned long value) +{ + mcl->op = __HYPERVISOR_set_segment_base; + mcl->args[0] = reg; + mcl->args[1] = value; + + trace_xen_mc_entry(mcl, 2); +} + #endif /* _ASM_X86_XEN_HYPERCALL_H */ diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index 2f6787fc7106..2eb9827dab4b 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -506,7 +506,7 @@ static inline bool desc_equal(const struct desc_struct *d1, } static void load_TLS_descriptor(struct thread_struct *t, - unsigned int cpu, unsigned int i) + unsigned int cpu, unsigned int i, int flush_gs) { struct desc_struct *shadow = &per_cpu(shadow_tls_desc, cpu).desc[i]; struct desc_struct *gdt; @@ -516,6 +516,17 @@ static void load_TLS_descriptor(struct thread_struct *t, if (desc_equal(shadow, &t->tls_array[i])) return; + /* +* If the current user %gs points to a descriptor we're changing, +* zero it first to avoid taking a fault if Xen preempts this +* vCPU between now and the time that %gs is later loaded with +* the new value. +*/ + if ((flush_gs >> 3) == GDT_ENTRY_TLS_MIN + i) { + mc = __xen_mc_entry(0); + MULTI_set_segment_base(mc.mc, SEGBASE_GS_USER_SEL, 0); + } + *shadow = t->tls_array[i]; gdt = get_cpu_gdt_rw(cpu); @@ -527,6 +538,8 @@ static void load_TLS_descriptor(struct thread_struct *t, static void xen_load_tls(struct thread_struct *t, unsigned int cpu) { + u16 flush_gs = 0; + /* * XXX sleazy hack: If we're being called in a lazy-cpu zone * and lazy gs handling is enabled, it means we're in a @@ -537,27 +550,38 @@ static void xen_load_tls(struct thread_struct *t, unsigned int cpu) * This will go away as soon as Xen has been modified to not * save/restore %gs for normal hypercalls. * -* On x86_64, this hack is not used for %gs, because gs points -* to KERNEL_GS_BASE (and uses it for PDA references), so we -* must not zero %gs on x86_64 -* * For x86_64, we need to zero %fs, otherwise we may get an * exception between the new %fs descriptor being loaded and * %fs being effectively cleared at __switch_to(). +* +* We may also need to zero %gs, if it refers to a descriptor +* which we are clearing. Otherwise a Xen vCPU context switch +* before it gets reloaded could also cause a fault. Since +* clearing the user %gs is another hypercall, do that only if +* it's necessary. +* +* Note: These "faults" end up in xen_failsafe_callback(), +* which just returns immediately to the "faulting" instruction +* (i.e. the random one after Xen preempted this vCPU) with +* the offending segment register zeroed. Which is actually +* a perfectly safe thing to happen anyway, as it'll be loaded +* again shortly. So maybe we needn't bother? */ if (paravirt_get_lazy_mode() == PARAVIRT_LAZY_CPU) { #ifdef CONFIG_X86_32 lazy_load_gs(0); #else + savesegment(gs, flush_gs); + loadsegment(fs, 0); #endif } xen_mc_batch(); - load_TLS_descriptor(t, cpu, 0); - load_TLS_descriptor(t, cpu, 1); - load_TLS_descriptor(t, cpu, 2); + load_TLS_descriptor(t, cpu, 0,
Re: [Xen-devel] xen vtd : set msi guest_masked 0 by default
On Tue, 2016-01-26 at 09:34 +0800, Jianzhong,Chang wrote: > There are some problems when msi guest_masked is set to 1 by default. > When guest os is windows 2008 r2 server, > the device(eg X540-AT2 vf) is not initialized correctly. > Host will always receive message like this :"VF Reset msg received from vf". > Guest has network connectivity issues, > and can not correctly receive/send the packet. In other words "the guest doesn't get any interrupts from the NIC". > So, guest_masked is set to 0 by default. This seems consistent with the PCI spec, which says that "After reset, the state of all implemented Mask and Pending bits is 0 (no vectors are masked and no messages are pending)." That's what we *used* to have in Xen, before these commits changed it to assume that IRQs were guest-masked by default: https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=ad28e42bd1d28d746988ed71654e8aa670629753 https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=84d6add5593d865736831d150da7c38588f669f6 > --- a/xen/arch/x86/msi.c > +++ b/xen/arch/x86/msi.c > @@ -512,7 +512,7 @@ void guest_mask_msi_irq(struct irq_desc *desc, bool_t > mask) > > static unsigned int startup_msi_irq(struct irq_desc *desc) > { > -if ( unlikely(!msi_set_mask_bit(desc, 0, !!(desc->status & IRQ_GUEST))) ) > +if ( unlikely(!msi_set_mask_bit(desc, 0, 0) )) > WARN(); > return 0; > } In testing, this part actually seems to make the difference in practice. Interrupts now work, and Windows guests have connectivity again. > @@ -972,7 +972,7 @@ static int msix_capability_init(struct pci_dev *dev, > entry->msi_attrib.entry_nr = msi->entry_nr; > entry->msi_attrib.maskbit = 1; > entry->msi_attrib.host_masked = 1; > -entry->msi_attrib.guest_masked = 1; > +entry->msi_attrib.guest_masked = 0; > entry->msi_attrib.pos = pos; > entry->irq = msi->irq; > entry->dev = dev; That also seems correct though, since it reflects the actual state we intend to emulate. smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] xen vtd : set msi guest_masked 0 by default
On Mon, 2018-05-21 at 14:10 +0200, Roger Pau Monné wrote: > > Hm, I think I might have fixed this issue, see: > > https://git.qemu.org/?p=qemu.git;a=commit;h=a8036336609d2e184fc3543a4c439c0ba7d7f3a2 Hm, thanks. We'll look at porting that change to qemu-traditional which still doesn't do it. smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/entry/64: Remove %ebx handling from error_entry/exit
On Wed, 2018-08-08 at 10:35 -0700, Sarah Newman wrote: > commit b3681dd548d06deb2e1573890829dff4b15abf46 upstream. > > This version applies to v4.9. I think you can kill the 'xorl %ebx,%ebx' from error_entry too but yes, this does want to go to 4.9 and earlier because the 'Fixes:' tag is a bit of a lie — the problem existed before that, at least in theory. > From Andy Lutomirski, original author: > > error_entry and error_exit communicate the user vs kernel status of > the frame using %ebx. This is unnecessary -- the information is in > regs->cs. Just use regs->cs. > > This makes error_entry simpler and makes error_exit more robust. > > It also fixes a nasty bug. Before all the Spectre nonsense, The > xen_failsafe_callback entry point returned like this: > > ALLOC_PT_GPREGS_ON_STACK > SAVE_C_REGS > SAVE_EXTRA_REGS > ENCODE_FRAME_POINTER > jmp error_exit > > And it did not go through error_entry. This was bogus: RBX > contained garbage, and error_exit expected a flag in RBX. > Fortunately, it generally contained *nonzero* garbage, so the > correct code path was used. As part of the Spectre fixes, code was > added to clear RBX to mitigate certain speculation attacks. Now, > depending on kernel configuration, RBX got zeroed and, when running > some Wine workloads, the kernel crashes. This was introduced by: > > commit 3ac6d8c787b8 ("x86/entry/64: Clear registers for > exceptions/interrupts, to reduce speculation attack surface") > > With this patch applied, RBX is no longer needed as a flag, and the > problem goes away. > > I suspect that malicious userspace could use this bug to crash the > kernel even without the offending patch applied, though. > > [Historical note: I wrote this patch as a cleanup before I was aware > of the bug it fixed.] > > [Note to stable maintainers: this should probably get applied to all > kernels.] > > Cc: Brian Gerst > Cc: Borislav Petkov > Cc: Dominik Brodowski > Cc: Ingo Molnar > Cc: "H. Peter Anvin" > Cc: Thomas Gleixner > Cc: Boris Ostrovsky > Cc: Juergen Gross > Cc: xen-devel@lists.xenproject.org > Cc: x...@kernel.org > Cc: sta...@vger.kernel.org > Cc: Andy Lutomirski > Fixes: 3ac6d8c787b8 ("x86/entry/64: Clear registers for > exceptions/interrupts, to reduce speculation attack surface") > Reported-and-tested-by: "M. Vefa Bicakci" > Signed-off-by: Andy Lutomirski > Signed-off-by: Sarah Newman > --- > arch/x86/entry/entry_64.S | 19 --- > 1 file changed, 4 insertions(+), 15 deletions(-) > > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S > index d58d8dc..0dab47a 100644 > --- a/arch/x86/entry/entry_64.S > +++ b/arch/x86/entry/entry_64.S > @@ -774,7 +774,7 @@ ENTRY(\sym) > > call\do_sym > > - jmp error_exit /* %ebx: no > swapgs flag */ > + jmp error_exit > .endif > END(\sym) > .endm > @@ -1043,7 +1043,6 @@ END(paranoid_exit) > > /* > * Save all registers in pt_regs, and switch gs if needed. > - * Return: EBX=0: came from user mode; EBX=1: otherwise > */ > ENTRY(error_entry) > cld > @@ -1087,7 +1086,6 @@ ENTRY(error_entry) > * for these here too. > */ > .Lerror_kernelspace: > - incl%ebx > leaqnative_irq_return_iret(%rip), %rcx > cmpq%rcx, RIP+8(%rsp) > je .Lerror_bad_iret > @@ -1119,28 +1117,19 @@ ENTRY(error_entry) > > /* > * Pretend that the exception came from user mode: set up > pt_regs > - * as if we faulted immediately after IRET and clear EBX so > that > - * error_exit knows that we will be returning to user mode. > + * as if we faulted immediately after IRET. > */ > mov %rsp, %rdi > callfixup_bad_iret > mov %rax, %rsp > - decl%ebx > jmp .Lerror_entry_from_usermode_after_swapgs > END(error_entry) > > - > -/* > - * On entry, EBX is a "return to kernel mode" flag: > - * 1: already in kernel mode, don't need SWAPGS > - * 0: user gsbase is loaded, we need SWAPGS and standard > preparation for return to usermode > - */ > ENTRY(error_exit) > - movl%ebx, %eax > DISABLE_INTERRUPTS(CLBR_NONE) > TRACE_IRQS_OFF > - testl %eax, %eax > - jnz retint_kernel > + testb $3, CS(%rsp) > + jz retint_kernel > jmp retint_user > END(error_exit) > smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] x86/hvm: re-work viridian APIC assist code
On Thu, 2018-01-18 at 10:10 -0500, Paul Durrant wrote: > Lastly the previous code did not properly emulate an EOI if a missed EOI > was discovered in vlapic_has_pending_irq(); it merely cleared the bit in > the ISR. The new code instead calls vlapic_EOI_set(). Hm, this *halves* my observed performance running a 32-thread 'diskspd.exe' on a Windows box with attached NVME devices, which makes me sad. It's the call to hvm_dpci_msi_eoi() that does it. Commenting out the call to pt_pirq_iterate() and leaving *just* the domain-global spinlock bouncing cache lines between all my CPUs, it's already down to 1.6MIOPS/s from 2.2M on my test box before it does *anything* at all. Calling an *inline* version of pt_pirq_iterate so no retpoline for the indirect calls, and I'm down to 1.1M even when I've nopped out the whole of the _hvm_dpci_msi_eoi function that it's calling. Put it all back, and I'm down to about 1.0M. So it's worse than halved. And what's all this for? The code here is making my eyes bleed but I believe it's for unmaskable MSIs, and these aren't unmaskable. Tempted to make it all go away by having a per-domain bitmap of vectors for which all this work is actually required, and bypassing the whole bloody lot in hvm_dpci_msi_eoi() if the corresponding in bit that bitmap isn't set. The hackish version of that (which seems to work, but would probably want testing with an actual unmaskable MSI in the system, and I have absolutely no confidence I understand what's going on here) looks something like this: diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c index bab3aa3..24df008 100644 --- a/xen/drivers/passthrough/io.c +++ b/xen/drivers/passthrough/io.c @@ -24,6 +24,7 @@ #include #include #include +#include static DEFINE_PER_CPU(struct list_head, dpci_list); @@ -282,6 +283,7 @@ int pt_irq_create_bind( struct hvm_pirq_dpci *pirq_dpci; struct pirq *info; int rc, pirq = pt_irq_bind->machine_irq; +irq_desc_t *desc; if ( pirq < 0 || pirq >= d->nr_pirqs ) return -EINVAL; @@ -422,6 +425,13 @@ int pt_irq_create_bind( dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode); pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id; +BUG_ON(!local_irq_is_enabled()); +desc = pirq_spin_lock_irq_desc(info, NULL); +if ( desc && desc->msi_desc && !msi_maskable_irq(desc->msi_desc) ) +set_bit(pirq_dpci->gmsi.gvec, +hvm_domain_irq(d)->unmaskable_msi_vecs); +spin_unlock_irq(&desc->lock); + spin_unlock(&d->event_lock); pirq_dpci->gmsi.posted = false; @@ -869,7 +874,8 @@ static int _hvm_dpci_msi_eoi(struct domain *d, void hvm_dpci_msi_eoi(struct domain *d, int vector) { -if ( !iommu_enabled || !hvm_domain_irq(d)->dpci ) +if ( !iommu_enabled || !hvm_domain_irq(d)->dpci || + !test_bit(vector, hvm_domain_irq(d)->unmaskable_msi_vecs) ) return; spin_lock(&d->event_lock); diff --git a/xen/include/asm-x86/hvm/irq.h b/xen/include/asm-x86/hvm/irq.h index 8a43cb9..d9d4652 100644 --- a/xen/include/asm-x86/hvm/irq.h +++ b/xen/include/asm-x86/hvm/irq.h @@ -78,6 +78,7 @@ struct hvm_irq { u8 round_robin_prev_vcpu; struct hvm_irq_dpci *dpci; +DECLARE_BITMAP(unmaskable_msi_vecs, 256); /* * Number of wires asserting each GSI. smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] x86/hvm: re-work viridian APIC assist code
On Mon, 2018-09-03 at 10:12 +, Paul Durrant wrote: > > I believe APIC assist is intended for fully synthetic interrupts. Hm, if by 'fully synthetic interrupts' you mean vlapic_virtual_intr_delivery_enabled(), then no I think APIC assist doesn't get used in that case at all. > Is it definitely this patch that causes the problem? It was only > intended to fix previous incorrectness but, if this is the culprit, > then it's clearly caused collateral damage in a logically unrelated > area. Not entirely. The performance gain we observed with APIC assist in the first place was basically stolen. It wasn't just bypassing the vmexit for that EOI; it was *so* much faster because it actually didn't ever do the EOI properly at all. You fixed that omission and unsurprisingly it got slower again; most of the apparent benefit of APIC assist is lost. But that's because it was never really doing the right thing in the first place. That EOI handling for unmaskable MSI is really painfully slow, so my hack bypasses it in the common case where it isn't really necessary. FWIW I've done it in my tree with a single per-domain flag rather than a per-vector bitmap now, which makes it slightly simpler. smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] x86/hvm: re-work viridian APIC assist code
On Wed, 2018-09-05 at 09:36 +, Paul Durrant wrote: > > I see. Given that Windows has used APIC assist to circumvent its EOI > then I wonder whether we can get away with essentially doing the > same. I.e. for a completed APIC assist found in > vlapic_has_pending_irq() we simply clear the APIC assist and highest > vector in the ISR, rather than calling through to vlapic_EOI_set() > and suffering the overhead. I'll spin up a patch and give it a whirl. I think that's fine if you don't actually pass unmaskable MSIs through to the guest in question, but if you *do* then you still need the EOI to happen properly to "unmask" it. Hence my approach which is basically doing what you said and bypassing the expensive part of hvm_dpci_msi_eoi(), but *only* if there's no unmaskable MSI to worry about. smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] x86/hvm: re-work viridian APIC assist code
On Wed, 2018-09-05 at 10:40 +, Paul Durrant wrote: > > Actually the neatest approach would be to get information into the > vlapic code as to whether APIC assist is suitable for the given > vector so that the code there can selectively enable it, and then Xen > would know it was safe to avoid fully emulating an EOI for anything > that did have assist enabled. I'm not sure I understand why an assisted EOI should be any different from "normal" EOI. The global lock and indirect function calls and all that stuff that hvm_dpci_msi_eoi() does are *expensive*, for a rare case. Why not bypass all that when it's not needed, for "normal" EOI and APIC-assisted EOI alike? Why have a distinction between the two? smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 0/2] xen: fix two issues in Xen pv qspinlock handling
On Mon, 2018-10-01 at 09:16 +0200, Juergen Gross wrote: > The Xen specific queue spinlock wait function has two issues which > could result in a hanging system. > > They have a similar root cause of clearing a pending wakeup of a > waiting vcpu and later going to sleep waiting for the just cleared > wakeup event, which of course won't ever happen. > > Juergen Gross (2): > xen: fix race in xen_qlock_wait() > xen: make xen_qlock_wait() nestable > > arch/x86/xen/spinlock.c | 33 - > 1 file changed, 12 insertions(+), 21 deletions(-) > > Cc: waiman.l...@hp.com > Cc: pet...@infradead.org LGTM. Both these should be Cc:sta...@vger.kernel.org, yes? Thanks. smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/2] xen: make xen_qlock_wait() nestable
On Mon, 2018-10-01 at 09:16 +0200, Juergen Gross wrote: > - /* If irq pending already clear it and return. */ > + /* Guard against reentry. */ > + local_irq_save(flags); > + > + /* If irq pending already clear it. */ > if (xen_test_irq_pending(irq)) { > xen_clear_irq_pending(irq); > - return; > + } else if (READ_ONCE(*byte) == val) { > + /* Block until irq becomes pending (or a spurious wakeup) */ > + xen_poll_irq(irq); > } Does this still allow other IRQs to wake it from xen_poll_irq()? In the case where process-context code is spinning for a lock without disabling interrupts, we *should* allow interrupts to occur still... does this? smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/2] xen: make xen_qlock_wait() nestable
On Wed, 2018-10-10 at 14:30 +0200, Thomas Gleixner wrote: > On Wed, 10 Oct 2018, David Woodhouse wrote: > > > On Mon, 2018-10-01 at 09:16 +0200, Juergen Gross wrote: > > > - /* If irq pending already clear it and return. */ > > > + /* Guard against reentry. */ > > > + local_irq_save(flags); > > > + > > > + /* If irq pending already clear it. */ > > > if (xen_test_irq_pending(irq)) { > > > xen_clear_irq_pending(irq); > > > - return; > > > + } else if (READ_ONCE(*byte) == val) { > > > + /* Block until irq becomes pending (or a spurious wakeup) > > > */ > > > + xen_poll_irq(irq); > > > } > > > > > > Does this still allow other IRQs to wake it from xen_poll_irq()? > > > > In the case where process-context code is spinning for a lock without > > disabling interrupts, we *should* allow interrupts to occur still... > > does this? > > Yes. Look at it like idle HLT or WFI. You have to disable interrupt before > checking the condition and then the hardware or in this case the hypervisor > has to bring you back when an interrupt is raised. > > If that would not work then the check would be racy, because the interrupt > could hit and be handled after the check and before going into > HLT/WFI/hypercall and then the thing is out until the next interrupt comes > along, which might be never. Right, but in this case we're calling into the hypervisor to poll for one *specific* IRQ. Everything you say is true for that specific IRQ. My question is what happens to *other* IRQs. We want them, but are they masked? I'm staring at the Xen do_poll() code and haven't quite worked that out... smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/2] xen: make xen_qlock_wait() nestable
> The Xen HV is doing it right. It is blocking the vcpu in do_poll() and > any interrupt will unblock it. Great. Thanks for the confirmation. -- dwmw2 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v8 02/17] x86: Support indirect thunks from assembly code
On Fri, 2018-01-12 at 18:00 +, Andrew Cooper wrote: > +#ifdef CONFIG_INDIRECT_THUNK > + /* callq __x86_indirect_thunk_rcx */ > + ctxt->io_emul_stub[10] = 0xe8; > + *(int32_t *)&ctxt->io_emul_stub[11] = > + (unsigned long)__x86_indirect_thunk_rcx - (stub_va + 11 + 4); > + > +#else Is that always guaranteed to be within a 32-bit offset? It's from the stack, isn't it? Even if it's true now, do we need a sanity check just to make *sure* things never get changed around and make it untrue? smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Guest soft lockups with "xen: make xen_qlock_wait() nestable"
On Thu, 2018-11-08 at 11:18 +0100, Juergen Gross wrote: > Oh, sorry. Of course it does. Dereferencing a percpu variable > directly can't work. How silly of me. > > The attached variant should repair that. Tested to not break booting. Strictly speaking, shouldn't you have an atomic_init() in there somewhere? smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Guest soft lockups with "xen: make xen_qlock_wait() nestable"
On Mon, 2018-11-19 at 08:05 +0100, Juergen Gross wrote: > On 15/11/2018 00:22, David Woodhouse wrote: > > On Thu, 2018-11-08 at 11:18 +0100, Juergen Gross wrote: > > > Oh, sorry. Of course it does. Dereferencing a percpu variable > > > directly can't work. How silly of me. > > > > > > The attached variant should repair that. Tested to not break > > > booting. > > > > Strictly speaking, shouldn't you have an atomic_init() in there > > somewhere? > > atomic_t variables initialized with 0 (e.g. static ones) seem not to > require atomic_init() (or ATOMIC_INIT). Documentation/atomic_t.txt > doesn't mention the need to use it in this case. So I guess it is a > matter of taste. Yeah, we have '#define ATOMIC_INIT(i) { (i) }' fairly much everywhere now, even on SPARC (not that this code runs on SPARC). So it doesn't really matter, and it's fairly unlikely that the atomic_t implementation is going to *change*. But still, there's no harm in doing the 'tasteful' version: static DEFINE_PER_CPU(atomic_t, xen_qlock_wait_nest) = ATOMIC_INIT(0); smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v6.5 11/26] x86: Support indirect thunks from assembly code
On Thu, 2018-01-04 at 00:15 +, Andrew Cooper wrote: > + * We've got no usable stack so can't use a RETPOLINE thunk, and are > + * further than +- 2G from the high mappings so couldn't use > JUMP_THUNK > + * even if was a non-RETPOLINE thunk. Futhermore, an LFENCE isn't > + * necesserily safe to use at this point. I count three typos, pedantry about ± and GiB aside. Late night? :) > --- a/xen/arch/x86/extable.c > +++ b/xen/arch/x86/extable.c > @@ -158,7 +158,7 @@ static int __init stub_selftest(void) > memcpy(ptr, tests[i].opc, ARRAY_SIZE(tests[i].opc)); > unmap_domain_page(ptr); > > -asm volatile ( "call *%[stb]\n" > +asm volatile ( "CALL_THUNK %[stb]\n" If you make that %V[stb] then... > +.if CONFIG_INDIRECT_THUNK == 1 > + > +$done = 0 > +.irp reg, rax, rbx, rcx, rdx, rsi, rdi, rbp, r8, r9, r10, r11, r12, > r13, r14, r15 > +.ifeqs "\arg", "%\reg" > +\insn __x86.indirect_thunk.\reg > +$done = 1 > + .exitm > +.endif > +.endr > +.if $done != 1 > +.error "Bad register arg \arg" > +.endif > + ... you don't need this. smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v6.5 11/26] x86: Support indirect thunks from assembly code
On Thu, 2018-01-11 at 13:41 +, Andrew Cooper wrote: > On 11/01/18 13:03, David Woodhouse wrote: > > > > On Thu, 2018-01-04 at 00:15 +, Andrew Cooper wrote: > > > > > > + * We've got no usable stack so can't use a RETPOLINE thunk, and > > > are > > > + * further than +- 2G from the high mappings so couldn't use > > > JUMP_THUNK > > > + * even if was a non-RETPOLINE thunk. Futhermore, an LFENCE > > > isn't > > > + * necesserily safe to use at this point. > > > I count three typos, pedantry about ± and GiB aside. > > Late night? :) > Just one of many... > > I've found furthermore and necessarily. Where is the 3rd? * even if IT was a … > > > -asm volatile ( "call *%[stb]\n" > > > +asm volatile ( "CALL_THUNK %[stb]\n" > > If you make that %V[stb] then... > > ... you don't need this. > That's fine in principle, except it isn't compatible with most of the > compilers we support. To use, the %V has to be hidden behind a > conditional macro, and I can't think of any remotely-clean way to do > that. In at least one incarnation I ended up with something like #ifndef CONFIG_RETPOLINE #define CALL_THUNK(reg) "call *%[" #reg "]" #else #define CALL_THUNK(reg) "CALL_THUNK %V[" #reg "]" #endif Or maybe I just insisted that it was called %[thunk_target] and my CALL_THUNK C macro doesn't even take an argument. I forget. Late night… smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v8 01/17] x86: Support compiling with indirect branch thunks
On Fri, 2018-01-12 at 18:00 +, Andrew Cooper wrote: > > +.macro IND_THUNK_RETPOLINE reg:req > + call 2f > +1: Linux and GCC have now settled on 'pause;lfence;jmp' here. > + lfence > + jmp 1b > +2: > + mov %\reg, (%rsp) > + ret > +.endm > + smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v8 15/17] x86/ctxt: Issue a speculation barrier between vcpu contexts
On Fri, 2018-01-12 at 18:01 +, Andrew Cooper wrote: > > @@ -1736,6 +1736,9 @@ void context_switch(struct vcpu *prev, struct > vcpu *next) > } > > ctxt_switch_levelling(next); > + > + if ( opt_ibpb ) > + wrmsrl(MSR_PRED_CMD, PRED_CMD_IBPB); > } > If you're doing that without an 'else lfence' don't you need to include a comment with your proof that it's safe to do so, and the CPU can't speculate a taken conditional branch and all the way to a problematic instruction? Also... if you're doing that in context_switch() does it do the right thing with idle? If a CPU switches to the idle domain and then back again to the same vCPU, does that do the IBPB twice? For vmx we only really need IBPB when we do VMPTRLD, right? smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v8 15/17] x86/ctxt: Issue a speculation barrier between vcpu contexts
On Mon, 2018-01-15 at 13:02 +, Andrew Cooper wrote: > On 15/01/18 12:54, David Woodhouse wrote: > > > > On Fri, 2018-01-12 at 18:01 +, Andrew Cooper wrote: > > > > > > @@ -1736,6 +1736,9 @@ void context_switch(struct vcpu *prev, struct > > > vcpu *next) > > > } > > > > > > ctxt_switch_levelling(next); > > > + > > > + if ( opt_ibpb ) > > > + wrmsrl(MSR_PRED_CMD, PRED_CMD_IBPB); > > > } > > > > > If you're doing that without an 'else lfence' don't you need to include > > a comment with your proof that it's safe to do so, and the CPU can't > > speculate a taken conditional branch and all the way to a problematic > > instruction? > > What would that gain? A malicious guest could speculate around it, but > speculation will catch up (at the very latest) when we return to guest, > which is a serialising event. There's your proof. I'd just be happier with a blanket policy of *including* that proof in all cases where we do this kind of runtime conditional branch around setting IBRS or IBPB. Because if we're in the habit of doing the 'if (foo) wrmsrl()' without it, we *might* miss a case where it's not actually OK. (Aside: Is VMLAUNCH actually architecturally guaranteed to be serialising? That doesn't seem consistent with a long-term goal of designing hardware to make VMs go faster. Or have we merely extracted a promise from Intel that *current* hardware will stop speculative execution when it hits a VMLAUNCH?) > > > > Also... if you're doing that in context_switch() does it do the right > > thing with idle? If a CPU switches to the idle domain and then back > > again to the same vCPU, does that do the IBPB twice? > > Context switches to idle will skip the IBPB because it isn't needed, but > any switch to non-idle need it. In your example, we should execute just > a single IBPB. In my example I think we should not execute IBPB at all. We come from a given VMCS, sleep for a while, and go back to it. No need for any flushing whatsoever. > > > > For vmx we only really need IBPB when we do VMPTRLD, right? > > That is part of IBRS_ATT is it not? It doesn't go away with IBRS_ALL (as it's now called), but it's the same for the existing IBRS *and* retpoline. Doing it on VMPTRLD is what Linux is doing. (cf. https://lkml.org/lkml/2018/1/13/40 and https://lkml.org/lkml/2018/1/15/146 and note the AMD SVM caveat.) smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v8 15/17] x86/ctxt: Issue a speculation barrier between vcpu contexts
On Mon, 2018-01-15 at 14:23 +0100, David Woodhouse wrote: > > > > > > > Also... if you're doing that in context_switch() does it do the right > > > thing with idle? If a CPU switches to the idle domain and then back > > > again to the same vCPU, does that do the IBPB twice? > > > > Context switches to idle will skip the IBPB because it isn't needed, but > > any switch to non-idle need it. In your example, we should execute just > > a single IBPB. > > In my example I think we should not execute IBPB at all. We come from a > given VMCS, sleep for a while, and go back to it. No need for any > flushing whatsoever. msw points out that in my example we *don't* execute IBPB at all, I think. In both switching to idle, and back to the vCPU, we should hit this case and not the 'else' case that does the IBPB: 1710 if ( (per_cpu(curr_vcpu, cpu) == next) || 1711 (is_idle_domain(nextd) && cpu_online(cpu)) ) 1712 { 1713 local_irq_enable(); 1714 } smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v8 08/17] x86/msr: Emulation of MSR_{SPEC_CTRL, PRED_CMD} for guests
On Fri, 2018-01-12 at 18:00 +, Andrew Cooper wrote: > > @@ -152,14 +163,38 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, > uint64_t val) > { > const struct vcpu *curr = current; > struct domain *d = v->domain; > + const struct cpuid_policy *cp = d->arch.cpuid; > struct msr_domain_policy *dp = d->arch.msr; > struct msr_vcpu_policy *vp = v->arch.msr; > > switch ( msr ) > { > case MSR_INTEL_PLATFORM_INFO: > + /* Read-only */ > goto gp_fault; > > + case MSR_SPEC_CTRL: > + if ( !cp->feat.ibrsb ) > + goto gp_fault; /* MSR available? */ > + if ( val & ~(SPEC_CTRL_IBRS | > + (cp->feat.stibp ? SPEC_CTRL_STIBP : 0)) ) Intel defines the STIBP bit as non-faulting and ignored, even when STIBP isn't advertised. So you should probably just mask it out if !cp->feat.stibp. > + goto gp_fault; /* Rsvd bit set? */ > + vp->spec_ctrl.guest = val; > + vp->spec_ctrl.host = val; > + break; > + There's no actual wrmsr there. This is fine, because you're going to do it on the way back to the guest (albeit in a later patch in the series). But it probably warrants a comment? smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v8 15/17] x86/ctxt: Issue a speculation barrier between vcpu contexts
On Mon, 2018-01-15 at 22:39 +0100, David Woodhouse wrote: > On Mon, 2018-01-15 at 14:23 +0100, David Woodhouse wrote: > > > > > > > > > > > > > > > > > > > Also... if you're doing that in context_switch() does it do the right > > > > thing with idle? If a CPU switches to the idle domain and then back > > > > again to the same vCPU, does that do the IBPB twice? > > > > > > Context switches to idle will skip the IBPB because it isn't needed, but > > > any switch to non-idle need it. In your example, we should execute just > > > a single IBPB. > > > In my example I think we should not execute IBPB at all. We come from a > > given VMCS, sleep for a while, and go back to it. No need for any > > flushing whatsoever. > > msw points out that in my example we *don't* execute IBPB at all, I > think. > > In both switching to idle, and back to the vCPU, we should hit this > case and not the 'else' case that does the IBPB: > > 1710 if ( (per_cpu(curr_vcpu, cpu) == next) || > 1711 (is_idle_domain(nextd) && cpu_online(cpu)) ) > 1712 { > 1713 local_irq_enable(); > 1714 } I tested that; it doesn't seem to be the case. We end up here with prev being the idle thread, next being the actual vCPU, and per_cpu(curr_vcpu, cpu) being the idle thread too. So we still do the IBPB even when we've just switch from a given vCPU to idle and back again. That's not actually tested on Xen master, but the code here looks very much the same as what I actually did test. smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v8 15/17] x86/ctxt: Issue a speculation barrier between vcpu contexts
On Wed, 2018-01-17 at 18:26 +0100, David Woodhouse wrote: > > > In both switching to idle, and back to the vCPU, we should hit this > > case and not the 'else' case that does the IBPB: > > > > 1710 if ( (per_cpu(curr_vcpu, cpu) == next) || > > 1711 (is_idle_domain(nextd) && cpu_online(cpu)) ) > > 1712 { > > 1713 local_irq_enable(); > > 1714 } > > I tested that; it doesn't seem to be the case. We end up here with prev > being the idle thread, next being the actual vCPU, and > per_cpu(curr_vcpu, cpu) being the idle thread too. So we still do the > IBPB even when we've just switch from a given vCPU to idle and back > again. > > That's not actually tested on Xen master, but the code here looks very > much the same as what I actually did test. This appears to make the excessive IBPBs go away. There might be better approaches. diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 04e9902..b8a9d54 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -68,6 +68,7 @@ #include #include +DEFINE_PER_CPU(struct vcpu *, last_vcpu); /* Last non-idle vCPU */ DEFINE_PER_CPU(struct vcpu *, curr_vcpu); static void default_idle(void); @@ -1745,8 +1746,14 @@ void context_switch(struct vcpu *prev, struct vcpu *next) ctxt_switch_levelling(next); -if ( opt_ibpb ) +/* IBPB on switching to a non-idle vCPU, if that vCPU was not + * the last one to be scheduled on this pCPU */ +if ( opt_ibpb && !is_idle_cpu(next) && + per_cpu(last_vcpu, cpu) != next ) +{ +per_cpu(last_vcpu, cpu) = next; wrmsrl(MSR_PRED_CMD, PRED_CMD_IBPB); +} } context_saved(prev); smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v9 09/11] x86/ctxt: Issue a speculation barrier between vcpu contexts
On Fri, 2018-01-19 at 08:07 -0700, Jan Beulich wrote: > > > > On 19.01.18 at 15:48, wrote: > > vcpu pointers are rather more susceptible to false aliasing in the case > > that the 4k memory allocation behind struct vcpu gets reused. > > > > The risks are admittedly minute, but this is a much safer option. > > Oh, right, I didn't consider the case of the vCPU (and domain) > having gone away in the meantime. Mind extending the comment > to clarify this? I look at this and figured I didn't care about aliasing. Sure, we might not do the IBPB if the stale pointer is aliased and we switch from a now-dead vCPU to a new vCPU. But good luck exploiting the new vCPU from an old domain that's now dead! I actually thought Andrew changed from pointers just because of the 'ick' factor of keeping a known-stale pointer around and using it for *comparison* but having to careful avoid ever *dereferencing* it (as my debugging patch in context_switch() did... oops!) > > David found that transitions to idle and back to the same vcpu were > > reliably taking an unnecessary IBPB. I'll repeat my caveat there — I observed this on our ancient Xen 4.mumble not current HEAD. But the code here looks remarkably similar; it hasn't changed for a while. > I understand that, but there was no explanation whatsoever as > to why that is. Looks like we set per_cpu(curr_vcpu)=next every time we switch, even if we are switching to the idle domain. Which is why I added per_cpu(last) specifically to be updated only when it *isn't* an idle vCPU. smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v6.5 08/26] x86/entry: Erase guest GPR state on entry to Xen
On Thu, 2018-01-04 at 00:15 +, Andrew Cooper wrote: > > --- a/xen/include/asm-x86/asm_defns.h > +++ b/xen/include/asm-x86/asm_defns.h > @@ -217,22 +217,34 @@ static always_inline void stac(void) > addq $-(UREGS_error_code-UREGS_r15), %rsp > cld > movq %rdi,UREGS_rdi(%rsp) > + xor %edi, %edi > movq %rsi,UREGS_rsi(%rsp) > + xor %esi, %esi > movq %rdx,UREGS_rdx(%rsp) > + xor %edx, %edx > movq %rcx,UREGS_rcx(%rsp) > + xor %ecx, %ecx > movq %rax,UREGS_rax(%rsp) > + xor %eax, %eax You didn't want to erase all 64 bits? smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v6.5 08/26] x86/entry: Erase guest GPR state on entry to Xen
On Mon, 2018-01-22 at 10:18 +, Andrew Cooper wrote: > On 22/01/2018 10:04, David Woodhouse wrote: > > > > On Thu, 2018-01-04 at 00:15 +, Andrew Cooper wrote: > > > > > > --- a/xen/include/asm-x86/asm_defns.h > > > +++ b/xen/include/asm-x86/asm_defns.h > > > @@ -217,22 +217,34 @@ static always_inline void stac(void) > > > addq $-(UREGS_error_code-UREGS_r15), %rsp > > > cld > > > movq %rdi,UREGS_rdi(%rsp) > > > + xor %edi, %edi > > > movq %rsi,UREGS_rsi(%rsp) > > > + xor %esi, %esi > > > movq %rdx,UREGS_rdx(%rsp) > > > + xor %edx, %edx > > > movq %rcx,UREGS_rcx(%rsp) > > > + xor %ecx, %ecx > > > movq %rax,UREGS_rax(%rsp) > > > + xor %eax, %eax > > You didn't want to erase all 64 bits? > > This does erase all 64 bits. (We're in long mode, so the upper 32 bits > are implicitly zeroed, without an added rex prefix.) Eww. In the grand scheme of things, I'd rather the assembler knew that (and happily omitted the rex prefix all by itself to use the more efficient encoding of the instruction), and not me. smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v10 09/11] x86/ctxt: Issue a speculation barrier between vcpu contexts
On Wed, 2018-01-24 at 13:49 +, Andrew Cooper wrote: > On 24/01/18 13:34, Woodhouse, David wrote: > > I am loath to suggest *more* tweakables, but given the IBPB cost is > > there any merit in having a mode which does it only if the *domain* is > > different, regardless of vcpu_id? > > This would only be a win if you were regularly cross-scheduling vcpus > from the same domain, which case you've probably other issues to be > worried about. Of course. If the guest *knows* about HT siblings that kind of implies you've told it about the topology and thus you're pinning vCPU. I don't think there *is* a world in which what I said makes sense. > > > > If a given domain is running on HT siblings, it ought to be doing its > > own mitigation — setting STIBP for userspace if it wants, ensuring its > > own kernel is safe by having IBRS set or using retpoline, etc. > ~Andrew > > [1] Is this trying to be a subtle hint? Heh, no. When I get to that bit, and the *reasons* we do that, it'll be far from subtle. But as with so many other things, NOT THIS WEEK :) smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 1/2] xen/mm: fold PGC_broken into PGC_state bits
On Fri, 2020-02-07 at 20:27 +, Julien Grall wrote: > Could you please send the series in a separate thread? So we don't mix > the two discussions (i.e merge and free on boot allocated page) together. Well, they're all part of the same mess, really. There are cases where pages end up in free_heap_pages() which were never vetted by init_heap_pages(). While that normally works just fine — to the extent that we've never really cared — the hack excluding MFN0 is one of the things that gets 'missed' for such pages. I was only looking at this because the early vmap support makes it a tiny bit more likely that some pages will be freed that way after being given out by the boot allocator, but there were plenty of reasons it might happen already. These patches basically legitimise that — we make it OK for free_heap_pages() to be given a range which was never in the heap in the first place. And in so doing, we fix the MFN0/zone merge trick even when (for example) MFN0 was part of the dom0 initramfs and assigned directly, but gets freed up later. But sure, having failed to elicit any screams of "don't do it like that", I'll fix up the things you pointed out and I'll repost it as part of the series containing the early vmap() support, since that's why I'm working on it. Although at this point I'm half tempted to declare that there are *so* many ways this can happen already, that the tiny little bit that it's made more likely by the early vmap() support is basically in the noise. In that case we can separate these patches out as an independent fix rather than a dependency of early vmap. smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/2] domain: use PGC_extra domheap page for shared_info
On Fri, 2020-03-06 at 12:37 +0100, Jan Beulich wrote: > I've started looking at the latest version of Paul's series, but I'm > still struggling to see the picture: There's no true distinction > between Xen heap and domain heap on x86-64 (except on very large > systems). Therefore it is unclear to me what "those pages" is actually > referring to above. Surely new Xen can't be given any pages in use > _in any way_ by old Xen, no matter whether it's ones assigned to > domains, or ones used internally to (old) Xen. Old and new Xen do not coexist. There is a kexec (via kexec_reloc.S and purgatory) from old to new. There are some pages which new Xen MUST NOT scribble on, because they actually belong to the domains being preserved. That includes the EPT (or at least IOMMU) page tables. I suppose new Xen also mustn't scribble on the pages in which old Xen has placed the migration information for those domains either. At least, not until it's consumed the data. Anything else, however, is fine for new Xen to scribble on. Fairly much anything that the old Xen had allocated from its xenheap (and not subsequently shared to a guest, qv) is no longer required and can be treated as free memory by the new Xen, which now owns the machine. smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/2] domain: use PGC_extra domheap page for shared_info
On Fri, 2020-03-06 at 12:37 +0100, Jan Beulich wrote: > > For live update we need to give a region of memory to the new Xen which > > it can use for its boot allocator, before it's handled any of the live > > update records and before it knows which *other* memory is still > > available for use. > > > > In order to do that, the original Xen has to ensure that it *doesn't* > > use any of that memory region for domain-owned pages which would need > > to be preserved. > > > > So far in the patches I've posted upstream I have cheated, and simply > > *not* added them to the main heap. Anything allocated before > > end_boot_allocator() is fine because it is "ephemeral" to the first Xen > > and doesn't need to be preserved (it's mostly frame tables and a few > > PTE pages). > > > > Paul's work is making it possible to use those pages as xenheap pages, > > safe in the knowledge that they *won't* end up being mapped to domains, > > and won't need to be preserved across live update. > > I've started looking at the latest version of Paul's series, but I'm > still struggling to see the picture: There's no true distinction > between Xen heap and domain heap on x86-64 (except on very large > systems). Therefore it is unclear to me what "those pages" is actually > referring to above. Surely new Xen can't be given any pages in use > _in any way_ by old Xen, no matter whether it's ones assigned to > domains, or ones used internally to (old) Xen. Hm, I'm not sure my previous response actually answered your question; sorry, I've been away all week so it's still Monday morning in my head right now. Let me try again... What I said just now is true. The new Xen can use anything that isn't actually owned by domains, but old Xen is dead and any of its own internal allocations, Xen page tables and data structures (i.e. most of what it allocated on its xenheap) have died with it and those pages are considered 'free' by the new Xen. Theoretically, it would be possible for the new Xen to go directly to that state. The live update data could be processed *early* in the new Xen before the boot allocator is even running, and new Xen could prime its boot allocator with the memory ranges that happen to be available according to the criteria outlined above. Our initial implementation did that, in fact. It was complex in early boot, and it didn't scale to more than 512 separate free ranges because the boot allocator panics if it has more free regions than that. That's why we settled on the model of reserving a specific region for the new Xen to use for its boot allocator. Old Xen promises that it won't put anything into that region that needs to be preserved over kexec, and then the startup process for the new Xen is much simpler; it can use that contiguous region for its boot allocations and then process the live update data in a better environment once things like vmap() are already available Then *finally* it can add the rest of the system memory that *isn't* used by running domains, into the buddy allocator. So this requires old Xen to promise that it *won't* put anything into that region of reserved bootmem (aka "those pages"), that needs to be preserved across kexec. That promise is *mostly* equivalent to "will only allocate xenheap pages from those pages"... except for the fact that sometimes we allocate a page from the xenheap and share it to domains. Thus, "don't do that then", and THEN we can say that it's OK for xenheap allocations to come from the reserved bootmem region, but not domheap allocations. smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/2] domain: use PGC_extra domheap page for shared_info
On Fri, 2020-03-06 at 13:25 +0100, Jan Beulich wrote: > And likely interrupt remapping tables, device tables, etc. I don't > have a clear picture on how you want to delineate ones in use in any > such way from ones indeed free to re-use. Right. The solution there is two-fold: For pages in the general population (outside the reserved bootmem), the responsibility lies with the new Xen. As it processes the live update information that it receives from the old Xen, it must mark those pages as in-use so that it doesn't attempt to allocate them. That's what this bugfix paves the way for — it avoids putting *bad* pages into the buddy allocator, by setting the page state before the page is seen by init_heap_pages(), and making init_heap_pages() skip the pages marked as broken. It's trivial, then, to make init_heap_pages() *also* skip pages which get marked as "already in use" when we process the live update data. The second part, as discussed, is that the old Xen must not put any of those "needs to be preserved" pages into the reserved bootmem region. That's what Paul is working on. We stop sharing xenheap pages to domains, which is part of it — but *also* we need to use the right allocation for any IOMMU page tables and IRQ remapping tables which need to be preserved, etc. That partly comes out of Hongyan's secret hiding work anyway, since we no longer get to assume that xenheap pages are already mapped *anyway*, so we might as *well* be allocating those from domheap. smime.p7s Description: S/MIME cryptographic signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel