Hi tech-kern@, On Mon, 31 Mar 2025 04:40:30 -0000 (UTC), Pierre Pronchery wrote:
> This post is related to iMil's recent work on PVH support for > NetBSD/amd64. > I was unable to use his work to boot on ramdisks directly with QEMU's - > initrd flag, when using -kernel. > > Well after a deep dive into it, I think I am almost there: > https://git.edgebsd.org/gitweb/? > p=src.git;a=commitdiff;h=629621f41089af50584214a4d32b50ae8ee414f2 > > This patch: > - extends sys/arch/amd64/amd64/genassym.cf for additional knowledge of > Xen's hvm_start_info (notably nr_modules and modlist_paddr) > - extends .start_genpvh in locore.S to copy the module entries, and > their > respective command lines and contents > - teaches x86_machdep.c to load Xen modules when a VM_GUEST_GENPVH guest > > The code is not working yet unfortunately. Well, now it does; with MICROVM, on an Intel-macOS host: > $ qemu-system-x86_64 -m 512 -accel hvf -display none -serial stdio \ > -M microvm,rtc=off,acpi=off,pic=off -kernel netbsd-MICROVM -append \ > console=com rw -v -initrd ramdisk-cgdroot.fs -action reboot=shutdown \ > -D qemu.log -d cpu_reset,in_asm,guest_errors,unimp \ > -device virtio-blk-device,drive=hd0 \ > -drive file=ld0.img,format=raw,id=hd0 > qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.80000001H:ECX.svm [bit 2] > [ 1.0000000] WARNING: system needs entropy for security; see entropy(7) > [ 1.0000000] [ Kernel symbol table missing! ] > [ 1.0000000] Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002,> 2003, > [ 1.0000000] 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, > 2013, > [ 1.0000000] 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, > [ 1.0000000] 2024, 2025 > [ 1.0000000] The NetBSD Foundation, Inc. All rights reserved. > [ 1.0000000] Copyright (c) 1982, 1986, 1989, 1991, 1993 > [ 1.0000000] The Regents of the University of California. All rights reserved. > > [ 1.0000000] NetBSD 10.99.12 (MICROVM) #0: Wed Apr 9 08:52:24 UTC 2025 > [ 1.0000000] mkre...@mkrepro.netbsd.org:/usr/src/sys/arch/amd64/ compile/MICROVM > [ 1.0000000] total memory = 511 MB > [ 1.0000000] avail memory = 480 MB > [ 1.0000000] KERNBASE=0xffffffff80000000 > [ 1.0000000] modlist_paddr=0xffffffff80a00038 > cmdline_paddr=0xffffffff80ee2075 cmdline="console=com rw -v virtio_mmio.device=512@0xfeb00e00:12" > [ 1.0000000] Xen module info at boot (0xffffffff80a00038, 1) > [ 1.0000000] timecounter: Timecounters tick every 10.000 msec > [ 1.0000000] mainbus0 (root) > [ 1.0000000] mainbus0: Intel MP Specification (Version 1.4) (QBOOT 000000000000) > [ 1.0000000] cpu0 at mainbus0 apid 0 > [ 1.0000000] cpu0: Use lfence to serialize rdtsc > [ 1.0000000] cpu0: QEMU Virtual CPU version 2.5+, id 0x60fb1 > [ 1.0000000] cpu0: node 0, package 0, core 0, smt 0 > [ 1.0000000] mpbios: bus 0 is type ISA > [ 1.0000000] ioapic0 at mainbus0 apid 2: pa 0xfec00000, version 0x20, 24 pins > [ 1.0000000] isa0 at mainbus0 > [ 1.0000000] com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, 16-byte FIFO > [ 1.0000000] com0: console > [ 1.0000000] allocated pic ioapic0 type edge pin 4 level 8 to cpu0 slot 0 idt entry 129 > [ 1.0000000] pv0 at mainbus0 > [ 1.0000000] virtio0 at pv0 > [ 1.0000000] virtio0: kernel parameters: console=com rw -v virtio_mmio.device=512@0xfeb00e00:12 > [ 1.0000000] virtio0: viommio: 512@0xfeb00e00:12 > [ 1.0000000] virtio0: VirtIO-MMIO-v1 > [ 1.0000000] virtio0: block device (id 2, rev. 0x00) > [ 1.0000000] ld0 at virtio0: features: 0x10002e54<INDIRECT_DESC,DISCARD,CONFIG_WCE,TOPOLOGY,FLUSH,BLK_SIZE,GEOMETRY,SEG_MAX> > [ 1.0000000] ld0: Unknown SIZE_MAX, assuming 65536 > [ 1.0000000] ld0: max 254 segs of max 65536 bytes > [ 1.0000000] virtio0: allocated 4227072 byte for virtqueue 0 for I/O request, size 1024 > [ 1.0000000] virtio0: using 4194304 byte (262144 entries) indirect descriptors > [ 1.0000000] allocated pic ioapic0 type level pin 12 level 6 to cpu0 slot 1 idt entry 96 > [ 1.0000000] virtio0: interrupting on -1 > [ 1.0000000] ld0: 1953 MB, 3968 cyl, 16 head, 63 sec, 512 bytes/sect x 4000000 sectors > [ 1.0000000] virtio1 at pv0 > [ 1.0000000] timecounter: Timecounter "lapic" frequency 1046204000 Hz quality -100 > [ 1.0000000] timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0 > [ 1.0000030] timecounter: Timecounter "TSC" frequency 2410445480 Hz quality -100 > [ 1.0000030] boot device: ld0 > [ 1.0000030] md0: internal 5000 KB image area > [ 1.0000030] root on md0a dumps on md0b > [ 1.0000030] root file system type: ffs > [ 1.0000030] kern.module.path=/stand/amd64/10.99.12/modules > [ 1.0100030] WARNING: no TOD clock present > [ 1.0100030] WARNING: using filesystem time > [ 1.0100030] WARNING: CHECK AND RESET THE DATE! > [ 1.0100030] warning: no /dev/console > Created tmpfs /dev (1835008 byte, 3552 inodes) > Could not mount the boot partition > erase ^?, werase ^W, kill ^U, intr ^C > This image contains utilities which may be needed > to get you out of a pinch. > # Your help in reviewing this work before committing will be very welcome! The patch: >From caa038822350a7f30a7975dc29386c052dca32de Mon Sep 17 00:00:00 2001 From: Pierre Pronchery <khor...@edgebsd.org> Date: Mon, 31 Mar 2025 04:36:00 +0200 Subject: [PATCH] amd64: add support for -initrd with VM_GUEST_GENPVH Tested on NetBSD/amd64 --- sys/arch/amd64/amd64/genassym.cf | 6 +++ sys/arch/amd64/amd64/locore.S | 65 +++++++++++++++++++++++++++++--- sys/arch/amd64/conf/MICROVM | 4 ++ sys/arch/x86/x86/x86_machdep.c | 32 ++++++++++++++++ 4 files changed, 102 insertions(+), 5 deletions(-) diff --git a/sys/arch/amd64/amd64/genassym.cf b/sys/arch/amd64/amd64/ genassym.cf index d8f31cd51a22..c93c79ffb32c 100644 --- a/sys/arch/amd64/amd64/genassym.cf +++ b/sys/arch/amd64/amd64/genassym.cf @@ -384,6 +384,12 @@ define SIR_XENIPL_HIGH SIR_XENIPL_HIGH define EVTCHN_UPCALL_MASK offsetof(struct vcpu_info, evtchn_upcall_mask) define HVM_START_INFO_SIZE sizeof(struct hvm_start_info) define START_INFO_VERSION offsetof(struct hvm_start_info, version) +define START_INFO_MODLIST_PADDR offsetof(struct hvm_start_info, modlist_paddr) +define START_INFO_NR_MODULES offsetof(struct hvm_start_info, nr_modules) +define HVM_MODLIST_ENTRY_SIZE sizeof(struct hvm_modlist_entry) +define MODLIST_ENTRY_CMDLINE offsetof(struct hvm_modlist_entry, cmdline_paddr) +define MODLIST_ENTRY_PADDR offsetof(struct hvm_modlist_entry, paddr) +define MODLIST_ENTRY_SIZE offsetof(struct hvm_modlist_entry, size) define MMAP_PADDR offsetof(struct hvm_start_info, memmap_paddr) define MMAP_ENTRIES offsetof(struct hvm_start_info, memmap_entries) define MMAP_ENTRY_SIZE sizeof(struct hvm_memmap_table_entry) diff --git a/sys/arch/amd64/amd64/locore.S b/sys/arch/amd64/amd64/locore.S index 6711b572324f..f3db58189b45 100644 --- a/sys/arch/amd64/amd64/locore.S +++ b/sys/arch/amd64/amd64/locore.S @@ -1106,10 +1106,60 @@ ENTRY(start_pvh) shrl $2, %ecx rep movsl - /* Copy cmdline_paddr after hvm_start_info */ + /* Copy hvm_modlist_entry[] after hvm_start_info */ + movl $RELOC(__kernel_end), %ebx + movl START_INFO_MODLIST_PADDR(%ebx), %esi + movl %edi, START_INFO_MODLIST_PADDR(%ebx) /* Set new modlist_paddr in hvm_start_info */ + movl START_INFO_NR_MODULES(%ebx), %eax /* Get nr_modules */ + movl $HVM_MODLIST_ENTRY_SIZE, %ecx /* ecx = sizeof(hvm_modlist_entry) */ + mull %ecx /* eax * ecx => edx:eax */ + movl %eax, %ecx + shrl $2, %ecx + rep movsl + + /* Copy the modules after the hvm_modlist_entry[] */ + xorl %ecx, %ecx /* ecx = i = 0 */ + .modlist_copy: + movl $RELOC(__kernel_end), %ebx /* ebx = &hvm_start_info */ + movl START_INFO_NR_MODULES(%ebx), %eax /* eax = nr_modules */ + cmpl %eax, %ecx /* if (ecx == nr_modules) */ + je .modlist_copy_done /* goto modlist_copy_done */ + push %ecx + /* Copy the module */ + movl START_INFO_MODLIST_PADDR(%ebx), %ebx /* ebx = &hvm_modlist_entry[0] */ + movl $HVM_MODLIST_ENTRY_SIZE, %eax /* eax = sizeof(hvm_modlist_entry) */ + mull %ecx /* eax *= ecx */ + addl %eax, %ebx /* ebx = &hvm_modlist_entry[i] */ + /* Copy the module's cmdline */ + movl MODLIST_ENTRY_CMDLINE(%ebx), %esi + xorl %eax, %eax + movl %eax, MODLIST_ENTRY_CMDLINE(%ebx) + cmpl %eax, %esi + je .modlist_cmdline_copy_done + + movl %edi, MODLIST_ENTRY_CMDLINE(%ebx) /* Set new cmdline_paddr in hvm_modlist_entry */ + .modlist_cmdline_copy: + movb (%esi), %al + movsb + cmp $0, %al + jne .modlist_cmdline_copy + .modlist_cmdline_copy_done: + + /* Copy the module's content */ + movl MODLIST_ENTRY_PADDR(%ebx), %esi /* esi = hvm_modlist_entry[i].paddr */ + movl %edi, MODLIST_ENTRY_PADDR(%ebx) /* Set new paddr in hvm_modlist_entry */ + movl MODLIST_ENTRY_SIZE(%ebx), %ecx /* ecx = hvm_modlist_entry[i].size */ + rep movsb + + pop %ecx /* i++ */ + inc %ecx + jmp .modlist_copy + .modlist_copy_done: + + /* Copy cmdline_paddr after the modules */ + movl $RELOC(__kernel_end), %ebx movl CMDLINE_PADDR(%ebx), %esi - movl $RELOC(__kernel_end), %ecx - movl %edi, CMDLINE_PADDR(%ecx) /* Set new cmdline_paddr in hvm_start_info */ + movl %edi, CMDLINE_PADDR(%ebx) /* Set new cmdline_paddr in hvm_start_info */ .cmdline_copy: movb (%esi), %al movsb @@ -1136,11 +1186,17 @@ ENTRY(start_pvh) /* announce ourself */ movl $VM_GUEST_GENPVH, RELOC(vm_guest) + /* determine the amount of data needed */ + movl %edi, %edx + subl $RELOC(__kernel_end), %edx + jmp .save_hvm_start_paddr .start_xen32: pop %ebx movl $VM_GUEST_XENPVH, RELOC(vm_guest) + /* XXX assume hvm_start_info+dependant structure fits in a single page */ + movl $PAGE_SIZE, %edx .save_hvm_start_paddr: /* @@ -1166,9 +1222,8 @@ ENTRY(start_pvh) movl $RELOC(HYPERVISOR_shared_info_pa),%ebp movl %ebx,(%ebp) movl $0,4(%ebp) - /* XXX assume hvm_start_info+dependant structure fits in a single page */ .add_hvm_start_info_page: - addl $PAGE_SIZE, %ebx + addl %edx, %ebx addl $PGOFSET,%ebx andl $~PGOFSET,%ebx addl $KERNBASE_LO,%ebx diff --git a/sys/arch/amd64/conf/MICROVM b/sys/arch/amd64/conf/MICROVM index 65982d42b4a9..864002a5eb25 100644 --- a/sys/arch/amd64/conf/MICROVM +++ b/sys/arch/amd64/conf/MICROVM @@ -23,3 +23,7 @@ machine amd64 x86 xen include "arch/x86/conf/MICROVM.common" options EXEC_ELF64 # exec ELF binaries +options MODULAR # new style module(7) framework + +options MEMORY_DISK_HOOKS # enable md specific hooks +options MEMORY_DISK_DYNAMIC # enable dynamic resizing diff --git a/sys/arch/x86/x86/x86_machdep.c b/sys/arch/x86/x86/ x86_machdep.c index ab5ffaf35410..7f3d2308ba46 100644 --- a/sys/arch/x86/x86/x86_machdep.c +++ b/sys/arch/x86/x86/x86_machdep.c @@ -215,6 +215,32 @@ mm_md_physacc(paddr_t pa, vm_prot_t prot) } #ifdef MODULAR +#ifdef XEN +void x86_add_xen_modules(void); +void x86_add_xen_modules(void) +{ + uint32_t i; +#if defined(MEMORY_DISK_HOOKS) && defined(MEMORY_DISK_DYNAMIC) + struct hvm_modlist_entry *modlist; +#endif + + if (hvm_start_info->nr_modules == 0) { + aprint_verbose("No Xen module info at boot\n"); + return; + } +#if defined(MEMORY_DISK_HOOKS) && defined(MEMORY_DISK_DYNAMIC) + modlist = (void *)((uintptr_t)hvm_start_info->modlist_paddr + KERNBASE); +#endif + for (i = 0; i < hvm_start_info->nr_modules; i++) { + /* XXX can be a filesystem image or ELF module or splashscreen */ +#if defined(MEMORY_DISK_HOOKS) && defined(MEMORY_DISK_DYNAMIC) + md_root_setconf( + (void *)((uintptr_t)modlist[i].paddr + KERNBASE), + modlist[i].size); +#endif + } +} +#endif /* * Push any modules loaded by the boot loader. */ @@ -224,6 +250,12 @@ module_init_md(void) struct btinfo_modulelist *biml; struct bi_modulelist_entry *bi, *bimax; +#ifdef XEN + if (vm_guest_is_pvh()) { + x86_add_xen_modules(); + } +#endif /* XEN */ + biml = lookup_bootinfo(BTINFO_MODULELIST); if (biml == NULL) { aprint_debug("No module info at boot\n"); -- 2.48.1 Cheers & HTH, -- khorben