Hi tech-kern@,

On Mon, 31 Mar 2025 04:40:30 -0000 (UTC), Pierre Pronchery wrote:

> This post is related to iMil's recent work on PVH support for
> NetBSD/amd64.
> I was unable to use his work to boot on ramdisks directly with QEMU's -
> initrd flag, when using -kernel.
> 
> Well after a deep dive into it, I think I am almost there:
> https://git.edgebsd.org/gitweb/?
> p=src.git;a=commitdiff;h=629621f41089af50584214a4d32b50ae8ee414f2
> 
> This patch:
> - extends sys/arch/amd64/amd64/genassym.cf for additional knowledge of
>   Xen's hvm_start_info (notably nr_modules and modlist_paddr)
> - extends .start_genpvh in locore.S to copy the module entries, and
> their
>   respective command lines and contents
> - teaches x86_machdep.c to load Xen modules when a VM_GUEST_GENPVH guest
> 
> The code is not working yet unfortunately.

Well, now it does; with MICROVM, on an Intel-macOS host:

> $ qemu-system-x86_64 -m 512 -accel hvf -display none -serial stdio \
>   -M microvm,rtc=off,acpi=off,pic=off -kernel netbsd-MICROVM -append \
>   console=com rw -v -initrd ramdisk-cgdroot.fs -action reboot=shutdown \
>   -D qemu.log -d cpu_reset,in_asm,guest_errors,unimp \
>   -device virtio-blk-device,drive=hd0 \
>   -drive file=ld0.img,format=raw,id=hd0
> qemu-system-x86_64: warning: host doesn't support requested feature: 
CPUID.80000001H:ECX.svm [bit 2]
> [   1.0000000] WARNING: system needs entropy for security; see entropy(7)
> [   1.0000000] [ Kernel symbol table missing! ]
> [   1.0000000] Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002,>  
2003,
> [   1.0000000]     2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 
> 2013,
> [   1.0000000]     2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 
2023,
> [   1.0000000]     2024, 2025
> [   1.0000000]     The NetBSD Foundation, Inc.  All rights reserved.
> [   1.0000000] Copyright (c) 1982, 1986, 1989, 1991, 1993
> [   1.0000000]     The Regents of the University of California.  All 
rights reserved.
> 
> [   1.0000000] NetBSD 10.99.12 (MICROVM) #0: Wed Apr  9 08:52:24 UTC 2025
> [   1.0000000]  mkre...@mkrepro.netbsd.org:/usr/src/sys/arch/amd64/
compile/MICROVM
> [   1.0000000] total memory = 511 MB
> [   1.0000000] avail memory = 480 MB
> [   1.0000000] KERNBASE=0xffffffff80000000
> [   1.0000000] modlist_paddr=0xffffffff80a00038 > 
cmdline_paddr=0xffffffff80ee2075 cmdline="console=com rw -v 
virtio_mmio.device=512@0xfeb00e00:12"
> [   1.0000000] Xen module info at boot (0xffffffff80a00038, 1)
> [   1.0000000] timecounter: Timecounters tick every 10.000 msec
> [   1.0000000] mainbus0 (root)
> [   1.0000000] mainbus0: Intel MP Specification (Version 1.4) (QBOOT    
000000000000)
> [   1.0000000] cpu0 at mainbus0 apid 0
> [   1.0000000] cpu0: Use lfence to serialize rdtsc
> [   1.0000000] cpu0: QEMU Virtual CPU version 2.5+, id 0x60fb1
> [   1.0000000] cpu0: node 0, package 0, core 0, smt 0
> [   1.0000000] mpbios: bus 0 is type ISA   
> [   1.0000000] ioapic0 at mainbus0 apid 2: pa 0xfec00000, version 0x20, 
24 pins
> [   1.0000000] isa0 at mainbus0
> [   1.0000000] com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, 16-byte 
FIFO
> [   1.0000000] com0: console
> [   1.0000000] allocated pic ioapic0 type edge pin 4 level 8 to cpu0 slot 
0 idt entry 129
> [   1.0000000] pv0 at mainbus0
> [   1.0000000] virtio0 at pv0
> [   1.0000000] virtio0: kernel parameters: console=com rw -v 
virtio_mmio.device=512@0xfeb00e00:12
> [   1.0000000] virtio0: viommio: 512@0xfeb00e00:12
> [   1.0000000] virtio0: VirtIO-MMIO-v1
> [   1.0000000] virtio0: block device (id 2, rev. 0x00)
> [   1.0000000] ld0 at virtio0: features: 
0x10002e54<INDIRECT_DESC,DISCARD,CONFIG_WCE,TOPOLOGY,FLUSH,BLK_SIZE,GEOMETRY,SEG_MAX>
> [   1.0000000] ld0: Unknown SIZE_MAX, assuming 65536
> [   1.0000000] ld0: max 254 segs of max 65536 bytes
> [   1.0000000] virtio0: allocated 4227072 byte for virtqueue 0 for I/O 
request, size 1024
> [   1.0000000] virtio0: using 4194304 byte (262144 entries) indirect 
descriptors
> [   1.0000000] allocated pic ioapic0 type level pin 12 level 6 to cpu0 
slot 1 idt entry 96
> [   1.0000000] virtio0: interrupting on -1
> [   1.0000000] ld0: 1953 MB, 3968 cyl, 16 head, 63 sec, 512 bytes/sect x 
4000000 sectors
> [   1.0000000] virtio1 at pv0
> [   1.0000000] timecounter: Timecounter "lapic" frequency 1046204000 Hz 
quality -100
> [   1.0000000] timecounter: Timecounter "clockinterrupt" frequency 100 Hz 
quality 0
> [   1.0000030] timecounter: Timecounter "TSC" frequency 2410445480 Hz 
quality -100
> [   1.0000030] boot device: ld0
> [   1.0000030] md0: internal 5000 KB image area
> [   1.0000030] root on md0a dumps on md0b
> [   1.0000030] root file system type: ffs
> [   1.0000030] kern.module.path=/stand/amd64/10.99.12/modules
> [   1.0100030] WARNING: no TOD clock present
> [   1.0100030] WARNING: using filesystem time
> [   1.0100030] WARNING: CHECK AND RESET THE DATE!
> [   1.0100030] warning: no /dev/console
> Created tmpfs /dev (1835008 byte, 3552 inodes)
> Could not mount the boot partition
> erase ^?, werase ^W, kill ^U, intr ^C
> This image contains utilities which may be needed
> to get you out of a pinch.
> # 

Your help in reviewing this work before committing will be very welcome!

The patch:

>From caa038822350a7f30a7975dc29386c052dca32de Mon Sep 17 00:00:00 2001
From: Pierre Pronchery <khor...@edgebsd.org>
Date: Mon, 31 Mar 2025 04:36:00 +0200
Subject: [PATCH] amd64: add support for -initrd with VM_GUEST_GENPVH

Tested on NetBSD/amd64
---
 sys/arch/amd64/amd64/genassym.cf |  6 +++
 sys/arch/amd64/amd64/locore.S    | 65 +++++++++++++++++++++++++++++---
 sys/arch/amd64/conf/MICROVM      |  4 ++
 sys/arch/x86/x86/x86_machdep.c   | 32 ++++++++++++++++
 4 files changed, 102 insertions(+), 5 deletions(-)

diff --git a/sys/arch/amd64/amd64/genassym.cf b/sys/arch/amd64/amd64/
genassym.cf
index d8f31cd51a22..c93c79ffb32c 100644
--- a/sys/arch/amd64/amd64/genassym.cf
+++ b/sys/arch/amd64/amd64/genassym.cf
@@ -384,6 +384,12 @@ define SIR_XENIPL_HIGH             SIR_XENIPL_HIGH
 define EVTCHN_UPCALL_MASK      offsetof(struct vcpu_info, 
evtchn_upcall_mask)
 define HVM_START_INFO_SIZE     sizeof(struct hvm_start_info)
 define START_INFO_VERSION      offsetof(struct hvm_start_info, version)
+define START_INFO_MODLIST_PADDR        offsetof(struct hvm_start_info, 
modlist_paddr)
+define START_INFO_NR_MODULES   offsetof(struct hvm_start_info, nr_modules)
+define HVM_MODLIST_ENTRY_SIZE  sizeof(struct hvm_modlist_entry)
+define MODLIST_ENTRY_CMDLINE   offsetof(struct hvm_modlist_entry, 
cmdline_paddr)
+define MODLIST_ENTRY_PADDR     offsetof(struct hvm_modlist_entry, paddr)
+define MODLIST_ENTRY_SIZE      offsetof(struct hvm_modlist_entry, size)
 define MMAP_PADDR              offsetof(struct hvm_start_info, 
memmap_paddr)
 define MMAP_ENTRIES            offsetof(struct hvm_start_info, 
memmap_entries)
 define MMAP_ENTRY_SIZE         sizeof(struct hvm_memmap_table_entry)
diff --git a/sys/arch/amd64/amd64/locore.S b/sys/arch/amd64/amd64/locore.S
index 6711b572324f..f3db58189b45 100644
--- a/sys/arch/amd64/amd64/locore.S
+++ b/sys/arch/amd64/amd64/locore.S
@@ -1106,10 +1106,60 @@ ENTRY(start_pvh)
        shrl $2, %ecx
        rep movsl
 
-       /* Copy cmdline_paddr after hvm_start_info */
+       /* Copy hvm_modlist_entry[] after hvm_start_info */
+       movl $RELOC(__kernel_end), %ebx
+       movl START_INFO_MODLIST_PADDR(%ebx), %esi
+       movl %edi, START_INFO_MODLIST_PADDR(%ebx)   /* Set new 
modlist_paddr in hvm_start_info */
+       movl START_INFO_NR_MODULES(%ebx), %eax /* Get nr_modules */
+       movl $HVM_MODLIST_ENTRY_SIZE, %ecx /* ecx = 
sizeof(hvm_modlist_entry) */
+       mull %ecx                        /* eax * ecx => edx:eax */
+       movl %eax, %ecx
+       shrl $2, %ecx
+       rep movsl
+
+       /* Copy the modules after the hvm_modlist_entry[] */
+       xorl %ecx, %ecx                 /* ecx = i = 0 */
+       .modlist_copy:
+       movl $RELOC(__kernel_end), %ebx /* ebx = &hvm_start_info */
+       movl START_INFO_NR_MODULES(%ebx), %eax /* eax = nr_modules */
+       cmpl %eax, %ecx                 /* if (ecx == nr_modules) */
+       je .modlist_copy_done           /*   goto modlist_copy_done */
+       push %ecx
+       /* Copy the module */
+       movl START_INFO_MODLIST_PADDR(%ebx), %ebx /* ebx = 
&hvm_modlist_entry[0] */
+       movl $HVM_MODLIST_ENTRY_SIZE, %eax /* eax = 
sizeof(hvm_modlist_entry) */
+       mull %ecx                       /* eax *= ecx */
+       addl %eax, %ebx                 /* ebx = &hvm_modlist_entry[i] */
+       /* Copy the module's cmdline */
+       movl MODLIST_ENTRY_CMDLINE(%ebx), %esi
+       xorl %eax, %eax
+       movl %eax, MODLIST_ENTRY_CMDLINE(%ebx)
+       cmpl %eax, %esi
+       je .modlist_cmdline_copy_done
+
+       movl %edi, MODLIST_ENTRY_CMDLINE(%ebx)  /* Set new cmdline_paddr in 
hvm_modlist_entry */
+       .modlist_cmdline_copy:
+       movb (%esi), %al
+       movsb
+       cmp $0, %al
+       jne .modlist_cmdline_copy
+       .modlist_cmdline_copy_done:
+
+       /* Copy the module's content */
+       movl MODLIST_ENTRY_PADDR(%ebx), %esi /* esi = 
hvm_modlist_entry[i].paddr */
+       movl %edi, MODLIST_ENTRY_PADDR(%ebx) /* Set new paddr in 
hvm_modlist_entry */
+       movl MODLIST_ENTRY_SIZE(%ebx), %ecx /* ecx = 
hvm_modlist_entry[i].size */
+       rep movsb
+
+       pop %ecx                        /* i++ */
+       inc %ecx
+       jmp .modlist_copy
+       .modlist_copy_done:
+
+       /* Copy cmdline_paddr after the modules */
+       movl $RELOC(__kernel_end), %ebx
        movl CMDLINE_PADDR(%ebx), %esi
-       movl $RELOC(__kernel_end), %ecx
-       movl %edi, CMDLINE_PADDR(%ecx)  /* Set new cmdline_paddr in 
hvm_start_info */
+       movl %edi, CMDLINE_PADDR(%ebx)  /* Set new cmdline_paddr in 
hvm_start_info */
        .cmdline_copy:
        movb (%esi), %al
        movsb
@@ -1136,11 +1186,17 @@ ENTRY(start_pvh)
        /* announce ourself */
        movl    $VM_GUEST_GENPVH, RELOC(vm_guest)
 
+       /* determine the amount of data needed */
+       movl    %edi, %edx
+       subl    $RELOC(__kernel_end), %edx
+
        jmp .save_hvm_start_paddr
 
 .start_xen32:
        pop %ebx
        movl    $VM_GUEST_XENPVH, RELOC(vm_guest)
+       /* XXX assume hvm_start_info+dependant structure fits in a single 
page */
+       movl    $PAGE_SIZE, %edx
 
 .save_hvm_start_paddr:
        /*
@@ -1166,9 +1222,8 @@ ENTRY(start_pvh)
        movl    $RELOC(HYPERVISOR_shared_info_pa),%ebp
        movl    %ebx,(%ebp)
        movl    $0,4(%ebp)
-       /* XXX assume hvm_start_info+dependant structure fits in a single 
page */
 .add_hvm_start_info_page:
-       addl    $PAGE_SIZE, %ebx
+       addl    %edx, %ebx
        addl    $PGOFSET,%ebx
        andl    $~PGOFSET,%ebx
        addl    $KERNBASE_LO,%ebx
diff --git a/sys/arch/amd64/conf/MICROVM b/sys/arch/amd64/conf/MICROVM
index 65982d42b4a9..864002a5eb25 100644
--- a/sys/arch/amd64/conf/MICROVM
+++ b/sys/arch/amd64/conf/MICROVM
@@ -23,3 +23,7 @@ machine amd64 x86 xen
 include         "arch/x86/conf/MICROVM.common"
 
 options         EXEC_ELF64      # exec ELF binaries
+options        MODULAR         # new style module(7) framework
+
+options        MEMORY_DISK_HOOKS       # enable md specific hooks
+options        MEMORY_DISK_DYNAMIC     # enable dynamic resizing
diff --git a/sys/arch/x86/x86/x86_machdep.c b/sys/arch/x86/x86/
x86_machdep.c
index ab5ffaf35410..7f3d2308ba46 100644
--- a/sys/arch/x86/x86/x86_machdep.c
+++ b/sys/arch/x86/x86/x86_machdep.c
@@ -215,6 +215,32 @@ mm_md_physacc(paddr_t pa, vm_prot_t prot)
 }
 
 #ifdef MODULAR
+#ifdef XEN
+void x86_add_xen_modules(void);
+void x86_add_xen_modules(void)
+{
+       uint32_t i;
+#if defined(MEMORY_DISK_HOOKS) && defined(MEMORY_DISK_DYNAMIC)
+       struct hvm_modlist_entry *modlist;
+#endif
+
+       if (hvm_start_info->nr_modules == 0) {
+               aprint_verbose("No Xen module info at boot\n");
+               return;
+       }
+#if defined(MEMORY_DISK_HOOKS) && defined(MEMORY_DISK_DYNAMIC)
+       modlist = (void *)((uintptr_t)hvm_start_info->modlist_paddr + 
KERNBASE);
+#endif
+       for (i = 0; i < hvm_start_info->nr_modules; i++) {
+               /* XXX can be a filesystem image or ELF module or 
splashscreen */
+#if defined(MEMORY_DISK_HOOKS) && defined(MEMORY_DISK_DYNAMIC)
+               md_root_setconf(
+                   (void *)((uintptr_t)modlist[i].paddr + KERNBASE),
+                   modlist[i].size);
+#endif
+       }
+}
+#endif
 /*
  * Push any modules loaded by the boot loader.
  */
@@ -224,6 +250,12 @@ module_init_md(void)
        struct btinfo_modulelist *biml;
        struct bi_modulelist_entry *bi, *bimax;
 
+#ifdef XEN
+       if (vm_guest_is_pvh()) {
+               x86_add_xen_modules();
+       }
+#endif /* XEN */
+
        biml = lookup_bootinfo(BTINFO_MODULELIST);
        if (biml == NULL) {
                aprint_debug("No module info at boot\n");
-- 
2.48.1

Cheers & HTH,
-- 
khorben

Reply via email to