On Fri, 8 Apr 2022 at 15:15, Peter Maydell <peter.mayd...@linaro.org> wrote: > > This patchset implements emulation of GICv4 in our TCG GIC and ITS > models, and makes the virt board use it where appropriate.
> Tested with a Linux kernel passing through a virtio-blk device > to an inner Linux VM with KVM/QEMU. (NB that to get the outer > Linux kernel to actually use the new GICv4 functionality you > need to pass it "kvm-arm.vgic_v4_enable=1", as the kernel > will not use it by default.) I guess I might as well post my notes here about how I set up that test environment. These are a bit too scrappy (and rather specific about a niche thing) to be proper documentation, but having them in the list archives might be helpful in future... ===nested-setup.txt=== How to set up an environment to test QEMU's emulation of virtualization, with PCI passthrough of a virtio-blk-pci device to the L2 guest (1) Set up a Debian aarch64 guest (the instructions in the old blog post https://translatedcode.wordpress.com/2017/07/24/installing-debian-on-qemus-64-bit-arm-virt-board/ still work; I used Debian bullseye for my testing). (2) Copy the hda.qcow2 to hda-for-inner.qcow2; run the L1 guest using the 'runme' script. Caution: the virtio devices need to be in this order (hda.qcow2, network,hda-for-inner.qcow2), because systemd in the guest names the ethernet interface based on which PCI slot it goes in. (3) In the L1 guest, first we need to fix up the hda-for-inner.qcow2 so that it has different UUIDs and partition UUIDs from hda.qcow2. You'll need to make sure you have the blkid, gdisk, tune2fs, swaplabel utilities installed in the guest. swapoff -a # L1 guest might have swapped onto /dev/vdb2 by accident # print current partition IDs; you'll see that vda and vdb currently # share IDs for their partitions, and we must change those for vdb blkid # first change the PARTUUIDs with gdisk; this is the answer from # https://askubuntu.com/questions/1250224/how-to-change-partuuid gdisk /dev/vdb x # change to experts menu c # change partition ID 1 # for partition 1 R # pick a random ID c # ditto for partitions 2, 3 2 R c 3 R m # back to main menu w # write partition table q # quit # change UUIDs; from https://unix.stackexchange.com/questions/12858/how-to-change-filesystem-uuid-2-same-uuid tune2fs -U random /dev/vdb1 tune2fs -U random /dev/vdb2 swaplabel -U $(uuidgen) /dev/vdb3 # Check the UUIDs and PARTUUIDs are now all changed: blkid # Now update the fstab in the L2 filesystem: mount /dev/vdb2 /mnt # Finally, edit /mnt/etc/fstab to set the UUID values for /, /boot and swap to # the new ones for /dev/vdb's partitions vi /mnt/etc/fstab # or editor of your choice umount /mnt # shutdown the L1 guest now, to ensure that all the changes to that # qcow2 file are committed shutdown -h now (4) Copy necessary files into the L1 guest's filesystem; you can run the L1 guest and run scp there to copy from your host machine, or any other method you like. You'll need: - the vmlinuz (same one being used for L1) - the initrd - some scripts [runme-inner, runme-inner-nopassthru, reassign-vdb] - a copy of hda-for-inner.qcow2 (probably best to copy it to a temporary file while the L1 guest is not running, then copy that into the guest) - the qemu-system-aarch64 you want to use as the L2 QEMU (I cross-compiled this on my x86-64 host. The packaged Debian bullseye qemu-system-aarch64 will also work if you don't need to use a custom QEMU for L2.) (5) Now you can run the L2 guest without using PCI passthrough like this: ./runme-inner-nopassthru ./qemu-system-aarch64 (6) And you can run the L2 guest with PCI passthrough like this: # you only need to run reassign-vdb once for any given run of the # L1 guest, to give the PCI device to vfio-pci rather than to the # L1 virtio driver. After that you can run the L2 QEMU multiple times. ./reassign-vdb ./runme-inner ./qemu-system-aarch64 Notes: I have set up the various 'runme' scripts so that L1 has a mux of stdio and the monitor, which means that you can kill it with ^A-x, and ^C will be delivered to the L1 guest. The L2 guest has plain '-serial stdio', which means that ^C will kill the L2 guest. The 'runme' scripts expect their first argument to be the path to the QEMU you want to run; any further arguments are extra arguments to that QEMU. So you can do things like: # pass more arguments to QEMU, here disabling the ITS ./runme ~/qemu-system-aarch64 -machine its=off # run gdb, and run QEMU under gdb ./runme gdb --args ~/qemu-system-aarch64 -machine its=off The 'runme' scripts should be in the same directory as the kernel etc files they go with; but you don't need to be in that directory to run them. ===endit=== ===runme=== #!/bin/sh -e TESTDIR="$(cd "$(dirname "$0")"; pwd)" QEMU="$@" # Run with GICv3 and the disk image with a nested copy in it # (for testing EL2/GICv3-virt emulation) : ${KERNEL:=$TESTDIR/vmlinuz-5.10.0-9-arm64} : ${INITRD:=$TESTDIR/initrd.img-5.10.0-9-arm64} : ${DISK:=$TESTDIR/hda.qcow2} : ${INNERDISK:=$TESTDIR/hda-for-inner.qcow2} # Note that the virtio-net-pci must be the 2nd PCI device, # because otherwise the network interface name it gets will # not match /etc/network/interfaces. # set up with -serial mon:stdio so we can ^C the inner QEMU IOMMU_ADDON=',iommu_platform=on,disable-modern=off,disable-legacy=on' ${QEMU} \ -cpu cortex-a57 \ -machine type=virt \ -machine gic-version=max \ -machine virtualization=true \ -machine iommu=smmuv3 \ -m 1024M \ -kernel "${KERNEL}" -initrd "${INITRD}" \ -drive if=none,id=mydrive,file="${DISK}",format=qcow2 \ -device virtio-blk-pci,drive=mydrive \ -netdev user,id=mynet \ -device virtio-net-pci,netdev=mynet \ -drive if=none,id=innerdrive,file="${INNERDISK}",format=qcow2 \ -device virtio-blk-pci,drive=innerdrive"$IOMMU_ADDON" \ -append 'console=ttyAMA0,38400 keep_bootcon root=/dev/vda2 kvm-arm.vgic_v4_enable=1' \ -chardev socket,id=monitor,host=127.0.0.1,port=4444,server=on,wait=off,telnet=on \ -mon chardev=monitor,mode=readline \ -display none -serial mon:stdio ===endit=== ===reassign-vdb=== #!/bin/sh -e # Script to detach the /dev/vdb PCI device from the virtio-blk driver # and hand it to vfio-pci PCIDEV=0000:00:03.0 echo -n "$PCIDEV" > /sys/bus/pci/drivers/virtio-pci/unbind modprobe vfio-pci echo vfio-pci > /sys/bus/pci/devices/"$PCIDEV"/driver_override echo -n "$PCIDEV" > /sys/bus/pci/drivers/vfio-pci/bind ===endit=== ===runme-inner=== #!/bin/sh -e TESTDIR="$(cd "$(dirname "$0")"; pwd)" QEMU="$@" # run the inner guest, passing it the passthrough PCI device : ${KERNEL:=$TESTDIR/vmlinuz-5.10.0-9-arm64} : ${INITRD:=$TESTDIR/initrd.img-5.10.0-9-arm64} # set up with -serial stdio so we can ^C the inner QEMU # use -net none to work around the default virtio-net-pci # network device wanting to load efi-virtio.rom, which the # L1 guest's debian package puts somewhere other than where # our locally compiled qemu-system-aarch64 wants to find it. ${QEMU} \ -cpu cortex-a57 \ -enable-kvm \ -machine type=virt \ -machine gic-version=3 \ -m 256M \ -kernel "${KERNEL}" -initrd "${INITRD}" \ -append 'console=ttyAMA0,38400 keep_bootcon root=/dev/vda2' \ -display none -serial stdio \ -device vfio-pci,host=0000:00:03.0,id=pci0 \ -net none ===endit=== ===runme-inner-nopassthru=== #!/bin/sh -e TESTDIR="$(cd "$(dirname "$0")"; pwd)" QEMU="$@" # run the inner guest, passing it a disk image : ${KERNEL:=$TESTDIR/vmlinuz-5.10.0-9-arm64} : ${INITRD:=$TESTDIR/initrd.img-5.10.0-9-arm64} : ${DISK:=$TESTDIR/hda-for-inner.qcow2} # set up with -serial stdio so we can ^C the inner QEMU # use -net none to work around the default virtio-net-pci # network device wanting to load efi-virtio.rom, which the # L1 guest's debian package puts somewhere other than where # our locally compiled qemu-system-aarch64 wants to find it. ${QEMU} \ -cpu cortex-a57 \ -enable-kvm \ -machine type=virt \ -machine gic-version=3 \ -m 256M \ -kernel "${KERNEL}" -initrd "${INITRD}" \ -drive if=none,id=mydrive,file="${DISK}",format=qcow2 \ -device virtio-blk-pci,drive=mydrive \ -append 'console=ttyAMA0,38400 keep_bootcon root=/dev/vda2' \ -display none -serial stdio \ -net none ===endit=== -- PMM