On Wed, Apr 30, 2025 at 01:13:48PM +0200, Thomas Huth wrote: > On 30/04/2025 00.30, Peter Xu wrote: > > On Tue, Apr 29, 2025 at 05:21:40PM +0200, Thomas Huth wrote: > > > From: Thomas Huth <th...@redhat.com> > > > > > > For automatic tests, we need reference files from older QEMU versions. > > > QEMU 7.2 is a long term stable release, so it's a good candidate for > > > checking whether the migration could still work correctly. Let's add the > > > files from that version that have been taken with the "-dump-vmstate" > > > parameter of QEMU (compiled with single machines and the configure switch > > > "--without-default-devices" to keep the json files reasonable small). > > > > > > Some devices also have been removed manually from the json files, e.g. > > > the "pci-bridge" (which can be disabled in later QEMU versions via > > > Kconfig), > > > and some Linux-related devices like "scsi-block" and "scsi-generic" and > > > KVM-related devices. Without removing them, we might get errors otherwise > > > if these devices have not been compiled into the destination QEMU build. > > > > > > Signed-off-by: Thomas Huth <th...@redhat.com> > > > --- > > > MAINTAINERS | 1 + > > > .../aarch64/virt-7.2.json | 2571 +++++++++++++ > > > .../vmstate-static-checker/m68k/virt-7.2.json | 2936 ++++++++++++++ > > > .../ppc64/pseries-7.2.json | 1068 ++++++ > > > .../s390x/s390-ccw-virtio-7.2.json | 475 +++ > > > .../x86_64/pc-q35-7.2.json | 3402 +++++++++++++++++ > > > 6 files changed, 10453 insertions(+) > > > create mode 100644 > > > tests/data/vmstate-static-checker/aarch64/virt-7.2.json > > > create mode 100644 tests/data/vmstate-static-checker/m68k/virt-7.2.json > > > create mode 100644 > > > tests/data/vmstate-static-checker/ppc64/pseries-7.2.json > > > create mode 100644 > > > tests/data/vmstate-static-checker/s390x/s390-ccw-virtio-7.2.json > > > create mode 100644 > > > tests/data/vmstate-static-checker/x86_64/pc-q35-7.2.json > > > > This looks like an improvement indeed, it so far only covers machine type > > 7.2, rather than all machine types. I used to run these by hands before > > each release.. and sometimes I forgot. Before me, I am aware at least Dave > > used to run also during softfreezes. > > Since I had to strip down the dump files quite a bit (compiling QEMU with > --without-default-devices etc.) to avoid errors when the current build is > limited, it's maybe best if you continue checking manually at least once > during the soft-freeze with a non-stripped json file.
Yep, will do (if I won't forget to..). > > > One thing I am wondering is if we can do it the same way as the compat > > migration test in CI, so that we always compare with a base (which is the > > previous release binary), then move the base after each release. > > > > After all, due to migration-compat-common CI job, we always build the base > > version of QEMU already in build-previous-qemu, so we already have two > > binaries at hand. IIUC we only need one new job to generate the two JSON > > blobs, and feed them to the checker on both directions. > > > > If that is the case, it might have a benefit that we can cover all the > > machines as long as listed on both binaries, then check all of them? > > > > Meanwhile, if we keep the ball rolling for each release (by boosting the > > base QEMU binary version), IIUC it guarantees all the new binaries will > > make sure to provide compatible VMSDs all across since we start running the > > job. > > > > Would that work? > > I can have a try, though we might need to teach the checker script some more > things first, e.g. it's getting confused if we clean up old and unnecessary > fields like in commit 445d3facffe82788b880107c0849dab9505b33d9 ... Right, we definitely want to have such job allow_failure set.. And this is also not the only way to fail it, e.g. I just ran it on the latest 10.0 for q35-9.2 machine type, then I spot two more failures besides what you fixed in this series. x1:bin [migration-staging]$ ../scripts/vmstate-static-checker.py -s ./json.9.2 -d ./json.10.0 Section "sysbus-ahci" does not exist in dest Section "hpet", Description "hpet": expected field "num_timers", got "num_timers_save"; skipping rest The first one is caused by a Kconfig change - CONFIG_AHCI used to be selected via hw/ide on x86, now since there's the new ahci-sysbus.c since 1b26146e898086 in 10.0, vmstate_sysbus_ahci isn't available anymore on x86, but only selected by hw/arm/Kconfig even if I built the binary using exactly the same setup. The 2nd one is a rename which is similarly benign. I'll throw a patch soon for that. If one step further.. maybe we could also cache the checker's output for each (arch, machine type) tuple, then we mark the job fail only if the output changes. That may help to warn maintainers that something changed the VMSD so one can have a look. But we could start from simple.. -- Peter Xu