Peter Xu <pet...@redhat.com> writes: > On Mon, Sep 09, 2024 at 03:02:57PM +0100, Peter Maydell wrote: >> On Mon, 9 Sept 2024 at 14:51, Hyman Huang <yong.hu...@smartx.com> wrote: >> > >> > Despite the fact that the responsive CPU throttle is enabled, >> > the dirty sync count may not always increase because this is >> > an optimization that might not happen in any situation. >> > >> > This test case just making sure it doesn't interfere with any >> > current functionality. >> > >> > Signed-off-by: Hyman Huang <yong.hu...@smartx.com> >> >> tests/qtest/migration-test already runs 75 different >> subtests, takes up a massive chunk of our "make check" >> time, and is very commonly a "times out" test on some >> of our CI jobs. It runs on five different guest CPU >> architectures, each one of which takes between 2 and >> 5 minutes to complete the full migration-test. >> >> Do we really need to make it even bigger? > > I'll try to find some time in the next few weeks looking into this to see > whether we can further shrink migration test times after previous attemps > from Dan. At least a low hanging fruit is we should indeed put some more > tests into g_test_slow(), and this new test could also be a candidate (then > we can run "-m slow" for migration PRs only).
I think we could (using -m slow or any other method) separate tests that are generic enough that every CI run should benefit from them vs. tests that are only useful once someone starts touching migration code. I'd say very few in the former category and most of them in the latter. For an idea of where migration bugs lie, I took a look at what was fixed since 2022: # bugs | device/subsystem/arch ---------------------------------- 54 | migration 10 | vfio 6 | ppc 3 | virtio-gpu 2 | pcie_sriov, tpm_emulator, vdpa, virtio-rng-pci 1 | arm, block, gpio, lasi, pci, s390, scsi-disk, virtio-mem, TCG >From these, ignoring the migration bugs, the migration-tests cover some of: arm, ppc, s390, TCG. The device_opts[1] patch hasn't merged yet, but once it is, then virtio-gpu would be covered and we could investigate adding some of the others. For actual migration code issues: # bugs | (sub)subsystem | kind ---------------------------------------------- 13 | multifd | correctness/races 8 | ram | correctness 8 | rdma: | general programming 7 | qmp | new api bugs 5 | postcopy | races 4 | file: | leaks 3 | return path | races 3 | fd_cleanup | races 2 | savevm, aio/coroutines 1 | xbzrle, colo, dirtyrate, exec:, windows, iochannel, qemufile, arch (ppc64le) Here, the migration-tests cover well: multifd, ram, qmp, postcopy, file, rp, fd_cleanup, iochannel, qemufile, xbzrle. My suggestion is we run per arch: "/precopy/tcp/plain" "/precopy/tcp/tls/psk/match", "/postcopy/plain" "/postcopy/preempt/plain" "/postcopy/preempt/recovery/plain" "/multifd/tcp/plain/cancel" "/multifd/tcp/uri/plain/none" and x86 gets extra: "/precopy/unix/suspend/live" "/precopy/unix/suspend/notlive" "/dirty_ring" (the other dirty_* tests are too slow) All the rest go behind a knob that people touching migration code will enable. wdyt? 1- allows adding devices to QEMU cmdline for migration-test https://lore.kernel.org/r/20240523201922.28007-4-faro...@suse.de