Thomas Huth <th...@redhat.com> writes: > On 18/07/2023 14.55, Milan Zamazal wrote: >> Thomas Huth <th...@redhat.com> writes: >> > >>> On 11/07/2023 01.02, Michael S. Tsirkin wrote: >>>> From: Milan Zamazal <mzama...@redhat.com> >>>> We don't have a virtio-scmi implementation in QEMU and only support >>> >>>> a >>>> vhost-user backend. This is very similar to virtio-gpio and we add the >>>> same >>>> set of tests, just passing some vhost-user messages over the control >>>> socket. >>>> Signed-off-by: Milan Zamazal <mzama...@redhat.com> >>>> Acked-by: Thomas Huth <th...@redhat.com> >>>> Message-Id: <20230628100524.342666-4-mzama...@redhat.com> >>>> Reviewed-by: Michael S. Tsirkin <m...@redhat.com> >>>> Signed-off-by: Michael S. Tsirkin <m...@redhat.com> >>>> --- >>>> tests/qtest/libqos/virtio-scmi.h | 34 ++++++ >>>> tests/qtest/libqos/virtio-scmi.c | 174 +++++++++++++++++++++++++++++++ >>>> tests/qtest/vhost-user-test.c | 44 ++++++++ >>>> MAINTAINERS | 1 + >>>> tests/qtest/libqos/meson.build | 1 + >>>> 5 files changed, 254 insertions(+) >>>> create mode 100644 tests/qtest/libqos/virtio-scmi.h >>>> create mode 100644 tests/qtest/libqos/virtio-scmi.c >>> >>> Hi! >>> >>> I'm seeing some random failures with this new scmi test, so far only >>> on non-x86 systems, e.g.: >>> >>> https://app.travis-ci.com/github/huth/qemu/jobs/606246131#L4774 >>> >>> It also reproduces on a s390x host here, but only if I run "make check >>> -j$(nproc)" - if I run the tests single-threaded, the qos-test passes >>> there. Seems like there is a race somewhere in this test? >> Hmm, it's basically the same as virtio-gpio.c test, so it should be >> OK. >> Is it possible that the two tests (virtio-gpio.c & virtio-scmi.c) >> interfere with each other in some way? Is there possibly a way to >> serialize them to check? > > I think within one qos-test, the sub-tests are already run > serialized.
I see, OK. > But there might be multiple qos-tests running in parallel, e.g. one > for the aarch64 target and one for the ppc64 target. And indeed, I can > reproduce the problem on my x86 laptop by running this in one terminal > window: > > for ((x=0;x<1000;x++)); do \ > QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon \ > G_TEST_DBUS_DAEMON=.tests/dbus-vmstate-daemon.sh \ > QTEST_QEMU_BINARY=./qemu-system-ppc64 \ > MALLOC_PERTURB_=188 QTEST_QEMU_IMG=./qemu-img \ > tests/qtest/qos-test -p \ > > /ppc64/pseries/spapr-pci-host-bridge/pci-bus-spapr/pci-bus/vhost-user-scmi-pci/vhost-user-scmi/vhost-user-scmi-tests/scmi/read-guest-mem/memfile > \ > || break ; \ > done > > And this in another terminal window at the same time: > > for ((x=0;x<1000;x++)); do \ > QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon \ > G_TEST_DBUS_DAEMON=.tests/dbus-vmstate-daemon.sh \ > QTEST_QEMU_BINARY=./qemu-system-aarch64 \ > MALLOC_PERTURB_=188 QTEST_QEMU_IMG=./qemu-img \ > tests/qtest/qos-test -p \ > > /aarch64/virt/generic-pcihost/pci-bus-generic/pci-bus/vhost-user-scmi-pci/vhost-user-scmi/vhost-user-scmi-tests/scmi/read-guest-mem/memfile > \ > || break ; \ > done > > After a while, the aarch64 test broke with: > > /aarch64/virt/generic-pcihost/pci-bus-generic/pci-bus/vhost-user-scmi-pci/vhost-user-scmi/vhost-user-scmi-tests/scmi/read-guest-mem/memfile: > qemu-system-aarch64: Failed to set msg fds. > qemu-system-aarch64: Failed to set msg fds. > qemu-system-aarch64: vhost VQ 0 ring restore failed: -22: Invalid argument > (22) > qemu-system-aarch64: Failed to set msg fds. > qemu-system-aarch64: vhost VQ 1 ring restore failed: -22: Invalid argument > (22) > qemu-system-aarch64: Failed to set msg fds. > qemu-system-aarch64: vhost_set_vring_call failed 22 > qemu-system-aarch64: Failed to set msg fds. > qemu-system-aarch64: vhost_set_vring_call failed 22 > qemu-system-aarch64: Failed to write msg. Wrote -1 instead of 20. > qemu-system-aarch64: Failed to set msg fds. > qemu-system-aarch64: vhost VQ 0 ring restore failed: -22: Invalid argument > (22) > qemu-system-aarch64: Failed to set msg fds. > qemu-system-aarch64: vhost VQ 1 ring restore failed: -22: Invalid argument > (22) > qemu-system-aarch64: ../../devel/qemu/hw/pci/msix.c:659: > msix_unset_vector_notifiers: Assertion `dev->msix_vector_use_notifier > && dev->msix_vector_release_notifier' failed. > ../../devel/qemu/tests/qtest/libqtest.c:200: kill_qemu() detected QEMU > death from signal 6 (Aborted) (core dumped) > ** > ERROR:../../devel/qemu/tests/qtest/qos-test.c:191:subprocess_run_one_test: > child process > (/aarch64/virt/generic-pcihost/pci-bus-generic/pci-bus/vhost-user-scmi-pci/vhost-user-scmi/vhost-user-scmi-tests/scmi/read-guest-mem/memfile/subprocess > [488457]) failed unexpectedly > Aborted (core dumped) Interesting, good discovery. > Can you also reproduce it this way? Unfortunately not. I ran the loops several times and everything passed. I tried to compile and run it in a different distro container and it passed too. I also haven't been successful in getting any idea how the processes could influence each other. What OS and what QEMU configure flags did you use to compile and run it? Thanks, Milan