On Fri, Jul 05, 2024 at 10:39:33AM +0200, Stefano Garzarella wrote: > On Wed, Jul 03, 2024 at 06:49:30PM GMT, Michael S. Tsirkin wrote: > > On Tue, Jun 18, 2024 at 12:00:30PM +0200, Stefano Garzarella wrote: > > > As discussed with Michael and Markus [1], this version also includes the > > > patch > > > on which v7 depended to simplify the merge in Michael's tree. > > > > > > The series is all reviewed, so if there are no new changes required, I > > > would > > > ask to merge it. > > > > > > I dropped patches 9 and 10 for now since otherwise make vm-build-freebsd > > fails. > > > > Pls figure it out and resend just 9 and 10. > > I replicated locally, but I can't understand why it only happens in certain > architectures, in my case on loongarch64, ppc64, and riscv32: > > 326/846 qemu:qtest+qtest-loongarch64 / qtest-loongarch64/qos-test > ERROR 116.10s killed by signal 6 SIGABRT > 337/846 qemu:qtest+qtest-ppc64 / qtest-ppc64/qos-test > ERROR 115.10s killed by signal 6 SIGABRT > 339/846 qemu:qtest+qtest-riscv32 / qtest-riscv32/qos-test > ERROR 107.65s killed by signal 6 SIGABRT > > I focused on ppc64 running `gmake --output-sync -j6 check-qtest-ppc64` in > the FreeBSD VM and it fails every time. In particular, the test that fails > is the `vhost-user/reconnect` test, in fact disabling it this way, the > qos-test tests always pass: > > diff --git a/tests/qtest/vhost-user-test.c b/tests/qtest/vhost-user-test.c > index 0fa8951c9f..c3d686f0ee 100644 > --- a/tests/qtest/vhost-user-test.c > +++ b/tests/qtest/vhost-user-test.c > @@ -1118,9 +1119,11 @@ static void register_vhost_user_test(void) > "virtio-net", > test_migrate, &opts); > > +#if 0 > opts.before = vhost_user_test_setup_reconnect; > qos_add_test("vhost-user/reconnect", "virtio-net", > test_reconnect, &opts); > +#endif > > opts.before = vhost_user_test_setup_connect_fail; > qos_add_test("vhost-user/connect-fail", "virtio-net", > > Analyzing the test, what happens is that after the disconnection, the test > doesn't receive VHOST_USER_SET_MEM_TABLE message, so the second > `wait_for_fds()` fails after the 5 sec timeout (increasing it doesn't help), > not having received the fds. > > diff --git a/tests/qtest/vhost-user-test.c b/tests/qtest/vhost-user-test.c > index 0fa8951c9f..c3d686f0ee 100644 > --- a/tests/qtest/vhost-user-test.c > +++ b/tests/qtest/vhost-user-test.c > @@ -976,6 +976,7 @@ static void test_reconnect(void *obj, void *arg, > QGuestAllocator *alloc) > g_source_set_callback(src, reconnect_cb, s, NULL); > g_source_attach(src, s->context); > g_source_unref(src); > + // THIS one is failing > g_assert(wait_for_fds(s)); > wait_for_rings_started(s, 2); > } > > This is the test log (note: IIUC QEMU failures happen after the test exits > on the assertion, so so it could mean that the chardev reconnected > correctly): > > ▶ 28/30 > /ppc64/pseries/spapr-pci-host-bridge/pci-bus-spapr/pci-bus/virtio-net-pci/virtio-net/virtio-net-tests/vhost-user/reconnect > - ERROR:../src/tests/qtest/qos-test.c:191:subprocess_run_one_test: child > process > (/ppc64/pseries/spapr-pci-host-bridge/pci-bus-spapr/pci-bus/virtio-net-pci/virtio-net/virtio-net-tests/vhost-user/reconnect/subprocess > [54991]) failed unexpectedly FAIL > ▶ 28/30 > ERROR > [28-30/30] 🌒 qemu:qtest+qtest-ppc64 / qtest-ppc64/qmp-cmd-test > [28-30/30] 🌓 qemu:qtest+qtest-ppc64 / qtest-ppc64/migration-test > 28/30 qemu:qtest+qtest-ppc64 / qtest-ppc64/qos-test > ERROR 21.53s killed by signal 6 SIGABRT > >>> PYTHON=/usr/home/qemu/qemu-test.OD8v2L/build/pyvenv/bin/python3.9 > ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 > G_TEST_DBUS_DAEMON=/usr/home/qemu/qemu-test.OD8v2L/src/tests/dbus-vmstate-daemon.sh > QTEST_QEMU_BINARY=./qemu-system-ppc64 MALLOC_PERTURB_=141 > QTEST_QEMU_IMG=./qemu-img > QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon > UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 > /usr/home/qemu/qemu-test.OD8v2L/build/tests/qtest/qos-test --tap -k > ―――――――――――――――――――――――――――――――――――――――― ✀ > ―――――――――――――――――――――――――――――――――――――――― > stderr: > Vhost user backend fails to broadcast fake RARP > qemu-system-ppc64: -chardev > socket,id=chr-reconnect,path=/tmp/vhost-test-Z5VMQ2/reconnect.sock,server=on: > info: QEMU waiting for connection on: > disconnected:unix:/tmp/vhost-test-Z5VMQ2/reconnect.sock,server=on > ** > ERROR:../src/tests/qtest/vhost-user-test.c:255:wait_for_fds: assertion > failed: (s->fds_num) > qemu-system-ppc64: Failed to set msg fds. > qemu-system-ppc64: vhost VQ 0 ring restore failed: -22: Invalid argument > (22) > qemu-system-ppc64: Failed to set msg fds. > qemu-system-ppc64: vhost_set_vring_endian failed: Invalid argument (22) > qemu-system-ppc64: Failed to set msg fds. > qemu-system-ppc64: vhost VQ 1 ring restore failed: -22: Invalid argument > (22) > qemu-system-ppc64: Failed to set msg fds. > qemu-system-ppc64: vhost_set_vring_endian failed: Invalid argument (22) > qemu-system-ppc64: Failed to write msg. Wrote -1 instead of 12. > qemu-system-ppc64: vhost_backend_init failed: Protocol error > qemu-system-ppc64: failed to init vhost_net for queue 0 > ** > ERROR:../src/tests/qtest/qos-test.c:191:subprocess_run_one_test: child > process > (/ppc64/pseries/spapr-pci-host-bridge/pci-bus-spapr/pci-bus/virtio-net-pci/virtio-net/virtio-net-tests/vhost-user/reconnect/subprocess > [54991]) failed unexpectedly > (test program exited with status code -6) > > > I would think of some endianness problem, but it's strange that it only > happens in the reconnect test.
loongarch64 is LE and I think so is riscv in practice. > Next week I'll try to figure out why this is > systematic only on some architectures, does anyone have any ideas? > > Thanks, > Stefano