Thomas Huth <th...@redhat.com> wrote: > On 03/02/2023 22.14, Juan Quintela wrote: >> Peter Maydell <peter.mayd...@linaro.org> wrote: >>> On Fri, 3 Feb 2023 at 15:44, Thomas Huth <th...@redhat.com> wrote: >>>> >>>> On 03/02/2023 13.08, Kevin Wolf wrote: >>>>> Am 03.02.2023 um 12:23 hat Thomas Huth geschrieben: >>>>>> On 30/01/2023 11.58, Daniel P. Berrangé wrote: >>>>>>> On Mon, Jan 30, 2023 at 11:44:46AM +0100, Thomas Huth wrote: >>>>>>>> We can get rid of the build-coroutine-sigaltstack job by moving >>>>>>>> the configure flags that should be tested here to other jobs: >>>>>>>> Move --with-coroutine=sigaltstack to the build-without-defaults job >>>>>>>> and --enable-trace-backends=ftrace to the cross-s390x-kvm-only job. >>>>>>> >>>>>>> The biggest user of coroutines is the block layer. So we probably >>>>>>> ought to have coroutines aligned with a job that triggers the >>>>>>> 'make check-block' for iotests. IIUC, the without-defaults >>>>>>> job won't do that. How about, arbitrarily, using either the >>>>>>> 'check-system-debian' or 'check-system-ubuntu' job. Those distros >>>>>>> are closely related, so getting sigaltstack vs ucontext coverage >>>>>>> between them is a good win, and they both trigger the block jobs >>>>>>> IIUC. >>>>>> >>>>>> I gave it a try with the ubuntu job, but this apparently trips up the >>>>>> iotests: >>>>>> >>>>>> https://gitlab.com/thuth/qemu/-/jobs/3705965062#L212 >>>>>> >>>>>> Does anybody have a clue what could be going wrong here? >>>>> >>>>> I'm not sure how changing the coroutine backend could cause it, but >>>>> primarily this looks like an assertion failure in migration code. >>>>> >>>>> Dave, Juan, any ideas what this assertion checks and why it could be >>>>> failing? >>>> >>>> Ah, I think it's the bug that will be fixed by: >>>> >>>> >>>> https://lore.kernel.org/qemu-devel/20230202160640.2300-2-quint...@redhat.com/ >>>> >>>> The fix hasn't hit the master branch yet (I think), and I had another patch >>>> in my CI that disables the aarch64 binary in that runner, so the iotests >>>> suddenly have been executed with the alpha binary there --> migration >>>> fails. >>>> >>>> So never mind, it will be fixed as soon as Juan's pull request gets >>>> included. >>> >>> The migration tests have been flaky for a while now, >>> including setups where host and guest page sizes are the same. >>> (For instance, my x86 macos box pretty reliably sees failures >>> when the machine is under load.) >> I *thought* that we had fixed all of those. >> But it is difficult for me to know because: >> - I only happens when one runs "make check" >> - running ./migration-test have never failed to me >> - When it fails (and it has been a while since it has failed to me) >> it is impossible to me to detect what is going on, and as said, I have >> never been able to reproduce running only migration-test. >> I will try to run several at the same time and see if it happens. >> And as Thomas said, I *think* that the fix that Peter Xu posted >> should >> fix this issue. Famous last words. > > The patch from Peter should fix my problems that I triggered via the > iotests - but the migration-qtest is still unstable independent from > that issue, I think. See for example the latest staging pipeline: > > https://gitlab.com/qemu-project/qemu/-/pipelines/767961842 > > The migration qtest failed in both, the x86-freebsd-build and the > ubuntu-20.04-s390x-all pipelin. > > Thomas
31/659 qemu:qtest+qtest-aarch64 / qtest-aarch64/migration-test ERROR 48.23s killed by signal 6 SIGABRT >>> G_TEST_DBUS_DAEMON=/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/tests/dbus-vmstate-daemon.sh >>> QTEST_QEMU_IMG=./qemu-img QTEST_QEMU_BINARY=./qemu-system-aarch64 >>> MALLOC_PERTURB_=124 >>> QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon >>> /home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/build/tests/qtest/migration-test >>> --tap -k ――――――――――――――――――――――――――――――――――――― ✀ ――――――――――――――――――――――――――――――――――――― stderr: Broken pipe ../tests/qtest/libqtest.c:190: kill_qemu() detected QEMU death from signal 11 (Segmentation fault) (core dumped) TAP parsing error: Too few tests run (expected 41, got 12) (test program exited with status code -6) ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― I don't know hat to do with this: - this is aarch64 tcg - this *works* on f37, or at least I can't reproduce any error with make check on my box, and I *think* my configuration is quite extensive (as far as I know everything that can be compiled in fedora with packages in the distro): configure file: /mnt/code/qemu/full/configure --enable-trace-backends=log --prefix=/usr --sysconfdir=/etc/sysconfig/ --audio-drv-list=pa,alsa --with-coroutine=ucontext --with-git-submodules=validate --enable-alsa --enable-attr --enable-auth-pam --enable-avx2 --enable-avx512f --enable-bochs --enable-bpf --enable-brlapi --disable-bsd-user --enable-bzip2 --enable-cap-ng --enable-capstone --disable-cfi --disable-cfi-debug --enable-cloop --disable-cocoa --enable-containers --disable-coreaudio --enable-coroutine-pool --enable-crypto-afalg --enable-curl --enable-curses --enable-dbus-display --enable-debug-info --disable-debug-mutex --disable-debug-stack-usage --disable-debug-tcg --enable-dmg --enable-docs --disable-dsound --enable-fdt --enable-fuse --enable-fuse-lseek --disable-fuzzing --disable-gcov --disable-gcrypt --enable-gettext --enable-gio --enable-glusterfs --enable-gnutls --disable-gprof --enable-gtk --enable-guest-agent --disable-guest-agent-msi --disable-hax --disable-hvf --enable-iconv --enable-install-blobs --enable-jack --enable-keyring --enable-kvm --enable-l2tpv3 --enable-libdaxctl --enable-libiscsi --enable-libnfs --enable-libpmem --enable-libssh --enable-libudev --enable-libusb --enable-linux-aio --enable-linux-io-uring --enable-linux-user --enable-live-block-migration --disable-lto --disable-lzfse --enable-lzo --disable-malloc-trim --enable-membarrier --enable-module-upgrades --enable-modules --enable-mpath --enable-multiprocess --disable-netmap --enable-nettle --enable-numa --disable-nvmm --enable-opengl --enable-oss --enable-pa --enable-parallels --enable-pie --enable-plugins --enable-png --disable-profiler --enable-pvrdma --enable-qcow1 --enable-qed --disable-qom-cast-debug --enable-rbd --enable-rdma --enable-replication --enable-rng-none --disable-safe-stack --disable-sanitizers --enable-stack-protector --enable-sdl --enable-sdl-image --enable-seccomp --enable-selinux --enable-slirp --enable-slirp-smbd --enable-smartcard --enable-snappy --enable-sparse --enable-spice --enable-spice-protocol --enable-system --enable-tcg --disable-tcg-interpreter --enable-tools --enable-tpm --disable-tsan --disable-u2f --enable-usb-redir --enable-user --disable-vde --enable-vdi --enable-vhost-crypto --enable-vhost-kernel --enable-vhost-net --enable-vhost-user --enable-vhost-user-blk-server --enable-vhost-vdpa --enable-virglrenderer --enable-virtfs --enable-virtiofsd --enable-vnc --enable-vnc-jpeg --enable-vnc-sasl --enable-vte --enable-vvfat --enable-werror --disable-whpx --enable-xen --enable-xen-pci-passthrough --enable-xkbcommon --enable-zstd - It gives a segmentation fault. Nothing else. Can we get at least a backtrace to work from there? Thanks, Juan.