Peter Maydell <peter.mayd...@linaro.org> wrote: > On Fri, 3 Feb 2023 at 15:44, Thomas Huth <th...@redhat.com> wrote: >> >> On 03/02/2023 13.08, Kevin Wolf wrote: >> > Am 03.02.2023 um 12:23 hat Thomas Huth geschrieben: >> >> On 30/01/2023 11.58, Daniel P. Berrangé wrote: >> >>> On Mon, Jan 30, 2023 at 11:44:46AM +0100, Thomas Huth wrote: >> >>>> We can get rid of the build-coroutine-sigaltstack job by moving >> >>>> the configure flags that should be tested here to other jobs: >> >>>> Move --with-coroutine=sigaltstack to the build-without-defaults job >> >>>> and --enable-trace-backends=ftrace to the cross-s390x-kvm-only job. >> >>> >> >>> The biggest user of coroutines is the block layer. So we probably >> >>> ought to have coroutines aligned with a job that triggers the >> >>> 'make check-block' for iotests. IIUC, the without-defaults >> >>> job won't do that. How about, arbitrarily, using either the >> >>> 'check-system-debian' or 'check-system-ubuntu' job. Those distros >> >>> are closely related, so getting sigaltstack vs ucontext coverage >> >>> between them is a good win, and they both trigger the block jobs >> >>> IIUC. >> >> >> >> I gave it a try with the ubuntu job, but this apparently trips up the >> >> iotests: >> >> >> >> https://gitlab.com/thuth/qemu/-/jobs/3705965062#L212 >> >> >> >> Does anybody have a clue what could be going wrong here? >> > >> > I'm not sure how changing the coroutine backend could cause it, but >> > primarily this looks like an assertion failure in migration code. >> > >> > Dave, Juan, any ideas what this assertion checks and why it could be >> > failing? >> >> Ah, I think it's the bug that will be fixed by: >> >> >> https://lore.kernel.org/qemu-devel/20230202160640.2300-2-quint...@redhat.com/ >> >> The fix hasn't hit the master branch yet (I think), and I had another patch >> in my CI that disables the aarch64 binary in that runner, so the iotests >> suddenly have been executed with the alpha binary there --> migration fails. >> >> So never mind, it will be fixed as soon as Juan's pull request gets included. > > The migration tests have been flaky for a while now, > including setups where host and guest page sizes are the same. > (For instance, my x86 macos box pretty reliably sees failures > when the machine is under load.)
I *thought* that we had fixed all of those. But it is difficult for me to know because: - I only happens when one runs "make check" - running ./migration-test have never failed to me - When it fails (and it has been a while since it has failed to me) it is impossible to me to detect what is going on, and as said, I have never been able to reproduce running only migration-test. I will try to run several at the same time and see if it happens. And as Thomas said, I *think* that the fix that Peter Xu posted should fix this issue. Famous last words. Later, Juan.