On Fri, 25 Mar 2022 at 08:04, Juan Quintela <quint...@redhat.com> wrote: > > Laurent Vivier <lviv...@redhat.com> wrote: > > Perhaps Juan or Thomas can help too (added to cc) > > > > Is this a regression? > > It looks like a bug in QEMU as it doesn't move from cancelling to cancelled.
I had a repeat of this hang (same machine), so here's the debug info I wasn't able to gather the first time round. > >> [Inferior 1 (process 2771497) detached] > >> =========================================================== > >> PROCESS: 2772862 > >> gitlab-+ 2772862 2771497 99 Mar23 ? 18:45:28 ./qemu-system-i386 > >> -qtest unix:/tmp/qtest-2771497.sock -qtest-log /dev/null -chardev > >> socket,path=/tmp/qtest-2771497.qmp,id=char0 -mon > >> chardev=char0,mode=control -display none -accel kvm -accel tcg -name > >> source,debug-threads=on -m 150M -serial > >> file:/tmp/migration-test-f6G71L/src_serial -drive > >> file=/tmp/migration-test-f6G71L/bootsect,format=raw -accel qtest > > Source of migration thread. > > >> [New LWP 2772864] > >> [New LWP 2772866] > >> [New LWP 2772867] > >> [New LWP 2772915] > >> [Thread debugging using libthread_db enabled] > >> Using host libthread_db library "/lib/s390x-linux-gnu/libthread_db.so.1". > >> 0x000003ff94ef1c9c in __ppoll (fds=0x2aa179a6f30, nfds=5, > >> timeout=<optimized out>, timeout@entry=0x3fff557b588, > >> sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44 > >> 44 ../sysdeps/unix/sysv/linux/ppoll.c: No such file or directory. > >> Thread 5 (Thread 0x3ff1b7f6900 (LWP 2772915)): > >> #0 futex_abstimed_wait_cancelable (private=0, abstime=0x0, clockid=0, > >> expected=0, futex_word=0x2aa1881f634) at > >> ../sysdeps/nptl/futex-internal.h:320 > >> #1 do_futex_wait (sem=sem@entry=0x2aa1881f630, abstime=0x0, > >> clockid=0) at sem_waitcommon.c:112 > >> #2 0x000003ff95011870 in __new_sem_wait_slow > >> (sem=sem@entry=0x2aa1881f630, abstime=0x0, clockid=0) at > >> sem_waitcommon.c:184 > >> #3 0x000003ff9501190e in __new_sem_wait (sem=sem@entry=0x2aa1881f630) > >> at sem_wait.c:42 > >> #4 0x000002aa165b1416 in qemu_sem_wait (sem=sem@entry=0x2aa1881f630) > >> at ../util/qemu-thread-posix.c:358 > >> #5 0x000002aa16023434 in multifd_send_sync_main (f=0x2aa17993760) at > >> ../migration/multifd.c:610 > >> #6 0x000002aa162a8f18 in ram_save_iterate (f=0x2aa17993760, > >> opaque=<optimized out>) at ../migration/ram.c:3049 > >> #7 0x000002aa1602bafc in qemu_savevm_state_iterate (f=0x2aa17993760, > >> postcopy=<optimized out>) at ../migration/savevm.c:1296 > >> #8 0x000002aa1601fe4e in migration_iteration_run (s=0x2aa17748010) at > >> ../migration/migration.c:3607 > >> #9 migration_thread (opaque=opaque@entry=0x2aa17748010) at > >> ../migration/migration.c:3838 > >> #10 0x000002aa165b05c2 in qemu_thread_start (args=<optimized out>) at > >> ../util/qemu-thread-posix.c:556 > >> #11 0x000003ff95007e66 in start_thread (arg=0x3ff1b7f6900) at > >> pthread_create.c:477 > >> #12 0x000003ff94efcbf6 in thread_start () at > >> ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65 > > Migration main thread in multifd_send_sync_main(), waiting for the > semaphore in > > for (i = 0; i < migrate_multifd_channels(); i++) { > MultiFDSendParams *p = &multifd_send_state->params[i]; > > trace_multifd_send_sync_main_wait(p->id); > qemu_sem_wait(&p->sem_sync); > } > > Knowing the value of i would be great. See the end of the email, I > think it is going to be 0. gdb says i is 1. Possibly the compiler has enthusiastically reordered the 'i++' above the qemu_sem_wait(), though. I tried to get gdb to tell me the value of migrate_multifd_channels(), but that was a mistake because gdb's attempt to execute code in the debuggee to answer that question did not work and left it in a state where it was broken and I had to kill it. Is there something we can put into either QEMU or the test case that will let us get some better information when this happens again ? thanks -- PMM