** Description changed: + Command: + + qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 + + Hangs indefinitely approximately 30% of the runs. + + ---- + + Workaround: + + qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 + + Run "qemu-img convert" with "a single coroutine" to avoid this issue. + + ---- + + (gdb) thread 1 + ... + (gdb) bt + #0 0x0000ffffbf1ad81c in __GI_ppoll + #1 0x0000aaaaaabcf73c in ppoll + #2 qemu_poll_ns + #3 0x0000aaaaaabd0764 in os_host_main_loop_wait + #4 main_loop_wait + ... + + (gdb) thread 2 + ... + (gdb) bt + #0 syscall () + #1 0x0000aaaaaabd41cc in qemu_futex_wait + #2 qemu_event_wait (ev=ev@entry=0xaaaaaac86ce8 <rcu_call_ready_event>) + #3 0x0000aaaaaabed05c in call_rcu_thread + #4 0x0000aaaaaabd34c8 in qemu_thread_start + #5 0x0000ffffbf25c880 in start_thread + #6 0x0000ffffbf1b6b9c in thread_start () + + (gdb) thread 3 + ... + (gdb) bt + #0 0x0000ffffbf11aa20 in __GI___sigtimedwait + #1 0x0000ffffbf2671b4 in __sigwait + #2 0x0000aaaaaabd1ddc in sigwait_compat + #3 0x0000aaaaaabd34c8 in qemu_thread_start + #4 0x0000ffffbf25c880 in start_thread + #5 0x0000ffffbf1b6b9c in thread_start + + ---- + + (gdb) run + Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2 + ./disk01.ext4.qcow2 ./output.qcow2 + + [New Thread 0xffffbec5ad90 (LWP 72839)] + [New Thread 0xffffbe459d90 (LWP 72840)] + [New Thread 0xffffbdb57d90 (LWP 72841)] + [New Thread 0xffffacac9d90 (LWP 72859)] + [New Thread 0xffffa7ffed90 (LWP 72860)] + [New Thread 0xffffa77fdd90 (LWP 72861)] + [New Thread 0xffffa6ffcd90 (LWP 72862)] + [New Thread 0xffffa67fbd90 (LWP 72863)] + [New Thread 0xffffa5ffad90 (LWP 72864)] + + [Thread 0xffffa5ffad90 (LWP 72864) exited] + [Thread 0xffffa6ffcd90 (LWP 72862) exited] + [Thread 0xffffa77fdd90 (LWP 72861) exited] + [Thread 0xffffbdb57d90 (LWP 72841) exited] + [Thread 0xffffa67fbd90 (LWP 72863) exited] + [Thread 0xffffacac9d90 (LWP 72859) exited] + [Thread 0xffffa7ffed90 (LWP 72860) exited] + + <HUNG w/ 3 threads in the stack trace showed before> + """ + + All the tasks left are blocked in a system call, so no task left to call + qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock + thread #1 (doing poll() in a pipe with thread #2). + + Those 7 threads exit before disk conversion is complete (sometimes in + the beginning, sometimes at the end). + + ---- + + [ Original Description ] + On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt - #0 0x0000ffffae4f8154 in __GI_ppoll (fds=0xaaaae8a67dc0, nfds=187650274213760, - timeout=<optimized out>, timeout@entry=0x0, sigmask=0xffffc123b950) - at ../sysdeps/unix/sysv/linux/ppoll.c:39 - #1 0x0000aaaabbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, - __fds=<optimized out>) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 - #2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, - timeout=timeout@entry=-1) at util/qemu-timer.c:322 + #0 0x0000ffffae4f8154 in __GI_ppoll (fds=0xaaaae8a67dc0, nfds=187650274213760, + timeout=<optimized out>, timeout@entry=0x0, sigmask=0xffffc123b950) + at ../sysdeps/unix/sysv/linux/ppoll.c:39 + #1 0x0000aaaabbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, + __fds=<optimized out>) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 + #2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, + timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0x0000aaaabbefbf80 in os_host_main_loop_wait (timeout=-1) - at util/main-loop.c:233 + at util/main-loop.c:233 #4 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497 #5 0x0000aaaabbe2aa30 in convert_do_copy (s=0xffffc123bb58) at qemu-img.c:1980 #6 img_convert (argc=<optimized out>, argv=<optimized out>) at qemu-img.c:2456 #7 0x0000aaaabbe2333c in main (argc=7, argv=<optimized out>) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182)
** Description changed: Command: - qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 + qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs. ---- Workaround: qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Run "qemu-img convert" with "a single coroutine" to avoid this issue. ---- (gdb) thread 1 ... (gdb) bt #0 0x0000ffffbf1ad81c in __GI_ppoll #1 0x0000aaaaaabcf73c in ppoll #2 qemu_poll_ns #3 0x0000aaaaaabd0764 in os_host_main_loop_wait #4 main_loop_wait ... (gdb) thread 2 ... (gdb) bt #0 syscall () #1 0x0000aaaaaabd41cc in qemu_futex_wait #2 qemu_event_wait (ev=ev@entry=0xaaaaaac86ce8 <rcu_call_ready_event>) #3 0x0000aaaaaabed05c in call_rcu_thread #4 0x0000aaaaaabd34c8 in qemu_thread_start #5 0x0000ffffbf25c880 in start_thread #6 0x0000ffffbf1b6b9c in thread_start () (gdb) thread 3 ... (gdb) bt #0 0x0000ffffbf11aa20 in __GI___sigtimedwait #1 0x0000ffffbf2671b4 in __sigwait #2 0x0000aaaaaabd1ddc in sigwait_compat #3 0x0000aaaaaabd34c8 in qemu_thread_start #4 0x0000ffffbf25c880 in start_thread #5 0x0000ffffbf1b6b9c in thread_start ---- (gdb) run Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2 ./disk01.ext4.qcow2 ./output.qcow2 [New Thread 0xffffbec5ad90 (LWP 72839)] [New Thread 0xffffbe459d90 (LWP 72840)] [New Thread 0xffffbdb57d90 (LWP 72841)] [New Thread 0xffffacac9d90 (LWP 72859)] [New Thread 0xffffa7ffed90 (LWP 72860)] [New Thread 0xffffa77fdd90 (LWP 72861)] [New Thread 0xffffa6ffcd90 (LWP 72862)] [New Thread 0xffffa67fbd90 (LWP 72863)] [New Thread 0xffffa5ffad90 (LWP 72864)] [Thread 0xffffa5ffad90 (LWP 72864) exited] [Thread 0xffffa6ffcd90 (LWP 72862) exited] [Thread 0xffffa77fdd90 (LWP 72861) exited] [Thread 0xffffbdb57d90 (LWP 72841) exited] [Thread 0xffffa67fbd90 (LWP 72863) exited] [Thread 0xffffacac9d90 (LWP 72859) exited] [Thread 0xffffa7ffed90 (LWP 72860) exited] <HUNG w/ 3 threads in the stack trace showed before> """ All the tasks left are blocked in a system call, so no task left to call qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock thread #1 (doing poll() in a pipe with thread #2). Those 7 threads exit before disk conversion is complete (sometimes in the beginning, sometimes at the end). ---- [ Original Description ] On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0x0000ffffae4f8154 in __GI_ppoll (fds=0xaaaae8a67dc0, nfds=187650274213760, timeout=<optimized out>, timeout@entry=0x0, sigmask=0xffffc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0x0000aaaabbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0x0000aaaabbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497 #5 0x0000aaaabbe2aa30 in convert_do_copy (s=0xffffc123bb58) at qemu-img.c:1980 #6 img_convert (argc=<optimized out>, argv=<optimized out>) at qemu-img.c:2456 #7 0x0000aaaabbe2333c in main (argc=7, argv=<optimized out>) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Status in QEMU: In Progress Status in qemu package in Ubuntu: In Progress Bug description: Command: qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs. ---- Workaround: qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Run "qemu-img convert" with "a single coroutine" to avoid this issue. ---- (gdb) thread 1 ... (gdb) bt #0 0x0000ffffbf1ad81c in __GI_ppoll #1 0x0000aaaaaabcf73c in ppoll #2 qemu_poll_ns #3 0x0000aaaaaabd0764 in os_host_main_loop_wait #4 main_loop_wait ... (gdb) thread 2 ... (gdb) bt #0 syscall () #1 0x0000aaaaaabd41cc in qemu_futex_wait #2 qemu_event_wait (ev=ev@entry=0xaaaaaac86ce8 <rcu_call_ready_event>) #3 0x0000aaaaaabed05c in call_rcu_thread #4 0x0000aaaaaabd34c8 in qemu_thread_start #5 0x0000ffffbf25c880 in start_thread #6 0x0000ffffbf1b6b9c in thread_start () (gdb) thread 3 ... (gdb) bt #0 0x0000ffffbf11aa20 in __GI___sigtimedwait #1 0x0000ffffbf2671b4 in __sigwait #2 0x0000aaaaaabd1ddc in sigwait_compat #3 0x0000aaaaaabd34c8 in qemu_thread_start #4 0x0000ffffbf25c880 in start_thread #5 0x0000ffffbf1b6b9c in thread_start ---- (gdb) run Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2 ./disk01.ext4.qcow2 ./output.qcow2 [New Thread 0xffffbec5ad90 (LWP 72839)] [New Thread 0xffffbe459d90 (LWP 72840)] [New Thread 0xffffbdb57d90 (LWP 72841)] [New Thread 0xffffacac9d90 (LWP 72859)] [New Thread 0xffffa7ffed90 (LWP 72860)] [New Thread 0xffffa77fdd90 (LWP 72861)] [New Thread 0xffffa6ffcd90 (LWP 72862)] [New Thread 0xffffa67fbd90 (LWP 72863)] [New Thread 0xffffa5ffad90 (LWP 72864)] [Thread 0xffffa5ffad90 (LWP 72864) exited] [Thread 0xffffa6ffcd90 (LWP 72862) exited] [Thread 0xffffa77fdd90 (LWP 72861) exited] [Thread 0xffffbdb57d90 (LWP 72841) exited] [Thread 0xffffa67fbd90 (LWP 72863) exited] [Thread 0xffffacac9d90 (LWP 72859) exited] [Thread 0xffffa7ffed90 (LWP 72860) exited] <HUNG w/ 3 threads in the stack trace showed before> """ All the tasks left are blocked in a system call, so no task left to call qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock thread #1 (doing poll() in a pipe with thread #2). Those 7 threads exit before disk conversion is complete (sometimes in the beginning, sometimes at the end). ---- [ Original Description ] On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0x0000ffffae4f8154 in __GI_ppoll (fds=0xaaaae8a67dc0, nfds=187650274213760, timeout=<optimized out>, timeout@entry=0x0, sigmask=0xffffc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0x0000aaaabbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0x0000aaaabbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497 #5 0x0000aaaabbe2aa30 in convert_do_copy (s=0xffffc123bb58) at qemu-img.c:1980 #6 img_convert (argc=<optimized out>, argv=<optimized out>) at qemu-img.c:2456 #7 0x0000aaaabbe2333c in main (argc=7, argv=<optimized out>) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions