Lukas Jünger <lukas.jun...@greensocs.com> writes:
> On 10/18/21 13:18, Alex Bennée wrote: >> Lukas Jünger <lukas.jun...@greensocs.com> writes: >> >>> On 9/7/21 13:43, Alex Bennée wrote: >>>> Lukas Jünger <lukas.jun...@greensocs.com> writes: >>>> >>>>> Hi all, >>>>> <snip> >>> It seems like there is a race condition with the tcg threads. >>> The plugin exit handler is run with atexit(). While the exit callback >>> is freeing memory, tcg is still running and memory callbacks write >>> to the already freed data structures in the plugin causing the segfault. >>> I tested this with the current master branch and this cmdline: >> We fixed a bug in linux-user that was leading to the exit callbacks >> being called (see qemu_plugin_user_exit). >> >>> bin/qemu-system-riscv64 -machine virt -nographic -bios fw_jump.elf >>> -kernel Image -plugin path/to/libcache.so -d plugin -D log.txt >>> >>> I wonder if we could somehow wait for the tcg to exit before executing >>> plugin exit cb. Do you have an idea? >> It should be because I don't see how TCG would still be running when >> we >> run the atexit handler. It literally shouldn't be called until QEMU >> itself calls exit which should be well after the TCG has stopped running >> (see pause_all_vcpus). >> Any chance you could replicate and generate a backtrace that shows >> this >> happening? >> > > I'm on Fedora 34, running: > > gdb --args bin/qemu-system-riscv64 -machine virt -nographic -bios > /home/lukas/work/greensocs/projects/sifive/buildroot-2021.05.1/BUILD/images/fw_jump.elf > -kernel > /home/lukas/work/greensocs/projects/sifive/buildroot-2021.05.1/BUILD/images/Image > -plugin BUILD/contrib/plugins/libcache.so -d plugin -D foo.log > > I get: > > Thread 1 "qemu-system-ris" received signal SIGSEGV, Segmentation fault. > 0x00007ffff76a9571 in unlink_chunk.constprop () from /lib64/libc.so.6 > Missing separate debuginfos, use: dnf debuginfo-install > bzip2-libs-1.0.8-6.fc34.x86_64 glib2-2.68.4-1.fc34.x86_64 > glibc-2.33-20.fc34.x86_64 libblkid-2.36.2-1.fc34.x86_64 > libfdt-1.6.1-1.fc34.x86_64 libffi-3.1-28.fc34.x86_64 > libgcc-11.2.1-1.fc34.x86_64 libgcrypt-1.9.3-3.fc34.x86_64 > libgpg-error-1.42-1.fc34.x86_64 libmount-2.36.2-1.fc34.x86_64 > libpng-1.6.37-10.fc34.x86_64 ncurses-libs-6.2-4.20200222.fc34.x86_64 > pcre-8.44-3.fc34.1.x86_64 pcre2-10.36-4.fc34.x86_64 > pixman-0.40.0-3.fc34.x86_64 zlib-1.2.11-26.fc34.x86_64 > (gdb) thread apply all bt > > Thread 3 (Thread 0x7ffff6a85640 (LWP 669129) "qemu-system-ris"): > #0 0x00007ffff7bdd1ad in g_mutex_lock () at /lib64/libglib-2.0.so.0 > #1 0x00007ffff7fc0e19 in vcpu_mem_access (vcpu_index=0, info=131121, > vaddr=18446743936379926112, userdata=0x7fff6812dfb0) at > > /home/lukas/work/greensocs/projects/sifive/upstream_qemu/contrib/plugins/cache.c:395 > #2 0x00007fff7021377c in code_gen_buffer () > #3 0x0000555555c4daf1 in cpu_tb_exec (cpu=0x555556880570, > itb=0x7fffb020bc40, tb_exit=0x7ffff6a84834) at > ../../../accel/tcg/cpu-exec.c:353 > #4 0x0000555555c4e8f2 in cpu_loop_exec_tb (cpu=0x555556880570, > tb=0x7fffb020bc40, last_tb=0x7ffff6a84840, tb_exit=0x7ffff6a84834) at > ../../../accel/tcg/cpu-exec.c:829 > #5 0x0000555555c4ecd7 in cpu_exec (cpu=0x555556880570) at > ../../../accel/tcg/cpu-exec.c:987 > #6 0x0000555555c703ca in tcg_cpus_exec (cpu=0x555556880570) at > ../../../accel/tcg/tcg-accel-ops.c:67 > #7 0x0000555555c706dc in mttcg_cpu_thread_fn (arg=0x555556880570) at > ../../../accel/tcg/tcg-accel-ops-mttcg.c:70 > #8 0x0000555555e2b806 in qemu_thread_start (args=0x5555568a60b0) at > ../../../util/qemu-thread-posix.c:556 > #9 0x00007ffff77f9299 in start_thread () at /lib64/libpthread.so.0 > #10 0x00007ffff7721353 in clone () at /lib64/libc.so.6 > > Thread 2 (Thread 0x7ffff7408640 (LWP 669128) "qemu-system-ris"): > #0 0x00007ffff771be0d in syscall () at /lib64/libc.so.6 > #1 0x0000555555e2b468 in qemu_futex_wait (f=0x55555663da00 > <rcu_gp_event>, val=4294967295) at > > /home/lukas/work/greensocs/projects/sifive/upstream_qemu/include/qemu/futex.h:29 > #2 0x0000555555e2b653 in qemu_event_wait (ev=0x55555663da00 > <rcu_gp_event>) at ../../../util/qemu-thread-posix.c:481 > #3 0x0000555555e364a4 in wait_for_readers () at ../../../util/rcu.c:135 > #4 0x0000555555e36620 in synchronize_rcu () at ../../../util/rcu.c:171 > #5 0x0000555555e367d3 in call_rcu_thread (opaque=0x0) at > ../../../util/rcu.c:265 > #6 0x0000555555e2b806 in qemu_thread_start (args=0x555556648860) at > ../../../util/qemu-thread-posix.c:556 > #7 0x00007ffff77f9299 in start_thread () at /lib64/libpthread.so.0 > #8 0x00007ffff7721353 in clone () at /lib64/libc.so.6 > > Thread 1 (Thread 0x7ffff740a0c0 (LWP 669123) "qemu-system-ris"): > #0 0x00007ffff76a9571 in unlink_chunk.constprop () at /lib64/libc.so.6 > #1 0x00007ffff76a99e1 in _int_free () at /lib64/libc.so.6 > #2 0x00007ffff76ad7c8 in free () at /lib64/libc.so.6 > #3 0x00007ffff7b9424d in g_free () at /lib64/libglib-2.0.so.0 > #4 0x00007ffff7fc11ad in cache_free (cache=0x5555566a1c60) at > > /home/lukas/work/greensocs/projects/sifive/upstream_qemu/contrib/plugins/cache.c:478 > #5 0x00007ffff7fc1231 in caches_free (caches=0x5555568517e0) at > > /home/lukas/work/greensocs/projects/sifive/upstream_qemu/contrib/plugins/cache.c:494 > #6 0x00007ffff7fc18f3 in plugin_exit (id=14580660273623469927, p=0x0) > at > > /home/lukas/work/greensocs/projects/sifive/upstream_qemu/contrib/plugins/cache.c:616 > #7 0x0000555555c6eb0b in plugin_cb__udata (ev=QEMU_PLUGIN_EV_ATEXIT) > at ../../../plugins/core.c:156 > #8 0x0000555555c6f7b7 in qemu_plugin_atexit_cb () at > ../../../plugins/core.c:480 > #9 0x00007ffff7660af7 in __run_exit_handlers () at /lib64/libc.so.6 > #10 0x00007ffff7660ca0 in on_exit () at /lib64/libc.so.6 > #11 0x0000555555d78166 in mux_proc_byte (chr=0x555556823400, > d=0x555556823400, ch=120) at ../../../chardev/char-mux.c:160 Hmm it seems the problem is being able to slam out of QEMU in the mux handler. Ideally it should be the start of triggering a graceful shutdown probably with something like qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN); > Hope that helps? -- Alex Bennée