Public bug reported: I have been working on a series of patches for ARM big-endian system mode support, using QEMU as a bare-metal simulator for the GDB test suite. At some point I realised that these tests were not running reliably on the QEMU master branch, even without my patches applied. (I.e., in little-endian mode.)
Running QEMU under GDB in the test harness via Valgrind, using something akin to: (gdb) target remote | valgrind --tool=memcheck qemu-arm-system [...] leads to intermittent (and quite hard-to-reproduce) segfaults in QEMU of the form: ==52333== Process terminating with default action of signal 11 (SIGSEGV) ==52333== Access not within mapped region at address 0x24 ==52333== at 0x1D55F2: tb_page_remove (translate-all.c:1026) ==52333== by 0x1D58B4: tb_phys_invalidate (translate-all.c:1119) ==52333== by 0x1D63AA: tb_invalidate_phys_page_range (translate-all.c:1519) ==52333== by 0x1D66D7: tb_invalidate_phys_addr (translate-all.c:1714) ==52333== by 0x1CBA7F: breakpoint_invalidate (exec.c:704) ==52333== by 0x1CC01F: cpu_breakpoint_remove_by_ref (exec.c:869) ==52333== by 0x1CBF97: cpu_breakpoint_remove (exec.c:857) ==52333== by 0x218FAA: gdb_breakpoint_remove (gdbstub.c:717) ==52333== by 0x219E35: gdb_handle_packet (gdbstub.c:1035) ==52333== by 0x21AF62: gdb_read_byte (gdbstub.c:1459) ==52333== by 0x21B096: gdb_chr_receive (gdbstub.c:1672) ==52333== by 0x3AF2BC: qemu_chr_be_write_impl (qemu-char.c:419) These crashes didn't happen on a 2.6-era QEMU, so I bisected and discovered the commit 3359baad36889b83df40b637ed993a4b816c4906 ("tcg: Make tb_flush() thread safe") appears to be the thing that triggers this intermittent failure. Reverting the patch on the branch tip makes the crashes go away. Unfortunately I don't currently have a way to trigger the segfaults outside of Mentor Graphics's test infrastructure, which I can't share. Does anyone know a reason that this might be happening, or suggestions of how I might further debug this? Maybe a missed tb flush in the gdb stub code, somewhere? Thanks! Julian ** Affects: qemu Importance: Undecided Status: New -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1647683 Title: Bad interaction between tb flushing & gdb stub Status in QEMU: New Bug description: I have been working on a series of patches for ARM big-endian system mode support, using QEMU as a bare-metal simulator for the GDB test suite. At some point I realised that these tests were not running reliably on the QEMU master branch, even without my patches applied. (I.e., in little-endian mode.) Running QEMU under GDB in the test harness via Valgrind, using something akin to: (gdb) target remote | valgrind --tool=memcheck qemu-arm-system [...] leads to intermittent (and quite hard-to-reproduce) segfaults in QEMU of the form: ==52333== Process terminating with default action of signal 11 (SIGSEGV) ==52333== Access not within mapped region at address 0x24 ==52333== at 0x1D55F2: tb_page_remove (translate-all.c:1026) ==52333== by 0x1D58B4: tb_phys_invalidate (translate-all.c:1119) ==52333== by 0x1D63AA: tb_invalidate_phys_page_range (translate-all.c:1519) ==52333== by 0x1D66D7: tb_invalidate_phys_addr (translate-all.c:1714) ==52333== by 0x1CBA7F: breakpoint_invalidate (exec.c:704) ==52333== by 0x1CC01F: cpu_breakpoint_remove_by_ref (exec.c:869) ==52333== by 0x1CBF97: cpu_breakpoint_remove (exec.c:857) ==52333== by 0x218FAA: gdb_breakpoint_remove (gdbstub.c:717) ==52333== by 0x219E35: gdb_handle_packet (gdbstub.c:1035) ==52333== by 0x21AF62: gdb_read_byte (gdbstub.c:1459) ==52333== by 0x21B096: gdb_chr_receive (gdbstub.c:1672) ==52333== by 0x3AF2BC: qemu_chr_be_write_impl (qemu-char.c:419) These crashes didn't happen on a 2.6-era QEMU, so I bisected and discovered the commit 3359baad36889b83df40b637ed993a4b816c4906 ("tcg: Make tb_flush() thread safe") appears to be the thing that triggers this intermittent failure. Reverting the patch on the branch tip makes the crashes go away. Unfortunately I don't currently have a way to trigger the segfaults outside of Mentor Graphics's test infrastructure, which I can't share. Does anyone know a reason that this might be happening, or suggestions of how I might further debug this? Maybe a missed tb flush in the gdb stub code, somewhere? Thanks! Julian To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1647683/+subscriptions