[Qemu-devel] [Bug 1647683] [NEW] Bad interaction between tb flushing & gdb stub

Julian Brown Tue, 06 Dec 2016 04:19:53 -0800

Public bug reported:

I have been working on a series of patches for ARM big-endian system
mode support, using QEMU as a bare-metal simulator for the GDB test
suite. At some point I realised that these tests were not running
reliably on the QEMU master branch, even without my patches applied.
(I.e., in little-endian mode.)


Running QEMU under GDB in the test harness via Valgrind, using something
akin to:

(gdb) target remote | valgrind --tool=memcheck qemu-arm-system [...]

leads to intermittent (and quite hard-to-reproduce) segfaults in QEMU of
the form:

==52333== Process terminating with default action of signal 11 (SIGSEGV)
==52333==  Access not within mapped region at address 0x24
==52333==    at 0x1D55F2: tb_page_remove (translate-all.c:1026)
==52333==    by 0x1D58B4: tb_phys_invalidate (translate-all.c:1119)
==52333==    by 0x1D63AA: tb_invalidate_phys_page_range (translate-all.c:1519)
==52333==    by 0x1D66D7: tb_invalidate_phys_addr (translate-all.c:1714)
==52333==    by 0x1CBA7F: breakpoint_invalidate (exec.c:704)
==52333==    by 0x1CC01F: cpu_breakpoint_remove_by_ref (exec.c:869)
==52333==    by 0x1CBF97: cpu_breakpoint_remove (exec.c:857)
==52333==    by 0x218FAA: gdb_breakpoint_remove (gdbstub.c:717)
==52333==    by 0x219E35: gdb_handle_packet (gdbstub.c:1035)
==52333==    by 0x21AF62: gdb_read_byte (gdbstub.c:1459)
==52333==    by 0x21B096: gdb_chr_receive (gdbstub.c:1672)
==52333==    by 0x3AF2BC: qemu_chr_be_write_impl (qemu-char.c:419)

These crashes didn't happen on a 2.6-era QEMU, so I bisected and
discovered the commit 3359baad36889b83df40b637ed993a4b816c4906 ("tcg:
Make tb_flush() thread safe") appears to be the thing that triggers this
intermittent failure. Reverting the patch on the branch tip makes the
crashes go away.

Unfortunately I don't currently have a way to trigger the segfaults
outside of Mentor Graphics's test infrastructure, which I can't share.

Does anyone know a reason that this might be happening, or suggestions
of how I might further debug this? Maybe a missed tb flush in the gdb
stub code, somewhere?

Thanks!

Julian

** Affects: qemu
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1647683

Title:
  Bad interaction between tb flushing & gdb stub

Status in QEMU:
  New

Bug description:
  I have been working on a series of patches for ARM big-endian system
  mode support, using QEMU as a bare-metal simulator for the GDB test
  suite. At some point I realised that these tests were not running
  reliably on the QEMU master branch, even without my patches applied.
  (I.e., in little-endian mode.)

  Running QEMU under GDB in the test harness via Valgrind, using
  something akin to:

  (gdb) target remote | valgrind --tool=memcheck qemu-arm-system [...]

  leads to intermittent (and quite hard-to-reproduce) segfaults in QEMU
  of the form:

  ==52333== Process terminating with default action of signal 11 (SIGSEGV)
  ==52333==  Access not within mapped region at address 0x24
  ==52333==    at 0x1D55F2: tb_page_remove (translate-all.c:1026)
  ==52333==    by 0x1D58B4: tb_phys_invalidate (translate-all.c:1119)
  ==52333==    by 0x1D63AA: tb_invalidate_phys_page_range (translate-all.c:1519)
  ==52333==    by 0x1D66D7: tb_invalidate_phys_addr (translate-all.c:1714)
  ==52333==    by 0x1CBA7F: breakpoint_invalidate (exec.c:704)
  ==52333==    by 0x1CC01F: cpu_breakpoint_remove_by_ref (exec.c:869)
  ==52333==    by 0x1CBF97: cpu_breakpoint_remove (exec.c:857)
  ==52333==    by 0x218FAA: gdb_breakpoint_remove (gdbstub.c:717)
  ==52333==    by 0x219E35: gdb_handle_packet (gdbstub.c:1035)
  ==52333==    by 0x21AF62: gdb_read_byte (gdbstub.c:1459)
  ==52333==    by 0x21B096: gdb_chr_receive (gdbstub.c:1672)
  ==52333==    by 0x3AF2BC: qemu_chr_be_write_impl (qemu-char.c:419)

  These crashes didn't happen on a 2.6-era QEMU, so I bisected and
  discovered the commit 3359baad36889b83df40b637ed993a4b816c4906 ("tcg:
  Make tb_flush() thread safe") appears to be the thing that triggers
  this intermittent failure. Reverting the patch on the branch tip makes
  the crashes go away.

  Unfortunately I don't currently have a way to trigger the segfaults
  outside of Mentor Graphics's test infrastructure, which I can't share.

  Does anyone know a reason that this might be happening, or suggestions
  of how I might further debug this? Maybe a missed tb flush in the gdb
  stub code, somewhere?

  Thanks!

  Julian

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1647683/+subscriptions

[Qemu-devel] [Bug 1647683] [NEW] Bad interaction between tb flushing & gdb stub

Reply via email to