Daniel P. Berrangé <berra...@redhat.com> writes: > On Tue, Aug 05, 2025 at 07:57:38PM +0300, Manos Pitsidianakis wrote: >> On Tue, Aug 5, 2025 at 7:49 PM Daniel P. Berrangé <berra...@redhat.com> >> wrote: >> > >> > On Tue, Aug 05, 2025 at 07:22:14PM +0300, Manos Pitsidianakis wrote: >> > > On Tue, Aug 5, 2025 at 7:00 PM Daniel P. Berrangé <berra...@redhat.com> >> > > wrote: >> > > > >> > > > On Tue, Aug 05, 2025 at 12:19:26PM +0300, Manos Pitsidianakis wrote: >> > > > > Add a backtrace_on_error meson feature (enabled with >> > > > > --enable-backtrace-on-error) that compiles system binaries with >> > > > > -rdynamic option and prints a function backtrace on error to stderr. >> > > > > >> > > > > Example output by adding an unconditional error_setg on error_abort >> > > > > in hw/arm/boot.c: >> > > > > >> > > > > ./qemu-system-aarch64(+0x13b4a2c) [0x55d015406a2c] >> > > > > ./qemu-system-aarch64(+0x13b4abd) [0x55d015406abd] >> > > > > ./qemu-system-aarch64(+0x13b4d49) [0x55d015406d49] >> > > > > ./qemu-system-aarch64(error_setg_internal+0xe7) [0x55d015406f62] >> > > > > ./qemu-system-aarch64(arm_load_dtb+0xbf) [0x55d014d7686f] >> > > > > ./qemu-system-aarch64(+0xd2f1d8) [0x55d014d811d8] >> > > > > ./qemu-system-aarch64(notifier_list_notify+0x44) [0x55d01540a282] >> > > > > ./qemu-system-aarch64(qdev_machine_creation_done+0xa0) >> > > > > [0x55d01476ae17] >> > > > > ./qemu-system-aarch64(+0xaa691e) [0x55d014af891e] >> > > > > ./qemu-system-aarch64(qmp_x_exit_preconfig+0x72) [0x55d014af8a5d] >> > > > > ./qemu-system-aarch64(qemu_init+0x2a89) [0x55d014afb657] >> > > > > ./qemu-system-aarch64(main+0x2f) [0x55d01521e836] >> > > > > /lib/x86_64-linux-gnu/libc.so.6(+0x29ca8) [0x7f3033d67ca8] >> > > > > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85) >> > > > > [0x7f3033d67d65] >> > > > > ./qemu-system-aarch64(_start+0x21) [0x55d0146814f1] >> > > > > >> > > > > Unexpected error in arm_load_dtb() at ../hw/arm/boot.c:529: >> > > > >> > > > From an end-user POV, IMHO the error messages need to be good enough >> > > > that such backtraces aren't needed to understand the problem. For >> > > > developers, GDB can give much better backtraces (file+line numbers, >> > > > plus parameters plus local variables) in the ideally rare cases that >> > > > the error message alone has insufficient info. So I'm not really >> > > > convinced that programs (in general, not just QEMU) should try to >> > > > create backtraces themselves. >> > > >> > > I don't think there's value in replacing gdb debugging with this, I >> > > agree. I think it has value for "fire and forget" uses, when errors >> > > happen unexpectedly and are hard to replicate and you only end up with >> > > log entries and no easy way to debug it. >> > >> > If the log entry with the error message is useless for devs, then it >> > is even worse for end users... who will be copying that message into >> > bug reports anyway. This patch doesn't feel like something we could >> > enable in formal builds in the distro, so we still need better error >> > reporting without it, such that user bug reports are actionable. >> > >> > Was there a specific place where you found things hard to debug >> > from the error message alone ? I'm sure we have plenty of examples >> > of errors that can be improved, but wondering if there are some >> > general patterns we're doing badly that would be a good win >> > to improve ? >> >> Some months ago I was debugging a MemoryRegion use-after-free and used >> this code to figure out that the free was called from RCU context >> instead of the main thread. > > We give useful names to many (but not neccessarily all) threads that we > spawn. Perhaps we should call pthread_getname_np() to fetch the current > thread name, and used that as a prefix on the error message we print > out, as a bit of extra context ?
Do we always have sensible names for threads or only if we enable the option? > Obviously not as much info as a full stack trace, but that is something > we could likely enable unconditionally without any overheads to worry > about, so a likely incremental wni. The place where it comes in useful is when we get bug reports from users who have crashed QEMU in a embedded docker container and can't give us a reasonable reproducer. If we can encourage such users to enable this option (or maybe make it part of --enable-debug-info) then we could get a slightly more useful backtrace for those bugs. I agree most sane configurations (i.e. a distro) would just attach gdb and use whatever symbol resolution the distro provides. > > With regards, > Daniel -- Alex Bennée Virtualisation Tech Lead @ Linaro