Daniel P. Berrangé <berra...@redhat.com> writes:

> On Tue, Aug 05, 2025 at 07:57:38PM +0300, Manos Pitsidianakis wrote:
>> On Tue, Aug 5, 2025 at 7:49 PM Daniel P. Berrangé <berra...@redhat.com> 
>> wrote:
>> >
>> > On Tue, Aug 05, 2025 at 07:22:14PM +0300, Manos Pitsidianakis wrote:
>> > > On Tue, Aug 5, 2025 at 7:00 PM Daniel P. Berrangé <berra...@redhat.com> 
>> > > wrote:
>> > > >
>> > > > On Tue, Aug 05, 2025 at 12:19:26PM +0300, Manos Pitsidianakis wrote:
>> > > > > Add a backtrace_on_error meson feature (enabled with
>> > > > > --enable-backtrace-on-error) that compiles system binaries with
>> > > > > -rdynamic option and prints a function backtrace on error to stderr.
>> > > > >
>> > > > > Example output by adding an unconditional error_setg on error_abort 
>> > > > > in hw/arm/boot.c:
>> > > > >
>> > > > >   ./qemu-system-aarch64(+0x13b4a2c) [0x55d015406a2c]
>> > > > >   ./qemu-system-aarch64(+0x13b4abd) [0x55d015406abd]
>> > > > >   ./qemu-system-aarch64(+0x13b4d49) [0x55d015406d49]
>> > > > >   ./qemu-system-aarch64(error_setg_internal+0xe7) [0x55d015406f62]
>> > > > >   ./qemu-system-aarch64(arm_load_dtb+0xbf) [0x55d014d7686f]
>> > > > >   ./qemu-system-aarch64(+0xd2f1d8) [0x55d014d811d8]
>> > > > >   ./qemu-system-aarch64(notifier_list_notify+0x44) [0x55d01540a282]
>> > > > >   ./qemu-system-aarch64(qdev_machine_creation_done+0xa0) 
>> > > > > [0x55d01476ae17]
>> > > > >   ./qemu-system-aarch64(+0xaa691e) [0x55d014af891e]
>> > > > >   ./qemu-system-aarch64(qmp_x_exit_preconfig+0x72) [0x55d014af8a5d]
>> > > > >   ./qemu-system-aarch64(qemu_init+0x2a89) [0x55d014afb657]
>> > > > >   ./qemu-system-aarch64(main+0x2f) [0x55d01521e836]
>> > > > >   /lib/x86_64-linux-gnu/libc.so.6(+0x29ca8) [0x7f3033d67ca8]
>> > > > >   /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85) 
>> > > > > [0x7f3033d67d65]
>> > > > >   ./qemu-system-aarch64(_start+0x21) [0x55d0146814f1]
>> > > > >
>> > > > >   Unexpected error in arm_load_dtb() at ../hw/arm/boot.c:529:
>> > > >
>> > > > From an end-user POV, IMHO the error messages need to be good enough
>> > > > that such backtraces aren't needed to understand the problem. For
>> > > > developers, GDB can give much better backtraces (file+line numbers,
>> > > > plus parameters plus local variables) in the ideally rare cases that
>> > > > the error message alone has insufficient info. So I'm not really
>> > > > convinced that programs (in general, not just QEMU) should try to
>> > > > create backtraces themselves.
>> > >
>> > > I don't think there's value in replacing gdb debugging with this, I
>> > > agree. I think it has value for "fire and forget" uses, when errors
>> > > happen unexpectedly and are hard to replicate and you only end up with
>> > > log entries and no easy way to debug it.
>> >
>> > If the log entry with the error message is useless for devs, then it
>> > is even worse for end users... who will be copying that message into
>> > bug reports anyway. This patch doesn't feel like something we could
>> > enable in formal builds in the distro, so we still need better error
>> > reporting without it, such that user bug reports are actionable.
>> >
>> > Was there a specific place where you found things hard to debug
>> > from the error message alone ?  I'm sure we have plenty of examples
>> > of errors that can be improved, but wondering if there are some
>> > general patterns we're doing badly that would be a good win
>> > to improve ?
>> 
>> Some months ago I was debugging a MemoryRegion use-after-free and used
>> this code to figure out that the free was called from RCU context
>> instead of the main thread.
>
> We give useful names to many (but not neccessarily all) threads that we
> spawn. Perhaps we should call pthread_getname_np() to fetch the current
> thread name, and used that as a prefix on the error message we print
> out, as a bit of extra context ?

Do we always have sensible names for threads or only if we enable the
option?

> Obviously not as much info as a full stack trace, but that is something
> we could likely enable unconditionally without any overheads to worry
> about, so a likely incremental wni.

The place where it comes in useful is when we get bug reports from users
who have crashed QEMU in a embedded docker container and can't give us a
reasonable reproducer. If we can encourage such users to enable this
option (or maybe make it part of --enable-debug-info) then we could get
a slightly more useful backtrace for those bugs.

I agree most sane configurations (i.e. a distro) would just attach gdb
and use whatever symbol resolution the distro provides.

>
> With regards,
> Daniel

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Reply via email to