On 8/30/23 15:00, Ard Biesheuvel wrote: > On Tue, 29 Aug 2023 at 16:37, Laszlo Ersek <ler...@redhat.com> wrote: >> >> On 8/29/23 15:29, Ard Biesheuvel wrote: >>> Laszlo reports that the efi_gdb.py script fails to produce a full >>> backtrace when attaching it to an ARM firmware build that has halted on >>> an unhandled exception. >>> >>> The reason is that the asm code that processes the exception was not >>> implemented with this in mind, and therefore lacks any handling of it. >>> >>> So let's add this: create a dummy frame record suitable for chasing the >>> frame pointer, and add the CFI metadata to describe where the return >>> value can be found on the stack. >>> >>> When using a GCC5 build, this produces a stack trace such as >>> >>> (gdb) bt >>> #0 0x000000007fd4537c in CpuDeadLoop () at >>> /home/ardb/build/edk2/MdePkg/Library/BaseLib/CpuDeadLoop.c:30 >>> #1 0x000000007fd454f8 in DebugAssert ( >>> FileName=FileName@entry=0x7fd4a8a8 <MmioWrite64Internal+3604> >>> "/home/ardb/build/edk2/ArmPkg/Library/DefaultExceptionHandlerLib/AArch64/DefaultExceptionHandler.c", >>> LineNumber=LineNumber@entry=343, >>> Description=Description@entry=0x7fd4a896 <MmioWrite64Internal+3586> >>> "((BOOLEAN)(0==1))") >>> at >>> /home/ardb/build/edk2/MdePkg/Library/BaseDebugLibSerialPort/DebugLib.c:235 >>> #2 0x000000007fd479ec in DefaultExceptionHandler >>> (ExceptionType=<optimized out>, SystemContext=...) >>> at >>> /home/ardb/build/edk2/ArmPkg/Library/DefaultExceptionHandlerLib/AArch64/DefaultExceptionHandler.c:343 >>> #3 0x000000007fd48eb8 in ExceptionHandlersEnd () >>> #4 0x000000007fcde944 in QemuLoadKernelImage (ImageHandle=<synthetic >>> pointer>) at >>> /home/ardb/build/edk2/OvmfPkg/Library/GenericQemuLoadImageLib/GenericQemuLoadImageLib.c:201 >>> #5 TryRunningQemuKernel () at >>> /home/ardb/build/edk2/ArmVirtPkg/Library/PlatformBootManagerLib/QemuKernel.c:46 >>> #6 PlatformBootManagerAfterConsole () at >>> /home/ardb/build/edk2/ArmVirtPkg/Library/PlatformBootManagerLib/PlatformBm.c:1139 >>> #7 BdsEntry (This=<optimized out>) at >>> /home/ardb/build/edk2/MdeModulePkg/Universal/BdsDxe/BdsEntry.c:931 >>> #8 0x000000007ffd0018 in ?? () >>> Backtrace stopped: previous frame inner to this frame (corrupt stack?) >>> >>> when QemuLoadKernelImage() has been tweaked to trigger an exception, as >>> is shown by GDB when walking the call stack: >>> >>> | 0x7fcde938 <BdsEntry+3292> b.ne 0x7fcdf134 <BdsEntry+5336> // >>> b.any >>> | 0x7fcde93c <BdsEntry+3296> mov x0, #0x40 >>> // #64 >>> | 0x7fcde940 <BdsEntry+3300> bl 0x7fcd7aec <DebugPrint> >>> | > 0x7fcde944 <BdsEntry+3304> brk #0x4d2 >>> | 0x7fcde948 <BdsEntry+3308> bl 0x7fce4354 >>> <ConnectDevicesFromQemu> >>> | 0x7fcde94c <BdsEntry+3312> tbz x0, #63, 0x7fcde954 >>> <BdsEntry+3320> >>> | 0x7fcde950 <BdsEntry+3316> bl 0x7fcd844c >>> <EfiBootManagerConnectAll> >>> | 0x7fcde954 <BdsEntry+3320> bl 0x7fcd990c >>> <EfiBootManagerRefreshAllBootOption >>> >>> Unfortunately, CLANGDWARF does not seem entirely happy with this >>> arrangement: it identifies the call frame where the exception >>> originated, but does not show any frames above that. (This could be >>> related to the fact that the exception code uses a separate exception >>> stack for handling synchronous exceptions) >> >> First of all, thanks for writing this patch so incredibly quickly. :) >> > > My pleasure. > >> Second, something must be off with my gdb. >> >> Before your patch, I kept experimenting with manually resetting FP, SP, >> and LR to the values printed in the register dump, using gdb "set" >> commands. Strangely, that did result in complete pre-exception stack >> traces, but *only sometimes*. Most of the time gdb complains about >> "corrupted stack". And I just can't figure out what distinguishes the >> broken from the functional "bt" commands -- I did walk the allegedly >> corrupt stack manually, and there is nothing corrupt in the FP and LR >> parts of the stack frames. They all chain nicely and point to valid >> instructions, respectively. I don't know what it is that gdb doesn't like. >> > > I suspect that gdb is filled with heuristics and tweaks, and uses a > combination of the frame records, the actual value of LR and the > unwind data to figure out what the call stack looks like.
That's what I feared :/ > >> Third, when I test your patch, I seem to experience precisely what you >> describe under CLANGDWARF -- it shows the faulting frame (the frame just >> before the exception), but nothing before it! And I'm not building with >> clang :( >> > > Shame. Unfortunately, I don't have a lot of time to spend on this > right now, but it is something I have been wanting to fix forever so > hopefully I'll get back to it at some point. > I'm grateful that you wrote v1! :) Thank you! Laszlo -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#108146): https://edk2.groups.io/g/devel/message/108146 Mute This Topic: https://groups.io/mt/101030910/21656 Group Owner: devel+ow...@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/leave/9847357/21656/1706620634/xyzzy [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-