Re: [edk2-devel] I think we need a Panic API...

Ni, Ray Wed, 09 Feb 2022 18:54:38 -0800

Andrew,
I agree Panic is useful because ASSERT is a NOP in the  release build.

Can you explain a bit more on ProcessSystemContext?

-----Original Message-----
From: Andrew Fish <af...@apple.com> 
Sent: Thursday, February 3, 2022 12:37 PM
To: edk2-devel-groups-io <devel@edk2.groups.io>
Cc: Ni, Ray <ray...@intel.com>
Subject: I think we need a Panic API...

Mike Kinney mentioned at the TianoCore Stewards Meeting that it might be a good 
idea to add a panic API to the edk2. I agree. 

Some background…. While ASSERT is a powerful tool I think we have a tendency to 
misuse it at times in our TianoCore code:
1) Sometime we do ASSERT when we should be error checking. This pattern 
probably goes back to the days when we had to fit EFI + Legacy BIOS in 512 KiB 
ROMs. I actually noticed with clang some of the ASSERTs we have that check for 
null actually generate Undefined Behavior and clang emits a trap. That is for a 
future mail to discuss. 
2) Sometimes we use the ASSERTs to indicate catastrophic failure. Which is all 
nice on a DEBUG build to get a nice message but not very helpful on a RELEASE 
build when ASSERT is turned off. 

Thus the idea for a Panic API would be to capture fatal errors. For example if 
DRAM fails to train you are kind of stuck. There are also software states that 
make you dead in the water so you may as well panic. Since my VP runs CoreOS 
I’ve got a few opinions on what this infrastructure should look like. I also 
happen to have implemented this in a proprietary way. 

So 1st rule of panic is don’t talk about panic. No sorry that is fight club. 
The 1st rule of panic is don’t assume the reset of the software stack 
functions. So we need the option of pluming up the panic as BASE code. Given 
that we probably want an independent library that produces the panic function. 
A platform can chose to implement the panic flow at a higher level and use some 
existing infrastructure like report status code, but the architecture needs to 
empower a BASE implementation that does not require any other infrastructure. 
The 2nd rule of panic is people are interested in what happened when the system 
panicked so if at first you get a panic string over time you will want to 
collect more and more info. So we need a scalable way to get data out. 3rd rule 
of panic is there may actually be some hardware to talk to so don’t skimp on 
sending data. I think QEMU supports panic devices, and you might be able to wax 
poetic to a security coprocessor to tell the tail of your EFI panic. I almost 
forgot the 4th rule of panic, sometimes you want to panic for a friend. Sounds 
funny but think about how the exception handler works today you can kind of end 
up taking nested exceptions. So the unexpected exception handler is a good 
place to panic from, but you don’t want to panic in the context of the 
exception handler you want to point to the code that faulted. 

Ok given all that here is my 1st pass at an API.

VOID
EFIAPI
Panic (
  IN EFI_SYSTEM_CONTEXT  SystemContext OPTIONAL,
  IN CHAR16                             *PanicString
  );

VOID
EFIAPI
PanicEx (
  IN EFI_SYSTEM_CONTEXT  SystemContext OPTIONAL,
  IN BOOLEAN                          ProcessSystemContext,
  IN CHAR16                             *PanicString
  );

VOID
EFIAPI
PanicExData (
  IN VOID *Data,
  IN UINT DataSize
  );

VOID
EFIAPI
PanicExDone (
  VOID
  );

The simple form is:
Panic (NULL, L”MRC: Memory training failed due to XYZ thing-a-ma-bob failure”);

The extended form example from the exception handler:
PanicEx (SystemContext, FALSE, L”Exception 13: GP fault address: 
0xAFAFAFAFAFAFAFAF”);
PanicExData (StackTrace, SizeOfStackTrace);
PanicExData (FirmwareVersion, SizeFirmwareVersion);
PanicExDone ();

The reason for having multiple calls to PanicExData() is you might be limited 
to a smallish buffer on the stack that you want to iterate over to do more 
complex logging. The 4th rule of panic implies the agent getting the data may 
have a lot more resources than the panicking code, so you might be able to send 
a lot of data.  

The idea behind ProcessSystemContext is if you are called from arbitrary code 
you might what to process SystemContext in a way similar to the unexpected 
exception handler. If you are in the unexpected exception handler maybe you 
want to let that code do the processing for you? I guess this implies we may 
want to make some worker libs for processing exception data and building stack 
frames etc. But that is more of an optimization after we add the Panic API.

The other things we could think about adding to the PanicEx is a UUID that 
could start the stream of data (or follow the panic string) and it could define 
the encoding of the panic data. This would make it easy to post process the 
panic data. Even if the panic data was just text it would point at which 
automated parser you could use to extract useful info out of the text stream. 
If you want to log locally, think DEBUG print then maybe you want more type 
info about Data. So maybe you want UINT8 vs CHAR8 vs CHAR16 API options for 
PanicExData(). 

I thought I’d get the conversation started, looking forward to seeing what 
others think.

Thanks,

Andrew Fish

-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#86539): https://edk2.groups.io/g/devel/message/86539
Mute This Topic: https://groups.io/mt/88877106/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [edk2-devel] I think we need a Panic API...

Reply via email to