Re: [edk2-devel] I think we need a Panic API...

Andrew Fish via groups.io Thu, 10 Feb 2022 17:02:34 -0800


> On Feb 9, 2022, at 9:33 PM, Ni, Ray <ray...@intel.com> wrote:
> 
> Jeff,
> I understand what “EFI_SYSTEM_CONTEXT” is. I am curious of the need of 
> “BOOLEAN ProcessSystemContext”.
>


Ray,

ProcessSystemContext may not be needed… 

My thinking was if you pass ProcessSystemContext == TRUE there is some implied 
processing of the SystemContext by the PanicLib. If you pass FALSE then there 
is no processing. For example if you called panic from DumpImageAndCpuContent() 
[1] in the CpuExceptionHandleLib then the logic exists  in 
DumpImageAndCpuContent() to decode the SystemContext so you don’t want the 
PanicLib to do it for you. 

It might be possible to pass NULL for SystemContext and get the same behavior. 
The only time you might need SystemContext if there was some kind of binary 
panic header that encoded system state that only the Panic code knew about. 

[1] 
https://github.com/tianocore/edk2/blob/master/UefiCpuPkg/Library/CpuExceptionHandlerLib/X64/ArchExceptionHandler.c#L416

Thanks,

Andrew Fish

> From: devel@edk2.groups.io <mailto:devel@edk2.groups.io> 
> <devel@edk2.groups.io <mailto:devel@edk2.groups.io>> On Behalf Of Jeff Fan
> Sent: Thursday, February 10, 2022 11:37 AM
> To: devel@edk2.groups.io <mailto:devel@edk2.groups.io>; Ni, Ray 
> <ray...@intel.com <mailto:ray...@intel.com>>; 'Andrew Fish' <af...@apple.com 
> <mailto:af...@apple.com>>
> Subject: Re: [edk2-devel] I think we need a Panic API...
>  
> Ray,
>  
> EFI_SYSTEM_CONTEXT was defined in MdePkg/Include/Protocol/DebugSupport.h
>  
> Jeff
>  
> fanjianf...@byosoft.com.cn <mailto:fanjianf...@byosoft.com.cn>
>  
> From: Ni, Ray <mailto:ray...@intel.com>
> Date: 2022-02-10 10:54
> To: Andrew Fish <mailto:af...@apple.com>; edk2-devel-groups-io 
> <mailto:devel@edk2.groups.io>
> Subject: Re: [edk2-devel] I think we need a Panic API...
> Andrew,
> I agree Panic is useful because ASSERT is a NOP in the  release build.
>  
> Can you explain a bit more on ProcessSystemContext?
>  
> -----Original Message-----
> From: Andrew Fish <af...@apple.com <mailto:af...@apple.com>>
> Sent: Thursday, February 3, 2022 12:37 PM
> To: edk2-devel-groups-io <devel@edk2.groups.io <mailto:devel@edk2.groups.io>>
> Cc: Ni, Ray <ray...@intel.com <mailto:ray...@intel.com>>
> Subject: I think we need a Panic API...
>  
> Mike Kinney mentioned at the TianoCore Stewards Meeting that it might be a 
> good idea to add a panic API to the edk2. I agree.
>  
> Some background…. While ASSERT is a powerful tool I think we have a tendency 
> to misuse it at times in our TianoCore code:
> 1) Sometime we do ASSERT when we should be error checking. This pattern 
> probably goes back to the days when we had to fit EFI + Legacy BIOS in 512 
> KiB ROMs. I actually noticed with clang some of the ASSERTs we have that 
> check for null actually generate Undefined Behavior and clang emits a trap. 
> That is for a future mail to discuss.
> 2) Sometimes we use the ASSERTs to indicate catastrophic failure. Which is 
> all nice on a DEBUG build to get a nice message but not very helpful on a 
> RELEASE build when ASSERT is turned off. 
>  
> Thus the idea for a Panic API would be to capture fatal errors. For example 
> if DRAM fails to train you are kind of stuck. There are also software states 
> that make you dead in the water so you may as well panic. Since my VP runs 
> CoreOS I’ve got a few opinions on what this infrastructure should look like. 
> I also happen to have implemented this in a proprietary way.
>  
> So 1st rule of panic is don’t talk about panic. No sorry that is fight club. 
> The 1st rule of panic is don’t assume the reset of the software stack 
> functions. So we need the option of pluming up the panic as BASE code. Given 
> that we probably want an independent library that produces the panic 
> function. A platform can chose to implement the panic flow at a higher level 
> and use some existing infrastructure like report status code, but the 
> architecture needs to empower a BASE implementation that does not require any 
> other infrastructure. The 2nd rule of panic is people are interested in what 
> happened when the system panicked so if at first you get a panic string over 
> time you will want to collect more and more info. So we need a scalable way 
> to get data out. 3rd rule of panic is there may actually be some hardware to 
> talk to so don’t skimp on sending data. I think QEMU supports panic devices, 
> and you might be able to wax poetic to a security coprocessor to tell the 
> tail of your EFI panic. I almost forgot the 4th rule of panic, sometimes you 
> want to panic for a friend. Sounds funny but think about how the exception 
> handler works today you can kind of end up taking nested exceptions. So the 
> unexpected exception handler is a good place to panic from, but you don’t 
> want to panic in the context of the exception handler you want to point to 
> the code that faulted. 
>  
> Ok given all that here is my 1st pass at an API.
>  
> VOID
> EFIAPI
> Panic (
>   IN EFI_SYSTEM_CONTEXT  SystemContext OPTIONAL,
>   IN CHAR16                             *PanicString
>   );
>  
>  
> VOID
> EFIAPI
> PanicEx (
>   IN EFI_SYSTEM_CONTEXT  SystemContext OPTIONAL,
>   IN BOOLEAN                          ProcessSystemContext,
>   IN CHAR16                             *PanicString
>   );
>  
> VOID
> EFIAPI
> PanicExData (
>   IN VOID *Data,
>   IN UINT DataSize
>   );
>  
> VOID
> EFIAPI
> PanicExDone (
>   VOID
>   );
>  
> The simple form is:
> Panic (NULL, L”MRC: Memory training failed due to XYZ thing-a-ma-bob 
> failure”);
>  
> The extended form example from the exception handler:
> PanicEx (SystemContext, FALSE, L”Exception 13: GP fault address: 
> 0xAFAFAFAFAFAFAFAF”);
> PanicExData (StackTrace, SizeOfStackTrace);
> PanicExData (FirmwareVersion, SizeFirmwareVersion);
> PanicExDone ();
>  
> The reason for having multiple calls to PanicExData() is you might be limited 
> to a smallish buffer on the stack that you want to iterate over to do more 
> complex logging. The 4th rule of panic implies the agent getting the data may 
> have a lot more resources than the panicking code, so you might be able to 
> send a lot of data. 
>  
> The idea behind ProcessSystemContext is if you are called from arbitrary code 
> you might what to process SystemContext in a way similar to the unexpected 
> exception handler. If you are in the unexpected exception handler maybe you 
> want to let that code do the processing for you? I guess this implies we may 
> want to make some worker libs for processing exception data and building 
> stack frames etc. But that is more of an optimization after we add the Panic 
> API.
>  
> The other things we could think about adding to the PanicEx is a UUID that 
> could start the stream of data (or follow the panic string) and it could 
> define the encoding of the panic data. This would make it easy to post 
> process the panic data. Even if the panic data was just text it would point 
> at which automated parser you could use to extract useful info out of the 
> text stream. If you want to log locally, think DEBUG print then maybe you 
> want more type info about Data. So maybe you want UINT8 vs CHAR8 vs CHAR16 
> API options for PanicExData().
>  
> I thought I’d get the conversation started, looking forward to seeing what 
> others think.
>  
> Thanks,
>  
> Andrew Fish
>  
>  
>  
>  
> 



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#86597): https://edk2.groups.io/g/devel/message/86597
Mute This Topic: https://groups.io/mt/88877106/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [edk2-devel] I think we need a Panic API...

Reply via email to