Laszlo, Michael, When timer interrupt happens, the calling flow is: [Timer Interrupt #1] CPU IDT handler calls into LocalApicTimerDxe::TimerInterruptHandler(), which [Timer Interrupt #1]1. RaiseTPL (HIGH) from APPLICATION causing CPU interrupt be disabled. [Timer Interrupt #1]2. Send APIC EOI (ACK the interrupt received so APIC can continue generate interrupts) [Timer Interrupt #1]3. Call DxeCore::CoreTimerTick() [Timer Interrupt #1]4. RestoreTPL (APPLICATION) from HIGH. (All callbacks registered at NOTIFY and CALLBACK will run.) [Timer Interrupt #1]4.1. When there are Callbacks registered at NOTIFY, current TPL is set to NOTIFY and interrupt is enabled. CoreDispatchEventNotifies() is called to run the NOTIFY callbacks. [Timer Interrupt #2] Immediately after interrupt is enabled, CPU runs to LocalApicTimerDxe::TimerInterruptHandler(). But stack is not fully popped to the initial state. [Timer Interrupt #2]1. RaiseTPL (HIGH) from NOTIFY causing CPU interrupt be disabled. [Timer Interrupt #2]2. Send APIC EOI (ACK the interrupt received so APIC can continue generate interrupts) [Timer Interrupt #2]3. Call DxeCore::CoreTimerTick() [Timer Interrupt #2]4. RestoreTPL (NOTIFY) from HIGH. No callback runs as no callback can be registered at TPL > NOTIFY. In the end of RestoreTPL(), CPU interrupt is enabled. [Timer Interrupt #3] Immediately after interrupt is enabled, CPU runs to LocalApicTimerDxe::TimerInterruptHandler(). But stack is not fully popped to the initial state. [Timer Interrupt #3]1. RaiseTPL (HIGH) from NOTIFY causing CPU interrupt be disabled. [Timer Interrupt #3]2. Send APIC EOI (ACK the interrupt received so APIC can continue generate interrupts) [Timer Interrupt #3]3. Call DxeCore::CoreTimerTick() [Timer Interrupt #3]4. RestoreTPL (NOTIFY) from HIGH. No callback runs as no callback can be registered at TPL > NOTIFY. In the end of RestoreTPL(), CPU interrupt is enabled. [Timer Interrupt #4] Immediately after interrupt is enabled, CPU runs to LocalApicTimerDxe::TimerInterruptHandler(). But stack is not fully popped to the initial state. [Timer Interrupt #4]...
The above flow shows endless re-entrance of timer interrupt handler. But, my question is: above flow only can happen in real platform when the below 4 steps occupies more time than the timer period (usually 10ms). [Timer Interrupt #2]1. RaiseTPL (HIGH) from NOTIFY causing CPU interrupt be disabled. [Timer Interrupt #2]2. Send APIC EOI (ACK the interrupt received so APIC can continue generate interrupts) [Timer Interrupt #2]3. Call DxeCore::CoreTimerTick() [Timer Interrupt #2]4. RestoreTPL (NOTIFY) from HIGH. No callback runs as no callback can be registered at TPL > NOTIFY. In the end of RestoreTPL(), CPU interrupt is enabled. But, in my opinion, it's impossible. Thanks, Ray > -----Original Message----- > From: Laszlo Ersek <ler...@redhat.com> > Sent: Tuesday, January 16, 2024 11:37 PM > To: devel@edk2.groups.io; mc...@ipxe.org; kra...@redhat.com > Cc: Pedro Falcato <pedro.falc...@gmail.com>; Ni, Ray <ray...@intel.com>; > Kinney, Michael D <michael.d.kin...@intel.com>; Desimone, Nathaniel L > <nathaniel.l.desim...@intel.com>; Kumar, Rahul R > <rahul.r.ku...@intel.com> > Subject: Re: [edk2-devel] [PATCH 1/6] UefiCpuPkg/LocalApicTimerDxe: > Duplicate OvmfPkg/LocalApicTimerDxe driver > > On 1/16/24 16:16, Michael Brown wrote: > > On 16/01/2024 14:34, Laszlo Ersek wrote: > >> On 1/16/24 10:48, Michael Brown wrote: > >> IOW, my impression is that NestedInterruptTplLib can certainly handle > >> all scenarios thrown at it, but where it really matters is in the face > >> of an interrupt storm (not just "normal nesting"), and a storm is > >> unlikely (or even impossible?) on physical hardware. > >> > >> ... Oh, scratch that. "Interrupt storm" simply means that interrupts are > >> being delivered at a rate higher than the handler routine can service > >> them. IOW, the "storm" is not that interrupts are delivered *very > >> rapidly* in an absoulte sense. If interrupts are delivered at normal > >> frequency, but the handler is too slow to service *even that rate*, then > >> that also qualifies as "storm", because the nesting depth will *keep > >> growing*. It's not really the growth rate that matters; what matter is > >> the *trend*, i.e., the fact that there *is* growth (the stack gets > >> deeper and deeper). The stack might not overflow immediately, and if the > >> handler speeds up (for whatever reason), the stack might recover, but > >> there is nothing to prevent an overflow. > >> > >> So, in the end, I think you've convinced me. > > > > :) > > > >>> I'm happy to send a patch to migrate NestedInterruptTplLib to > >>> MdeModulePkg, so that it can be consumed outside of OvmfPkg. Shall I > do > >>> this? > >> > >> Sounds like a valid idea to me. > >> > >> Could be greatly supported by a test case (to be run on the bare metal) > >> installing a slow handler that *eventually* exhausted the stack, when > >> not using NestedInterruptTplLib. > >> > >> (FWIW, IIRC, the UEFI spec warns about this -- it says something like, > >> "return from TPL_HIGH as soon as you can, otherwise the system will > >> become unstable".) > >> > >> Sorry for the wall of text, I find this very difficult to reason about. > > > > I also find it very difficult to reason about, which is why > > NestedInterruptRestoreTpl() has 126 lines of comments providing a > > semi-formal proof of correctness for a mere 15 statements of C code! > > > > In particular, I find it difficult to reason about when it would be safe > > for a platform to *not* use NestedInterruptTplLib. It's clearly > > empirically difficult to trigger stack underflow via an interrupt > > "storm" on physical hardware, but I'm not convinced it's impossible. > > > > I find it mentally easier to rely on the hard guarantee that > > NestedInterruptTplLib provides: that nested interrupts will continue to > > be delivered but that the number of interrupt-induced stack frames is > > bounded by the (small, finite) number of distinct TPL levels in existence. > > > > > > > > While developing NestedInterruptTplLib, I did hack together a test case > > for a slow handler that would deliberately induce an interrupt storm, > > since I needed this to test that my code was working. When triggered, > > this test would cause the machine to effectively hang due to servicing > > an endless storm of timer interrupts. Before NestedInterruptTplLib, the > > stack would soon underflow and would typically cause a reboot (or other > > crash). With NestedInterruptTplLib the machine would continue to > > service interrupts indefinitely. > > > > How might such a test case be included in upstream EDK2? I'm > > peripherally aware of EDK2 test infrastructure such as UEFI SCT, but > > I've never interacted with it yet. > > I'm vaguely aware of a unit test framework inside edk2, but the best I > can give you is just this link: > > https://github.com/tianocore/edk2/tree/master/UnitTestFrameworkPkg#unit > -test-framework-package > > There are some files under the directory "MdeModulePkg/Test" too; > git-log on that subdir, and perhaps the MdeModulePkg maintainers, might > provide more pointers. > > The end of the readme linked above says to ask Bret, Mike and Sean, as well. > > Laszlo -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#113932): https://edk2.groups.io/g/devel/message/113932 Mute This Topic: https://groups.io/mt/103734961/21656 Group Owner: devel+ow...@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/leave/9847357/21656/1706620634/xyzzy [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-