On 1/16/24 10:48, Michael Brown wrote: > On 16/01/2024 08:47, Gerd Hoffmann wrote: >> I think the reason is that the next timer interrupt arriving while the >> handler is running still is *much* more likely in virtual machines >> because the vCPU does not get 100% of the of the physical CPU time >> slice. > > The interrupt handler can run for an arbitrary length of time, because > the call to RestoreTPL() can end up calling an application callback > which in turn waits on further timer interrupts. > > This is a legitimate use case within UEFI, so all timer interrupt > handlers (not just in virtual machines) need to allow for the > possibility that nested interrupts will occur.
The more naive, original interrupt handler implementation (i.e., the one without NestedInterruptTplLib) still "allowed" for nesting, simply by virtue of letting itself be interrupted, if I remember correctly. That in itself didn't cause a problem; the problem arose when this reentrant interruption got limitlessly deep, due to interrupts having been queued on the host side, and then being injected as a "storm" in the guest. The naive handler impl. effectively translated the host-side "interrupt queue" to a "guest side stack". "queue length" became "stack depth", leading to stack overflow. Thus, even the original (more naive) handler could work fine (for nesting too) as long as there was no storm (put differently, as long as queue length, aka stack depth, were small); is that correct? (I admit that I can't really recall the details at this point.) The sophistication of NestedInterruptTplLib is that it supports nesting while *resisting* a storm (= preventing infinite nesting by careful masking of interrupt delivery, IIRC). It does not eagerly slurp all pending (queued) interrupts into the handler stack. IOW, my impression is that NestedInterruptTplLib can certainly handle all scenarios thrown at it, but where it really matters is in the face of an interrupt storm (not just "normal nesting"), and a storm is unlikely (or even impossible?) on physical hardware. ... Oh, scratch that. "Interrupt storm" simply means that interrupts are being delivered at a rate higher than the handler routine can service them. IOW, the "storm" is not that interrupts are delivered *very rapidly* in an absoulte sense. If interrupts are delivered at normal frequency, but the handler is too slow to service *even that rate*, then that also qualifies as "storm", because the nesting depth will *keep growing*. It's not really the growth rate that matters; what matter is the *trend*, i.e., the fact that there *is* growth (the stack gets deeper and deeper). The stack might not overflow immediately, and if the handler speeds up (for whatever reason), the stack might recover, but there is nothing to prevent an overflow. So, in the end, I think you've convinced me. > >> So on OVMF we will continue to need NestedInterruptTplLib. I like the >> idea to have a Null version of the library, so OVMF does not need its >> own version of the driver just because of NestedInterruptTplLib. > > I agree that there should not need to be two versions of LocalApicTimerDxe. > > I would suggest moving NestedInterruptTplLib from OvmfPkg to MdeModulePkg. > > The code is complex, but for the caller the complexity is completely > hidden behind the calls to NestedInterruptRaiseTPL() and > NestedInterruptRestoreTPL(), which straightforwardly replace what would > otherwise be (unsafe) calls to RaiseTPL() and RestoreTPL(), as shown in > > https://github.com/tianocore/edk2/commit/a086f4a#diff-1418ec21e24e4ce2f3d621113585e3bea11fbcfa3b77d54046e61be7928e0c92 > > For a new CPU architecture, the only requirement is to provide a short > fragment (~5 lines) of code that can clear the interrupts-enabled flag > in the EFI_SYSTEM_CONTEXT, as shown in > OvmfPkg/Library/NestedInterruptTplLib/Iret.c. > > I'm happy to send a patch to migrate NestedInterruptTplLib to > MdeModulePkg, so that it can be consumed outside of OvmfPkg. Shall I do > this? Sounds like a valid idea to me. Could be greatly supported by a test case (to be run on the bare metal) installing a slow handler that *eventually* exhausted the stack, when not using NestedInterruptTplLib. (FWIW, IIRC, the UEFI spec warns about this -- it says something like, "return from TPL_HIGH as soon as you can, otherwise the system will become unstable".) Sorry for the wall of text, I find this very difficult to reason about. Laszlo -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#113902): https://edk2.groups.io/g/devel/message/113902 Mute This Topic: https://groups.io/mt/103734961/21656 Group Owner: devel+ow...@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/leave/9847357/21656/1706620634/xyzzy [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-