On Mon, 8 May 2023 at 08:46, Laszlo Ersek <ler...@redhat.com> wrote: > > On 5/6/23 01:57, Ard Biesheuvel wrote: > > On Sat, 6 May 2023 at 01:27, Michael Brown <mc...@ipxe.org> wrote: > >> > >> On 05/05/2023 19:56, Laszlo Ersek wrote: > >>> I don't like the patch. For two reasons: > >>> > >>> (1) It papers over the actual issue. The problem should be fixed where > >>> it is, if possible. > >> > >> Agreed, but (as you have shown in > >> https://bugzilla.redhat.com/show_bug.cgi?id=2189136) the bug lies in > >> Windows code rather than in EDK2 code. If the goal is to allow these > >> buggy Windows builds to still be used with OVMF, then the only option is > >> to paper over the issue. We should do this only if it can be proven > >> safe to do so, of course. > >> > >>> (2) With the patch applied, NestedInterruptRaiseTPL() can return > >>> TPL_HIGH_LEVEL (as "InterruptedTPL"). Consequently, > >>> TimerInterruptHandler() [OvmfPkg/LocalApicTimerDxe/LocalApicTimerDxe.c] > >>> may pass TPL_HIGH_LEVEL back to NestedInterruptRestoreTPL(), as > >>> "InterruptedTPL". > >>> > >>> I believe that this in turn may invalidate at least one comment in > >>> NestedInterruptRestoreTPL(): > >>> > >>> // > >>> // Call RestoreTPL() to allow event notifications to be > >>> // dispatched. This will implicitly re-enable interrupts. > >>> // > >>> gBS->RestoreTPL (InterruptedTPL); > >>> > >>> Restoring TPL_HIGH_LEVEL does not re-enable interrupts -- nominally > >>> anyways. > >> > >> I agree that the comment is invalidated, but as far as I can tell the > >> logic remains safe. > >> > >> I will put together a patch to update the comments in > >> NestedInterruptTplLib to address the possibility of an interrupt > >> occurring (illegally) at TPL_HIGH_LEVEL. > >> > >>> (a) Make LocalApicTimerDxe Xen-specific again. It's only the OVMF Xen > >>> platform that really *needs* NestedInterruptTplLib. (Don't get me wrong: > >>> NestedInterruptTplLib is technically correct in all circumstances, but > >>> in practice it happens to be too strict.) > >>> > >>> (b) For the non-Xen OVMF platforms, re-create a LocalApicTimerDxe > >>> variant that effectively has commits a086f4a63bc0 and a24fbd606125 > >>> reverted. (We should keep 9bf473da4c1d.) This returns us to > >>> pre-239b50a86370 status -- that is, a timer interrupt handler that (a) > >>> does not try to be smart about nested interrupts, therefore one that is > >>> much simpler, and (b) is more tolerant of the Windows / cdboot.efi spec > >>> violation, (c) is vulnerable to the timer interrupt storm seen on Xen, > >>> but will never run on Xen. (Only the OVMF Xen platform is supposed to be > >>> launched on Xen.) > >> > >> I'm less keen on this because it reduces the runtime exposure of a very > >> complex piece of code, and will effectively cause that code to become > >> unmaintained. > >> > >> It's also satisfying (to me) that NestedInterruptTplLib provides a > >> provable upper bound on stack consumption due to interrupts, which can't > >> be guaranteed by the simpler pre-239b50a86370 scheme. > >> > >> Could we defer judgement until after I've fully reasoned through (and > >> documented) how NestedInterruptTplLib will work in the presence of > >> interrupts occurring at TPL_HIGH_LEVEL? > >> > > > > Would it be feasible for our firmware implementation to disable the > > timer interrupt at the timer end as well? > > > > E.g., > > > > RaiseTPL(HIGH):: > > > > CLI > > disarm timer > > > > > > RestoreTPL:: > > > > <complain if HIGH and interrupts enabled at CPU side> > > re-arm timer > > STI > > > > I can be entirely wrong here, but: > > - we looked for a solution (or workaround) to the original problem that > stays within the boundaries of OvmfPkg, so sinking tweaks into the core > TPL manipulation functions isn't ideal > > - regarding the TimerInterruptHandler() function(s) that do live in > OvmfPkg, there had been tweaks to signaling end-of-interrupt (which I > understand as sort of equivalent to your suggestion, as unless/until you > signal EOI, no more interrupts will be *generated*), but those had not > helped. The EOI was either too early and so we got the unbounded > nesting, or it was too late, and no interrupts were generated while (for > example) TPL_CALLBACK code would depend on timers with CheckEvent. See > bug 4162 -- that was what prompted Michael to revert the EOI placement > tweak and to implement NestedInterruptLib. > > Apologies if there are further interpretations of disarming the timer > that I'm missing! >
No, I think you've summed it up. And in any case, it is not OVMF's job to police what the loader is doing to that extent - we should flag it as a violation but beyond that, I don't think we should try to interfere with this non-compliant behavior. -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#104358): https://edk2.groups.io/g/devel/message/104358 Mute This Topic: https://groups.io/mt/98656860/21656 Group Owner: devel+ow...@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-