On Mon, 8 May 2023 at 08:46, Laszlo Ersek <ler...@redhat.com> wrote:
>
> On 5/6/23 01:57, Ard Biesheuvel wrote:
> > On Sat, 6 May 2023 at 01:27, Michael Brown <mc...@ipxe.org> wrote:
> >>
> >> On 05/05/2023 19:56, Laszlo Ersek wrote:
> >>> I don't like the patch. For two reasons:
> >>>
> >>> (1) It papers over the actual issue. The problem should be fixed where
> >>> it is, if possible.
> >>
> >> Agreed, but (as you have shown in
> >> https://bugzilla.redhat.com/show_bug.cgi?id=2189136) the bug lies in
> >> Windows code rather than in EDK2 code.  If the goal is to allow these
> >> buggy Windows builds to still be used with OVMF, then the only option is
> >> to paper over the issue.  We should do this only if it can be proven
> >> safe to do so, of course.
> >>
> >>> (2) With the patch applied, NestedInterruptRaiseTPL() can return
> >>> TPL_HIGH_LEVEL (as "InterruptedTPL"). Consequently,
> >>> TimerInterruptHandler() [OvmfPkg/LocalApicTimerDxe/LocalApicTimerDxe.c]
> >>> may pass TPL_HIGH_LEVEL back to NestedInterruptRestoreTPL(), as
> >>> "InterruptedTPL".
> >>>
> >>> I believe that this in turn may invalidate at least one comment in
> >>> NestedInterruptRestoreTPL():
> >>>
> >>>      //
> >>>      // Call RestoreTPL() to allow event notifications to be
> >>>      // dispatched.  This will implicitly re-enable interrupts.
> >>>      //
> >>>      gBS->RestoreTPL (InterruptedTPL);
> >>>
> >>> Restoring TPL_HIGH_LEVEL does not re-enable interrupts -- nominally 
> >>> anyways.
> >>
> >> I agree that the comment is invalidated, but as far as I can tell the
> >> logic remains safe.
> >>
> >> I will put together a patch to update the comments in
> >> NestedInterruptTplLib to address the possibility of an interrupt
> >> occurring (illegally) at TPL_HIGH_LEVEL.
> >>
> >>> (a) Make LocalApicTimerDxe Xen-specific again. It's only the OVMF Xen
> >>> platform that really *needs* NestedInterruptTplLib. (Don't get me wrong:
> >>> NestedInterruptTplLib is technically correct in all circumstances, but
> >>> in practice it happens to be too strict.)
> >>>
> >>> (b) For the non-Xen OVMF platforms, re-create a LocalApicTimerDxe
> >>> variant that effectively has commits a086f4a63bc0 and a24fbd606125
> >>> reverted. (We should keep 9bf473da4c1d.) This returns us to
> >>> pre-239b50a86370 status -- that is, a timer interrupt handler that (a)
> >>> does not try to be smart about nested interrupts, therefore one that is
> >>> much simpler, and (b) is more tolerant of the Windows / cdboot.efi spec
> >>> violation, (c) is vulnerable to the timer interrupt storm seen on Xen,
> >>> but will never run on Xen. (Only the OVMF Xen platform is supposed to be
> >>> launched on Xen.)
> >>
> >> I'm less keen on this because it reduces the runtime exposure of a very
> >> complex piece of code, and will effectively cause that code to become
> >> unmaintained.
> >>
> >> It's also satisfying (to me) that NestedInterruptTplLib provides a
> >> provable upper bound on stack consumption due to interrupts, which can't
> >> be guaranteed by the simpler pre-239b50a86370 scheme.
> >>
> >> Could we defer judgement until after I've fully reasoned through (and
> >> documented) how NestedInterruptTplLib will work in the presence of
> >> interrupts occurring at TPL_HIGH_LEVEL?
> >>
> >
> > Would it be feasible for our firmware implementation to disable the
> > timer interrupt at the timer end as well?
> >
> > E.g.,
> >
> > RaiseTPL(HIGH)::
> >
> > CLI
> > disarm timer
> >
> >
> > RestoreTPL::
> >
> > <complain if HIGH and interrupts enabled at CPU side>
> > re-arm timer
> > STI
> >
>
> I can be entirely wrong here, but:
>
> - we looked for a solution (or workaround) to the original problem that
> stays within the boundaries of OvmfPkg, so sinking tweaks into the core
> TPL manipulation functions isn't ideal
>
> - regarding the TimerInterruptHandler() function(s) that do live in
> OvmfPkg, there had been tweaks to signaling end-of-interrupt (which I
> understand as sort of equivalent to your suggestion, as unless/until you
> signal EOI, no more interrupts will be *generated*), but those had not
> helped. The EOI was either too early and so we got the unbounded
> nesting, or it was too late, and no interrupts were generated while (for
> example) TPL_CALLBACK code would depend on timers with CheckEvent. See
> bug 4162 -- that was what prompted Michael to revert the EOI placement
> tweak and to implement NestedInterruptLib.
>
> Apologies if there are further interpretations of disarming the timer
> that I'm missing!
>

No, I think you've summed it up. And in any case, it is not OVMF's job
to police what the loader is doing to that extent - we should flag it
as a violation but beyond that, I don't think we should try to
interfere with this non-compliant behavior.


-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#104358): https://edk2.groups.io/g/devel/message/104358
Mute This Topic: https://groups.io/mt/98656860/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Reply via email to