On 29/02/2024 19:09, Michael D Kinney wrote:
"When the DXE Foundation is notified that the EFI_CPU_ARCH_PROTOCOL has
been installed, then the full version of the Boot Service RestoreTPL()
can be made available.  When an attempt is made to restore the TPL
level
to level below EFI_TPL_HIGH_LEVEL, then the DXE Foundation should use
the services of the EFI_CPU_ARCH_PROTOCOL to enable interrupts."

I would claim that this spec is perhaps incomplete in this area that
that incomplete description is what allows the window for interrupt
nesting to occur.  This language is correct for UEFI code that calls
Raise/Restore TPL once the CPU Arch Protocol is available.  It does
not cover the required behavior to prevent nesting when processing
a timer interrupt.  This could be considered a gap in the UEFI/PI
spec content.

I think it's important that we don't phrase it as preventing interrupt nesting. The UEFI design *requires* that nested interrupts be allowed to happen, since callbacks at TPL_CALLBACK are allowed to wait for events at TPL_NOTIFY, and this can't happen without the existence of nested interrupts.

The problem is not nested interrupts per se: the problem is the potential for unlimited stack consumption.

- How does the proposed patch react to an interrupt occurring
(illegally) at TPL_HIGH_LEVEL (as happens with some versions of
Windows)?  As far as I can tell, it will result in mInterruptedTplMask
having bit 31 set but never cleared.  What impact will this have?

This behavior could potentially break any UEFI code that sets TPL to
TPL_HIGH_LEVEL as a lock, which can then cause any number of
undefined behaviors.  I am curious of you have a way to reproduce
this failure for testing purposed.

I would agree that any proposed change needs to comprehend this
Scenario if it can be reproduced with shipping OS images.

https://bugzilla.redhat.com/show_bug.cgi?id=2189136 was the original bug report in which it was discovered that Windows 11 would call RaiseTPL(TPL_HIGH_LEVEL) and then enable interrupts using the STI instruction.

It would be interesting to hear from anyone at Microsoft as to why this happens!

- How does the proposed patch react to potentially mismatched
RaisedTPL()/RestoreTPL() calls (e.g. oldTpl = RaiseTPL(TPL_CALLBACK)
followed by RaiseTPL(TPL_NOTIFY) followed by a single
RestoreTPL(oldTpl))?

The proposed patch only changes behavior when processing a timer
interrupt.  I do not think there would be any changes in behavior
for UEFI code that makes that sequence of calls.

The patch affects all callers of RaiseTPL() and RestoreTPL(). Given that it creates a new piece of shared state (mInterruptedTplMask), I'd like to see some kind of proof that it can correctly handle an arbitrary sequence of calls from unknown third-party code.

For example: consider an interrupt at TPL_APPLICATION with a third-party timer interrupt handler that does something like:

  OldTpl = RaiseTPL (TPL_HIGH_LEVEL);

  ... send EOI, call timer tick function, etc ...

  if (OldTpl < TPL_NOTIFY) {
    RestoreTPL (TPL_NOTIFY);
    ... do some weird OEM-specific thing ...
  }

  RestoreTPL ( OldTpl );

This is arguably a valid sequence of calls to RaiseTPL()/RestoreTPL(). With the patch as-is, mInterruptedTplMask will have flagged the TPL_APPLICATION bit but not the TPL_NOTIFY bit, and so the call to RestoreTPL(TPL_NOTIFY) *will* re-enable interrupts, which is against the intention of the patch.

Thanks,

Michael



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#116187): https://edk2.groups.io/g/devel/message/116187
Mute This Topic: https://groups.io/mt/104642317/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Reply via email to