> -----Original Message-----
> From: Michael Brown <mc...@ipxe.org>
> Sent: Thursday, February 29, 2024 9:39 AM
> To: Kinney, Michael D <michael.d.kin...@intel.com>;
> devel@edk2.groups.io; Ni, Ray <ray...@intel.com>
> Cc: Liming Gao <gaolim...@byosoft.com.cn>; Laszlo Ersek
> <ler...@redhat.com>; Paolo Bonzini <pbonz...@redhat.com>
> Subject: Re: [edk2-devel] [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack
> overflow issue due to nested interrupts
> 
> On 29/02/2024 16:43, Kinney, Michael D wrote:
> > Hi Michael,
> >
> > Can you provide a pointer to the UEFI Spec statement this breaks?
> 
> II-9.7.1.3 RestoreTPL():
> 
> "When the DXE Foundation is notified that the EFI_CPU_ARCH_PROTOCOL has
> been installed, then the full version of the Boot Service RestoreTPL()
> can be made available.  When an attempt is made to restore the TPL
> level
> to level below EFI_TPL_HIGH_LEVEL, then the DXE Foundation should use
> the services of the EFI_CPU_ARCH_PROTOCOL to enable interrupts."

I would claim that this spec is perhaps incomplete in this area that
that incomplete description is what allows the window for interrupt
nesting to occur.  This language is correct for UEFI code that calls
Raise/Restore TPL once the CPU Arch Protocol is available.  It does
not cover the required behavior to prevent nesting when processing
a timer interrupt.  This could be considered a gap in the UEFI/PI
spec content.

> 
> I suspect this is sufficient to veto the proposed design, though we
> could argue that the loosely worded "should" is technically not "must".
> 
> 
> If we still want to proceed with this design, then I have several other
> questions:
> 
> - How does the proposed patch react to an interrupt occurring
> (illegally) at TPL_HIGH_LEVEL (as happens with some versions of
> Windows)?  As far as I can tell, it will result in mInterruptedTplMask
> having bit 31 set but never cleared.  What impact will this have?

This behavior could potentially break any UEFI code that sets TPL to
TPL_HIGH_LEVEL as a lock, which can then cause any number of 
undefined behaviors.  I am curious of you have a way to reproduce 
this failure for testing purposed.

I would agree that any proposed change needs to comprehend this
Scenario if it can be reproduced with shipping OS images.

> 
> - How does the proposed patch react to potentially mismatched
> RaisedTPL()/RestoreTPL() calls (e.g. oldTpl = RaiseTPL(TPL_CALLBACK)
> followed by RaiseTPL(TPL_NOTIFY) followed by a single
> RestoreTPL(oldTpl))?

The proposed patch only changes behavior when processing a timer
interrupt.  I do not think there would be any changes in behavior
for UEFI code that makes that sequence of calls.  

> 
> 
> I believe the proposed patch is attempting to establish a new invariant
> as follows:
> 
> Once an interrupt has occured at a given TPL, then that *TPL* is
> conceptually considered to be in an "interrupted" state.  The *only*
> thing that can clear this "interrupted" state from the TPL is to return
> from the interrupt handler.
> 
> Note that this conceptual definition does not perfectly align with the
> bit flags in mInterruptedTplMask, since those bits will necessarily be
> set only some time after the interrupt occurs, and will have to be
> cleared before returning from the interrupt.  However, it is the
> conceptual definition that is relevant to the invariant.
> 
> The new invariant is that no code may execute at an "interrupted" TPL
> with interrupts enabled.  It is legitimate for code to raise to a
> higher
> TPL and to enable interrupts while there, and it is legitimate for code
> to execute in an "interrupted" TPL with interrupts disabled, but it is
> not legitimate for any code to reenable interrupts while still at an
> "interrupted" TPL.
> 
> It would be good to call out this invariant explicitly, so that authors
> of interrupt handlers are aware of the restrictions.  It would also
> clarify some of the logic (e.g. it provides the reason why interrupts
> must be disabled before lowering gEfiCurrentTpl in CoreRestoreTpl()).
> 
> It's also generally easier to reason about a stated invariant than to
> extrapolate from a list of complicated examples.

I agree that the proposed code change should describe the change in 
this way, and that the examples currently included in comments would
be better in a BZ.

> 
> Thanks,
> 
> Michael



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#116183): https://edk2.groups.io/g/devel/message/116183
Mute This Topic: https://groups.io/mt/104642317/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Reply via email to