On Fri, Apr 28, 2023 at 11:26:32AM +0000, Michael Brown wrote: > On 28/04/2023 10:10, Gerd Hoffmann wrote: > > OVMF can't guarantee that the ASSERT() doesn't happen. Misbehaving > > EFI applications can trigger this. So log a warning instead and try > > to continue. > > > > Reproducer: Fetch windows 11 22H2 iso image, boot it in qemu with OVMF. > > > > Traced to BootServices->Stall() being called with IPL=TPL_HIGH_LEVEL > > and Interrupts /enabled/ while windows is booting. > > Do we know how the system ended up in a state of being at TPL_HIGH_LEVEL > with interrupts enabled? This ought not to be possible: the code in EDK2 > will (as far as I can tell) always maintain the invariant that interrupts > are disabled while at TPL_HIGH_LEVEL.
Not clear. Have not found anything in edk2 either. With a breakpoint at BootServices->Stall (CoreStall function) I can see it being called with IPL=TPL_HIGH_LEVEL, most likely from windows boot code, stack traces point to locations in low memory where no edk2 code is located. After running for a while (and probably loading files from cdrom to memory) these calls suddenly start happen with interrupts enabled. I suspect the windows boot loader does something fishy here, but I can't proof it, I have not yet pinned down the exact location where interrupts get enabled while running at IPL=TPL_HIGH_LEVEL (which is what I suspect is happening, but of course this is not the only possible theory how that ASSERT got triggered). Not fully sure how to best continue debugging this, I don't think gdb can set an watchpoint on eflags.if ... > > --- a/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c > > +++ b/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c > > @@ -39,7 +39,15 @@ NestedInterruptRaiseTPL ( > > // > > ASSERT (GetInterruptState () == FALSE); > > InterruptedTPL = gBS->RaiseTPL (TPL_HIGH_LEVEL); > > - ASSERT (InterruptedTPL < TPL_HIGH_LEVEL); > > + if (InterruptedTPL >= TPL_HIGH_LEVEL) { > > + DEBUG (( > > + DEBUG_WARN, > > + "%a: Called at IPL %d, trying to fixup and continue...\n", > > + __func__, > > + InterruptedTPL > > + )); > > + InterruptedTPL = TPL_HIGH_LEVEL - 1; > > + } > > This workaround looks wrong to me: it will cause the subsequent call to > NestedInterruptRestoreTPL() to drop the TPL back down to TPL_HIGH_LEVEL-1, > which is lower than it would have been when the interrupt occurred. The > completed hardware interrupt would then result in an overall change of TPL, > which cannot be correct. The idea was to set InterruptedTPL to the highest level which is allowed to run with interrupts enabled and continue running with interrupts enabled, hoping that things get back into normal once the next timer interrupt arrives. Which -- now that I'm thinking about it again -- is clearly bogus, we are in the timer interrupt so IRQs have already been disabled at that point, so the fixup idea doesn't make much sense. So just leave InterruptedTPL alone and hope for the best? take care, Gerd -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#103778): https://edk2.groups.io/g/devel/message/103778 Mute This Topic: https://groups.io/mt/98554842/21656 Group Owner: devel+ow...@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-