The symptom is that just when the OS would be launched, the
multiprocessor guest hangs. This is how the log terminates:
FSOpen: Open '\370ac550dcaa48b88f1ca75ad903b0e7\4.16.7-100.fc26.i686\linux'
Success
[Security] 3rd party image[0] can be loaded after EndOfDxe:
PciRoot(0x0)/Pci(0x2,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x0)/HD(1,GPT,D9F1FBA5-E5D3-440A-B6A7-87B593E4FAB1,0x800,0x100000)/\370ac550dcaa48b88f1ca75ad903b0e7\4.16.7-100.fc26.i686\linux.
InstallProtocolInterface: [EfiLoadedImageProtocol] 853A03A8
Loading driver at 0x00083E72000 EntryPoint=0x00083E76680
InstallProtocolInterface: [EfiLoadedImageDevicePathProtocol] 853A0510
ProtectUefiImageCommon - 0x853A03A8
- 0x0000000083E72000 - 0x0000000000E75000
FSOpen: Open '370ac550dcaa48b88f1ca75ad903b0e7\4.16.7-100.fc26.i686\initrd'
Success
PixelBlueGreenRedReserved8BitPerColor
ConvertPages: range 400000 - 1274FFF covers multiple entries
SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
CpuDxe: 5-Level Paging = 0
[HANG]
Meanwhile some guest CPUs are pegged.
Normally, when this series is not applied, the next log entry is (in
place of [HANG]):
MpInitChangeApLoopCallback() done!
I've identified this patch by bisection, after applying the series on
current master (137c2c6eff67, "Revert "BaseTools/PatchCheck.py: Add
LicenseCheck"", 2020-07-31).
Here's the bisection log:
git bisect start
# good: [137c2c6eff67f4750d77e8e40af6683c412d3ed0] Revert "BaseTools/PatchCheck.py:
Add LicenseCheck"
git bisect good 137c2c6eff67f4750d77e8e40af6683c412d3ed0
# bad: [d3f7971f4f70c9f39170b42af837e58e59435ad3] Maintainers.txt: Add
reviewers for the OvmfPkg SEV-related files
git bisect bad d3f7971f4f70c9f39170b42af837e58e59435ad3
# good: [9551e3fc61ba0c0ddf8e79b425a22aa7dd61cb8b] OvmfPkg/VmgExitLib: Add
support for RDTSCP NAE events
git bisect good 9551e3fc61ba0c0ddf8e79b425a22aa7dd61cb8b
# good: [10acf16b38522d8a1b538b3aa432daaa72c0e97b] OvmfPkg: Reserve a page in
memory for the SEV-ES usage
git bisect good 10acf16b38522d8a1b538b3aa432daaa72c0e97b
# good: [ccb4267e76b6474657c41bef7e76a980930c22ea] UefiCpuPkg: Add a 16-bit
protected mode code segment descriptor
git bisect good ccb4267e76b6474657c41bef7e76a980930c22ea
# good: [94e238ae37505cfb081f3b9b4632067e4a113cf9] OvmfPkg: Use the SEV-ES work
area for the SEV-ES AP reset vector
git bisect good 94e238ae37505cfb081f3b9b4632067e4a113cf9
# bad: [16c21b9d10b032d66d4105dd4693fd9dc6e6ec18] UefiCpuPkg/MpInitLib: Prepare
SEV-ES guest APs for OS use
git bisect bad 16c21b9d10b032d66d4105dd4693fd9dc6e6ec18
# good: [49855596e383ab2aa6410fa060e22d4817d8e64e] OvmfPkg: Move the GHCB
allocations into reserved memory
git bisect good 49855596e383ab2aa6410fa060e22d4817d8e64e
# first bad commit: [16c21b9d10b032d66d4105dd4693fd9dc6e6ec18]
UefiCpuPkg/MpInitLib: Prepare SEV-ES guest APs for OS use
So clearly we should be looking for an IA32-specific change, or
IA32-specific *omission*, in this patch, that could cause the problem.
The bug is the following:
On 07/30/20 20:43, Tom Lendacky wrote:
diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.h
b/UefiCpuPkg/Library/MpInitLib/MpLib.h
index b1a9d99cb3eb..267aa5201c50 100644
--- a/UefiCpuPkg/Library/MpInitLib/MpLib.h
+++ b/UefiCpuPkg/Library/MpInitLib/MpLib.h
@@ -349,8 +350,11 @@ VOID
IN BOOLEAN MwaitSupport,
IN UINTN ApTargetCState,
IN UINTN PmCodeSegment,
+ IN UINTN Pm16CodeSegment,
IN UINTN TopOfApStack,
- IN UINTN NumberToFinish
+ IN UINTN NumberToFinish,
+ IN UINTN SevEsAPJumpTable,
+ IN UINTN WakeupBuffer
);
/**
(1) This hunk modifies the parameter list of functions pointed-to by
ASM_RELOCATE_AP_LOOP.
diff --git a/UefiCpuPkg/Library/MpInitLib/DxeMpLib.c
b/UefiCpuPkg/Library/MpInitLib/DxeMpLib.c
index 9115ff9e3e30..7165bcf3124a 100644
--- a/UefiCpuPkg/Library/MpInitLib/DxeMpLib.c
+++ b/UefiCpuPkg/Library/MpInitLib/DxeMpLib.c
@@ -330,17 +350,26 @@ RelocateApLoop (
BOOLEAN MwaitSupport;
ASM_RELOCATE_AP_LOOP AsmRelocateApLoopFunc;
UINTN ProcessorNumber;
+ UINTN StackStart;
MpInitLibWhoAmI (&ProcessorNumber);
CpuMpData = GetCpuMpData ();
MwaitSupport = IsMwaitSupport ();
+ if (CpuMpData->SevEsIsEnabled) {
+ StackStart = CpuMpData->SevEsAPResetStackStart;
+ } else {
+ StackStart = mReservedTopOfApStack;
+ }
AsmRelocateApLoopFunc = (ASM_RELOCATE_AP_LOOP) (UINTN) mReservedApLoopFunc;
AsmRelocateApLoopFunc (
MwaitSupport,
CpuMpData->ApTargetCState,
CpuMpData->PmCodeSegment,
- mReservedTopOfApStack - ProcessorNumber * AP_SAFE_STACK_SIZE,
- (UINTN) &mNumberToFinish
+ CpuMpData->Pm16CodeSegment,
+ StackStart - ProcessorNumber * AP_SAFE_STACK_SIZE,
+ (UINTN) &mNumberToFinish,
+ CpuMpData->SevEsAPBuffer,
+ CpuMpData->WakeupBuffer
);
//
// It should never reach here
(2) This hunk modifies the call site, in accordance with the prototype
change at (1).
diff --git a/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
b/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
index 6956b408d004..3b8ec477b8b3 100644
--- a/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
+++ b/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
@@ -465,6 +465,10 @@ BITS 16
; - IP for Real Mode (two bytes)
; - CS for Real Mode (two bytes)
;
+ ; This label is also used with AsmRelocateApLoop. During MP finalization,
+ ; the code from PM16Mode to SwitchToRealProcEnd is copied to the start of
+ ; the WakeupBuffer, allowing a parked AP to be booted by an OS.
+ ;
PM16Mode:
mov eax, cr0 ; Read CR0
btr eax, 0 ; Set PE=0
@@ -487,32 +491,95 @@ PM16Mode:
SwitchToRealProcEnd:
;-------------------------------------------------------------------------------------
-; AsmRelocateApLoop (MwaitSupport, ApTargetCState, PmCodeSegment,
TopOfApStack, CountTofinish);
+; AsmRelocateApLoop (MwaitSupport, ApTargetCState, PmCodeSegment,
Pm16CodeSegment, TopOfApStack, CountTofinish, SevEsAPJumpTable, WakeupBuffer);
;-------------------------------------------------------------------------------------
global ASM_PFX(AsmRelocateApLoop)
ASM_PFX(AsmRelocateApLoop):
AsmRelocateApLoopStart:
BITS 64
(3) Unfortunately, the patch only adapts the X64 implementation of the
AsmRelocateApLoopStart() function to the new prototype; the IA32
implementation no longer matches the call site.
(I'm not sure if the intent was for the IA32 version to simply ignore
the new parameters, but even in that case, the "Pm16CodeSegment"
parameter is inserted in the middle of the parameter list, likely
offsetting the rest.)
The problem is foreshadowed even by hunk (2). Namely, in hunk (2), the
s/mReservedTopOfApStack/StackStart/
replacement is *more difficult* to verify than necessary -- exactly
because "CpuMpData->Pm16CodeSegment" is inserted *before* it.