On 10/25/23 13:42, Yuanhao Xie wrote:
> Aim:
> - To solve the assertion that checks if CpuMpData->FinishedCount
> equals (CpuMpData->CpuCount - 1). The assertion arises from a timing
> discrepancy between the BSP's completion of startup signal checks and
> the APs' incrementation of the FinishedCount.
> - This patch also ensures that "finished" reporting from the APs is as
> later as possible.
> 
> More specifially:
> 
> In the SwitchApContext() function, the BSP trigers
> the startup signal and check whether the APs have received it. After
> completing this check, the BSP then verifies if the FinishedCount is
> equal to CpuCount-1.
> 
> On the AP side, upon receiving the startup signal, they invoke
> SwitchContextPerAp() and increase the FinishedCount to indicate their
> activation. However, even when all APs have received the startup signal,
> they might not have finished incrementing the FinishedCount. This timing
> gap results in the triggering of the assertion.
> 
> Solution:
> Instead of assertion, use while loop to waits until all the APs have
> incremented the FinishedCount.
> 
> Fixes: 964a4f032dcd
> 
> Signed-off-by: Yuanhao Xie <yuanhao....@intel.com>
> Cc: Ray Ni <ray...@intel.com>
> Cc: Eric Dong <eric.d...@intel.com>
> Cc: Rahul Kumar <rahul1.ku...@intel.com>
> Cc: Tom Lendacky <thomas.lenda...@amd.com>
> ---
>  UefiCpuPkg/Library/MpInitLib/MpLib.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c 
> b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> index 6f1456cfe1..9a6ec5db5c 100644
> --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
> +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> @@ -913,8 +913,8 @@ DxeApEntryPoint (
>    UINTN  ProcessorNumber;
>  
>    GetProcessorNumber (CpuMpData, &ProcessorNumber);
> -  InterlockedIncrement ((UINT32 *)&CpuMpData->FinishedCount);
>    RestoreVolatileRegisters (&CpuMpData->CpuData[0].VolatileRegisters, FALSE);
> +  InterlockedIncrement ((UINT32 *)&CpuMpData->FinishedCount);
>    PlaceAPInMwaitLoopOrRunLoop (
>      CpuMpData->ApLoopMode,
>      CpuMpData->CpuData[ProcessorNumber].StartupApSignal,
> @@ -2201,7 +2201,12 @@ MpInitLibInitialize (
>        // looping process there.
>        //
>        SwitchApContext (MpHandOff);
> -      ASSERT (CpuMpData->FinishedCount == (CpuMpData->CpuCount - 1));
> +      //
> +      // Wait for all APs finished initialization
> +      //
> +      while (CpuMpData->FinishedCount < (CpuMpData->CpuCount - 1)) {
> +        CpuPause ();
> +      }
>  
>        //
>        // Set Apstate as Idle, otherwise Aps cannot be waken-up again.

Reviewed-by: Laszlo Ersek <ler...@redhat.com>

The change is not testable using OVMF, because OVMF (intentionally) uses
ApLoopMode=ApInHltLoop, and in that case, neither hunk is reachable.
(Accordingly, the log message reports WaitLoopExecutionMode as zero.)

I've still regression-tested this change, with my usual configs:

- OVMF IA32 with SMM_REQUIRE, on q35
- OVMF IA32X64 with SMM_REQUIRE, on q35
- OVMF X64 without SMM_REQUIRE, on pc (i440fx)

The test goes like

- boot with 1 cold-plugged plus 2 more hot-pluggable VCPUs
- [*]
- hotplug 2 VCPUs
- [*]
- hot-unplug 2 VCPUs
- [*]
- poweroff

where [*] stands for:
- run efibootmgr, bound to each online VCPU in separation
- ACPI S3 suspend/resume
- run efibootmgr, bound to each online VCPU in separation

I used Fedora and RHEL guests.

So:

Regression-tested-by: Laszlo Ersek <ler...@redhat.com>

Thanks
Laszlo



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#110101): https://edk2.groups.io/g/devel/message/110101
Mute This Topic: https://groups.io/mt/102176057/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: 
https://edk2.groups.io/g/devel/leave/9847357/21656/1706620634/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Reply via email to