On 01/29/21 01:59, Ankur Arora wrote: > Add CpuEject(), which handles the CPU ejection, and provides a holding > area for said CPUs. It is called via SmmCpuFeaturesRendezvousExit(), > at the tail end of the SMI handling.
(1) The functions introduced thus far by this patch series are all named "Verb + Object", which is great; so please call this function EjectCpu() as well, rather than CpuEject(). Modify all three of: subject line, commit message, patch body; please. > > Also UnplugCpus() now stashes APIC IDs of CPUs which need to be > ejected in CPU_HOT_EJECT_DATA.ApicIdMap. These are used by CpuEject() > to identify such CPUs. > > Cc: Laszlo Ersek <ler...@redhat.com> > Cc: Jordan Justen <jordan.l.jus...@intel.com> > Cc: Ard Biesheuvel <ard.biesheu...@arm.com> > Cc: Igor Mammedov <imamm...@redhat.com> > Cc: Boris Ostrovsky <boris.ostrov...@oracle.com> > Cc: Aaron Young <aaron.yo...@oracle.com> > Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=3132 > Signed-off-by: Ankur Arora <ankur.a.ar...@oracle.com> > --- > OvmfPkg/CpuHotplugSmm/CpuHotplug.c | 109 > +++++++++++++++++++++++++++++++++++-- > 1 file changed, 105 insertions(+), 4 deletions(-) > > diff --git a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c > b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c > index 70d69f6ed65b..526f51faf070 100644 > --- a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c > +++ b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c > @@ -14,6 +14,7 @@ > #include <Library/MmServicesTableLib.h> // gMmst > #include <Library/PcdLib.h> // PcdGetBool() > #include <Library/SafeIntLib.h> // SafeUintnSub() > +#include <Library/CpuHotEjectData.h> // CPU_HOT_EJECT_DATA > #include <Protocol/MmCpuIo.h> // EFI_MM_CPU_IO_PROTOCOL > #include <Protocol/SmmCpuService.h> // EFI_SMM_CPU_SERVICE_PROTOCOL > #include <Uefi/UefiBaseType.h> // EFI_STATUS (2) This will change due to the movement of the header file, but: please keep the #include directive list alphabetically sorted. > @@ -32,11 +33,12 @@ STATIC EFI_MM_CPU_IO_PROTOCOL *mMmCpuIo; > // > STATIC EFI_SMM_CPU_SERVICE_PROTOCOL *mMmCpuService; > // > -// This structure is a communication side-channel between the > +// These structures serve as communication side-channels between the > // EFI_SMM_CPU_SERVICE_PROTOCOL consumer (i.e., this driver) and provider > // (i.e., PiSmmCpuDxeSmm). > // > STATIC CPU_HOT_PLUG_DATA *mCpuHotPlugData; > +STATIC CPU_HOT_EJECT_DATA *mCpuHotEjectData; > // > // SMRAM arrays for fetching the APIC IDs of processors with pending events > (of > // known event types), for the time of just one MMI. > @@ -188,11 +190,53 @@ RevokeNewSlot: > } > > /** > + CPU Hot-eject handler, called from SmmCpuFeaturesRendezvousExit(), > + on each CPU at exit from SMM. > + > + If, the executing CPU is not being ejected, nothing to be done. > + If, the executing CPU is being ejected, wait in a CpuDeadLoop() > + until ejected. > + > + @param[in] ProcessorNum Index of executing CPU. > + > +**/ > +VOID > +EFIAPI > +CpuEject ( > + IN UINTN ProcessorNum > + ) > +{ > + // > + // APIC ID is UINT32, but mCpuHotEjectData->ApicIdMap[] is UINT64 > + // so use UINT64 throughout. > + // > + UINT64 ApicId; > + > + ApicId = mCpuHotEjectData->ApicIdMap[ProcessorNum]; > + if (ApicId == CPU_EJECT_INVALID) { > + return; > + } > + > + // > + // CPU(s) being unplugged get here from SmmCpuFeaturesSmiRendezvousExit() > + // after having been cleared to exit the SMI by the monarch and thus have > + // no SMM processing remaining. > + // > + // Given that we cannot allow them to escape to the guest, we pen them > + // here until the SMM monarch tells the HW to unplug them. > + // > + CpuDeadLoop (); > +} (3a) We can make this less resource-hungry, by replacing CpuDeadLoop() with: for (;;) { DisableInterrupts (); CpuSleep (); } This basically translates to a { CLI; HLT; } loop. (Both functions come from BaseLib, which CpuHotplugSmm already consumes, thus there is no need to modify #include's or [LibraryClasses].) (3b) Please refresh the CpuDeadLoop() reference in the function's leading comment as well. > + > +/** > Process to be hot-unplugged CPUs, per QemuCpuhpCollectApicIds(). > > For each such CPU, report the CPU to PiSmmCpuDxeSmm via > - EFI_SMM_CPU_SERVICE_PROTOCOL. If the to be hot-unplugged CPU is > - unknown, skip it silently. > + EFI_SMM_CPU_SERVICE_PROTOCOL and stash the APIC ID for later ejection. > + If the to be hot-unplugged CPU is unknown, skip it silently. > + > + Additonally, if we do stash any APIC IDs, also install a CPU eject handler > + which would handle the ejection. > > @param[in] ToUnplugApicIds The APIC IDs of the CPUs that are about to be > hot-unplugged. > @@ -216,9 +260,11 @@ UnplugCpus ( > { > EFI_STATUS Status; > UINT32 ToUnplugIdx; > + UINT32 EjectCount; > UINTN ProcessorNum; > > ToUnplugIdx = 0; > + EjectCount = 0; > while (ToUnplugIdx < ToUnplugCount) { > APIC_ID RemoveApicId; > > @@ -255,13 +301,41 @@ UnplugCpus ( > DEBUG ((DEBUG_ERROR, "%a: RemoveProcessor(" FMT_APIC_ID "): %r\n", > __FUNCTION__, RemoveApicId, Status)); > goto Fatal; > + } else { (Under patch v6 4/9, I request that the "goto" be replaced with a "return" -- my point (4) below applies regardless:) (4) Please don't add an "else" branch, if the first branch of the "if" ends with a jump statement. Because, in that case, the code that follows the "if" statement is not reachable after the first branch anyway. So please just unnest the next part: > + // > + // Stash the APIC IDs so we can do the actual ejection later. > + // > + if (mCpuHotEjectData->ApicIdMap[ProcessorNum] != CPU_EJECT_INVALID) { > + // > + // Since ProcessorNum and APIC-ID map 1-1, so a valid > + // mCpuHotEjectData->ApicIdMap[ProcessorNum] means something > + // is horribly wrong. > + // (5) To be honest, I would replace this with: // // - mCpuHotEjectData->ApicIdMap[ProcessorNum] is initialized to // CPU_EJECT_INVALID when mCpuHotEjectData->ApicIdMap is allocated, // // - mCpuHotEjectData->ApicIdMap[ProcessorNum] is restored to // CPU_EJECT_INVALID when the subject processor is ejected, // // - mMmCpuService->RemoveProcessor(ProcessorNum) invalidates // mCpuHotPlugData->ApicId[ProcessorNum], so a given ProcessorNum can // never match more than one APIC ID in a single invocation of // UnplugCpus(). // > + DEBUG ((DEBUG_ERROR, "%a: ProcessorNum %u maps to %llx, cannot " > + "map to " FMT_APIC_ID "\n", __FUNCTION__, ProcessorNum, > + mCpuHotEjectData->ApicIdMap[ProcessorNum], RemoveApicId)); (6a) The indentation of the 2nd and 3rd lines is incorrect. (6b) For logging UINTN values (i.e., ProcessorNum) portably between IA32 and X64, %u is not correct. Instead: - cast the UINTN value to UINT64 explicitly, - use the %Lu or %Lx format specifier. (6c) There is no "%llx" format string in edk2's PrintLib (no "ll" length modifier, to be more precise). UINT64 values need to be printed with "%lu" or "%lx", or -- identically -- with "%Lu" or "%Lx". I prefer the latter, because standard C does not define the "L" size modifier for integers, and that makes it clear that we're using an edk2-specific feature. The "l" (ell) length modifier could be misunderstood as "long" (which is something we don't use in edk2). (6d) FMT_APIC_ID is defined as "0x%08x"; to remain consistent with that, I would print the ApicIdMap element not just with "%Lx", but with "0x%016Lx". > + > + Status = EFI_INVALID_PARAMETER; > + goto Fatal; (7a) Please just "return EFI_ALREADY_STARTED". (7b) Please also modify the leading comment on the function -- the new return value EFI_ALREADY_STARTED should be documented. I suggest: @retval EFI_ALREADY_STARTED For the ProcessorNumber that EFI_SMM_CPU_SERVICE_PROTOCOL had assigned to one of the APIC ID in ToUnplugApicIds, mCpuHotEjectData->ApicIdMap already has an APIC ID stashed. (This should never happen.) > + } > + > + mCpuHotEjectData->ApicIdMap[ProcessorNum] = (UINT64)RemoveApicId; > + EjectCount++; > } > > ToUnplugIdx++; > } > > + if (EjectCount != 0) { > + // > + // We have processors to be ejected; install the handler. > + // > + mCpuHotEjectData->Handler = CpuEject; > + } > + (8) I suggest removing the "EjectCount" local variable, and setting the "Handler" member where you currently increment "EjectCount". > // > - // We've removed this set of APIC IDs from SMM data structures. > + // We've removed this set of APIC IDs from SMM data structures and > + // have installed an ejection handler if needed. > // > return EFI_SUCCESS; > > @@ -458,7 +532,13 @@ CpuHotplugEntry ( > // Our DEPEX on EFI_SMM_CPU_SERVICE_PROTOCOL guarantees that PiSmmCpuDxeSmm > // has pointed PcdCpuHotPlugDataAddress to CPU_HOT_PLUG_DATA in SMRAM. > // > + // Additionally, CPU Hot-unplug is available only if CPU Hotplug is, so > + // the same DEPEX also guarantees that PcdCpuHotEjectDataAddress points > + // to CPU_HOT_EJECT_DATA in SMRAM. > + // (9) I don't see the relevance of "hot-unplug depends on hot-plug" here. I recommend the following comment instead: // // Our DEPEX on EFI_SMM_CPU_SERVICE_PROTOCOL guarantees that PiSmmCpuDxeSmm // has pointed: // - PcdCpuHotPlugDataAddress to CPU_HOT_PLUG_DATA in SMRAM, // - PcdCpuHotEjectDataAddress to CPU_HOT_EJECT_DATA in SMRAM, if the // possible CPU count is greater than 1. // > mCpuHotPlugData = (VOID *)(UINTN)PcdGet64 (PcdCpuHotPlugDataAddress); > + mCpuHotEjectData = (VOID *)(UINTN)PcdGet64 (PcdCpuHotEjectDataAddress); > + > if (mCpuHotPlugData == NULL) { > Status = EFI_NOT_FOUND; > DEBUG ((DEBUG_ERROR, "%a: CPU_HOT_PLUG_DATA: %r\n", __FUNCTION__, > Status)); > @@ -470,6 +550,9 @@ CpuHotplugEntry ( > if (mCpuHotPlugData->ArrayLength == 1) { > return EFI_UNSUPPORTED; > } > + ASSERT (mCpuHotEjectData && > + (mCpuHotPlugData->ArrayLength == mCpuHotEjectData->ArrayLength)); > + > // > // Allocate the data structures that depend on the possible CPU count. > // (10) To remain consistent with the check performed on "mCpuHotPlugData", please do: if (mCpuHotEjectData == NULL) { Status = EFI_NOT_FOUND; } else if (mCpuHotPlugData->ArrayLength != mCpuHotEjectData->ArrayLength) { Status = EFI_INVALID_PARAMETER; } else { Status = EFI_SUCCESS; } if (EFI_ERROR (Status)) { DEBUG ((DEBUG_ERROR, "%a: CPU_HOT_EJECT_DATA: %r\n", __FUNCTION__, Status)); goto Fatal; } ( As a digression, I'll make some comments on the ASSERT() too: - Given ASSERT ((C1) && (C2)), it is best to express the same as ASSERT (C1); ASSERT (C2); -- the effect is the same, but the error messages have finer granularity. - Checking a pointer against NULL must be explicit at all times, in edk2. IOW, ASSERT (mCpuHotEjectData) should be spelled ASSERT (mCpuHotEjectData != NULL). ) > @@ -552,6 +635,24 @@ CpuHotplugEntry ( > // > SmbaseInstallFirstSmiHandler (); > > + if (mCpuHotEjectData) { (11) This condition is guaranteed to evaluate to TRUE; see the ASSERT() above. Anyway, ignore this... > + UINT32 Idx; (12) Incorrect indentation, but ignore this too... > + // > + // For CPU ejection we need to map ProcessorNum -> APIC_ID. By the time > + // we need the mapping, however, the Processor's APIC ID has already been > + // removed from SMM data structures. So we will maintain a local map > + // in mCpuHotEjectData->ApicIdMap. > + // > + for (Idx = 0; Idx < mCpuHotEjectData->ArrayLength; Idx++) { > + mCpuHotEjectData->ApicIdMap[Idx] = CPU_EJECT_INVALID; > + } (13) ... because this init loop should be moved to patch #6 (subject "OvmfPkg/SmmCpuFeaturesLib: init CPU ejection state"), as I mentioned there... > + > + // > + // Wait to init the handler until an ejection is warranted > + // > + mCpuHotEjectData->Handler = NULL; (14) ... and because this nulling is performed by patch #6 already (subject "OvmfPkg/SmmCpuFeaturesLib: init CPU ejection state"). Therefore, this whole conditional block should be removed please. Thanks! Laszlo > + } > + > return EFI_SUCCESS; > > ReleasePostSmmPen: > -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#71023): https://edk2.groups.io/g/devel/message/71023 Mute This Topic: https://groups.io/mt/80199926/21656 Group Owner: devel+ow...@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-