On 02/24/21 04:44, Ankur Arora wrote: > On 2021-02-23 1:39 p.m., Laszlo Ersek wrote: >> On 02/22/21 08:19, Ankur Arora wrote:
>>> + UINT32 Idx; >>> + >>> + for (Idx = 0; Idx < mCpuHotEjectData->ArrayLength; Idx++) { >>> + UINT64 QemuSelector; >>> + >>> + QemuSelector = mCpuHotEjectData->QemuSelectorMap[Idx]; >>> + >>> + if (QemuSelector != CPU_EJECT_QEMU_SELECTOR_INVALID) { >>> + // >>> + // This to-be-ejected-CPU has already received the BSP's SMI >>> exit >>> + // signal and, will execute SmmCpuFeaturesRendezvousExit() >>> + // followed by this callback or is already waiting in the >>> + // CpuSleep() loop below. >>> + // >>> + // Tell QEMU to context-switch it out. >>> + // >>> + QemuCpuhpWriteCpuSelector (mMmCpuIo, (UINT32) QemuSelector); >>> + QemuCpuhpWriteCpuStatus (mMmCpuIo, QEMU_CPUHP_STAT_EJECT); >>> + >>> + // >>> + // We need a compiler barrier here to ensure that the compiler >>> + // does not reorder the CpuStatus and QemuSelectorMap[Idx] >>> stores. >>> + // >>> + // A store fence is not strictly necessary on x86 which has >>> + // TSO; however, both of these stores are in different >>> address spaces >>> + // so also add a Store Fence here. >>> + // >>> + MemoryFence (); >> >> (6) I wonder if this compiler barrier + comment block are helpful. >> Paraphrasing your (ex-)colleague Liran, if MMIO and IO Port accessors >> didn't contain built-in fences, all hell would break lose. We're using >> EFI_MM_CPU_IO_PROTOCOL for IO Port accesses. I think we should be safe >> ordering-wise, even without an explicit compiler barrier here. >> >> To me personally, this particular fence only muddies the picture -- >> where we already have an acquire memory fence and a store memory fence >> to couple with each other. >> >> I'd recommend removing this. (If you disagree, I'm willing to listen to >> arguments, of course!) > > You are right that we don't need a memory fence here -- given that there > is an implicit fence due to the MMIO. > > As for the compiler fence, I'm just now re-looking at handlers in > EFI_MM_CPU_IO_PROTOCOL and they do seem to include a compiler barrier. > > So I agree with you that we have all the fences that we need. However, > I do think it's a good idea to document both of these here. OK. >>> diff --git a/OvmfPkg/Library/SmmCpuFeaturesLib/SmmCpuFeaturesLib.c >>> b/OvmfPkg/Library/SmmCpuFeaturesLib/SmmCpuFeaturesLib.c >>> index 99988285b6a2..ddfef05ee6cf 100644 >>> --- a/OvmfPkg/Library/SmmCpuFeaturesLib/SmmCpuFeaturesLib.c >>> +++ b/OvmfPkg/Library/SmmCpuFeaturesLib/SmmCpuFeaturesLib.c >>> @@ -472,6 +472,37 @@ SmmCpuFeaturesRendezvousExit ( >>> // (PcdCpuMaxLogicalProcessorNumber > 1), and hot-eject is needed >>> // in this SMI exit (otherwise mCpuHotEjectData->Handler is not >>> armed.) >>> // >>> + // mCpuHotEjectData itself is stable once setup so it can be >>> + // dereferenced without needing any synchronization, >>> + // but, mCpuHotEjectData->Handler is updated on the BSP in the >>> + // ongoing SMI iteration at two places: >>> + // >>> + // - UnplugCpus() where the BSP determines if a CPU is under ejection >>> + // or not. As the comment where mCpuHotEjectData->Handler is set-up >>> + // describes any such updates are guaranteed to be >>> ordered-before the >>> + // dereference below. >>> + // >>> + // - EjectCpu() (which is called via the Handler below), on the BSP >>> + // updates mCpuHotEjectData->Handler once it is done with all >>> ejections. >>> + // >>> + // The CPU under ejection: might be executing anywhere between the >>> + // "AllCpusInSync" exit loop in SmiRendezvous() to about to >>> + // dereference the Handler field. >>> + // Given that the BSP ensures that this store only happens after >>> all >>> + // CPUs under ejection have been ejected, this CPU would never see >>> + // the after value. >>> + // (Note that any CPU that is already executing the CpuSleep() loop >>> + // below never raced any updates and always saw the before value.) >>> + // >>> + // CPUs not-under ejection: might see either value of the Handler >>> + // which is fine, because the Handler is a NOP for CPUs not-under >>> + // ejection. >>> + // >>> + // Lastly, note that we are also guaranteed that any dereferencing >>> + // CPU only sees the before or after value and not an intermediate >>> + // value. This is because mCpuHotEjectData->Handler is aligned at a >>> + // natural boundary. >>> + // >>> if (mCpuHotEjectData != NULL) { >>> CPU_HOT_EJECT_HANDLER Handler; >>> >> >> (8) I can't really put my finger on it, I just feel that repeating >> (open-coding) this wall of text here is not really productive. > > Part of the reason I wanted to document this here was to get your > opinion on it and figure out how much of it is useful and how > much might be overkill. > >> >> Do you think that, after you add the "acquire memory fence" comment in >> patch #7, we could avoid most of the text here? I think we should only >> point out (in patch #7) the "release fence" that the logic here pairs >> with.> If you really want to present it all from both perspectives, I >> guess I'm >> OK with that, but then I believe we should drop the last paragraph at >> least (see point (4)). > > Rereading it after a gap of a few days and given that most of this is > just a repeat, I'm also tending towards overkill. I think a comment > talking about acquire/release pairing is useful. Rest of it can probably > be met with just a pointer towards the comment in EjectCpus(). Does that > make sense? Yes, absolutely. Short comment + pointer to the "other half" (which has the large comment too) seem best. Thanks Laszlo -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#72210): https://edk2.groups.io/g/devel/message/72210 Mute This Topic: https://groups.io/mt/80819864/21656 Group Owner: devel+ow...@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-