On 2021-02-02 6:00 a.m., Laszlo Ersek wrote:
On 02/01/21 21:12, Ankur Arora wrote:
On 2021-02-01 11:08 a.m., Laszlo Ersek wrote:
apologies, I've got more comments here:
On 01/29/21 01:59, Ankur Arora wrote:
/**
+ CPU Hot-eject handler, called from SmmCpuFeaturesRendezvousExit(),
+ on each CPU at exit from SMM.
+
+ If, the executing CPU is not being ejected, nothing to be done.
+ If, the executing CPU is being ejected, wait in a CpuDeadLoop()
+ until ejected.
+
+ @param[in] ProcessorNum Index of executing CPU.
+
+**/
+VOID
+EFIAPI
+CpuEject (
+ IN UINTN ProcessorNum
+ )
+{
+ //
+ // APIC ID is UINT32, but mCpuHotEjectData->ApicIdMap[] is UINT64
+ // so use UINT64 throughout.
+ //
+ UINT64 ApicId;
+
+ ApicId = mCpuHotEjectData->ApicIdMap[ProcessorNum];
+ if (ApicId == CPU_EJECT_INVALID) {
+ return;
+ }
+
+ //
+ // CPU(s) being unplugged get here from
SmmCpuFeaturesSmiRendezvousExit()
+ // after having been cleared to exit the SMI by the monarch and
thus have
+ // no SMM processing remaining.
+ //
+ // Given that we cannot allow them to escape to the guest, we pen
them
+ // here until the SMM monarch tells the HW to unplug them.
+ //
+ CpuDeadLoop ();
+}
(15) There is no such function as SmmCpuFeaturesSmiRendezvousExit() --
it's SmmCpuFeaturesRendezvousExit().
(16) This function uses a data structure for communication between BSP
and APs -- mCpuHotEjectData->ApicIdMap is modified in UnplugCpus() on
the BSP, and checked above by the APs (too).
What guarantees the visibility of mCpuHotEjectData->ApicIdMap?
I was banking on SmiRendezvous() explicitly signalling that all
processing on the BSP was done before any AP will look at
mCpuHotEjectData in SmmCpuFeaturesRendezvousExit().
1716 //
1717 // Wait for BSP's signal to exit SMI
1718 //
1719 while (*mSmmMpSyncData->AllCpusInSync) {
1720 CpuPause ();
1721 }
1722 }
1723
1724 Exit:
1725 SmmCpuFeaturesRendezvousExit (CpuIndex);
Right; it's a general pattern in edk2: volatile UINT8 (aka BOOLEAN)
objects are considered atomic. (See
SMM_DISPATCHER_MP_SYNC_DATA.AllCpusInSync -- it's a pointer to a
volatile BOOLEAN.)
But our UINT64 values are neither volatile nor UINT8, and I got suddenly
doubtful about "AllCpusInSync" working as a multiprocessor barrier.
(I could be unjustifiedly worried, as a bunch of other fields in
SMM_DISPATCHER_MP_SYNC_DATA are volatile, wider than UINT8, and *not*
accessed with InterlockedCompareExchageXx().)
Thanks for pointing me to this code. There's a curious comment in
about making this structure uncache-able in the declaration here
(though I couldn't figure out how that is done):
418 typedef struct {
419 //
420 // Pointer to an array. The array should be located immediately after
this structure
421 // so that UC cache-ability can be set together.
422 //
423 SMM_CPU_DATA_BLOCK *CpuData;
424 volatile UINT32 *Counter;
425 volatile UINT32 BspIndex;
426 volatile BOOLEAN *InsideSmm;
427 volatile BOOLEAN *AllCpusInSync;
428 volatile SMM_CPU_SYNC_MODE EffectiveSyncMode;
429 volatile BOOLEAN SwitchBsp;
430 volatile BOOLEAN *CandidateBsp;
431 EFI_AP_PROCEDURE StartupProcedure;
432 VOID *StartupProcArgs;
433 } SMM_DISPATCHER_MP_SYNC_DATA;
Also, is there an expectation that these fields (at least some of
them) switch over when a new leader is chosen?
Otherwise I'm not sure why for instance, AllCpusInSync would be
a pointer.
I think we might want to use InterlockedCompareExchange64() in both
EjectCpu() and UnplugCpus() (and make "ApicIdMap" volatile, in
addition). InterlockedCompareExchange64() can be used just for
comparison as well, by passing ExchangeValue=CompareValue.
Speaking specifically about the ApicIdMap, I'm not sure I fully
agree (assuming my comment just above is correct.)
The only AP (reader) ApicIdMap deref is here:
CpuEject():
218 ApicId = mCpuHotEjectData->ApicIdMap[ProcessorNum];
For the to-be-ejected-AP, this value can only move from
valid-APIC-ID (=> wait in CpuDeadLoop()) -> CPU_EJECT_INVALID.
Given that, by the time the worker does the write on line 254, this
AP is guaranteed to be dead already, I don't think there's any
scenario where the to-be-ejected-AP can see anything other than
a valid-APIC-ID.
The scenario I had in mind was different: what guarantees that the
effect of
375 mCpuHotEjectData->ApicIdMap[ProcessorNum] = (UINT64)RemoveApicId;
which is performed by the BSP in UnplugCpus(), is visible by the AP on
line 218 (see your quote above)?
What if the AP gets to line 218 before the BSP's write on line 375
*propagates* sufficiently?
I understand. That does make sense. And, as you said elsewhere, a real
memory fence would come in useful here.
We could use AsmCpuid() as a poor man's mfence, but that seems overkill
given that x86 at least guarantees store-order.
Ankur
There's no question that the BSP writes before the AP reads, but I'm
uncertain if that suffices for the *effect* of the write to be visible
to the AP. My concern is not whether the AP sees a partial vs. a settled
update; my concern is if the AP could see an entirely *stale* value.
The consequence of that problem would be that an AP that the BSP were
about to eject would return from CpuEject() to
SmmCpuFeaturesRendezvousExit() to SmiRendezvous().
... I guess that volatile-qualifying both CPU_HOT_EJECT_DATA, and the
array pointed-to by CPU_HOT_EJECT_DATA.ApicIdMap, should suffice. In
combination with the sync-up point that you quoted. This seems to match
existing practice in PiSmmCpuDxeSmm -- there are no concurrent accesses,
so atomicity is not a concern, and serializing the instruction streams
coarsely, with the sync-up, in combination with volatile accesses,
should presumably guarantee visibility (on x86 anyway).
Thanks
Laszlo
241 QemuCpuhpWriteCpuSelector (mMmCpuIo, (APIC_ID) RemoveApicId);
242 QemuCpuhpWriteCpuStatus (mMmCpuIo, QEMU_CPUHP_STAT_EJECTED);
243
244 //
245 // Compiler barrier to ensure the next store isn't reordered
246 //
247 MemoryFence ();
248
249 //
250 // Clear the eject status for CpuIndex to ensure that an
invalid
251 // SMI later does not end up trying to eject it or a newly
252 // hotplugged CpuIndex does not go into the dead loop.
253 //
254 mCpuHotEjectData->ApicIdMap[CpuIndex] = CPU_EJECT_INVALID;
For APs that are not being ejected, they will always see
CPU_EJECT_INVALID
since the writer never changes that.
The one scenario in which bad things could happen is if entries in the
ApicIdMap are unaligned (or if the compiler or cpu-arch tears aligned
writes).
(17) I think a similar observation applies to the "Handler" field too,
as APs call it, while the BSP keeps flipping it between NULL and a real
function address. We might have to turn that field into an
From a real function address, to NULL is the problem part right?
(Same argument as above for the transition in UnplugCpus() from
NULL -> function-address.)
EFI_PHYSICAL_ADDRESS (just a fancy name for UINT64), and use
InterlockedCompareExchange64() again.
AFAICS, these are the problematic derefs:
SmmCpuFeaturesRendezvousExit():
450 if (mCpuHotEjectData == NULL ||
451 mCpuHotEjectData->Handler == NULL) {
452 return;
and problematic assignments:
266 //
267 // We are done until the next hot-unplug; clear the handler.
268 //
269 mCpuHotEjectData->Handler = NULL;
270 return;
271 }
Here as well, I've been banking on aligned writes such that the APs would
only see the before or after value not an intermediate value.
Thanks
Ankur
Thanks
Laszlo
-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#71094): https://edk2.groups.io/g/devel/message/71094
Mute This Topic: https://groups.io/mt/80199926/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-