On 02/03/21 07:45, Ankur Arora wrote: > On 2021-02-02 6:15 a.m., Laszlo Ersek wrote: >> On 02/02/21 15:00, Laszlo Ersek wrote: >> >>> ... I guess that volatile-qualifying both CPU_HOT_EJECT_DATA, and the >>> array pointed-to by CPU_HOT_EJECT_DATA.ApicIdMap, should suffice. In >>> combination with the sync-up point that you quoted. This seems to match >>> existing practice in PiSmmCpuDxeSmm -- there are no concurrent accesses, >>> so atomicity is not a concern, and serializing the instruction streams >>> coarsely, with the sync-up, in combination with volatile accesses, >>> should presumably guarantee visibility (on x86 anyway). >> >> To summarize, this is what I would ask for: >> >> - make CPU_HOT_EJECT_DATA volatile >> >> - make (*CPU_HOT_EJECT_DATA.ApicIdMap) volatile >> >> - after storing something to CPU_HOT_EJECT_DATA or >> CPU_HOT_EJECT_DATA.ApicIdMap on the BSP, execute a MemoryFence() >> >> - before fetching something from CPU_HOT_EJECT_DATA or >> CPU_HOT_EJECT_DATA.ApicIdMap on an AP, execute a MemoryFence() >> >> >> Except: MemoryFence() isn't a *memory fence* in fact. >> >> See "MdePkg/Library/BaseLib/X64/GccInline.c". >> >> It's just a compiler barrier, which may not add anything beyond what >> we'd already have from "volatile". >> >> Case in point: PiSmmCpuDxeSmm performs heavy multi-processing, but does >> not contain a single invocation of MemoryFence(). It uses volatile >> objects, and a handful of InterlockedCompareExchangeXx() calls, for >> implementing semaphores. (NB: there is no 8-bit variant of >> InterlockedCompareExchange(), as "volatile UINT8" is considered atomic >> in itself, and a suitable basis for a sempahore too.) And given the >> synchronization from those semaphores, PiSmmCpuDpxeSmm trusts that >> updates to the *other* volatile objects are both atomic and visible. >> >> I'm pretty sure this only works because x86 is in-order. There are >> instruction stream barriers in place, and compiler barriers too, but no >> actual memory barriers. > > Right and just to separate them explicitly, there are two problems: > > - compiler: where the compiler caches stuff in or looks at stale memory > locations. Now, AFAICS, this should not happen because the ApicIdMap would > never change once set so the compiler should reasonably be able to cache > the address of ApicIdMap and dereference it (thus obviating the need for > volatile.)
(CPU_HOT_EJECT_DATA.Handler does change though.) > The compiler could, however, cache any assignments to ApicIdMap[Idx] > (especially given LTO) and we could use a MemoryFence() (as the compiler > barrier that it is) to force the store. > > - CPU pipeline: as you said, right now we basically depend on x86 store > order semantics (and the CpuPause() loop around AllCpusInSync, kinda > provides > a barrier.) > > So the BSP writes in this order: > ApicIdMap[Idx]=x; ... ->AllCpusInSync = true > > And whenever the AP sees ->AllCpusInSync == True, it has to now see > ApicIdMap[Idx] == x. Yes. > > Maybe the thing to do is to detail this barrier in a commit note/comment? That would be nice. > And add the MemoryFence() but not the volatile. Yes, that should work. Thanks, Laszlo -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#71116): https://edk2.groups.io/g/devel/message/71116 Mute This Topic: https://groups.io/mt/80199926/21656 Group Owner: devel+ow...@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-