On 02/04/21 03:49, Ankur Arora wrote: > On 2021-02-03 12:58 p.m., Laszlo Ersek wrote: >> On 02/03/21 07:45, Ankur Arora wrote: >>> On 2021-02-02 6:15 a.m., Laszlo Ersek wrote: >>>> On 02/02/21 15:00, Laszlo Ersek wrote: >>>> >>>>> ... I guess that volatile-qualifying both CPU_HOT_EJECT_DATA, and the >>>>> array pointed-to by CPU_HOT_EJECT_DATA.ApicIdMap, should suffice. In >>>>> combination with the sync-up point that you quoted. This seems to >>>>> match >>>>> existing practice in PiSmmCpuDxeSmm -- there are no concurrent >>>>> accesses, >>>>> so atomicity is not a concern, and serializing the instruction streams >>>>> coarsely, with the sync-up, in combination with volatile accesses, >>>>> should presumably guarantee visibility (on x86 anyway). >>>> >>>> To summarize, this is what I would ask for: >>>> >>>> - make CPU_HOT_EJECT_DATA volatile >>>> >>>> - make (*CPU_HOT_EJECT_DATA.ApicIdMap) volatile >>>> >>>> - after storing something to CPU_HOT_EJECT_DATA or >>>> CPU_HOT_EJECT_DATA.ApicIdMap on the BSP, execute a MemoryFence() >>>> >>>> - before fetching something from CPU_HOT_EJECT_DATA or >>>> CPU_HOT_EJECT_DATA.ApicIdMap on an AP, execute a MemoryFence() >>>> >>>> >>>> Except: MemoryFence() isn't a *memory fence* in fact. >>>> >>>> See "MdePkg/Library/BaseLib/X64/GccInline.c". >>>> >>>> It's just a compiler barrier, which may not add anything beyond what >>>> we'd already have from "volatile". >>>> >>>> Case in point: PiSmmCpuDxeSmm performs heavy multi-processing, but does >>>> not contain a single invocation of MemoryFence(). It uses volatile >>>> objects, and a handful of InterlockedCompareExchangeXx() calls, for >>>> implementing semaphores. (NB: there is no 8-bit variant of >>>> InterlockedCompareExchange(), as "volatile UINT8" is considered atomic >>>> in itself, and a suitable basis for a sempahore too.) And given the >>>> synchronization from those semaphores, PiSmmCpuDpxeSmm trusts that >>>> updates to the *other* volatile objects are both atomic and visible. >>>> >>>> I'm pretty sure this only works because x86 is in-order. There are >>>> instruction stream barriers in place, and compiler barriers too, but no >>>> actual memory barriers. >>> >>> Right and just to separate them explicitly, there are two problems: >>> >>> - compiler: where the compiler caches stuff in or looks at stale >>> memory >>> locations. Now, AFAICS, this should not happen because the ApicIdMap >>> would >>> never change once set so the compiler should reasonably be able to cache >>> the address of ApicIdMap and dereference it (thus obviating the need for >>> volatile.) >> >> (CPU_HOT_EJECT_DATA.Handler does change though.) > > Yeah, I did kinda elide over that. Let me think this through in v7 > and add more explicit comments and then we can see if it still looks > fishy?
OK. (Clearly, I don't want to block progress on this with concerns that are purely theoretical.) Thanks, Laszlo > Thanks > Ankur > >> >>> The compiler could, however, cache any assignments to ApicIdMap[Idx] >>> (especially given LTO) and we could use a MemoryFence() (as the compiler >>> barrier that it is) to force the store. >>> >>> - CPU pipeline: as you said, right now we basically depend on x86 >>> store >>> order semantics (and the CpuPause() loop around AllCpusInSync, kinda >>> provides >>> a barrier.) >>> >>> So the BSP writes in this order: >>> ApicIdMap[Idx]=x; ... ->AllCpusInSync = true >>> >>> And whenever the AP sees ->AllCpusInSync == True, it has to now see >>> ApicIdMap[Idx] == x. >> >> Yes. >> >>> >>> Maybe the thing to do is to detail this barrier in a commit >>> note/comment? >> >> That would be nice. >> >>> And add the MemoryFence() but not the volatile. >> >> Yes, that should work. >> >> Thanks, >> Laszlo >> > -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#71231): https://edk2.groups.io/g/devel/message/71231 Mute This Topic: https://groups.io/mt/80199926/21656 Group Owner: devel+ow...@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-