Hi, Thanks for the detailed feedback.
I did consider moving the CPUID check up a level so AsmLfence() would always do the requested instruction. This would have been a larger patch set that would introduce an IA32 and X64 specific version of SpeculationBarrier(). I agree that a build time PCD would is a better approach. It should have a default value of enabled, and only platforms that use CPUs that do not support LFENCE would avoid the call to AsmLfence() and use an alternate serializing instruction. Here is the proposed PCD: [PcdsFixedAtBuild] ## Indicates the type of instruction sequence to use # for a speculation barrier. The default instruction # sequence is LFENCE.<BR> # 0x00 - No operation.<BR> # 0x01 - LFENCE (IA32/X64).<BR> # 0x02 - CPUID (IA32/X64).<BR> # Other - reserved # @Prompt Speculation Barrier Type. gEfiMdePkgTokenSpaceGuid.PcdSpeculationBarrierType|0x01|UINT8|0x30001018 Best regards, Mike > -----Original Message----- > From: Gao, Liming > Sent: Monday, April 29, 2019 7:10 AM > To: devel@edk2.groups.io; ler...@redhat.com; > brian.john...@hpe.com; Kinney, Michael D > <michael.d.kin...@intel.com> > Subject: RE: [edk2-devel] [Patch 1/4] MdePkg/BaseLib: > Verify SSE2 support in IA32 AsmLfence() > > > -----Original Message----- > > From: devel@edk2.groups.io > [mailto:devel@edk2.groups.io] On Behalf Of Laszlo Ersek > > Sent: Monday, April 29, 2019 7:15 PM > > To: devel@edk2.groups.io; brian.john...@hpe.com; > Kinney, Michael D <michael.d.kin...@intel.com> > > Cc: Gao, Liming <liming....@intel.com> > > Subject: Re: [edk2-devel] [Patch 1/4] MdePkg/BaseLib: > Verify SSE2 support in IA32 AsmLfence() > > > > On 04/26/19 23:08, Brian J. Johnson wrote: > > > On 4/26/19 3:27 PM, Laszlo Ersek wrote: > > >> On 04/25/19 19:53, Michael D Kinney wrote: > > >>> Use CPUID in IA32 implementation of AsmLfence() to > verify > > >>> that SSE2 is supported before using the LFENCE > instruction. > > >>> > > >>> Cc: Liming Gao <liming....@intel.com> > > >>> Signed-off-by: Michael D Kinney > <michael.d.kin...@intel.com> > > >>> --- > > >>> MdePkg/Library/BaseLib/Ia32/Lfence.nasm | 14 > +++++++++++++- > > >>> 1 file changed, 13 insertions(+), 1 deletion(-) > > >>> > > >>> diff --git > a/MdePkg/Library/BaseLib/Ia32/Lfence.nasm > > >>> b/MdePkg/Library/BaseLib/Ia32/Lfence.nasm > > >>> index 44478be35f..0a60ae1d57 100644 > > >>> --- a/MdePkg/Library/BaseLib/Ia32/Lfence.nasm > > >>> +++ b/MdePkg/Library/BaseLib/Ia32/Lfence.nasm > > >>> @@ -1,5 +1,5 @@ > > >>> > > >>> ;------------------------------------------------- > ----------------------------- > > >>> ; > > >>> -; Copyright (c) 2018, Intel Corporation. All > rights reserved.<BR> > > >>> +; Copyright (c) 2018 - 2019, Intel Corporation. > All rights > > >>> reserved.<BR> > > >>> ; SPDX-License-Identifier: BSD-2-Clause-Patent > > >>> ; > > >>> ; Module Name: > > >>> @@ -26,5 +26,17 @@ > > >>> > > >>> ;------------------------------------------------- > ----------------------------- > > >>> > > >>> global ASM_PFX(AsmLfence) > > >>> ASM_PFX(AsmLfence): > > >>> + ; > > >>> + ; Use CPUID instruction > (CPUID.01H:EDX.SSE2[bit 26] = 1) to test > > >>> whether the > > >>> + ; processor supports SSE2 > instruction. Save/restore > > >>> non-volatile register > > >>> + ; EBX that is modified by CPUID > > >>> + ; > > >>> + push ebx > > >>> + mov eax, 1 > > >>> + cpuid > > >>> + bt edx, 26 > > >>> + jnc Done > > >>> lfence > > >>> +Done: > > >>> + pop ebx > > >>> ret > > >>> > > >> > > >> The SDM seems to confirm that lfence depends on > SSE2. > > >> > > >> However, that raises another question: > > >> > > >> AsmLfence() is used for implementing > SpeculationBarrier() on IA32/X64. > > >> And so I wonder: the plaforms where lfence is > unavailable, do they *not* > > >> need a speculation barrier at all, or do they need > a *different* > > >> implementation? > > >> > > >> Thanks, > > >> Laszlo > > > > > > Several issues: > > > > > > This patch introduces stronger fencing than is > required. The SDM says, > > > "Reads or writes cannot be reordered with I/O > instructions, locked > > > instructions, or serializing instructions." (vol 3a, > sec 8.2.2.) The > > > cpuid instruction is a "serializing instruction" > (sec 8.3). So the > > > cpuid is essentially a load fence plus a store fence > (plus other > > > functionality, such as draining the memory > queues.) That makes the > > > lfence following it redundant. > > > > > > Cpuid is a fairly heavyweight instruction due to its > serializing > > > behavior. It provides the load fencing semantics of > lfence, but can > > > introduce a significant performance hit if it's > called often. Lfence is > > > a lot lighter weight instruction. So using cpuid in > AsmLfence may make > > > it a lot slower than the caller expects. > > > > > > Also, skipping a fencing instruction if it's not > supported doesn't seem > > > like the right approach in any case. The caller > expects AsmLfence to > > > provide certain fencing semantics... the > implementation isn't free to > > > just do nothing (unless it's on an architecture > where load fencing is > > > not required, since memory is always > ordered.) Plus, the routine is > > > called "AsmLfence", so I'd expect it to always use > lfence, and cause an > > > exception if the instruction isn't implemented. I'd > think the callers > > > should be modified to call AsmLfence or routines > using other > > > instructions (such as cpuid) to provide fencing > according to the CPU > > > architecture they are on. > > > > That appears the right approach to me -- let AsmLfence > do what its name > > says, and let platforms pick the right implementation > for > > SpeculationBarrier -- possibly at build time, possibly > at boot time. > > > > > Is there a way to make this a compile-time > decision? I haven't tried to > > > track down all the callers of AsmLfence.... > > > > Right now AsmLfence is only called from > > "MdePkg/Library/BaseLib/X86SpeculationBarrier.c". > > > This is a good suggestion to decide it at build time. > One PCD can be introduced to control its logic. > > SpeculationBarrier() is called in SMM drivers / lib > instances: > > - FaultTolerantWriteSmm, > FaultTolerantWriteStandaloneMm > > - SmmLockBoxLib > > - VariableSmm, VariableStandaloneMm > > - PiSmmCpuDxeSmm > > > > Given that there's a mode switch anyway, the runtime > cost of a simple > > CPUID-based implementation might be tolerable. > > > > Thanks, > > Laszlo > > > > -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#39796): https://edk2.groups.io/g/devel/message/39796 Mute This Topic: https://groups.io/mt/31345225/21656 Group Owner: devel+ow...@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-