On 16/10/2024 5:10 pm, Linus Torvalds wrote: > --- a/arch/x86/lib/getuser.S > +++ b/arch/x86/lib/getuser.S > @@ -37,11 +37,14 @@ > > #define ASM_BARRIER_NOSPEC ALTERNATIVE "", "lfence", > X86_FEATURE_LFENCE_RDTSC > > +#define X86_CANONICAL_MASK ALTERNATIVE \ > + "movq $0x80007fffffffffff,%rdx", \ > + "movq $0x80ffffffffffffff,%rdx", X86_FEATURE_LA57 > + > .macro check_range size:req > .if IS_ENABLED(CONFIG_X86_64) > - mov %rax, %rdx > - sar $63, %rdx > - or %rdx, %rax > + X86_CANONICAL_MASK /* mask into %rdx */ > + and %rdx,%rax
That doesn't have the same semantics, does it? Consider userspace passing an otherwise-good pointer with bit 60 set. Previously that would have resulted in a failure, whereas now it will succeed. If AMD think it's appropriate, then what you probably want is the real branch as per before (to maintain architectural user behaviour), and then use a trick such as this one in place of the LFENCE for speed in the common case. > But in (related) news, the address masking trick really does matter: > > https://lore.kernel.org/all/202410161557.5b87225e-oliver.s...@intel.com/ > > which reports that ridiculous 6.8% improvement just from avoiding the > barrier in user_access_begin(). > > So that barrier really *is* very expensive. Surprisingly so. 7% performance is what it costs to maintain the security barrier we were sold originally. Then it's a lot of time, arguing in public, and wild guesswork over explicitly undocumented properties that the vendors would prefer not to discuss, to try and claw some of that 7% back while keeping the security barrier intact for the benefit of users who want it both secure and fast. Forgive me if I think that we (the SW people) are getting the raw end of the deal here, while vendors keep selling more and more expensive chips that don't work safely. ~Andrew