On 09.08.2025 00:20, Andrew Cooper wrote: > Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com> > --- > CC: Jan Beulich <jbeul...@suse.com> > CC: Roger Pau Monné <roger....@citrix.com> > > This is on top of the FRED series for the wrmsrns() cleanup, but otherwise > unrelated. > > The code generation isn't entirely ideal > > Function old new delta > init_fred 255 274 +19 > vmx_set_reg 248 256 +8 > enter_state_helper.cold 1014 1018 +4 > __start_xen 8893 8897 +4 > > but made worse by the the prior codegen for wrmsrns(MSR_STAR, ...) being mad: > > mov $0xc0000081,%ecx > mov $0xe023e008,%edx > movabs $0xe023e00800000000,%rax > cs wrmsr > > The two sources of code expansion come from the compiler not being able to > construct %eax and %edx separately, and not being able propagate constants. > > Loading 0 is possibly common enough to warrant another specialisation where we > can use "a" (0), "d" (0) and forgo the MOV+SHR. > > I'm probably overthinking things. The addition will be in the noise in > practice, and Intel are sure the advantage of MSR_IMM will not be.
It's not entirely clear to me what the overall effects are now with your 02/22 reply on the FRED series. Nevertheless a nit or two here. > --- a/xen/arch/x86/include/asm/msr.h > +++ b/xen/arch/x86/include/asm/msr.h > @@ -38,9 +38,46 @@ static inline void wrmsrl(unsigned int msr, uint64_t val) > wrmsr(msr, lo, hi); > } > > +/* > + * Non-serialising WRMSR with a compile-time constant index, when available. > + * Falls back to plain WRMSRNS, or to a serialising WRMSR. > + */ > +static always_inline void __wrmsrns_imm(uint32_t msr, uint64_t val) > +{ > + /* > + * For best performance, WRMSRNS %r64, $msr is recommended. For > + * compatibility, we need to fall back to plain WRMSRNS, or to WRMSR. > + * > + * The combined ABI is awkward, because WRMSRNS $imm takes a single r64, > + * whereas WRMSR{,NS} takes a split edx:eax pair. > + * > + * Always use WRMSRNS %rax, $imm, because it has the most in common with > + * the legacy forms. When MSR_IMM isn't available, emit setup logic for > + * %ecx and %edx too. > + */ > + alternative_input_2( > + "mov $%c[msr], %%ecx\n\t" Simply %[msr] here? And then, might it make sense to pull out this and ... > + "mov %%rax, %%rdx\n\t" > + "shr $32, %%rdx\n\t" > + ".byte 0x2e; wrmsr", > + > + /* WRMSRNS %rax, $msr */ > + ".byte 0xc4,0xe7,0x7a,0xf6,0xc0; .long %c[msr]", X86_FEATURE_MSR_IMM, > + > + "mov $%c[msr], %%ecx\n\t" ... this, to ... > + "mov %%rax, %%rdx\n\t" > + "shr $32, %%rdx\n\t" > + ".byte 0x0f,0x01,0xc6", X86_FEATURE_WRMSRNS, > + > + [msr] "i" (msr), "a" (val) : "rcx", "rdx"); [msr] "i" (msr), "a" (val), "c" (msr) : "rdx"); allowing the compiler to actually know what's put in %ecx? That'll make original and 2nd replacement code 10 bytes, better balancing with the 9 bytes of the 1st replacement. And I'd guess that the potentially dead MOV to %ecx would be hidden in the noise as well. Then, seeing your use of a CS: prefix on WRMSR, why not also add one to the 1st replacement, thus not requiring any NOP padding? Jan