Hello Siva, On Mon, Oct 9, 2023 at 4:06 PM Sivaprasad Tummala <sivaprasad.tumm...@amd.com> wrote: > > mwaitx allows EPYC processors to enter a implementation dependent > power/performance optimized state (C1 state) for a specific period > or until a store to the monitored address range. > > Signed-off-by: Sivaprasad Tummala <sivaprasad.tumm...@amd.com> > Acked-by: Anatoly Burakov <anatoly.bura...@intel.com> > ---
Please put some changelog to make life easier for reviewers (and me). I diffed with the previous version to check what had been changed and I see this: static void amd_mwaitx(const uint64_t timeout) ... - "c"(2), /* enable timer */ - "b"(timeout)); + "c"(0)); /* no time-out */ I will take this series as is, but please confirm why this change was needed. > lib/eal/x86/rte_power_intrinsics.c | 108 ++++++++++++++++++++++------- > 1 file changed, 84 insertions(+), 24 deletions(-) > > diff --git a/lib/eal/x86/rte_power_intrinsics.c > b/lib/eal/x86/rte_power_intrinsics.c > index 664cde01e9..0d2953f570 100644 > --- a/lib/eal/x86/rte_power_intrinsics.c > +++ b/lib/eal/x86/rte_power_intrinsics.c > @@ -17,6 +17,78 @@ static struct power_wait_status { > volatile void *monitor_addr; /**< NULL if not currently sleeping */ > } __rte_cache_aligned wait_status[RTE_MAX_LCORE]; > > +/** > + * This functions uses UMONITOR/UMWAIT instructions and will enter C0.2 > state. Fixed while applying, function* > + * For more information about usage of these instructions, please refer to > + * Intel(R) 64 and IA-32 Architectures Software Developer's Manual. > + */ > +static void intel_umonitor(volatile void *addr) > +{ > +#if defined(RTE_TOOLCHAIN_MSVC) || defined(__WAITPKG__) > + /* cast away "volatile" when using the intrinsic */ > + _umonitor((void *)(uintptr_t)addr); > +#else > + /* > + * we're using raw byte codes for compiler versions which > + * don't support this instruction natively. > + */ > + asm volatile(".byte 0xf3, 0x0f, 0xae, 0xf7;" > + : > + : "D"(addr)); > +#endif > +} > + > +static void intel_umwait(const uint64_t timeout) > +{ > + const uint32_t tsc_l = (uint32_t)timeout; > + const uint32_t tsc_h = (uint32_t)(timeout >> 32); > +#if defined(RTE_TOOLCHAIN_MSVC) || defined(__WAITPKG__) > + _umwait(tsc_l, tsc_h); > +#else > + asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf7;" > + : /* ignore rflags */ > + : "D"(0), /* enter C0.2 */ > + "a"(tsc_l), "d"(tsc_h)); > +#endif > +} > + > +/** > + * This functions uses MONITORX/MWAITX instructions and will enter C1 state. > + * For more information about usage of these instructions, please refer to > + * AMD64 Architecture Programmer’s Manual. > + */ > +static void amd_monitorx(volatile void *addr) > +{ > +#if defined(__MWAITX__) This part probably breaks build with MSVC. I could not determine whether MSVC supports this intrinsic. I'll rely on Tyler to fix it later. Series applied. -- David Marchand