On Fri, Apr 05, 2024 at 02:46:28PM +0200, Morten Brørup wrote: > When the rte_memcpy() size is 16, the same 16 bytes are copied twice. > In the case where the size is known to be 16 at build tine, omit the > duplicate copy. > > Reduced the amount of effectively copy-pasted code by using #ifdef > inside functions instead of outside functions. > > Suggested-by: Stephen Hemminger <step...@networkplumber.org> > Signed-off-by: Morten Brørup <m...@smartsharesystems.com> > Acked-by: Bruce Richardson <bruce.richard...@intel.com> > --- > v3: > * AVX2 is a superset of AVX; > for a block of AVX code, testing for AVX suffices. (Bruce Richardson) > * Define RTE_MEMCPY_AVX if AVX is available, to avoid copy-pasting the > check for older GCC version. (Bruce Richardson) > v2: > * For GCC, version 11 is required for proper AVX handling; > if older GCC version, treat AVX as SSE. > Clang does not have this issue. > Note: Original code always treated AVX as SSE, regardless of compiler. > * Do not add copyright. (Stephen Hemminger) > --- > lib/eal/x86/include/rte_memcpy.h | 234 ++++++++----------------------- > 1 file changed, 59 insertions(+), 175 deletions(-) > > diff --git a/lib/eal/x86/include/rte_memcpy.h > b/lib/eal/x86/include/rte_memcpy.h > index 72a92290e0..b56bc46713 100644 > --- a/lib/eal/x86/include/rte_memcpy.h > +++ b/lib/eal/x86/include/rte_memcpy.h > @@ -27,6 +27,11 @@ extern "C" { > #pragma GCC diagnostic ignored "-Wstringop-overflow" > #endif > > +/* GCC prior to version 11 doesn't compile AVX properly, so use SSE instead. > */ > +#if defined __AVX__ && !(defined(RTE_TOOLCHAIN_GCC) && (GCC_VERSION < > 110000)) > +#define RTE_MEMCPY_AVX > +#endif > +
Strictly speaking, to have the same behaviour as before, you need to check for AVX2 also, since the issue with GCC < 11 is for (AVX && !AVX2), i.e. if AVX2 is supported, all compilers are fine. My suggestion: #ifdef __AVX2__ #define RTE_MEMCPY_AVX #elif defined __AVX__ && !(defined(RTE_TOOLCHAIN_GCC) && (GCC_VERSION < 110000)) #define RTE_MEMCPY_AVX #endif You can obviously merge the two branches if you want, but I find the split slightly easier to follow, than a mix of && and || with brackets for precedence. Final alternative I see, you can change defined(RTE_MEMCPY_AVX) to "defined(__AVX2__) || defined(RTE_MEMCPY_AVX)" each place it's used. /Bruce