> -----Original Message-----
> From: Wodkowski, PawelX
> Sent: Monday, January 26, 2015 10:43 PM
> To: Wang, Zhihong; dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH 4/4] lib/librte_eal: Optimized memcpy in
> arch/x86/rte_memcpy.h for both SSE and AVX platforms
>
> Hi,
>
> I must say: greate work.
>
> I have some small comments:
>
> > +/**
> > + * Macro for copying unaligned block from one location to another,
> > + * 47 bytes leftover maximum,
> > + * locations should not overlap.
> > + * Requirements:
> > + * - Store is aligned
> > + * - Load offset is <offset>, which must be immediate value within [1, 15]
> > + * - For <src>, make sure <offset> bit backwards & <16 - offset> bit
> forwards
> > are available for loading
> > + * - <dst>, <src>, <len> must be variables
> > + * - __m128i <xmm0> ~ <xmm8> must be pre-defined
> > + */
> > +#define MOVEUNALIGNED_LEFT47(dst, src, len, offset)
> > \
> > +{
> > \
> ...
> > +}
>
> Why not do { ... } while(0) or ({ ... }) ? This could have unpredictable side
> effects.
>
> Second:
> Why you completely substitute
> #define rte_memcpy(dst, src, n) \
> ({ (__builtin_constant_p(n)) ? \
> memcpy((dst), (src), (n)) : \
> rte_memcpy_func((dst), (src), (n)); })
>
> with inline rte_memcpy()? This construction can help compiler to deduce
> which version to use (static?) inline implementation or call external
> function.
>
> Did you try 'extern inline' type? It could help reducing compilation time.
Hi Pawel,
Good call on "MOVEUNALIGNED_LEFT47". Thanks!
I removed the conditional __builtin_constant_p(n) because it calls glibc memcpy
when the parameter is constant, while rte_memcpy has better performance there.
Current long compile time is caused by too many function calls, I'll fix that
in the next version.
Zhihong (John)