On 2024-10-09 23:57, Stephen Hemminger wrote:
On Fri, 20 Sep 2024 12:27:16 +0200
Mattias Rönnblom <mattias.ronnb...@ericsson.com> wrote:
+#if defined(RTE_USE_CC_MEMCPY) && defined(RTE_ARCH_X86_64)
+static __rte_always_inline void
+pktcpy(void *restrict in_dst, const void *restrict in_src, size_t len)
+{
+ void *dst = __builtin_assume_aligned(in_dst, 16);
+ const void *src = __builtin_assume_aligned(in_src, 16);
Not sure if buffer is really aligned that way but x86 doesn't care.
I think it might care, actually. That's why this makes a difference.
With 16-byte alignment assumed, the compiler may use MOVDQA, otherwise,
it can't and must use MOVDQU. Generally these things doesn't matter from
a performance point of view in my experience, but it this case it did
(in my benchmark, on my CPU, with my compiler etc).
Since src and dst can be pointers into mbuf at an offset.
The offset will be a multiple of the buffer len.