> From: Mattias Rönnblom [mailto:mattias.ronnb...@ericsson.com]
> Sent: Wednesday, 24 July 2024 09.54

Which packet mix was used for your tests? Synthetic IMIX, or some live data?

> +/* The code generated by GCC (and to a lesser extent, clang) with just
> + * a straight memcpy() to copy packets is less than optimal on Intel
> + * P-cores, for small packets. Thus the need of this specialized
> + * memcpy() in builds where use_cc_memcpy is set to true.
> + */
> +#if defined(RTE_USE_CC_MEMCPY) && defined(RTE_ARCH_X86_64)
> +static __rte_always_inline void
> +pktcpy(void *restrict in_dst, const void *restrict in_src, size_t len)
> +{
> +     void *dst = __builtin_assume_aligned(in_dst, 16);
> +     const void *src = __builtin_assume_aligned(in_src, 16);
> +
> +     if (len <= 256) {
> +             size_t left;
> +
> +             for (left = len; left >= 32; left -= 32) {
> +                     memcpy(dst, src, 32);
> +                     dst = RTE_PTR_ADD(dst, 32);
> +                     src = RTE_PTR_ADD(src, 32);
> +             }
> +
> +             memcpy(dst, src, left);
> +     } else

Although the packets within a burst often have similar size, I'm not sure you 
can rely on the dynamic branch predictor here.

Looking at the ethdev packet size counters at an ISP (at the core of their 
Layer 3 network), 71 % are 256 byte or larger [1].

For static branch prediction, I would consider > 256 more likely and swap the 
two branches, i.e. compare (len > 256) instead of (len <= 256).

But again: I don't know how the dynamic branch predictor behaves here. Perhaps 
my suggested change makes no difference.

> +             memcpy(dst, src, len);
> +}

With or without suggested change,
Acked-by: Morten Brørup <m...@smartsharesystems.com>


[1]: Details (incl. one VLAN tag)
tx_size_64_packets            1,1 %
tx_size_65_to_127_packets    25,7 %
tx_size_128_to_255_packets    2,6 %
tx_size_256_to_511_packets    1,4 %
tx_size_512_to_1023_packets   1,7 %
tx_size_1024_to_1522_packets 67,6 %

Reply via email to