On Sat, 2 Mar 2024 21:40:03 -0800 Stephen Hemminger <step...@networkplumber.org> wrote:
> On Sun, 3 Mar 2024 00:48:12 +0100 > Morten Brørup <m...@smartsharesystems.com> wrote: > > > When the rte_memcpy() size is 16, the same 16 bytes are copied twice. > > In the case where the size is knownto be 16 at build tine, omit the > > duplicate copy. > > > > Reduced the amount of effectively copy-pasted code by using #ifdef > > inside functions instead of outside functions. > > > > Suggested-by: Stephen Hemminger <step...@networkplumber.org> > > Signed-off-by: Morten Brørup <m...@smartsharesystems.com> > > --- > > Looks good, let me see how it looks in goldbolt vs Gcc. > > One other issue is that for the non-constant case, rte_memcpy has an > excessively > large inline code footprint. That is one of the reasons Gcc doesn't always > inline. For > 128 bytes, it really should be a function. For size of 4,6,8,16, 32, 64, up to 128 Gcc inline and rte_memcpy match. For size 128. It looks gcc is simpler. rte_copy_addr: vmovdqu ymm0, YMMWORD PTR [rsi] vextracti128 XMMWORD PTR [rdi+16], ymm0, 0x1 vmovdqu XMMWORD PTR [rdi], xmm0 vmovdqu ymm0, YMMWORD PTR [rsi+32] vextracti128 XMMWORD PTR [rdi+48], ymm0, 0x1 vmovdqu XMMWORD PTR [rdi+32], xmm0 vmovdqu ymm0, YMMWORD PTR [rsi+64] vextracti128 XMMWORD PTR [rdi+80], ymm0, 0x1 vmovdqu XMMWORD PTR [rdi+64], xmm0 vmovdqu ymm0, YMMWORD PTR [rsi+96] vextracti128 XMMWORD PTR [rdi+112], ymm0, 0x1 vmovdqu XMMWORD PTR [rdi+96], xmm0 vzeroupper ret copy_addr: vmovdqu ymm0, YMMWORD PTR [rsi] vmovdqu YMMWORD PTR [rdi], ymm0 vmovdqu ymm1, YMMWORD PTR [rsi+32] vmovdqu YMMWORD PTR [rdi+32], ymm1 vmovdqu ymm2, YMMWORD PTR [rsi+64] vmovdqu YMMWORD PTR [rdi+64], ymm2 vmovdqu ymm3, YMMWORD PTR [rsi+96] vmovdqu YMMWORD PTR [rdi+96], ymm3 vzeroupper ret