On Fri, Jun 26, 2020 at 6:05 PM Morten Brørup <m...@smartsharesystems.com> wrote: > > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Ferruh Yigit > > Sent: Friday, June 26, 2020 2:08 PM > > > > On 6/25/2020 4:45 PM, Morten Brørup wrote: > > > The function rte_ether_addr_copy() checks for __INTEL_COMPILER and > > has a comment about "a strange gcc warning". It says: > > > > > > static inline void rte_ether_addr_copy(const struct rte_ether_addr > > *ea_from, > > > struct rte_ether_addr *ea_to) > > > { > > > #ifdef __INTEL_COMPILER > > > uint16_t *from_words = (uint16_t *)(ea_from->addr_bytes); > > > uint16_t *to_words = (uint16_t *)(ea_to->addr_bytes); > > > > > > to_words[0] = from_words[0]; > > > to_words[1] = from_words[1]; > > > to_words[2] = from_words[2]; > > > #else > > > /* > > > * Use the common way, because of a strange gcc warning. > > > */ > > > *ea_to = *ea_from; > > > #endif > > > } > > > > > > I can see that from_words discards the const qualifier. Is that the > > "strange" gcc warning the comment is referring to? > > > > > > This goes back to before the first public release of DPDK in 2013, > > ref. https://git.dpdk.org/dpdk/log/lib/librte_ether/rte_ether.h > > > > > > > > > It should be fixed as follows: > > > > > > - uint16_t *from_words = (uint16_t *)(ea_from->addr_bytes); > > > - uint16_t *to_words = (uint16_t *)(ea_to->addr_bytes); > > > + const uint16_t *from_words = (const uint16_t *)ea_from; > > > + uint16_t *to_words = (uint16_t *)ea_to; > > > > > > And the consequential question: Is copying the three shorts faster > > than copying the struct? In other words: Should we get rid of the > > #ifdef and use the first method only? > > > > > > I tried to investigate this in godbolt: https://godbolt.org/z/YSmvDn > > > > First I don't see the "strange" gcc warning with various gcc versions > > there. > > > > Related to the struct copy vs word copy, struct copy seems with less > > instruction > > [1],[2], > > my vote to remove ifdef and keep struct copy. > > > > > > [1] copy as individual function > > [1a] gcc 10.1, struct copy: > > copy: > > movdqa (%rsi), %xmm0 > > movaps %xmm0, (%rdi) > > ret > > > > [1b] gcc 10.1, word copy: > > copy: > > movzwl (%rsi), %eax > > movw %ax, (%rdi) > > movzwl 2(%rsi), %eax > > movw %ax, 2(%rdi) > > movzwl 4(%rsi), %eax > > movw %ax, 4(%rdi) > > ret > > > > [1c] icc 19.0.1, struct copy > > copy: > > movups (%rsi), %xmm0 #19.13 > > movups %xmm0, (%rdi) #19.13 > > ret > > > > > > [2] gcc 10.1, copy as inline function that knows the data, both seems > > similar > > // .addr = {1, 1, 1, 1, 1, 1}, > > [2a] struct copy: > > ... > > movl $257, %eax > > movw %ax, 4(%rsp) > > leaq 16(%rsp), %rdi > > movl $16843009, (%rsp) > > movdqa (%rsp), %xmm0 > > movaps %xmm0, 16(%rsp) > > ... > > > > [2b] word copy: > > movl $257, %eax > > movq %rsp, %rdi > > movw %ax, 4(%rsp) > > movl $16843009, (%rsp) > > > > Thank you for the detailed response, Ferruh. > > I didn't know about godbolt, so thank you for that reference too. > > The address struct is 2 byte aligned, not 16 byte aligned. Modifying your > test in godbolt to use 2 byte alignment gives a similar result, i.e. fewer > instructions on both icc and gcc. > > [1c-modified] icc 19.0.1, struct copy > > copy: > movl (%rsi), %eax #19.13 > movl %eax, (%rdi) #19.13 > movzwl 4(%rsi), %edx #19.13 > movw %dx, 4(%rdi) #19.13 > ret #28.1 > > [1d-modified] icc 19.0.1, word copy > copy: > movzwl (%rsi), %eax #24.12 > movw %ax, (%rdi) #24.5 > movzwl 2(%rsi), %edx #25.12 > movw %dx, 2(%rdi) #25.5 > movzwl 4(%rsi), %ecx #26.12 > movw %cx, 4(%rdi) #26.5 > ret #28.1 > > Testing for ARM64 on godbolt gives a similar result: more instructions using > word copy than struct copy. > > In conclusion, I will proceed with the struct copy.
Since you are up to changing the code, Could you add __restrict keyword for the further hint to the compiler for struct copy case? http://mails.dpdk.org/archives/dev/2020-June/169876.html