> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Ferruh Yigit > Sent: Friday, June 26, 2020 2:08 PM > > On 6/25/2020 4:45 PM, Morten Brørup wrote: > > The function rte_ether_addr_copy() checks for __INTEL_COMPILER and > has a comment about "a strange gcc warning". It says: > > > > static inline void rte_ether_addr_copy(const struct rte_ether_addr > *ea_from, > > struct rte_ether_addr *ea_to) > > { > > #ifdef __INTEL_COMPILER > > uint16_t *from_words = (uint16_t *)(ea_from->addr_bytes); > > uint16_t *to_words = (uint16_t *)(ea_to->addr_bytes); > > > > to_words[0] = from_words[0]; > > to_words[1] = from_words[1]; > > to_words[2] = from_words[2]; > > #else > > /* > > * Use the common way, because of a strange gcc warning. > > */ > > *ea_to = *ea_from; > > #endif > > } > > > > I can see that from_words discards the const qualifier. Is that the > "strange" gcc warning the comment is referring to? > > > > This goes back to before the first public release of DPDK in 2013, > ref. https://git.dpdk.org/dpdk/log/lib/librte_ether/rte_ether.h > > > > > > It should be fixed as follows: > > > > - uint16_t *from_words = (uint16_t *)(ea_from->addr_bytes); > > - uint16_t *to_words = (uint16_t *)(ea_to->addr_bytes); > > + const uint16_t *from_words = (const uint16_t *)ea_from; > > + uint16_t *to_words = (uint16_t *)ea_to; > > > > And the consequential question: Is copying the three shorts faster > than copying the struct? In other words: Should we get rid of the > #ifdef and use the first method only? > > > I tried to investigate this in godbolt: https://godbolt.org/z/YSmvDn > > First I don't see the "strange" gcc warning with various gcc versions > there. > > Related to the struct copy vs word copy, struct copy seems with less > instruction > [1],[2], > my vote to remove ifdef and keep struct copy. > > > [1] copy as individual function > [1a] gcc 10.1, struct copy: > copy: > movdqa (%rsi), %xmm0 > movaps %xmm0, (%rdi) > ret > > [1b] gcc 10.1, word copy: > copy: > movzwl (%rsi), %eax > movw %ax, (%rdi) > movzwl 2(%rsi), %eax > movw %ax, 2(%rdi) > movzwl 4(%rsi), %eax > movw %ax, 4(%rdi) > ret > > [1c] icc 19.0.1, struct copy > copy: > movups (%rsi), %xmm0 #19.13 > movups %xmm0, (%rdi) #19.13 > ret > > > [2] gcc 10.1, copy as inline function that knows the data, both seems > similar > // .addr = {1, 1, 1, 1, 1, 1}, > [2a] struct copy: > ... > movl $257, %eax > movw %ax, 4(%rsp) > leaq 16(%rsp), %rdi > movl $16843009, (%rsp) > movdqa (%rsp), %xmm0 > movaps %xmm0, 16(%rsp) > ... > > [2b] word copy: > movl $257, %eax > movq %rsp, %rdi > movw %ax, 4(%rsp) > movl $16843009, (%rsp) >
Thank you for the detailed response, Ferruh. I didn't know about godbolt, so thank you for that reference too. The address struct is 2 byte aligned, not 16 byte aligned. Modifying your test in godbolt to use 2 byte alignment gives a similar result, i.e. fewer instructions on both icc and gcc. [1c-modified] icc 19.0.1, struct copy copy: movl (%rsi), %eax #19.13 movl %eax, (%rdi) #19.13 movzwl 4(%rsi), %edx #19.13 movw %dx, 4(%rdi) #19.13 ret #28.1 [1d-modified] icc 19.0.1, word copy copy: movzwl (%rsi), %eax #24.12 movw %ax, (%rdi) #24.5 movzwl 2(%rsi), %edx #25.12 movw %dx, 2(%rdi) #25.5 movzwl 4(%rsi), %ecx #26.12 movw %cx, 4(%rdi) #26.5 ret #28.1 Testing for ARM64 on godbolt gives a similar result: more instructions using word copy than struct copy. In conclusion, I will proceed with the struct copy.