> -----Original Message----- > From: Yigit, Ferruh <ferruh.yi...@intel.com> > Sent: Friday, June 26, 2020 4:54 PM > To: Van Haaren, Harry <harry.van.haa...@intel.com>; Morten Brørup > <m...@smartsharesystems.com>; dev@dpdk.org > Cc: Olivier Matz <olivier.m...@6wind.com>; Ananyev, Konstantin > <konstantin.anan...@intel.com> > Subject: Re: [dpdk-dev] rte_ether_addr_copy() strange comment > > On 6/26/2020 1:41 PM, Van Haaren, Harry wrote: > >> -----Original Message-----
<snip serious conversation> > > PS: For extra bonus points, here's a SIMD version that only uses one store > > https://godbolt.org/z/VAR2La. Unless you intend on copying billions of > > L1 resident eth addrs, this may or may not be a useful optimization. > > Note that it requires the 10 bytes after the ether addr to be valid to read. > > It loads 16B across both SRC and DST, blends 48 bits of SRC into DST and > > writes the result back to DST. > > movdqu (%rsi), %xmm0 > > movdqu (%rdi), %xmm1 > > pblendw $7, %xmm1, %xmm0 > > movups %xmm0, (%rdi) > > ret > > > > Actually, its possible to do this using a uint64_t (8 byte scalar) > > load/store too, > > with some masking and bitwise OR... left as an exercise to the reader? :) > > > Does below work? (not for real life usage, just to experiment single store > solutions :) [https://godbolt.org/z/TmqwQh] > > movzwl 6(%rdi), %eax > salq $48, %rax > orq (%rsi), %rax > movq %rax, (%rdi) > ret > > ---- > > void copy(struct mac *dst, const struct mac *src) { > uint64_t *s = (uint64_t *) &src->addr; > uint64_t *d = (uint64_t *) &dst->addr; > uint16_t dd = ((uint16_t *)d)[3]; > *d = (*s & ~(0xffffUL<48)) | ((uint64_t)dd << 48); > } My code-golf reviewing skills are probably not 100% at end-of-day on a Friday.. so I wrote a unit test ;) Seems to check out yet - readers beware - this solution still overwrites 2 bytes past the dst mac data itself.