On Fri, 26 Jun 2020 17:28:49 +0000 "Van Haaren, Harry" <harry.van.haa...@intel.com> wrote:
> > -----Original Message----- > > From: Yigit, Ferruh <ferruh.yi...@intel.com> > > Sent: Friday, June 26, 2020 4:54 PM > > To: Van Haaren, Harry <harry.van.haa...@intel.com>; Morten Brørup > > <m...@smartsharesystems.com>; dev@dpdk.org > > Cc: Olivier Matz <olivier.m...@6wind.com>; Ananyev, Konstantin > > <konstantin.anan...@intel.com> > > Subject: Re: [dpdk-dev] rte_ether_addr_copy() strange comment > > > > On 6/26/2020 1:41 PM, Van Haaren, Harry wrote: > > >> -----Original Message----- > > <snip serious conversation> > > > > PS: For extra bonus points, here's a SIMD version that only uses one store > > > https://godbolt.org/z/VAR2La. Unless you intend on copying billions of > > > L1 resident eth addrs, this may or may not be a useful optimization. > > > Note that it requires the 10 bytes after the ether addr to be valid to > > > read. > > > It loads 16B across both SRC and DST, blends 48 bits of SRC into DST and > > > writes the result back to DST. > > > movdqu (%rsi), %xmm0 > > > movdqu (%rdi), %xmm1 > > > pblendw $7, %xmm1, %xmm0 > > > movups %xmm0, (%rdi) > > > ret > > > > > > Actually, its possible to do this using a uint64_t (8 byte scalar) > > > load/store too, > > > with some masking and bitwise OR... left as an exercise to the reader? :) > > > > > Does below work? (not for real life usage, just to experiment single store > > solutions :) [https://godbolt.org/z/TmqwQh] > > > > movzwl 6(%rdi), %eax > > salq $48, %rax > > orq (%rsi), %rax > > movq %rax, (%rdi) > > ret > > > > ---- > > > > void copy(struct mac *dst, const struct mac *src) { > > uint64_t *s = (uint64_t *) &src->addr; > > uint64_t *d = (uint64_t *) &dst->addr; > > uint16_t dd = ((uint16_t *)d)[3]; > > *d = (*s & ~(0xffffUL<48)) | ((uint64_t)dd << 48); > > } > > My code-golf reviewing skills are probably not 100% at end-of-day on a > Friday.. so I wrote a unit test ;) > Seems to check out yet - readers beware - this solution still overwrites 2 > bytes past the dst mac data itself. > The Linux kernel equivalent is: static inline void ether_addr_copy(u8 *dst, const u8 *src) { #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) *(u32 *)dst = *(const u32 *)src; *(u16 *)(dst + 4) = *(const u16 *)(src + 4); #else u16 *a = (u16 *)dst; const u16 *b = (const u16 *)src; a[0] = b[0]; a[1] = b[1]; a[2] = b[2]; #endif }