> -----Original Message-----
> From: Yigit, Ferruh <ferruh.yi...@intel.com>
> Sent: Friday, June 26, 2020 4:54 PM
> To: Van Haaren, Harry <harry.van.haa...@intel.com>; Morten Brørup
> <m...@smartsharesystems.com>; dev@dpdk.org
> Cc: Olivier Matz <olivier.m...@6wind.com>; Ananyev, Konstantin
> <konstantin.anan...@intel.com>
> Subject: Re: [dpdk-dev] rte_ether_addr_copy() strange comment
> 
> On 6/26/2020 1:41 PM, Van Haaren, Harry wrote:
> >> -----Original Message-----

<snip serious conversation>

> > PS: For extra bonus points, here's a SIMD version that only uses one store
> > https://godbolt.org/z/VAR2La. Unless you intend on copying billions of
> > L1 resident eth addrs, this may or may not be a useful optimization.
> > Note that it requires the 10 bytes after the ether addr to be valid to read.
> > It loads 16B across both SRC and DST, blends 48 bits of SRC into DST and
> > writes the result back to DST.
> >         movdqu  (%rsi), %xmm0
> >         movdqu  (%rdi), %xmm1
> >         pblendw $7, %xmm1, %xmm0
> >         movups  %xmm0, (%rdi)
> >         ret
> >
> > Actually, its possible to do this using a uint64_t (8 byte scalar) 
> > load/store too,
> > with some masking and bitwise OR... left as an exercise to the reader? :)
> >
> Does below work? (not for real life usage, just to experiment single store
> solutions :) [https://godbolt.org/z/TmqwQh]
> 
>         movzwl  6(%rdi), %eax
>         salq    $48, %rax
>         orq     (%rsi), %rax
>         movq    %rax, (%rdi)
>         ret
> 
> ----
> 
> void copy(struct mac *dst, const struct mac *src) {
>     uint64_t *s = (uint64_t *) &src->addr;
>     uint64_t *d = (uint64_t *) &dst->addr;
>     uint16_t dd = ((uint16_t *)d)[3];
>     *d = (*s & ~(0xffffUL<48)) | ((uint64_t)dd << 48);
> }

My code-golf reviewing skills are probably not 100% at end-of-day on a Friday.. 
so I wrote a unit test ;)
Seems to check out yet - readers beware - this solution still overwrites 2 
bytes past the dst mac data itself.

Reply via email to