On Fri, 26 Jun 2020 17:28:49 +0000
"Van Haaren, Harry" <harry.van.haa...@intel.com> wrote:

> > -----Original Message-----
> > From: Yigit, Ferruh <ferruh.yi...@intel.com>
> > Sent: Friday, June 26, 2020 4:54 PM
> > To: Van Haaren, Harry <harry.van.haa...@intel.com>; Morten Brørup
> > <m...@smartsharesystems.com>; dev@dpdk.org
> > Cc: Olivier Matz <olivier.m...@6wind.com>; Ananyev, Konstantin
> > <konstantin.anan...@intel.com>
> > Subject: Re: [dpdk-dev] rte_ether_addr_copy() strange comment
> > 
> > On 6/26/2020 1:41 PM, Van Haaren, Harry wrote:  
> > >> -----Original Message-----  
> 
> <snip serious conversation>
> 
> > > PS: For extra bonus points, here's a SIMD version that only uses one store
> > > https://godbolt.org/z/VAR2La. Unless you intend on copying billions of
> > > L1 resident eth addrs, this may or may not be a useful optimization.
> > > Note that it requires the 10 bytes after the ether addr to be valid to 
> > > read.
> > > It loads 16B across both SRC and DST, blends 48 bits of SRC into DST and
> > > writes the result back to DST.
> > >         movdqu  (%rsi), %xmm0
> > >         movdqu  (%rdi), %xmm1
> > >         pblendw $7, %xmm1, %xmm0
> > >         movups  %xmm0, (%rdi)
> > >         ret
> > >
> > > Actually, its possible to do this using a uint64_t (8 byte scalar) 
> > > load/store too,
> > > with some masking and bitwise OR... left as an exercise to the reader? :)
> > >  
> > Does below work? (not for real life usage, just to experiment single store
> > solutions :) [https://godbolt.org/z/TmqwQh]
> > 
> >         movzwl  6(%rdi), %eax
> >         salq    $48, %rax
> >         orq     (%rsi), %rax
> >         movq    %rax, (%rdi)
> >         ret
> > 
> > ----
> > 
> > void copy(struct mac *dst, const struct mac *src) {
> >     uint64_t *s = (uint64_t *) &src->addr;
> >     uint64_t *d = (uint64_t *) &dst->addr;
> >     uint16_t dd = ((uint16_t *)d)[3];
> >     *d = (*s & ~(0xffffUL<48)) | ((uint64_t)dd << 48);
> > }  
> 
> My code-golf reviewing skills are probably not 100% at end-of-day on a 
> Friday.. so I wrote a unit test ;)
> Seems to check out yet - readers beware - this solution still overwrites 2 
> bytes past the dst mac data itself.
> 

The Linux kernel equivalent is:

static inline void ether_addr_copy(u8 *dst, const u8 *src)
{
#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
        *(u32 *)dst = *(const u32 *)src;
        *(u16 *)(dst + 4) = *(const u16 *)(src + 4);
#else
        u16 *a = (u16 *)dst;
        const u16 *b = (const u16 *)src;

        a[0] = b[0];
        a[1] = b[1];
        a[2] = b[2];
#endif
}

Reply via email to