RE: [PATCH] net: fix checksum with unaligned buffer

Emil Berg Tue, 21 Jun 2022 23:01:56 -0700

> -----Original Message-----
> From: Morten Brørup <[email protected]>
> Sent: den 17 juni 2022 11:07
> To: Emil Berg <[email protected]>
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: RE: [PATCH] net: fix checksum with unaligned buffer
> 
> > From: Morten Brørup [mailto:[email protected]]
> > Sent: Friday, 17 June 2022 10.45
> >
> > With this patch, the checksum can be calculated on an unligned part of
> > a packet buffer.
> > I.e. the buf parameter is no longer required to be 16 bit aligned.
> >
> > The DPDK invariant that packet buffers must be 16 bit aligned remains
> > unchanged.
> > This invariant also defines how to calculate the 16 bit checksum on an
> > unaligned part of a packet buffer.
> >
> > Bugzilla ID: 1035
> > Cc: [email protected]
> >
> > Signed-off-by: Morten Brørup <[email protected]>
> > ---
> >  lib/net/rte_ip.h | 17 +++++++++++++++--
> >  1 file changed, 15 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h index
> > b502481670..8e301d9c26 100644
> > --- a/lib/net/rte_ip.h
> > +++ b/lib/net/rte_ip.h
> > @@ -162,9 +162,22 @@ __rte_raw_cksum(const void *buf, size_t len,
> > uint32_t sum)  {
> >     /* extend strict-aliasing rules */
> >     typedef uint16_t __attribute__((__may_alias__)) u16_p;
> > -   const u16_p *u16_buf = (const u16_p *)buf;
> > -   const u16_p *end = u16_buf + len / sizeof(*u16_buf);
> > +   const u16_p *u16_buf;
> > +   const u16_p *end;
> > +
> > +   /* if buffer is unaligned, keeping it byte order independent */
> > +   if (unlikely((uintptr_t)buf & 1)) {
> > +           uint16_t first = 0;
> > +           if (unlikely(len == 0))
> > +                   return 0;
> > +           ((unsigned char *)&first)[1] = *(const unsigned
> char *)buf;
> > +           sum += first;
> > +           buf = (const void *)((uintptr_t)buf + 1);
> > +           len--;
> > +   }
> >
> > +   u16_buf = (const u16_p *)buf;
> > +   end = u16_buf + len / sizeof(*u16_buf);
> >     for (; u16_buf != end; ++u16_buf)
> >             sum += *u16_buf;
> >
> > --
> > 2.17.1
> 
> @Emil, can you please test this patch with an unaligned buffer on your
> application to confirm that it produces the expected result.


Hi!

I tested the patch. It doesn't seem to produce the same results. I think the 
problem is that it always starts summing from an even address, the sum should 
always start from the first byte according to the checksum specification. Can I 
instead propose something Mattias Rönnblom sent me?
--------------------------------------------------------------------------------------------------------------
const void *end = RTE_PTR_ADD(buf, (len / sizeof(uint16_t)) * sizeof(uint16_t));

for (; buf != end; buf = RTE_PTR_ADD(buf, sizeof(uint16_t))) {
    uint16_t v;
    memcpy(&v, buf, sizeof(uint16_t));
    sum += v;
}

/* if length is odd, keeping it byte order independent */
if (unlikely(len % 2)) {
    uint16_t left = 0;
    *(unsigned char *)&left = *(const unsigned char *)end;
    sum += left;
}
--------------------------------------------------------------------------------------------------------------
Note that the last block is the same as before. Amazingly I see no measurable 
performance hit from this compared to the previous one (-O3, march=native). 
Looking at the previous the loop body may compile to (x86):
--------------------------------------------------------------------------------------------------------------
vmovdqa (%rdx),%xmm1
vpmovzxwd %xmm1,%xmm0
vpsrldq $0x8,%xmm1,%xmm1
vpmovzxwd %xmm1,%xmm1
vpaddd %xmm1,%xmm0,%xmm0
cmp    $0xf,%rax
jbe    0x7ff7a0dfb1a9
--------------------------------------------------------------------------------------------------------------
while Mattias' memcpy solution:
--------------------------------------------------------------------------------------------------------------
vmovdqu (%rcx),%ymm0
add    $0x20,%rcx
vpmovzxwd %xmm0,%ymm1
vextracti128 $0x1,%ymm0,%xmm0
vpmovzxwd %xmm0,%ymm0
vpaddd %ymm0,%ymm1,%ymm0
vpaddd %ymm0,%ymm2,%ymm2
cmp    %r9,%rcx
jne    0x555555556380
--------------------------------------------------------------------------------------------------------------
Thus two extra instructions in the loop, but I suspect it may be memory bound, 
leading to no measurable performance difference. 

Any comments?

/Emil

RE: [PATCH] net: fix checksum with unaligned buffer

Reply via email to