> -----Original Message-----
> From: Ferruh Yigit [mailto:ferruh.yi...@intel.com]
> Sent: Thursday, 14 October 2021 19.20
> 
> On 9/18/2021 12:49 PM, Georg Sauthoff wrote:
> > That means a superfluous cast is removed and aliasing through a
> uint8_t
> > pointer is eliminated. Note that uint8_t doesn't have the same
> > strict-aliasing properties as unsigned char.
> >
> > Also simplified the loop since a modern C compiler can speed up (i.e.
> > auto-vectorize) it in a similar way. For example, GCC auto-vectorizes
> it
> > for Haswell using AVX registers while halving the number of
> instructions
> > in the generated code.
> >
> > Signed-off-by: Georg Sauthoff <m...@gms.tf>
> 
> + Morten. (Because of past reviews on cksum code)

Thanks, Ferruh.

I have not verified the claimed benefits of the patch, but I have reviewed the 
code thoroughly, and it looks perfectly good to me.

Reviewed-by: Morten Brørup <m...@smartsharesystems.com>

BTW: It makes me wonder if other parts of DPDK could benefit from the same 
treatment. Especially some of the older DPDK code, where we were trying to 
optimize by hand what a modern compiler can optimize for us today.

> 
> > ---
> >   lib/net/rte_ip.h | 27 ++++++++-------------------
> >   1 file changed, 8 insertions(+), 19 deletions(-)
> >
> > diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
> > index 05948b69b7..386db94c85 100644
> > --- a/lib/net/rte_ip.h
> > +++ b/lib/net/rte_ip.h
> > @@ -141,29 +141,18 @@ rte_ipv4_hdr_len(const struct rte_ipv4_hdr
> *ipv4_hdr)
> >   static inline uint32_t
> >   __rte_raw_cksum(const void *buf, size_t len, uint32_t sum)
> >   {
> > -   /* workaround gcc strict-aliasing warning */
> > -   uintptr_t ptr = (uintptr_t)buf;
> > +   /* extend strict-aliasing rules */
> >     typedef uint16_t __attribute__((__may_alias__)) u16_p;
> > -   const u16_p *u16_buf = (const u16_p *)ptr;
> > -
> > -   while (len >= (sizeof(*u16_buf) * 4)) {
> > -           sum += u16_buf[0];
> > -           sum += u16_buf[1];
> > -           sum += u16_buf[2];
> > -           sum += u16_buf[3];
> > -           len -= sizeof(*u16_buf) * 4;
> > -           u16_buf += 4;
> > -   }
> > -   while (len >= sizeof(*u16_buf)) {
> > +   const u16_p *u16_buf = (const u16_p *)buf;
> > +   const u16_p *end = u16_buf + len / sizeof(*u16_buf);
> > +
> > +   for (; u16_buf != end; ++u16_buf)

Personally I would prefer post-incrementing here. It makes no difference, so I 
don't see any need to revise the patch.

> >             sum += *u16_buf;
> > -           len -= sizeof(*u16_buf);
> > -           u16_buf += 1;
> > -   }
> >
> > -   /* if length is in odd bytes */
> > -   if (len == 1) {
> > +   /* if length is odd, keeping it byte order independent */
> > +   if (len % 2) {
> >             uint16_t left = 0;
> > -           *(uint8_t *)&left = *(const uint8_t *)u16_buf;
> > +           *(unsigned char*)&left = *(const unsigned char *)end;
> >             sum += left;
> >     }
> >
> >
> 

Reply via email to