> On 6 Jan 2017, at 10:08 AM, Ladi Prosek <lpro...@redhat.com> wrote: > > Very simple loop optimization with a significant performance impact. > > Microbenchmark results, modern x86-64: > > buffer size | speed up > ------------+--------- > 1500 | 1.7x > 64 | 1.5x > 8 | 1.15x > > Microbenchmark results, POWER7: > > buffer size | speed up > ------------+--------- > 1500 | 5x > 64 | 3.3x > 8 | 1.13x > > There is a lot of room for further improvement at the expense of > code complexity - aligned multibyte reads, LE/BE considerations, > architecture-specific optimizations, etc. This patch still keeps > things simple and readable.
Reviewed-by: Dmitry Fleytman <dmi...@daynix.com> > > Signed-off-by: Ladi Prosek <lpro...@redhat.com> > --- > net/checksum.c | 21 +++++++++++++-------- > 1 file changed, 13 insertions(+), 8 deletions(-) > > diff --git a/net/checksum.c b/net/checksum.c > index 23323b0..4da72a6 100644 > --- a/net/checksum.c > +++ b/net/checksum.c > @@ -22,17 +22,22 @@ > > uint32_t net_checksum_add_cont(int len, uint8_t *buf, int seq) > { > - uint32_t sum = 0; > + uint32_t sum1 = 0, sum2 = 0; > int i; > > - for (i = seq; i < seq + len; i++) { > - if (i & 1) { > - sum += (uint32_t)buf[i - seq]; > - } else { > - sum += (uint32_t)buf[i - seq] << 8; > - } > + for (i = 0; i < len - 1; i += 2) { > + sum1 += (uint32_t)buf[i]; > + sum2 += (uint32_t)buf[i + 1]; > + } > + if (i < len) { > + sum1 += (uint32_t)buf[i]; > + } > + > + if (seq & 1) { > + return sum1 + (sum2 << 8); > + } else { > + return sum2 + (sum1 << 8); > } > - return sum; > } > > uint16_t net_checksum_finish(uint32_t sum) > -- > 2.7.4 >