On Tue, 2 Jul 2019 11:26:15 +0200 Olivier Matz <olivier.m...@6wind.com> wrote:
> On Tue, Jul 02, 2019 at 09:53:14AM +0200, Olivier Matz wrote: > > Hi, > > > > On Mon, Jun 24, 2019 at 01:44:31PM -0700, Stephen Hemminger wrote: > > > Using bit operations like or and xor is faster than a loop > > > on all architectures. Really just explicit unrolling. > > > > > > Similar cast to uint16 unaligned is already done in > > > other functions here. > > > > > > Signed-off-by: Stephen Hemminger <step...@networkplumber.org> > > > Reviewed-by: Andrew Rybchenko <arybche...@solarflare.com> > > > --- > > > lib/librte_net/rte_ether.h | 17 +++++++---------- > > > 1 file changed, 7 insertions(+), 10 deletions(-) > > > > > > diff --git a/lib/librte_net/rte_ether.h b/lib/librte_net/rte_ether.h > > > index 8edc7e217b25..feb35a33c94b 100644 > > > --- a/lib/librte_net/rte_ether.h > > > +++ b/lib/librte_net/rte_ether.h > > > @@ -81,11 +81,10 @@ struct rte_ether_addr { > > > static inline int rte_is_same_ether_addr(const struct rte_ether_addr > > > *ea1, > > > const struct rte_ether_addr *ea2) > > > { > > > - int i; > > > - for (i = 0; i < RTE_ETHER_ADDR_LEN; i++) > > > - if (ea1->addr_bytes[i] != ea2->addr_bytes[i]) > > > - return 0; > > > - return 1; > > > + const unaligned_uint16_t *w1 = (const uint16_t *)ea1; > > > + const unaligned_uint16_t *w2 = (const uint16_t *)ea2; > > > + > > > + return ((w1[0] ^ w2[0]) | (w1[1] ^ w2[1]) | (w1[2] ^ w2[2])) == 0; > > > } > > > > > > /** > > > @@ -100,11 +99,9 @@ static inline int rte_is_same_ether_addr(const struct > > > rte_ether_addr *ea1, > > > */ > > > static inline int rte_is_zero_ether_addr(const struct rte_ether_addr *ea) > > > { > > > - int i; > > > - for (i = 0; i < RTE_ETHER_ADDR_LEN; i++) > > > - if (ea->addr_bytes[i] != 0x00) > > > - return 0; > > > - return 1; > > > + const unaligned_uint16_t *w = (const uint16_t *)ea; > > > + > > > + return (w[0] | w[1] | w[2]) == 0; > > > } > > > > > > /** > > > > I wonder if using memcmp() isn't faster with recent compilers (gcc >= 7). > > I tried it quickly, and it seems the generated code is good (no call): > > https://godbolt.org/z/9MOL7g > > > > It would avoid the use of unaligned_uint16_t, and the next patch that > > adds the alignment constraint. > > As pointed out by Konstantin privately (I guess he wanted to do a > reply-all), the size of addr_bytes is wrong in my previous link (8 > instead of 6). Thanks for catching it. > > With 6, the gcc code is not as good: there is still no call to memcmp(), > but there are some jumps. With the latest clang, the generated code is > nice: https://godbolt.org/z/nfptnY gcc matters most for current DPDK users.