> -----Original Message----- > From: Richardson, Bruce > Sent: Thursday, April 16, 2015 3:01 PM > To: De Lara Guarch, Pablo > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH] hash: update jhash function with the latest > available > > On Thu, Apr 16, 2015 at 02:26:59PM +0100, Pablo de Lara wrote: > > Jenkins hash function was developed originally in 1996, > > and was integrated in first versions of DPDK. > > The function has been improved in 2006, > > achieving up to 60% better performance, compared to the original one. > > > > Check out: http://burtleburtle.net/bob/c/lookup3.c > > > > This patch integrates that code in the rte_jhash library, > > adding also a new function rte_jhash_word2, > > that returns two different hash values, for a single key. > > > > Should the addition of the new functionality not be a separate patch from > the > update to the existing code?
True, actually, I miss one extra function (2 in total). > Also, do the new functions return the exact same values as the previous > versions, > just faster? The new functions return different values from the previous version AND faster (some cases, MUCH faster) [...] > > +#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN > > + case 12: > > + c += k[2]; b += k[1]; a += k[0]; break; > > + case 11: > > + c += k[2]&0xffffff; b += k[1]; a += k[0]; break; > > + case 10: > > + c += k[2]&0xffff; b += k[1]; a += k[0]; break; > > + case 9: > > + c += k[2]&0xff; b += k[1]; a += k[0]; break; > > + case 8: > > + b += k[1]; a += k[0]; break; > > + case 7: > > + b += k[1]&0xffffff; a += k[0]; break; > > + case 6: > > + b += k[1]&0xffff; a += k[0]; break; > > + case 5: > > + b += k[1]&0xff; a += k[0]; break; > > + case 4: > > + a += k[0]; break; > > + case 3: > > + a += k[0]&0xffffff; break; > > + case 2: > > + a += k[0]&0xffff; break; > > + case 1: > > + a += k[0]&0xff; break; > > +#else > > + case 12: > > + c += k[2]; b += k[1]; a += k[0]; break; > > + case 11: > > + c += k[2]&0xffffff00; b += k[1]; a += k[0]; break; > > + case 10: > > + c += k[2]&0xffff0000; b += k[1]; a += k[0]; break; > > + case 9: > > + c += k[2]&0xff000000; b += k[1]; a += k[0]; break; > > + case 8: > > + b += k[1]; a += k[0]; break; > > + case 7: > > + b += k[1]&0xffffff00; a += k[0]; break; > > + case 6: > > + b += k[1]&0xffff0000; a += k[0]; break; > > + case 5: > > + b += k[1]&0xff000000; a += k[0]; break; > > + case 4: > > + a += k[0]; break; > > + case 3: > > + a += k[0]&0xffffff00; break; > > + case 2: > > + a += k[0]&0xffff0000; break; > > + case 1: > > + a += k[0]&0xff000000; break; > > +#endif > > Only the constants seem different in this block. Can we get rid of the > #ifdefs using rte_XX_to_cpu() calls instead? Will add that in next version. > > > + /* zero length strings require no mixing */ > > + case 0: > > + return c; > > + }; > > +#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN > > + } else if ((u.i & 0x1) == 0) { > > + /* read 16-bit chunks */ > > + const uint16_t *k = (const uint16_t *)key; > > + const uint8_t *k8; > > + > > + /* all but last block: aligned reads and different mixing */ > > + while (length > 12) { > > + a += k[0] + (((uint32_t)k[1])<<16); > > + b += k[2] + (((uint32_t)k[3])<<16); > > + c += k[4] + (((uint32_t)k[5])<<16); > > + > > + __rte_jhash_mix(a, b, c); > > + > > + k += 6; > > + length -= 12; > > + } > > + > > + /* handle the last (probably partial) block */ > > + k8 = (const uint8_t *)k; > > + switch (length) { > > + case 12: > > + c += k[4]+(((uint32_t)k[5])<<16); > > + b += k[2]+(((uint32_t)k[3])<<16); > > + a += k[0]+(((uint32_t)k[1])<<16); > > + break; > > + case 11: > > + /* fall through */ > > + c += ((uint32_t)k8[10])<<16; > > + case 10: > > + c += k[4]; > > + b += k[2]+(((uint32_t)k[3])<<16); > > + a += k[0]+(((uint32_t)k[1])<<16); > > + break; > > + case 9: > > + /* fall through */ > > + c += k8[8]; > > + case 8: > > + b += k[2]+(((uint32_t)k[3])<<16); > > + a += k[0]+(((uint32_t)k[1])<<16); > > + break; > > + case 7: > > + /* fall through */ > > + b += ((uint32_t)k8[6])<<16; > > + case 6: > > + b += k[2]; > > + a += k[0]+(((uint32_t)k[1])<<16); > > + break; > > + case 5: > > + /* fall through */ > > + b += k8[4]; > > + case 4: > > + a += k[0]+(((uint32_t)k[1])<<16); > > + break; > > + case 3: > > + /* fall through */ > > + a += ((uint32_t)k8[2])<<16; > > + case 2: > > + a += k[0]; > > + break; > > + case 1: > > + a += k8[0]; > > + break; > > + case 0: > > + /* zero length requires no mixing */ > > + return c; > > + } > > +#endif > > No else block for this ifdef? According to the code, for big endian, it only covers 4-byte alignment and rest of cases. > > > + } else { > > + const uint8_t *k = (const uint8_t *)key; > > + > > + /* all but the last block: affect some 32 bits of (a, b, c) */ > > + while (length > 12) { > > +#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN > > + a += k[0]; > > + a += ((uint32_t)k[1])<<8; > > + a += ((uint32_t)k[2])<<16; > > + a += ((uint32_t)k[3])<<24; > > + b += k[4]; > > + b += ((uint32_t)k[5])<<8; > > + b += ((uint32_t)k[6])<<16; > > + b += ((uint32_t)k[7])<<24; > > + c += k[8]; > > + c += ((uint32_t)k[9])<<8; > > + c += ((uint32_t)k[10])<<16; > > + c += ((uint32_t)k[11])<<24; > > +#else > > + a += ((uint32_t)k[0])<<24; > > + a += ((uint32_t)k[1])<<16; > > + a += ((uint32_t)k[2])<<8; > > + a += ((uint32_t)k[3]); > > + b += ((uint32_t)k[4])<<24; > > + b += ((uint32_t)k[5])<<16; > > + b += ((uint32_t)k[6])<<8; > > + b += ((uint32_t)k[7]); > > + c += ((uint32_t)k[8])<<32; > > + c += ((uint32_t)k[9])<<16; > > + c += ((uint32_t)k[10])<<8; > > + c += ((uint32_t)k[11]); > > +#endif > > Maybe find a better way to shorten/remove this #ifdef also. E.g. shorter > ifdef defining macros for the different shift amounts, 0, 8, 16, 24. Agree. Will change that in v2. [...] Thanks for the comments. I will send a v2 soon.