(This time, using e-mail instead of the web form...)
Hello again!
After what suggestions I saw (all on networking-discuss...), I put together a
multiple-choice question.
Consider thie _LITTLE_ENDIAN section in this code fragment, which is known to
be an improvement on SPARC:
===================== (Cut up to and including here.) =====================
#ifdef _LITTLE_ENDIAN
/* For little-endian, we really need to think about this. */
#if 1
/* Clever math - thanks Nico! */
#define PREFIX_LOW32(pfxlen) \
((((uint8_t)((0xFF00 >> ((pfxlen) & 0x7)))) << ((pfxlen) & ~0x7)) | \
(0xFFFFFF >> ((31 - (pfxlen)) & ~0x7)))
#endif
#if 0
/* ntohl() the big-endian solution */
#define PREFIX_LOW32(pfxlen) ntohl(0xFFFFFFFF << (32 - (pfxlen)))
#endif
#if 0
/* or use a table lookup */
static uint32_t masks[] = {
0x00000000, 0x00000080, 0x000000C0, 0x000000E0,
0x000000F0, 0x000000F8, 0x000000FC, 0x000000FE,
0x000000FF, 0x000080FF, 0x0000C0FF, 0x0000E0FF,
0x0000F0FF, 0x0000F8FF, 0x0000FCFF, 0x0000FEFF,
0x0000FFFF, 0x0080FFFF, 0x00C0FFFF, 0x00E0FFFF,
0x00F0FFFF, 0x00F8FFFF, 0x00FCFFFF, 0x00FEFFFF,
0x00FFFFFF, 0x80FFFFFF, 0xC0FFFFFF, 0xE0FFFFFF,
0xF0FFFFFF, 0xF8FFFFFF, 0xFCFFFFFF, 0xFEFFFFFF
};
#define PREFIX_LOW32(pfxlen) (masks[pfxlen])
#endif
/*
* sleazy prefix-length-based compare.
* another inlining candidate..
*/
boolean_t
ip_addr_match(uint32_t *addr1, int pfxlen, uint32_t *addr2)
{
while (pfxlen >= 32) {
if (*addr1 != *addr2)
return (B_FALSE);
addr1++;
addr2++;
pfxlen -= 32;
}
return (pfxlen == 0 || ((*addr1 ^ *addr2) & PREFIX_LOW32(pfxlen)));
p}
===================== (Cut up to and including here.) =====================
And here's the original code:
===================== (Cut up to and including here.) =====================
/*
* sleazy prefix-length-based compare.
* another inlining candidate..
*/
boolean_t
ip_addr_match(uint8_t *addr1, int pfxlen, in6_addr_t *addr2p)
{
int offset = pfxlen>>3;
int bitsleft = pfxlen & 7;
uint8_t *addr2 = (uint8_t *)addr2p;
/*
* and there was much evil..
* XXX should inline-expand the bcmp here and do this 32 bits
* or 64 bits at a time..
*/
return ((bcmp(addr1, addr2, offset) == 0) &&
((bitsleft == 0) ||
(((addr1[offset] ^ addr2[offset]) &
(0xff<<(8-bitsleft))) == 0)));
}
===================== (Cut up to and including here.) =====================
Experiments run using an IPsec and IKE test suite and using DTrace's FBT
indicate that the original function outperforms both the table lookup and the
"ntohl() the big-endian solution". I have my suspicions about our in-lining
performance of ntohl(), but that's another topic.
I haven't run Nico's "clever math" solution yet, but would like to know what
the peanut gallery thinks.
BTW, here are the two big buckets using bcmp() on an opteron box with bcmp():
Number of calls == 410221.
128 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 308602
256 |@@@@@@@@@@ 99140
and with table-lookup:
Number of calls == 412346.
128 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 292893
256 |@@@@@@@@@@@ 117017
The htonl() ones were worse than table-lookup. Those two buckets account for
the vast majority ( >99% of the sampled calls) of the calls to
ip_addr_match().
The sparc gains are bigger than the opteron lossage. Here's bcmp():
Number of calls == 398975.
128 |@@@@@@@@@@@@@@@@@@ 177120
256 |@@@@@@@@@@@@@@@@ 159558
512 |@@@@@@ 58487
and using the simple math-only solution:
Number of calls == 397934.
128 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 319443
256 |@@@@@@ 60991
512 |@ 5972
So I'm not sure what to do.
Any clues are, as always, welcome!
Thanks,
Dan
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org