Kris Kennaway wrote:
I have recently started looking at database performance over gigabit
ethernet, and there seems to be a bottleneck coming from the way route
reference counting is implemented. On an 8-core system it looks like
we spend a lot of time waiting for the rtentry mutex:
max total wait_total count avg wait_avg cnt_hold
cnt_lock name
[...]
408 950496 1135994 301418 3 3 24876
55936 net/if_ethersubr.c:397 (sleep mutex:bge1)
974 968617 1515169 253772 3 5 14741
60581 dev/bge/if_bge.c:2949 (sleep mutex:bge1)
2415 18255976 1607511 253841 71 6 125174
3131 netinet/tcp_input.c:770 (sleep mutex:inp)
233 1850252 2080506 141817 13 14 0
126897 netinet/tcp_usrreq.c:756 (sleep mutex:inp)
384 6895050 2737492 299002 23 9 92100
73942 dev/bge/if_bge.c:3506 (sleep mutex:bge1)
626 5342286 2760193 301477 17 9 47616
54158 net/route.c:147 (sleep mutex:radix node head)
326 3562050 3381510 301477 11 11 133968
110104 net/route.c:197 (sleep mutex:rtentry)
146 947173 5173813 301477 3 17 44578
120961 net/route.c:1290 (sleep mutex:rtentry)
146 953718 5501119 301476 3 18 63285
121819 netinet/ip_output.c:610 (sleep mutex:rtentry)
50 4530645 7885304 1423098 3 5 642391
788230 kern/subr_turnstile.c:489 (spin mutex:turnstile chain)
i.e. during a 30 second sample we spend a total of >14 seconds (on all
cpus) waiting to acquire the rtentry mutex.
This appears to be because (among other things), we increment and then
decrement the route refcount for each packet we send, each of which
requires acquiring the rtentry mutex for that route before adjusting
the refcount. So multiplexing traffic for lots of connections over a
single route is being partly rate-limited by those mutex operations.
The rtentry locking actually isn't that much of a problem in itself
and rtalloc1() in net/route.c only gets the blame because this function
aquires the lock for the routing table entry and returns a locked entry.
It is the job of the callers to unlock it as soon as possible again.
Here arpresolve() in netinet/if_ether.c is the offending function keeping
the lock over an extended period causing the contention and long wait
times. ARP is a horrible mess and I don't have a quick fix for this.
There is some work in progress for quite some time to replace the current
ARP code with something more adequate. That's not finished yet though.
This is not the end of the story though, the bge driver is a serious
bottleneck on its own (e.g. I nulled out the route locking since it is
not relevant in my environment, at least for the purposes of this
test, and that exposed bge as the next problem -- but other drivers
may not be so bad).
--
Andre
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"