Re: Scalability problem from route refcounting

Andre Oppermann Thu, 15 Mar 2007 08:23:39 -0800

Kris Kennaway wrote:

I have recently started looking at database performance over gigabit
ethernet, and there seems to be a bottleneck coming from the way route
reference counting is implemented.  On an 8-core system it looks like
we spend a lot of time waiting for the rtentry mutex:


   max        total   wait_total       count   avg wait_avg     cnt_hold     
cnt_lock name
[...]
   408       950496      1135994      301418     3     3        24876        
55936 net/if_ethersubr.c:397 (sleep mutex:bge1)
   974       968617      1515169      253772     3     5        14741        
60581 dev/bge/if_bge.c:2949 (sleep mutex:bge1)
  2415     18255976      1607511      253841    71     6       125174         
3131 netinet/tcp_input.c:770 (sleep mutex:inp)
   233      1850252      2080506      141817    13    14            0       
126897 netinet/tcp_usrreq.c:756 (sleep mutex:inp)
   384      6895050      2737492      299002    23     9        92100        
73942 dev/bge/if_bge.c:3506 (sleep mutex:bge1)
   626      5342286      2760193      301477    17     9        47616        
54158 net/route.c:147 (sleep mutex:radix node head)
   326      3562050      3381510      301477    11    11       133968       
110104 net/route.c:197 (sleep mutex:rtentry)
   146       947173      5173813      301477     3    17        44578       
120961 net/route.c:1290 (sleep mutex:rtentry)
   146       953718      5501119      301476     3    18        63285       
121819 netinet/ip_output.c:610 (sleep mutex:rtentry)
    50      4530645      7885304     1423098     3     5       642391       
788230 kern/subr_turnstile.c:489 (spin mutex:turnstile chain)

i.e. during a 30 second sample we spend a total of >14 seconds (on all
cpus) waiting to acquire the rtentry mutex.

This appears to be because (among other things), we increment and then
decrement the route refcount for each packet we send, each of which
requires acquiring the rtentry mutex for that route before adjusting
the refcount.  So multiplexing traffic for lots of connections over a
single route is being partly rate-limited by those mutex operations.


The rtentry locking actually isn't that much of a problem in itself
and rtalloc1() in net/route.c only gets the blame because this function
aquires the lock for the routing table entry and returns a locked entry.
It is the job of the callers to unlock it as soon as possible again.
Here arpresolve() in netinet/if_ether.c is the offending function keeping
the lock over an extended period causing the contention and long wait
times.  ARP is a horrible mess and I don't have a quick fix for this.
There is some work in progress for quite some time to replace the current
ARP code with something more adequate.  That's not finished yet though.

This is not the end of the story though, the bge driver is a serious
bottleneck on its own (e.g. I nulled out the route locking since it is
not relevant in my environment, at least for the purposes of this
test, and that exposed bge as the next problem -- but other drivers
may not be so bad).


--
Andre

_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Scalability problem from route refcounting

Reply via email to