Evgeniy Polyakov a e'crit :
On Sun, Feb 18, 2007 at 07:46:22PM +0100, Eric Dumazet ([EMAIL PROTECTED])
wrote:
Why anyone do not want to use trie - for socket-like loads it has
exactly constant search/insert/delete time and scales as hell.
Because we want to be *very* fast. You cannot beat hash table.
Say you have 1.000.000 tcp connections, with 50.000 incoming packets per
second to *random* streams...
What is really good in trie, that you may have upto 2^32 connections
without _any_ difference in lookup performance of random streams.
So are you speaking of one memory cache miss per lookup ?
If not, you loose.
With a 2^20 hashtable, a lookup uses one cache line (the hash head pointer)
plus one cache line to get the socket (you need it to access its refcounter)
Several attempts were done in the past to add RCU to ehash table (last done
by Benjamin LaHaise last March). I believe this was delayed a bit, because
David would like to be able to resize the hash table...
This is a theory.
Not theory, but actual practice, on a real machine.
# cat /proc/net/sockstat
sockets: used 918944
TCP: inuse 925413 orphan 7401 tw 4906 alloc 926292 mem 304759
UDP: inuse 9
RAW: inuse 0
FRAG: inuse 9 memory 18360
Practice includes cost for hashing, locking, and list traversal
(each pointer is in own cache line btw, which must be fetched) and plus
the same for time wait sockets (if we are unlucky).
No need to talk about price of cache miss when there might be more
serious problems - for example length of the linked list to traverse each
time new packet is received.
For example lookup time in trie with 1.6 millions random 3-dimensional
32bit (saddr/daddr/ports) entries is about 1 microsecond on amd athlon64
3500 cpu (test was ran in userspace emulator though).
1 microsecond ? Are you kidding ? We want no more than 50 ns.
You can check on this dual cpu machine, tcp_v4_rcv() uses 2.29 % of cpu.
CPU: AMD64 processors, speed 1992.67 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit
mask of 0x00 (No unit mask) count 100000
samples % symbol name
2009510 4.6863 memcpy_c
1668842 3.8918 tg3_start_xmit_dma_bug
1485844 3.4651 tg3_poll
1293558 3.0167 kmem_cache_free
1232862 2.8751 kfree
1131012 2.6376 free_block
1000671 2.3336 ip_route_input
982655 2.2916 tcp_v4_rcv
955554 2.2284 __alloc_skb
863753 2.0143 tcp_ack
863222 2.0131 tcp_recvmsg
834680 1.9465 fget_light
801445 1.8690 lock_sock_nested
793699 1.8510 tcp_sendmsg
764689 1.7833 copy_user_generic_string
743515 1.7339 ip_queue_xmit
712314 1.6612 sock_wfree
650486 1.5170 tcp_rcv_established
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html