Re: auto tuning tcp

Alfred Perlstein Tue, 13 Nov 2012 00:18:10 -0800

On 11/13/12 12:06 AM, Andre Oppermann wrote:

On 13.11.2012 07:45, Alfred Perlstein wrote:
On 11/12/12 10:23 PM, Peter Wemm wrote:
On Mon, Nov 12, 2012 at 10:11 PM, Alfred Perlstein <bri...@mu.org>wrote:
On 11/12/12 10:04 PM, Alfred Perlstein wrote:
On 11/12/12 10:48 AM, Alfred Perlstein wrote:
On 11/12/12 10:01 AM, Andre Oppermann wrote:
I've already added the tunable "kern.maxmbufmem" which is in pages.
That's probably not very convenient to work with.  I can change it
to a percentage of phymem/kva.  Would that make you happy?
It really makes sense to have the hash table be some relation tosockets
rather than buffers.
If you are hashing "foo-objects" you want the hash to be somerelation tothe max amount of "foo-objects" you'll see, not backwards derivedfrom the
number of "bar-objects" that "foo-objects" contain, right?

Because we are hashing the sockets, right?   not clusters.

Maybe I'm wrong?  I'm open to ideas.
Hey Andre, the following patch is what I was thinking
(uncompiled/untested), it basically rounds up the maxsockets to apower of 2
and replaces the default 512 tcb hashsize.
It might make sense to make the auto-tuning default to a minimumof 512.
There are a number of other hashes with static sizes that couldmake use
of this logic provided it's not upside-down.

Any thoughts on this?

Tune the tcp pcb hash based on maxsockets.
Be more forgiving of poorly chosen tunables by finding a closer power
of two rather than clamping down to 512.
Index: tcp_subr.c
===================================================================
Sorry, GUI mangled the patch... attaching a plain text version.
Wait, you want to replace a hash with a flat array?  Why even bother
to call it a hash at that point?
If you are concerned about the space/time tradeoff I'm pretty happywith making it 1/2, 1/4th, 1/8th
the size of maxsockets.  (smaller?)

Would that work better?
I'd go for 1/8 or even 1/16 with a lower bound of 512.  More than
that is excessive.

I'm OK with 1/8. All I'm really going for is trying to make it somewhatbetter than 512 when un-tuned.

The reason I chose to make it equal to max sockets was a space/timetradeoff, ideally a hash shouldhave zero collisions and if a user has enough memory for 250,000sockets, then surely they have
enough memory for 256,000 pointers.
I agree in general.  Though not all large memory servers do serve a
large amount of connections.  We have find a tradeoff here.

Having a perfect hash would certainly be laudable.  As long as the
average hash chain doesn't go beyond few entries it's not a problem.
If you strongly disagree then I am fine with a more conservativesetting, just note that effectivelythe hash table will require 1/2 the factor that we go smaller inadditional traversals when we maxout the number of sockets. Meaning if the table is 1/4 the size ofmax sockets, when we hit thatmany tcp connections I think we'll see an order of average 2 linkedlist traversals to find a node.
At 1/8, then that number becomes 4.
I'm fine with that and claim that if you expect N sockets that you
would also increase maxfiles/sockets to N*2 to have some headroom.

That is a good point.

I recall back in 2001 on a PII400 with a custom webserver I wrotehaving a huge benefit by uppingthis to 2^14 or maybe even 2^16, I forget, but suddenly my CPU wentdown a huge amount and I didn't
have to worry about a load balancer or other tricks.
I can certainly believe that.  A hash size of 512 is no good if
you have more than 4K connections.

PS: Please note that my patch for mbuf and maxfiles tuning is not yet
in HEAD, it's still sitting in my tcp_workqueue branch.  I still have
to search for derived values that may get totally out of whack with
the new scaling scheme.

This is cool!  Thank you for the feedback.

Would you like me to put this on a user branch somewhere for you tomerge into your perf branch?


-Alfred
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: auto tuning tcp

Reply via email to