Hi Gophers,

My problem domain is such that I need to make a large number of TCP 
connections from a small set of hosts to many other hosts (targets), on a 
local network. The connections are short lived, usually <200ms and transfer 
<100 bytes in each direction, I need to do about 100k connections / second 
per source host. 

The numbers below are all from a 24-core Intel machine, running Linux with 
Go 1.7.3, cross compiled from OS X.  The machine has a multi-queue nic with 
RSS enabled.  The targets are multiple machines running Go servers 
listening on 200 ports each (to avoid 5-tuple exhaustion).

My Go code [1] spawns N go routines, each of which calls net.Dial(), 
performs the transaction and then sleeps for 1s.

With this approach, setting GOMAXPROCS=1 can't sustain 10k conns/ section 
without triggering connection timeouts at a 400ms deadline. Similarly 
GOMAXPROCS=24 can't sustain 100k conns / second.  Removing the context 
timeout passed to Dial() improves performance to the point where 
GOMAXPROCS=1 can do 10k conns/second at a 1% timeout rate with a 200ms 
deadline. 

I've written a C++ solution that uses N-threads, each calling epoll(). 
Targets are assigned to threads and then the sockets stay local to the 
thread for the duration of the transaction. On the same host a single 
thread can do 20k conns/second with a 0.12% timeout rate at a 200ms 
deadline. 6 threads with 10k conn/s each produce <2% of timeouts @ 200ms 
and with 16 threads, 10k each, <2% exceed 200ms and <0.5% of requests 
exceed 300ms.

I believe the Go solution suffers from at least two issues:

i) net.Dial() is fairly expensive, both in terms of allocations & syscalls. 
[2]
ii) syscalls cause the Go routine to be rescheduled, bouncing the work for 
a single socket across CPU cores, hurting locality. Correct me if I'm wrong 
here but from my reading that's what occuring.


I've tried a number of workarounds:

- Use net.DialTCP() at GOMAXPROCS=4, 40k conns/second all requests complete 
in <200ms. That's an improvement but it doesn't allow me to provide a 
timeout.
- exposing net.tcpDial() directly gives 5% timeouts @200ms with 
GOMAXPROCS=4, 40k conn/s second. Setting GOMAXPROCS=24 produces a 0% 
timeout rate, and can scale up to 80k  conn/s before timeouts start 
appearing (1% @ 100k conns/s). This is the best option I've found so far 
but requires use of an internal API.
- using syscall.Socket() directly. The problem here is receiving 
notification when the socket is writable (connected). There doesn't appear 
to be a way to hook into the netpoller. I wrote a solution using 
syscall.EPoll() directly but that had even worse performance than the 
native Go solution.

Does anyone have suggestions on speeding this up? I'd prefer to keep this 
component in written in Go but I'm running out of options to meet the 
performance & efficiency targets.

Thanks,


Simon N

[1] https://gist.github.com/nomis52/7b8405644132a09d2e8f9b8f769297cb 
[2] Results from 
 https://github.com/prashantv/go-bench/blob/master/dial_test.go 

BenchmarkDial/dialer.DialContext-8 1000          1344 B/op     28 allocs/op
BenchmarkDial/net.Dial-8           3000           863 B/op     20 allocs/op
BenchmarkDial/net.DialTCP-8        2000           638 B/op     15 allocs/op
BenchmarkDial/net.DialTimeout-8    2000          1344 B/op     28 allocs/op
BenchmarkDial/net.dialTCP-8        1000          1120 B/op     23 allocs/op

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to