Missing SYN/ACK answers

2010-04-28 Thread Renaud Chaput
Hi,

I am using a DL360 G6 server with an additional Intel network card on
FreeBSD 8.0-REL-p2 as a loadbalancer. I use nginx as an SSL endpoint,
and haproxy as an HTTP loadbalancer.
One port of the intel card (em0) is on the internal LAN, and another
(em1) on the public LAN.

An external monitoring tool reported that sometimes HTTP requests were
during 3, 6 or even 9 seconds, in place of the 10-20ms which we
usually see.
I ran some tests, and I seen that sometimes no SYN/ACK is sent by the
loadbalancer, the clients sents another one after 3 seconds, and then
a SYN/ACK is sent. Sometime, the client needs to send the SYN 2 or 3
times to have and answer.

Here is a tcpdump example :
13:57:52.978784 IP mas91-4-88-189-56-133.fbx.proxad.net.58484 >
www-1.reverse.fotolia.net.http: Flags [S], seq 842845757, win 5840,
options [mss 1460,sackOK,TS val 24878682 ecr 0,nop,wscale 7], length 0
13:57:55.978314 IP mas91-4-88-189-56-133.fbx.proxad.net.58484 >
www-1.reverse.fotolia.net.http: Flags [S], seq 842845757, win 5840,
options [mss 1460,sackOK,TS val 24879432 ecr 0,nop,wscale 7], length 0
13:57:55.978335 IP www-1.reverse.fotolia.net.http >
mas91-4-88-189-56-133.fbx.proxad.net.58484: Flags [S.], seq
3988398305, ack 842845758, win 65535, options [mss 1460,nop,wscale
3,sackOK,TS val 2223023194 ecr 24879432], length 0
...

This is an HTTP request done using curl. It seems that i can reproduce
it more easily when there is more traffic on the server.

When I run ab (Apache Bench) on this server, I see things like this :

# ab2 -n 1000 -c 20 http://server/images/flags/zoneFlagSprite.png
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking server (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests


Server Software:nginx/0.6.32
Server Hostname:server
Server Port:80

Document Path:  /images/flags/zoneFlagSprite.png
Document Length:1979 bytes

Concurrency Level:  20
Time taken for tests:   8.252 seconds
Complete requests:  1000
Failed requests:0
Write errors:   0
Total transferred:  2261000 bytes
HTML transferred:   1979000 bytes
Requests per second:121.18 [#/sec] (mean)
Time per request:   165.047 [ms] (mean)
Time per request:   8.252 [ms] (mean, across all concurrent requests)
Transfer rate:  267.56 [Kbytes/sec] received

Connection Times (ms)
  min  mean[+/-sd] median   max
Connect:8   90 418.2 263027
Processing:16   74  39.0 62 580
Waiting:9   43  36.0 31 579
Total: 27  164 416.5 913115

Percentage of the requests served within a certain time (ms)
  50% 91
  66%105
  75%120
  80%135
  90%171
  95%210
  98%   3021
  99%   3057
 100%   3115 (longest request)

All requests are pretty fast, but 2% lasts more than 3s. The result is
the same when i request nginx, or when i request an URL handled by
haproxy directly.

I tried some sysctl tuning, with no visible results :

security.bsd.unprivileged_read_msgbuf=0
security.bsd.see_other_uids=0
net.inet.ip.portrange.hilast=5
net.inet.ip.portrange.hifirst=4
net.inet.ip.portrange.last=5
net.inet.ip.portrange.first=4
net.inet.icmp.icmplim=3000
net.inet.icmp.drop_redirect=1
net.inet.tcp.slowstart_flightsize=4
net.inet.tcp.inflight.enable=1

net.inet.tcp.sendspace=65536
net.inet.tcp.recvspace=65536
net.inet.udp.maxdgram=65536
net.inet.udp.recvspace=65536

net.inet.tcp.rfc1323=1

net.inet.tcp.blackhole=2
net.inet.tcp.msl=1
net.inet.udp.blackhole=1

I also have some packet loss on this server, on the internal LAN. The
losses are on the last hop, so not due to network. I dont know if this
can be related. I have the same server on another datacenters, with an
independant network, and see the same problem. I dont understand how
this can be related.

I tried with pf disabled, and this does not solve the issue.

Any ideas on how to debug and solve this ?

Thanks,
Renaud Chaput
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


FreeBSD + carp on VMWare ESX

2010-04-28 Thread Chris Anders
Hi All,

 

I appear to be suffering the same problem as many others on the internet with 
carp and vmware vsphere (ESX4) when running with redundant nics in the default 
failover mode which connect to two separate switches.

 

I have found this old thread - 
http://www.mail-archive.com/freebsd-net@freebsd.org/msg30562.html which talks 
about the exact same issue with a patch submitted to the list from Matthew 
Grooms which can be found here - 
http://www.shrew.net/static/patches/esx-carp.diff

 

After applying the patch, recompiling and setting net.inet.carp.drop_echoed=1 
in /etc/sysctl.conf my problem was solved!

 

Is there any way that this patch could be applied upstream as this vsphere 
setup is rather common where Cross-Stack EtherChannel cannot be deployed which 
is the known work around for this issue.

 

I also tried the patch submitted by Ermal Luçi which didn't work...

 

I did all my testing with freebsd8-current

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: kern/145728: [lagg] Stops working lagg between two servers.

2010-04-28 Thread sl...@kraslan.ru
The following reply was made to PR kern/145728; it has been noted by GNATS.

From: "sl...@kraslan.ru" 
To: bug-follo...@freebsd.org, te...@kgs.ru
Cc:  
Subject: Re: kern/145728: [lagg] Stops working lagg between two servers.
Date: Thu, 29 Apr 2010 12:41:36 +0800

 3 days ago has refreshed one of servers to 8.0-STABLE from *default 
 date=2010.04.05.00.00.00, the situation is a bit now another. Watchdog 
 is not present, but the interface from lagg is in a state
 
 
 lagg1: flags=8943 metric 
 0 mtu 1500
 
 options=9b 
 ether 
 00:1b:21:1b:19:5d   
 media: Ethernet 
 autoselect
 status: 
 active
 laggproto 
 lacp
 laggport: em4 
 flags=18   
 laggport: em1 
 flags=1c
 
 em4: flags=8943 metric 0 
 mtu 1500 
 
 options=9b 
 ether 
 00:1b:21:1b:19:5d   
 media: Ethernet 1000baseT (1000baseT 
 )   
 status: 
 active
 Has tried to make 
 ifconfig lagg1 -laggport em4
 and then
 ifconfig lagg1 laggport em4
 has not helped.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"