Re: [lopsa-tech] linux and networking

Andrew Hume Wed, 17 Oct 2012 21:40:12 -0700

i apologise for not being very clear.
(its been a long day.)

we have an 8-node cluster.
each node is a modest dell 910 or somesuch; 128GB mem and 32 cores.
each node also has 8 1Gbps NICs; most are rarely used but 2 are used a lot.
technically, these occupy 4 interfaces as they are two bonded pairs 
(active-passive).
each bonded pair is on one of two VPNs.
one VPN goes out to the general intranet; the other is a VPN local to the 
cluster.

the local VPN is pounded on hard. i estimate 800 Mbps during peak hours.
i have noticed no performance issues with the traffic on this VPN; its all
zeromq message traffic but i carefully monitor it for latency. the messages
i send are typically 100+ bytes, and zeromq normally bundles several together
for transmission.

on the external VPN, we have 80 inbound feeds (10 per nodes) typically around 
23 Mbps each.
what we notice is that these socket connections occasionally go dry, that is, 
data stops coming.
using tcpdump and sniffers, we determine that was because the server starts 
sending
window size 0  messages back to the source systems. in fact, a sniffer revealed
the window size starting around 26K and then quite quickly dropping all the
way down to 10, 8, 1 and then zero.

at this point, the processes receiving data over the socket exits, and gets 
restarted
a couple of minutes later. by then, the condition clears, the window
size goes back up to 26K and all is well for 6-10 minutes and then some other
group of sockets fails. strangely, not every socket on a node fails; sometimes
most, sometimes just a few, rarely all.

i take a window size of zero as definitive of tcp/ip stack congestion.

but i freely admit to not knowing (nor wanting to know) about networking.
which is why i ask for advice on this group.

        thanks
                andrew

On Oct 17, 2012, at 8:16 PM, Nathan Hruby wrote:

> On Wed, Oct 17, 2012 at 5:22 PM, Andrew Hume <and...@research.att.com> wrote:
>> screwed by linux again. sigh.
>> 
>> so apparently i am overloading my pathetic linux system with too much tcp/ip
>> traffic.
>> is there any way to detect this while (or before or after) it is happening?
>> of course no error messages are emitted.
>> but might there be some other thing buried away somewhere, like /proc?
> 
> If there are no messages emitted, how do you know it's overloaded with
> [network] traffic?
> 
> -n
> -- 
> -------------------------------------------
> nathan hruby <nhr...@gmail.com>
> metaphysically wrinkle-free
> -------------------------------------------

-----------------------
Andrew Hume
623-551-2845 (VO and best)
973-236-2014 (NJ)
and...@research.att.com

_______________________________________________
Tech mailing list
Tech@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Re: [lopsa-tech] linux and networking

Reply via email to