> -----Original Message-----
> From: James Relph [mailto:ja...@themacplace.co.uk]
> Sent: Sunday, August 11, 2013 10:59 AM
> 
> although would we lose pings
> with that (had pings running to test for a network issue and never had packet
> loss)?  It's a bit of a puzzler!

Hold on now ...


> From: James Relph [mailto:ja...@themacplace.co.uk]
> Sent: Sunday, August 11, 2013 12:59 PM
> 
>dedicated physical 10Gb network for iSCSI/NFS traffic, with 4x 10Gb
> links (in an LACP bond) per device.  Should be pretty solid really.

I think we found your smoking gun.  You're getting ping loss on a local 
network, and you're using 4x 10Gb LACP bonded network.  And for some reason you 
say "should be pretty solid."  What you've described is basically the 
definition of unstable, if you ask me.

Before anything else, know this:  In LACP, only one network interface can be 
used per data stream.  So if you have a server with LACP, then each client can 
go up to 10Gb, but if you have 4 clients simultaneously, they can each go up to 
10Gb.  You cannot push 40Gb to a single client.

Also, your hard disks are all 1Gbit.  So every 10 disks you have in the server 
add up to a single 10Gb network interface.  It is absolutely pointless to use 
LACP in this situation unless you have a huge honking server.  (Meaning >40 
disks).

In my experience, LACP is usually unstable, unless you buy a really expensive 
switch and QA test the hell out of your configuration before using it.  I hear 
lots of people say their LACP is stable and reliable where they are - but it's 
only because they have never tested it and haven't noticed the problems.  The 
problems are specifically as you've described.  Occasional packet loss, which 
people tend to think is ok, but in reality, the only acceptable level of packet 
loss is 0%.

Here's what you need to do:

Figure out how to observe & clear the error counters on all the network 
interfaces.  Login to the switch to measure them there ...  Login to the server 
to measure them there ...  Login to each client to measure them there.  Reset 
them all to 0.  And then start hammering the shit out of the whole system.  Get 
all the clients to drive the network hard, both transmit and receive.  If you 
see error counters increasing, you have a problem.

Based on what you've said so far, I guarantee you're going to see error 
counters increasing.  Unless you ignore my advice and don't do these tests ... 
because these tests are difficult to do, and like I said, several times I've 
seen sysadmins swear their system was reliable, only to be proven wrong when 
*actually* put to the test.

I also encounter a lot:  Mailing lists exactly like this one, I say something 
just like above, and other people come back and argue about it, *insisting* 
that it's ok to have occasional packet loss, on a LAN or WAN.  I swear to you, 
as an IT consultant, this provides a lot of my sustenance - I get called into 
places with either storage problems or internet problems, and if there is 
packet loss >0% that is ultimately the root cause of their problem.  Never seen 
an exception.

Because this argument invariably leads to argument, I won't respond to any of 
it.  I've simply grown tired of arguing about it with other people elsewhere.  
It's definitely a trending pattern.  The way I see it, I provide you free 
advice on a mailing list, if you don't take it, so be it.  I continue to get 
paid.

_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss

Reply via email to