On Monday, March 28, 2011 08:29:58 pm mcclnx mcc wrote:
> To answer your questios:
> 
> 1. network is Intranet not internet.
> 
> 2. servers CPU and I/O are very light.  We did use "sar -u" and sar -b to 
> check.  
> 
> 3. is NOT only one server has this network slow problem.  at least 4 to 5 
> servers on that rack all report slow.  it is NOT possible all servers on that 
> rack are all heavy load.

I have seen issues in the past with certain Broadcom gigabit ethernet NICs and 
the tg3 Linux kernel driver.  Occasionally the NIC would just go into 
'molasses' mode and get really slow.  I haven't seen the problem in quite a 
while, though, so I don't know if that issue has been fixed or not.  Note that 
not seeing the problem doesn't mean the problem didn't occur, of course.  I 
never saw a correlation between multiple servers with that NIC going slow at 
the same time, however.

The next thing I would check is the network switch these servers are attached 
to.  Many switches, especially Cisco Catalyst switches with hardware-assisted 
forwarding at mixed layers, tend to provide multiple physical connections on a 
single ASIC; the networking people can check in the Cisco IOS command line and 
see if the ASIC is throwing errors and such; the particular commands vary by 
ASIC, by switch model, and by operating system.

I had an older Catalyst 2900XL (I did say 'older' after all) where a certain 
set of ports would hang and go slow for minutes at a time; plugged the devices 
into ports served by a different ASIC, and things got better.  I then put a 
home-made permaplug into each of the the bad ports (A permaplug, something I 
make a few dozen of every so often, is an RJ45 with no contacts and where the 
latch release has been cut off, and the back end of the plug filled with red 
silicone; it'll go in, but it takes some work to get back out; I have been 
known to epoxy them into bad ports before to keep people from trying to use 
them....).  It was the ASIC; on that switch each ASIC serves eight ports.

And the last problem I had was related to a new IP security camera that had 
multicast features; note to self: always check to make sure multicast is set ot 
OFF if the entire subnet that camera is on is not carried by multicast-aware 
switches.  I had lots of devices just give up under the sustained 5Mb/s 
multicast load.  Multicast traffic also doesn't necessarily show up in the 
usual places for checking network traffic; you need Wireshark running on a SPAN 
port to catch it most of the time.  Since I wasn't aware that multicast was on 
by default, it took an inordinate amount of time to find the issue; I had 
switches giving up, losing BPDU's, causing spanning-tree loops, etc.  It was 
not a pleasant day.  My console terminal servers for devices, SitePlayer 
Telnets, all stopped responding completely after an hour of that sort of 
traffic.  Like I say, it was not a pleasant day.

I have revisited the multicast filtering features of many of my switches in the 
days since that issue.
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Reply via email to