On Saturday 06 May 2006 13:05, James He wrote: > Hi, all > > My boss wants me to test a bunch of gigabit ethernet cards of a > cluster. He kept getting time-out problems when running some MPI jobs > on the cluster. The problem only happens when the network traffic is > very high (~100MB/s). Therefore, he wants me to determine which > ethernet card(s) is/are having the problem when the traffic is high. > (We don't get any useful information from syslog or the log files of > MPI jobs.) > > I've seen people testing the ethernet card using nc (netcat) -- just > transfer some files using nc and then compare them. Is there any > better way to do this, or any suggestions about some existing > softwares which can automate this, for I have to test a bunch of them? > Thanks a lot.
use SNMP to monitor dropped packets, bandwidth utilization etc.. for each one of the machines. (if you haven't used snmp before, you can setup the monitor on one machine to monitor all your machines provided you setup the snmp daemon on each machine you want to monitor) if you have a managed switch, setup snmp daemon on the switch as well and monitor the switch as well. this is not going to help you test it - since i can't tell what may be the problem yet - but might give you enough clues as to a pattern of usage that causes this which you can use to start developing a testing strategy. 'mrtg' is good for tracking bandwidth usage but not much else. i think there is a package called 'cacti' that aims to be a more complete snmp monitoring software. the daemon you need to use used to be called 'net-snmp'. i think it's called 'snmpd' these days. i typically write my own client - depending on what parts i want to monitor and graph - using python and rrd. you might want to pay special attention to the dropped packet related OIDs in either the udp or tcp sections. someone is dropping packets for it to timeout. if you find out who (sender/reciever/switch) you might also find out why. also, tcpdump any ICMP packets. you just might get lucky ;) hope that helps. > > -- > Best regards, > > James He -- anoop aryal [EMAIL PROTECTED] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]