On May 8, 2009, at 9:01 PM, Allan Menezes wrote:

  Does openmpi version 1.3.2 for Fedora Core 10  x86_64 work with  4
gigabit pci-express ethernet cards per node stably.


It should. I routinely test over 3 or 4 IP interfaces (to include IPoIB and 1GB/10GB NICs).

I tried it on six Asus P5Q-VM motherboards with 4 cards and 8GB ram and
Intel Quad core Cpus each as follows:
eth0 - intel pro 1000 pt pci express gigabit cards.
eth1 - TP LINK's TG-3468  realtek r8111B chipset pci express gigabit
ethernet
eth2 - realtek  8111C  chipset  gigabit pci express ethernet builtv in
on mobo
eth3 - TP LINK's TG-3468  realtek r8111B chipset pci express gigabit
ethernet
with all using mtu's of 3000 and the latest intel and realtek drivers
from their respective websites
and hand configured and compiled kernel 2.6.28.4
I tried hpl-2.0 and gotoblas for checking my cluster and get approx 220
GFlops if i use
only eth0, eth1,eth3 or eth0, eth2, eth3 stably
but i get 203 GFlops with eth0, eth1,eth2,eth3 and the hpl test fail
after about the third test.


Are you saying that after three consecutive tests, the IP devices start failing? If so, it sounds like a kernel / driver problem. If you reboot, do the problems go away?

What happens if you restrict OMPI to just 1 device and run HPL 5-10 times on each device? Do you see the same degradation? E.g., can you localize which device is causing problems?

Any help would be very much appreciated as i would like to use 4 eth
cards per node.
Note: the measured performance of all cards is approximately 922 MBits/s
with jumbo frames of 3000
using Netpipe and NPtcp and with four cards between two nodes i measure
with NPmpi
compiled with openmpi approximately 3400 Mbits/s which is good! Scales
linearly with 4 times 900 Mbits/sec
THank you,
Allan Menezes
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Cisco Systems

Reply via email to