Just to add. My whole cluster is intel em64t or x86_64 and with openmpiv1.2.4 i was getting for two pc express intel gigabit and a pciexpress gigabit ethernet Syskonnect @ 888, 892 and 892 Mbps measured using NPtcp a sum total bandwidth of 1950Mbps on two identical different systems connected by three gigabit switches. But by changing to the beta version of openmpi, version 1.3a1r16973 nightly and recompiling NPtcp( which does not matter since it uses gcc) and NPmpi which uses the newer mpicc I get for the same setting between two seperate identical nodes 2583Mbps which is a 3 fold increase in bandwidth! The MTU in all was default of 1500 for all eth cards and both trials. I am using Fedora Core 8, x86_64 for the operating system.
Allan Menezes

Hi,
I found the problem. It's a bug with openmpi v 1.2.4 i think. As below tests confirm(AND an big THANKS to George!) I compiled openmpi v 1.3a1r16973 and tried the same tests with the same mca-params.conf file and got for three pci express gigabit ethernet cards a total bandwidth of 2583Mbps which is close to 892+892+888=2672Mbps for a linear increase in b/w everything else the same except for a recompilation of NPmpi and Nptcp of netpipe. NPmpi uses mpicc to compile NPmpi whereas NPtcp is compiled with gcc! I am now going to do some benchmarking with hpl of my basement cluster with openmpi v 1.3a1r16973 for increase in performnce and stability. V 1.2.4 is stable and completes all 18 hpl tests without errors! With openmpi v1.24 and NPmpi compiled wit's mpicc and using the shared memory commands below in --(a) I get for ./NPmpi -u 100000000 negative numbers for performance above approx 200Mbytes.
Some sort of  overflow in v1.2.4.
Thank you,
Regards,
Allan Menezes

Hi George, The following test peaks at 8392Mpbs: mpirun --prefix /opt/opnmpi124b --host a1,a1 -mca btl tcp,sm,self -np 2 ./NPmpi on a1 and on a2

mpirun --prefix /opt/opnmpi124b --host a2,a2 -mca btl tcp,sm,self -np 2 ./NPmpi
gives 8565Mbps --(a)
on a1:
mpirun --prefix /opt/opnmpi124b --host a1,a1  -np 2 ./NPmpi

gives 8424Mbps on a2:

mpirun --prefix /opt/opnmpi124b --host a2,a2 -np 2 ./NPmpi

gives 8372Mbps So theres enough memory and processor b/w to give 2.7Gbps for 3 pci express eth cards especially from --(a) between a1 and a2? Thank you for your help. Any assistance would be greatly apprectiated! Regards, Allan Menezes You should run a shared memory test, to see what's the max memory bandwidth you can get. Thanks, george. On Dec 17, 2007, at 7:14 AM, Gleb Natapov wrote:

On Sun, Dec 16, 2007 at 06:49:30PM -0500, Allan Menezes wrote:
Hi,
How many PCI-Express Gigabit ethernet cards does OpenMPI version 1.2.4 support with a corresponding linear increase in bandwith measured with
netpipe NPmpi and openmpi mpirun?
With two PCI express cards I get a B/W of 1.75Gbps for 892Mbps each ans
for three pci express cards ( one built into the motherboard) i get
1.95Gbps. They all are around 890Mbs indiviually measured with netpipe
and NPtcp and NPmpi and openmpi. For two it seems there is a linear
increase in b/w but not for three pci express gigabit eth cards.
I have tune the cards using netpipe and $HOME/.openmpi/mca- params.conf
file for latency and percentage b/w .
Please advise.
What is in your $HOME/.openmpi/mca-params.conf? May be are hitting your
chipset limit here. What is your HW configuration? Can you try to run
NPtcp on each interface simultaneously and see what BW do you get.

--
                        Gleb.



Reply via email to