Just to add. My whole cluster is intel em64t or x86_64 and with
openmpiv1.2.4 i was getting for two pc express intel gigabit and a
pciexpress gigabit ethernet Syskonnect @ 888, 892 and 892 Mbps measured
using NPtcp a sum total bandwidth of 1950Mbps on two identical different
systems connected by three gigabit switches. But by changing to the beta
version of openmpi, version 1.3a1r16973 nightly and recompiling NPtcp(
which does not matter since it uses gcc) and NPmpi which uses the newer
mpicc I get for the same setting between two seperate identical nodes
2583Mbps which is a 3 fold increase in bandwidth! The MTU in all was
default of 1500 for all eth cards and both trials. I am using Fedora
Core 8, x86_64 for the operating system.
Allan Menezes
Hi,
I found the problem. It's a bug with openmpi v 1.2.4 i think. As below
tests confirm(AND an big THANKS to George!) I compiled openmpi v
1.3a1r16973 and tried the same tests with the same mca-params.conf file
and got for three pci express gigabit ethernet cards a total bandwidth
of 2583Mbps which is close to 892+892+888=2672Mbps for a linear
increase in b/w everything else the same except for a recompilation of
NPmpi and Nptcp of netpipe. NPmpi uses mpicc to compile NPmpi whereas
NPtcp is compiled with gcc!
I am now going to do some benchmarking with hpl of my basement cluster
with openmpi v 1.3a1r16973 for increase in performnce and stability. V
1.2.4 is stable and completes all 18 hpl tests without errors!
With openmpi v1.24 and NPmpi compiled wit's mpicc and using the shared
memory commands below in --(a) I get for ./NPmpi -u 100000000 negative
numbers for performance above approx 200Mbytes.
Some sort of overflow in v1.2.4.
Thank you,
Regards,
Allan Menezes
Hi George, The following test peaks at 8392Mpbs: mpirun --prefix
/opt/opnmpi124b --host a1,a1 -mca btl tcp,sm,self -np 2 ./NPmpi on a1
and on a2
mpirun --prefix /opt/opnmpi124b --host a2,a2 -mca btl tcp,sm,self -np 2 ./NPmpi
gives 8565Mbps
--(a)
on a1:
mpirun --prefix /opt/opnmpi124b --host a1,a1 -np 2 ./NPmpi
gives 8424Mbps on a2:
mpirun --prefix /opt/opnmpi124b --host a2,a2 -np 2 ./NPmpi
gives 8372Mbps So theres enough memory and processor b/w to give 2.7Gbps
for 3 pci express eth cards especially from --(a) between a1 and a2?
Thank you for your help. Any assistance would be greatly apprectiated!
Regards, Allan Menezes You should run a shared memory test, to see
what's the max memory bandwidth you can get. Thanks, george. On Dec 17,
2007, at 7:14 AM, Gleb Natapov wrote:
On Sun, Dec 16, 2007 at 06:49:30PM -0500, Allan Menezes wrote:
Hi,
How many PCI-Express Gigabit ethernet cards does OpenMPI version
1.2.4
support with a corresponding linear increase in bandwith measured
with
netpipe NPmpi and openmpi mpirun?
With two PCI express cards I get a B/W of 1.75Gbps for 892Mbps each
ans
for three pci express cards ( one built into the motherboard) i get
1.95Gbps. They all are around 890Mbs indiviually measured with
netpipe
and NPtcp and NPmpi and openmpi. For two it seems there is a linear
increase in b/w but not for three pci express gigabit eth cards.
I have tune the cards using netpipe and $HOME/.openmpi/mca-
params.conf
file for latency and percentage b/w .
Please advise.
What is in your $HOME/.openmpi/mca-params.conf? May be are hitting
your
chipset limit here. What is your HW configuration? Can you try to run
NPtcp on each interface simultaneously and see what BW do you get.
--
Gleb.