We a performing a comparison of HPMPI versus OpenMPI using
Infiniband and
seeing a performance hit in the vicinity of 60% (OpenMPI is slower)
on
controlled benchmarks. Since everything else is similar, we
suspect a
problem with the way we are using or have installed OpenMPI.
Please find attached the following info as requested from
http://www.open-mpi.org/community/help/
<http://www.open-mpi.org/community/help/>
Application: in house CFD solver using both point-point and
collective
operations. Also, for historical reasons it makes extensive use of
BSEND.
We recognize that BSEND's can be inefficient but it is not
practical to
change them at this time. We are trying to understand why the
performance
is so significantly different from HPMPI. The application is mixed
FORTRAN 90 and C built with Portland Group compilers.
HPMPI Version info:
mpirun: HP MPI 02.02.05.00 Linux x86-64
major version 202 minor version 5
OpenMPI Version info:
mpirun (Open MPI) 1.2.4
Report bugs to http://www.open-mpi.org/community/help/
<http://www.open-mpi.org/community/help/>
Configuration info :
The benchmark was a 4-processor job run on a single dual-socket
dual core
HP DL140G3 (Woodcrest 3.0) with 4 GB of memory. Each rank requires
approximately 250MB of memory.
1) Output from ompi_info --all
See attached file ompi_info_output.txt
<< File: ompi_info_output.txt >>
Below is the output requested in the FAQ section:
In order for us to help you, it is most helpful if you can run a
few steps
before sending an e-mail to both perform some basic troubleshooting
and
provide us with enough information about your environment to help
you.
Please include answers to the following questions in your e-mail:
1. Which OpenFabrics version are you running? Please specify where
you
got the software from (e.g., from the OpenFabrics community web
site, from
a vendor, or it was already included in your Linux distribution).
We obtained the software from www.openfabrics.org <www.openfabrics.org
>
Output from ofed_info command:
OFED-1.1
openib-1.1 (REV=9905)
# User space
https://openib.org/svn/gen2/branches/1.1/src/userspace
<https://openib.org/svn/gen2/branches/1.1/src/userspace>
Git:
ref: refs/heads/ofed_1_1
commit a083ec1174cb4b5a5052ef5de9a8175df82e864a
# MPI
mpi_osu-0.9.7-mlx2.2.0.tgz
openmpi-1.1.1-1.src.rpm
mpitests-2.0-0.src.rpm
2. What distro and version of Linux are you running? What is your
kernel version?
Linux xxxxxxxx 2.6.9-64.EL.IT133935.jbtest.1smp #1 SMP Fri Oct 19
11:28:12
EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
3. Which subnet manager are you running? (e.g., OpenSM, a
vendor-specific subnet manager, etc.)
We believe this to be HP or Voltaire but we are not certain how to
determine this.
4. What is the output of the ibv_devinfo command on a known "good"
node
and a known "bad" node? (NOTE: there must be at least one port
listed as
"PORT_ACTIVE" for Open MPI to work. If there is not at least one
PORT_ACTIVE port, something is wrong with your OpenFabrics
environment and
Open MPI will not be able to run).
hca_id: mthca0
fw_ver: 1.2.0
node_guid: 001a:4bff:ff0b:5f9c
sys_image_guid: 001a:4bff:ff0b:5f9f
vendor_id: 0x08f1
vendor_part_id: 25204
hw_ver: 0xA0
board_id: VLT0030010001
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 2048 (4)
active_mtu: 2048 (4)
sm_lid: 1
port_lid: 161
port_lmc: 0x00
5. What is the output of the ifconfig command on a known "good" node
and a known "bad" node? (mainly relevant for IPoIB installations)
Note
that some Linux distributions do not put ifconfig in the default
path for
normal users; look for it in /sbin/ifconfig or /usr/sbin/ifconfig.
eth0 Link encap:Ethernet HWaddr 00:XX:XX:XX:XX:XX
inet addr:X.Y.Z.Q Bcast:X.Y.Z.255 Mask:255.255.255.0
inet6 addr: X::X:X:X:X/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1021733054 errors:0 dropped:10717 overruns:0
frame:0
TX packets:1047320834 errors:0 dropped:0 overruns:0
carrier:0
collisions:0 txqueuelen:1000
RX bytes:1035986839096 (964.8 GiB) TX bytes:1068055599116
(994.7 GiB)
Interrupt:169
ib0 Link encap:UNSPEC HWaddr
80-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00
inet addr:A.B.C.D Bcast:A.B.C.255 Mask:255.255.255.0
inet6 addr: X::X:X:X:X/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1
RX packets:137021 errors:0 dropped:0 overruns:0 frame:0
TX packets:20 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:128
RX bytes:12570947 (11.9 MiB) TX bytes:1504 (1.4 KiB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:1498664 errors:0 dropped:0 overruns:0 frame:0
TX packets:1498664 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1190810468 (1.1 GiB) TX bytes:1190810468 (1.1 GiB)
6. If running under Bourne shells, what is the output of the "ulimit
-l" command?
If running under C shells, what is the output of the "limit | grep
memorylocked" command?
(NOTE: If the value is not "unlimited", this FAQ entry
<http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-
pages> and
this FAQ entry
<http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages-more
>
).
memorylocked 3500000 kbytes
Gather up this information and see this page
<http://www.open-mpi.org/community/help/> about how to submit a help
request to the user's mailing list.