One thing to look for is the process distribution. Based on the
application communication pattern, the process distribution can have a
tremendous impact on the execution time. Imagine that the application
split the processes in two equal groups based on the rank and only
communicate in each group. If such a group end-up on the same node,
then it will use sm for communications. On the opposite, if they end-
up spread across the nodes they will use TCP (which obviously has a
bigger latency and a smaller bandwidth) and the overall performance
will be greatly impacted.
By default, Open MPI use the following strategy to distribute
processes: if a node has several processors, then consecutive ranks
will be started on the same node. As an example in your case (2 nodes
with 4 processors each), the ranks 0-3 will be started on the first
host, while the ranks 4-7 on the second one. I don't know what is the
default distribution for MPICH2 ...
Anyway, there is a easy way to check if the process distribution is
the root of your problem. Please execute your application twice, once
providing to mpirun the --bynode argument, and once with the --byslot.
george.
On Oct 8, 2008, at 9:10 AM, Sangamesh B wrote:
Hi All,
I wanted to switch from mpich2/mvapich2 to OpenMPI, as
OpenMPI supports both ethernet and infiniband. Before doing that I
tested an application 'GROMACS' to compare the performance of MPICH2
& OpenMPI. Both have been compiled with GNU compilers.
After this benchmark, I came to know that OpenMPI is slower than
MPICH2.
This benchmark is run on a AMD dual core, dual opteron processor.
Both have compiled with default configurations.
The job is run on 2 nodes - 8 cores.
OpenMPI - 25 m 39 s.
MPICH2 - 15 m 53 s.
Any comments ..?
Thanks,
Sangamesh
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users