Hi,

When all processes run on the same node they communicate via shared memory
which delivers both high bandwidth and low latency. InfiniBand is slower and
more latent than shared memory. Your parallel algorithm might simply be very
latency sensitive and you should profile it with something like mpiP or
Vampir/VampirTrace in order to find why and only then try to further tune
Open MPI.

Hope that helps,
Hristo

From: users [mailto:users-boun...@open-mpi.org] On Behalf Of San B
Sent: Monday, October 07, 2013 8:46 AM
To: OpenMPI ML
Subject: [OMPI users] (no subject)

Hi,
I'm facing a  performance issue with a scientific application(Fortran). The
issue is, it runs faster on single node but runs very slow on multiple
nodes. For example, a 16 core job on single node finishes in 1hr 2mins, but
the same job on two nodes (i.e. 8 cores per node & remaining 8 cores kept
free) takes 3hr 20mins. The code is compiled with ifort-13.1.1,
openmpi-1.4.5 and intel MKL libraries - lapack, blas, scalapack, blacs &
fftw. What could be the problem here with?
Is it possible to do any tuning in OpenMPI? FY More info: The cluster has
Intel Sandybridge processor (E5-2670), infiniband and Hyperthreading is
Enabled. Jobs are submitted thru LSF scheduler.
Does HyperThreading causing any problem here?

Thanks

--
Hristo Iliev, PhD – High Performance Computing Team
RWTH Aachen University, Center for Computing and Communication
Rechen- und Kommunikationszentrum der RWTH Aachen
Seffenter Weg 23, D 52074 Aachen (Germany)
Phone: +49 241 80 24367 – Fax/UMS: +49 241 80 624367

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to