Hi, When all processes run on the same node they communicate via shared memory which delivers both high bandwidth and low latency. InfiniBand is slower and more latent than shared memory. Your parallel algorithm might simply be very latency sensitive and you should profile it with something like mpiP or Vampir/VampirTrace in order to find why and only then try to further tune Open MPI.
Hope that helps, Hristo From: users [mailto:users-boun...@open-mpi.org] On Behalf Of San B Sent: Monday, October 07, 2013 8:46 AM To: OpenMPI ML Subject: [OMPI users] (no subject) Hi, I'm facing a performance issue with a scientific application(Fortran). The issue is, it runs faster on single node but runs very slow on multiple nodes. For example, a 16 core job on single node finishes in 1hr 2mins, but the same job on two nodes (i.e. 8 cores per node & remaining 8 cores kept free) takes 3hr 20mins. The code is compiled with ifort-13.1.1, openmpi-1.4.5 and intel MKL libraries - lapack, blas, scalapack, blacs & fftw. What could be the problem here with? Is it possible to do any tuning in OpenMPI? FY More info: The cluster has Intel Sandybridge processor (E5-2670), infiniband and Hyperthreading is Enabled. Jobs are submitted thru LSF scheduler. Does HyperThreading causing any problem here? Thanks -- Hristo Iliev, PhD High Performance Computing Team RWTH Aachen University, Center for Computing and Communication Rechen- und Kommunikationszentrum der RWTH Aachen Seffenter Weg 23, D 52074 Aachen (Germany) Phone: +49 241 80 24367 Fax/UMS: +49 241 80 624367
smime.p7s
Description: S/MIME cryptographic signature