Hi, I'm facing a performance issue with a scientific application(Fortran). The issue is, it runs faster on single node but runs very slow on multiple nodes. For example, a 16 core job on single node finishes in 1hr 2mins, but the same job on two nodes (i.e. 8 cores per node & remaining 8 cores kept free) takes 3hr 20mins. The code is compiled with ifort-13.1.1, openmpi-1.4.5 and intel MKL libraries - lapack, blas, scalapack, blacs & fftw. What could be the problem here with? Is it possible to do any tuning in OpenMPI? FY More info: The cluster has Intel Sandybridge processor (E5-2670), infiniband and Hyperthreading is Enabled. Jobs are submitted thru LSF scheduler.
Does HyperThreading causing any problem here? Thanks