Hi,

     As per your instruction, I did the profiling of the application with
mpiP. Following is the difference between the two runs:

Run 1: 16 mpi processes on single node

@--- MPI Time (seconds) ---------------------------------------------------
---------------------------------------------------------------------------
Task    AppTime    MPITime     MPI%
   0   3.61e+03        661    18.32
   1   3.61e+03        627    17.37
   2   3.61e+03        700    19.39
   3   3.61e+03        665    18.41
   4   3.61e+03        702    19.45
   5   3.61e+03        703    19.48
   6   3.61e+03        740    20.50
   7   3.61e+03        763    21.14
...
...

Run 2: 16 mpi processes on two nodes - 8 mpi processes per node

@--- MPI Time (seconds) ---------------------------------------------------
---------------------------------------------------------------------------
Task    AppTime    MPITime     MPI%
   0   1.27e+04   1.06e+04    84.14
   1   1.27e+04   1.07e+04    84.34
   2   1.27e+04   1.07e+04    84.20
   3   1.27e+04   1.07e+04    84.20
   4   1.27e+04   1.07e+04    84.22
   5   1.27e+04   1.07e+04    84.25
   6   1.27e+04   1.06e+04    84.02
   7   1.27e+04   1.07e+04    84.35
   8   1.27e+04   1.07e+04    84.29


          The time spent for MPI functions in run 1 is less than 20%, where
as it is more than 80% in the run 2. For more details, I've attached both
output files. Please go thru these files and suggest what optimization we
can do with OpenMPI or Intel MKL.

Thanks


On Mon, Oct 7, 2013 at 12:15 PM, San B <forum....@gmail.com> wrote:

> Hi,
>
> I'm facing a  performance issue with a scientific application(Fortran).
> The issue is, it runs faster on single node but runs very slow on multiple
> nodes. For example, a 16 core job on single node finishes in 1hr 2mins, but
> the same job on two nodes (i.e. 8 cores per node & remaining 8 cores kept
> free) takes 3hr 20mins. The code is compiled with ifort-13.1.1,
> openmpi-1.4.5 and intel MKL libraries - lapack, blas, scalapack, blacs &
> fftw. What could be the problem here with?
> Is it possible to do any tuning in OpenMPI? FY More info: The cluster has
> Intel Sandybridge processor (E5-2670), infiniband and Hyperthreading is
> Enabled. Jobs are submitted thru LSF scheduler.
>
> Does HyperThreading causing any problem here?
>
>
> Thanks
>

Attachment: mpi-App-profile-1node-16perNode.mpiP
Description: Binary data

Attachment: mpi-App-profile-2Nodes-8perNode.mpiP
Description: Binary data

Reply via email to