I don't think it's a bug in OMPI, but more likely reflects improvements in the 
default collective algorithms. If you want to further improve performance, you 
should bind your processes to a core (if your application isn't threaded) or to 
a socket (if threaded).

As someone previously noted, apps will always run slower on multiple nodes vs 
everything on a single node due to the shared memory vs IB differences. Nothing 
you can do about that one.


On Oct 28, 2013, at 10:36 PM, San B <forum....@gmail.com> wrote:

>       As discussed earlier, the executable which was compiled with 
> OpenMPI-1.4.5 gave very low performance of 12338.809 seconds when job 
> executed on two nodes(8 cores per node). The same job run on single node(all 
> 16cores) got executed in just 3692.403 seconds. Now I compiled the 
> application with OpenMPI-1.6.5 and got executed in 5527.320 seconds on two 
> nodes. 
> 
>      Is this a performance gain with OMPI-1.6.5 over OMPI-1.4.5 or an issue 
> with OPENMPI itself?
> 
> 
> On Tue, Oct 15, 2013 at 5:32 PM, San B <forum....@gmail.com> wrote:
> Hi,
> 
>      As per your instruction, I did the profiling of the application with 
> mpiP. Following is the difference between the two runs:
> 
> Run 1: 16 mpi processes on single node
> 
> @--- MPI Time (seconds) ---------------------------------------------------
> ---------------------------------------------------------------------------
> Task    AppTime    MPITime     MPI%
>    0   3.61e+03        661    18.32
>    1   3.61e+03        627    17.37
>    2   3.61e+03        700    19.39
>    3   3.61e+03        665    18.41
>    4   3.61e+03        702    19.45
>    5   3.61e+03        703    19.48
>    6   3.61e+03        740    20.50
>    7   3.61e+03        763    21.14
> ...
> ...
> 
> Run 2: 16 mpi processes on two nodes - 8 mpi processes per node
> 
> @--- MPI Time (seconds) ---------------------------------------------------
> ---------------------------------------------------------------------------
> Task    AppTime    MPITime     MPI%
>    0   1.27e+04   1.06e+04    84.14
>    1   1.27e+04   1.07e+04    84.34
>    2   1.27e+04   1.07e+04    84.20
>    3   1.27e+04   1.07e+04    84.20
>    4   1.27e+04   1.07e+04    84.22
>    5   1.27e+04   1.07e+04    84.25
>    6   1.27e+04   1.06e+04    84.02
>    7   1.27e+04   1.07e+04    84.35
>    8   1.27e+04   1.07e+04    84.29
> 
> 
>           The time spent for MPI functions in run 1 is less than 20%, where 
> as it is more than 80% in the run 2. For more details, I've attached both 
> output files. Please go thru these files and suggest what optimization we can 
> do with OpenMPI or Intel MKL.
> 
> Thanks
> 
> 
> On Mon, Oct 7, 2013 at 12:15 PM, San B <forum....@gmail.com> wrote:
> Hi,
> 
> I'm facing a  performance issue with a scientific application(Fortran). The 
> issue is, it runs faster on single node but runs very slow on multiple nodes. 
> For example, a 16 core job on single node finishes in 1hr 2mins, but the same 
> job on two nodes (i.e. 8 cores per node & remaining 8 cores kept free) takes 
> 3hr 20mins. The code is compiled with ifort-13.1.1, openmpi-1.4.5 and intel 
> MKL libraries - lapack, blas, scalapack, blacs & fftw. What could be the 
> problem here with?
> 
> Is it possible to do any tuning in OpenMPI? FY More info: The cluster has 
> Intel Sandybridge processor (E5-2670), infiniband and Hyperthreading is 
> Enabled. Jobs are submitted thru LSF scheduler.
> 
> Does HyperThreading causing any problem here?
> 
> 
> Thanks
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to