Michele Marena wrote:
I've launched my app with mpiP both when two processes are on different node and when two processes are on the same node.

The process 0 is the manager (gathers the results only), processes 1 and 2 are  workers (compute).

This is the case processes 1 and 2 are on different nodes (runs in 162s).
@--- MPI Time (seconds) ---------------------------------------------------
Task    AppTime    MPITime     MPI%
   0        162        162    99.99
   1        162       30.2    18.66
   2        162       14.7     9.04
   *        486        207    42.56

The case when processes 1 and 2 are on the same node (runs in 260s).
@--- MPI Time (seconds) ---------------------------------------------------
Task    AppTime    MPITime     MPI%
   0        260        260    99.99
   1        260       39.7    15.29
   2        260       26.4    10.17
   *        779        326    41.82

I think there's a contention problem on the memory bus.
Right.  Process 0 spends all its time in MPI, presumably waiting on workers.  The workers spend about the same amount of time on MPI regardless of whether they're placed together or not.  The big difference is that the workers are much slower in non-MPI tasks when they're located on the same node.  The issue has little to do with MPI.  The workers are hogging local resources and work faster when placed on different nodes.
However, the message size is 4096 * sizeof(double). Maybe I are wrong in this point. Is the message size too huge for shared memory?
No.  That's not very large at all.
>>> On Mar 27, 2011, at 10:33 AM, Ralph Castain wrote:
>>>
>>> >http://www.open-mpi.org/faq/?category=perftools

Reply via email to