Michele Marena wrote:
I've launched my app with mpiP both when two processes are
on different node and when two processes are on the same node.
The process 0 is the manager (gathers the results only),
processes 1 and 2 are workers (compute).
This is the case processes 1 and 2 are on different nodes (runs
in 162s).
@--- MPI Time (seconds)
---------------------------------------------------
Task AppTime MPITime MPI%
0 162 162 99.99
1 162 30.2 18.66
2 162 14.7 9.04
* 486 207 42.56
The case when processes 1 and 2 are on the same node (runs in
260s).
@--- MPI Time (seconds)
---------------------------------------------------
Task AppTime MPITime MPI%
0 260 260 99.99
1 260 39.7 15.29
2 260 26.4 10.17
* 779 326 41.82
I think there's a contention problem on the memory bus.
Right. Process 0 spends all its time in MPI, presumably waiting on
workers. The workers spend about the same amount of time on MPI
regardless of whether they're placed together or not. The big
difference is that the workers are much slower in non-MPI tasks when
they're located on the same node. The issue has little to do with
MPI. The workers are hogging local resources and work faster when
placed on different nodes.
However, the message size is 4096 * sizeof(double). Maybe I are
wrong in this point. Is the message size too huge for shared memory?
No. That's not very large at all.
|