Hello,

I noticed when performing a profiling of an application that the MPI_init() function takes a considerable amount of time. There is a big difference when running 32 processes over 32 machines and 32 processes over 8 machines (Each machine has 8 cores).
These are the results of the profiling:


Results for 32 cores (8 machines)

           Group.1 percent        usec
38            SSOR 79.1125 2557445.625
7       EXCHANGE_1 31.8125      33.250
24      MPI_Recv() 26.0750      33.375
2             BLTS 24.7500     103.125
3             BUTS 22.2375      92.500
12       INIT_COMM 19.8500 1311003.375
*22      MPI_Init() 19.8500 1310925.750*
33             RHS 18.4000    4690.500
8       EXCHANGE_3  9.2750    1179.000
26      MPI_Wait()  7.2250     565.125
13           JACLD  6.4875      27.000
25      MPI_Send()  6.3500       8.000
14            JACU  6.2500      26.000
37           SETIV  0.6625   20908.500
6            EXACT  0.2188       0.000
4             ERHS  0.2000   11499.000

Results for 32 machines

           Group.1  percent         usec
38            SSOR 97.28889 2573471.0000
7       EXCHANGE_1 39.25556      33.3333
2             BLTS 29.11111      98.7778
3             BUTS 27.96667      95.0000
24      MPI_Recv() 27.48889      28.7778
33             RHS 23.98889    5018.6667
25      MPI_Send() 13.51111      14.0000
8       EXCHANGE_3 13.06667    1361.1111
26      MPI_Wait()  9.37778     599.0000
13           JACLD  7.72222      26.0000
14            JACU  7.37778      25.0000
12       INIT_COMM  1.46667   76713.6667
*22      MPI_Init()  1.45556   76253.4444*
37           SETIV  0.80000   20914.0000
6            EXACT  0.25000       0.0000
4             ERHS  0.21111   10458.3333

The function MPI_init() in the first case (4 processes per machine) was 17 times slower than the second case (1 process per machine). Is this behaviour normal?
The command I used for running the application was:

First case:

mpirun --machinefile machine_file -npernode 4 --mca btl self,sm,tcp lu.A.32

Second case:

mpirun  --machinefile machine_file  --mca btl self,sm,tcp  lu.A.32

I used the version of mpi:

mpirun --V
mpirun (Open MPI) 1.4.5

and the system I used is the following:

Linux kameleon-debian 3.2.0-4-amd64 #1 SMP Debian 3.2.65-1+deb7u2 x86_64 GNU/Linux

I will appreciate any feedback, thank you.


Reply via email to