Dear friends

I am going to run a hybrid MPI+OPENMP code.
As you can see in the below I played with the number of threads in various 
cases while running my main code with (Openmpi,Mpich,Mpich2).
As you can see in OpenMPi and Mpich it seems that openmp did not work at all as 
Total time did not change considerably. But in Mpich2 Total computational time 
increased with increasing the number of threads. It could be because of using 
virtual threads instead of physical threads(or you said that over-subscribing).


Hybrid code result (MPI + OpenMP) :
Your suggestions:
mvapich
mpirun -np 4 -genv OMP_NUM_THREADS 1 --bind-to hwthread:1 ./pjet.gfortran > 
output.txt
Total time = 7.290E+02
mpirun -np 4 -genv OMP_NUM_THREADS 8 --bind-to hwthread:8 ./pjet.gfortran > 
output.txt
Total time =  4.940E+02
mpirun -np 4 -genv OMP_NUM_THREADS 8 --bind-to hwthread:8  -map-by hwthread:8 
./pjet.gfortran > output.txt
Total time =  4.960E+02
mpirun -np 4 -genv OMP_NUM_THREADS 16 --bind-to hwthread:16 ./pjet.gfortran > 
output.txt
Total time = 4.502E+02
mpirun -np 4 -genv OMP_NUM_THREADS 16 -bind-to core:16 -map-by core:16 
./pjet.gfortran > output.txt
Total time =  4.628E+02

Pervios commands
OpenMPI 1.8.1
mpirun -np 4 -x OMP_NUM_THREADS=1 -bind-to socket -map-by socket 
./pjet.gfortran > output.txt
Total time =  4.475E+02
mpirun -np 4 -x OMP_NUM_THREADS=8 -bind-to socket -map-by socket 
./pjet.gfortran > output.txt
Total time =  4.525E+02
mpirun -np 4 -x OMP_NUM_THREADS=16 -bind-to socket -map-by socket 
./pjet.gfortran > output.txt
Total time =  4.611E+02

mvapich
mpirun -np 4 -genv OMP_NUM_THREADS 1 -bind-to socket -map-by socket 
./pjet.gfortran > output.txt
Total time = 4.441E+02
mpirun -np 4 -genv OMP_NUM_THREADS 4 -bind-to socket -map-by socket 
./pjet.gfortran > output.txt
Total time = 4.535E+02
mpirun -np 4 -genv OMP_NUM_THREADS 8 -bind-to socket -map-by socket 
./pjet.gfortran > output.txt
Total time =  4.552E+02
mpirun -np 4 -genv OMP_NUM_THREADS 16 -bind-to socket -map-by socket 
./pjet.gfortran > output.txt
Total time =  4.591E+02

mvapich2
mpirun -np 4 -genv OMP_NUM_THREADS 1 -bind-to socket -map-by socket 
./pjet.gfortran > output.txt
Total time = 4.935E+02
mpirun -np 4 -genv OMP_NUM_THREADS 4 -bind-to socket -map-by socket 
./pjet.gfortran > output.txt
Total time = 5.562E+02
mpirun -np 4 -genv OMP_NUM_THREADS 8 -bind-to socket -map-by socket 
./pjet.gfortran > output.txt
Total time = 6.392E+02
mpirun -np 4 -genv OMP_NUM_THREADS 16 -bind-to socket -map-by socket 
./pjet.gfortran > output.txt
Total time = 8.170E+02

Then I used a simple "hybrid.f90" code and its result which I used to check 
whether the computer can recognize correct value of cores and threads or not. 
It showed that the correct values in all three in different cases.
here is its result:

 Starting omp_dotprod_hybrid. Using           4  Cores...
 Core           3  using          16  threads
 Core           0  using          16  threads
 Core           2  using          16  threads
 Core           1  using          16  threads
 Core  1 thread  0  partial sum =   0.0000000000000000
 Core  3 thread  0  partial sum =   0.0000000000000000
 Core  1 thread  4  partial sum =   0.0000000000000000
 Core  3 thread  7  partial sum =   200.00000000000000
 Core  1 thread  8  partial sum =   200.00000000000000
 Core  3 thread  9  partial sum =   200.00000000000000
 Core  1 thread  11 partial sum =   200.00000000000000
 Core  3 thread  3  partial sum =   200.00000000000000
 Core  1 thread  2  partial sum =   0.0000000000000000
 Core  3 thread  5  partial sum =   0.0000000000000000
 Core  1 thread  3  partial sum =   200.00000000000000
 Core  3 thread  2  partial sum =   0.0000000000000000
 Core  1 thread  13 partial sum =   200.00000000000000
 Core  3 thread  12 partial sum =   200.00000000000000
 Core  1 thread  1  partial sum =   200.00000000000000
 Core  3 thread  1  partial sum =   200.00000000000000
 Core  3 thread  8  partial sum =   0.0000000000000000
 Core  1 thread  7  partial sum =   0.0000000000000000
 Core  3 thread  11 partial sum =   200.00000000000000
 Core  1 thread  15 partial sum =   0.0000000000000000
 Core  3 thread  15 partial sum =   0.0000000000000000
 Core  1 thread  10 partial sum =   200.00000000000000
 Core  1 thread  9  partial sum =   200.00000000000000
 Core  3 thread  13 partial sum =   0.0000000000000000
 Core  1 thread  5  partial sum =   200.00000000000000
 Core  3 thread  6  partial sum =   0.0000000000000000
 Core  3 thread  4  partial sum =   0.0000000000000000
 Core  1 thread  6  partial sum =   0.0000000000000000
 Core  3 thread  14 partial sum =   200.00000000000000
 Core  1 thread  12 partial sum =   0.0000000000000000
 Core  3 thread  10 partial sum =   200.00000000000000
 Core  0 thread  0  partial sum =   0.0000000000000000
 Core  0 thread  14 partial sum =   200.00000000000000
 Core  0 thread  8  partial sum =   200.00000000000000
 Core  0 thread  7  partial sum =   0.0000000000000000
 Core  0 thread  15 partial sum =   200.00000000000000
 Core  0 thread  5  partial sum =   200.00000000000000
 Core  0 thread  9  partial sum =   200.00000000000000
 Core  0 thread  11 partial sum =   0.0000000000000000
 Core  0 thread  10 partial sum =   200.00000000000000
 Core  0 thread  6  partial sum =   200.00000000000000
 Core  0 thread  3  partial sum =   200.00000000000000
 Core  0 thread  4  partial sum =   0.0000000000000000
 Core  0 thread  2  partial sum =   0.0000000000000000
 Core  0 thread  13 partial sum =   0.0000000000000000
 Core  0 thread  12 partial sum =   0.0000000000000000
 Core  0 thread  1  partial sum =   0.0000000000000000
 Core  0 partial sum =   1600.0000000000000
 Core  2 thread  3  partial sum =   0.0000000000000000
 Core  2 thread  15 partial sum =   0.0000000000000000
 Core  2 thread  0  partial sum =   0.0000000000000000
 Core  2 thread  2  partial sum =   200.00000000000000
 Core  2 thread  4  partial sum =   0.0000000000000000
 Core  2 thread  5  partial sum =   0.0000000000000000
 Core  2 thread  9  partial sum =   200.00000000000000
 Core  2 thread  7  partial sum =   0.0000000000000000
 Core  2 thread  14 partial sum =   200.00000000000000
 Core  2 thread  8  partial sum =   200.00000000000000
 Core  2 thread  12 partial sum =   200.00000000000000
 Core  2 thread  10 partial sum =   200.00000000000000
 Core  2 thread  6  partial sum =   200.00000000000000
 Core  2 thread  1  partial sum =   0.0000000000000000
 Core  2 thread  13 partial sum =   0.0000000000000000
 Core  2 thread  11 partial sum =   200.00000000000000
 Core  2 partial sum =   1600.0000000000000
 Core  3 partial sum =   1600.0000000000000
 Core  1 thread  14  partial sum =   0.0000000000000000
 Core  1 partial sum =   1600.0000000000000
 Done. Hybrid version: global sum  = 6400.0000000000000



Please tell me If I should check something. I still getting nowhere.
Best regards

Pasha Pashaei

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to