Dear friends I am going to run a hybrid MPI+OPENMP code. As you can see in the below I played with the number of threads in various cases while running my main code with (Openmpi,Mpich,Mpich2). As you can see in OpenMPi and Mpich it seems that openmp did not work at all as Total time did not change considerably. But in Mpich2 Total computational time increased with increasing the number of threads. It could be because of using virtual threads instead of physical threads(or you said that over-subscribing).
Hybrid code result (MPI + OpenMP) : Your suggestions: mvapich mpirun -np 4 -genv OMP_NUM_THREADS 1 --bind-to hwthread:1 ./pjet.gfortran > output.txt Total time = 7.290E+02 mpirun -np 4 -genv OMP_NUM_THREADS 8 --bind-to hwthread:8 ./pjet.gfortran > output.txt Total time = 4.940E+02 mpirun -np 4 -genv OMP_NUM_THREADS 8 --bind-to hwthread:8 -map-by hwthread:8 ./pjet.gfortran > output.txt Total time = 4.960E+02 mpirun -np 4 -genv OMP_NUM_THREADS 16 --bind-to hwthread:16 ./pjet.gfortran > output.txt Total time = 4.502E+02 mpirun -np 4 -genv OMP_NUM_THREADS 16 -bind-to core:16 -map-by core:16 ./pjet.gfortran > output.txt Total time = 4.628E+02 Pervios commands OpenMPI 1.8.1 mpirun -np 4 -x OMP_NUM_THREADS=1 -bind-to socket -map-by socket ./pjet.gfortran > output.txt Total time = 4.475E+02 mpirun -np 4 -x OMP_NUM_THREADS=8 -bind-to socket -map-by socket ./pjet.gfortran > output.txt Total time = 4.525E+02 mpirun -np 4 -x OMP_NUM_THREADS=16 -bind-to socket -map-by socket ./pjet.gfortran > output.txt Total time = 4.611E+02 mvapich mpirun -np 4 -genv OMP_NUM_THREADS 1 -bind-to socket -map-by socket ./pjet.gfortran > output.txt Total time = 4.441E+02 mpirun -np 4 -genv OMP_NUM_THREADS 4 -bind-to socket -map-by socket ./pjet.gfortran > output.txt Total time = 4.535E+02 mpirun -np 4 -genv OMP_NUM_THREADS 8 -bind-to socket -map-by socket ./pjet.gfortran > output.txt Total time = 4.552E+02 mpirun -np 4 -genv OMP_NUM_THREADS 16 -bind-to socket -map-by socket ./pjet.gfortran > output.txt Total time = 4.591E+02 mvapich2 mpirun -np 4 -genv OMP_NUM_THREADS 1 -bind-to socket -map-by socket ./pjet.gfortran > output.txt Total time = 4.935E+02 mpirun -np 4 -genv OMP_NUM_THREADS 4 -bind-to socket -map-by socket ./pjet.gfortran > output.txt Total time = 5.562E+02 mpirun -np 4 -genv OMP_NUM_THREADS 8 -bind-to socket -map-by socket ./pjet.gfortran > output.txt Total time = 6.392E+02 mpirun -np 4 -genv OMP_NUM_THREADS 16 -bind-to socket -map-by socket ./pjet.gfortran > output.txt Total time = 8.170E+02 Then I used a simple "hybrid.f90" code and its result which I used to check whether the computer can recognize correct value of cores and threads or not. It showed that the correct values in all three in different cases. here is its result: Starting omp_dotprod_hybrid. Using 4 Cores... Core 3 using 16 threads Core 0 using 16 threads Core 2 using 16 threads Core 1 using 16 threads Core 1 thread 0 partial sum = 0.0000000000000000 Core 3 thread 0 partial sum = 0.0000000000000000 Core 1 thread 4 partial sum = 0.0000000000000000 Core 3 thread 7 partial sum = 200.00000000000000 Core 1 thread 8 partial sum = 200.00000000000000 Core 3 thread 9 partial sum = 200.00000000000000 Core 1 thread 11 partial sum = 200.00000000000000 Core 3 thread 3 partial sum = 200.00000000000000 Core 1 thread 2 partial sum = 0.0000000000000000 Core 3 thread 5 partial sum = 0.0000000000000000 Core 1 thread 3 partial sum = 200.00000000000000 Core 3 thread 2 partial sum = 0.0000000000000000 Core 1 thread 13 partial sum = 200.00000000000000 Core 3 thread 12 partial sum = 200.00000000000000 Core 1 thread 1 partial sum = 200.00000000000000 Core 3 thread 1 partial sum = 200.00000000000000 Core 3 thread 8 partial sum = 0.0000000000000000 Core 1 thread 7 partial sum = 0.0000000000000000 Core 3 thread 11 partial sum = 200.00000000000000 Core 1 thread 15 partial sum = 0.0000000000000000 Core 3 thread 15 partial sum = 0.0000000000000000 Core 1 thread 10 partial sum = 200.00000000000000 Core 1 thread 9 partial sum = 200.00000000000000 Core 3 thread 13 partial sum = 0.0000000000000000 Core 1 thread 5 partial sum = 200.00000000000000 Core 3 thread 6 partial sum = 0.0000000000000000 Core 3 thread 4 partial sum = 0.0000000000000000 Core 1 thread 6 partial sum = 0.0000000000000000 Core 3 thread 14 partial sum = 200.00000000000000 Core 1 thread 12 partial sum = 0.0000000000000000 Core 3 thread 10 partial sum = 200.00000000000000 Core 0 thread 0 partial sum = 0.0000000000000000 Core 0 thread 14 partial sum = 200.00000000000000 Core 0 thread 8 partial sum = 200.00000000000000 Core 0 thread 7 partial sum = 0.0000000000000000 Core 0 thread 15 partial sum = 200.00000000000000 Core 0 thread 5 partial sum = 200.00000000000000 Core 0 thread 9 partial sum = 200.00000000000000 Core 0 thread 11 partial sum = 0.0000000000000000 Core 0 thread 10 partial sum = 200.00000000000000 Core 0 thread 6 partial sum = 200.00000000000000 Core 0 thread 3 partial sum = 200.00000000000000 Core 0 thread 4 partial sum = 0.0000000000000000 Core 0 thread 2 partial sum = 0.0000000000000000 Core 0 thread 13 partial sum = 0.0000000000000000 Core 0 thread 12 partial sum = 0.0000000000000000 Core 0 thread 1 partial sum = 0.0000000000000000 Core 0 partial sum = 1600.0000000000000 Core 2 thread 3 partial sum = 0.0000000000000000 Core 2 thread 15 partial sum = 0.0000000000000000 Core 2 thread 0 partial sum = 0.0000000000000000 Core 2 thread 2 partial sum = 200.00000000000000 Core 2 thread 4 partial sum = 0.0000000000000000 Core 2 thread 5 partial sum = 0.0000000000000000 Core 2 thread 9 partial sum = 200.00000000000000 Core 2 thread 7 partial sum = 0.0000000000000000 Core 2 thread 14 partial sum = 200.00000000000000 Core 2 thread 8 partial sum = 200.00000000000000 Core 2 thread 12 partial sum = 200.00000000000000 Core 2 thread 10 partial sum = 200.00000000000000 Core 2 thread 6 partial sum = 200.00000000000000 Core 2 thread 1 partial sum = 0.0000000000000000 Core 2 thread 13 partial sum = 0.0000000000000000 Core 2 thread 11 partial sum = 200.00000000000000 Core 2 partial sum = 1600.0000000000000 Core 3 partial sum = 1600.0000000000000 Core 1 thread 14 partial sum = 0.0000000000000000 Core 1 partial sum = 1600.0000000000000 Done. Hybrid version: global sum = 6400.0000000000000 Please tell me If I should check something. I still getting nowhere. Best regards Pasha Pashaei
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users