Hi, I have been trying to understand how to correctly launch hybrid MPI/OpenMP (i.e. multi-threaded MPI jobs) with mpirun. I am quite puzzled as to what is the correct command-line options to use. The description on mpirun man page is very confusing and I could not get what I wanted.
A background: The cluster is using SGE, and I am using OpenMPI 1.10.2 compiled with & for gcc 4.9.3. The MPI library was configured with SGE support. The compute nodes have 32 cores, which are basically 2 sockets of Xeon E5-2698 v3 (16-core Haswell). A colleague told me the following: $ export OMP_NUM_THREADS=2 $ mpirun -np 16 -map-by node:PE=2 ./EXECUTABLE I could see the executable using 200% of CPU per process--that's good. There is one catch in the general case. "-map-by node" will assign the MPI processes in a round-robin fashion (so MPI rank 0 gets node 0, mpi rank 1 gets node 1, and so on until all nodes are given 1 process, then it will go back to node 0,1, ...). Instead of the scenario above, I was trying to get the MPI processes side-by-side (more like "fill_up" policy in SGE scheduler), i.e. fill node 0 first, then fill node 1, and so on. How do I do this properly? I tried a few attempts that fail: $ export OMP_NUM_THREADS=2 $ mpirun -np 16 -map-by core:PE=2 ./EXECUTABLE or $ export OMP_NUM_THREADS=2 $ mpirun -np 16 -map-by socket:PE=2 ./EXECUTABLE Both failed with an error mesage: -------------------------------------------------------------------------- A request for multiple cpus-per-proc was given, but a directive was also give to map to an object level that cannot support that directive. Please specify a mapping level that has more than one cpu, or else let us define a default mapping that will allow multiple cpus-per-proc. -------------------------------------------------------------------------- Another attempt was: $ export OMP_NUM_THREADS=2 $ mpirun -np 16 -map-by socket:PE=2 -bind-to socket ./EXECUTABLE Here's the error message: -------------------------------------------------------------------------- A request for multiple cpus-per-proc was given, but a conflicting binding policy was specified: #cpus-per-proc: 2 type of cpus: cores as cpus binding policy given: SOCKET The correct binding policy for the given type of cpu is: correct binding policy: bind-to core This is the binding policy we would apply by default for this situation, so no binding need be specified. Please correct the situation and try again. -------------------------------------------------------------------------- Clearly I am not understanding how this map-by works. Could somebody help me? There was a wiki article partially written: https://github.com/open-mpi/ompi/wiki/ProcessPlacement but unfortunately it is also not clear to me. -- Wirawan Purwanto Computational Scientist, HPC Group Information Technology Services Old Dominion University Norfolk, VA 23529 _______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users