Sorry for the delay. Runing mpirun whith wrong OMPI_MCA_plm_rsh_agent doesn't give any explicit message in OpenMPI-1.10.0.

How I can show the problem:

I request 2 nodes, 1cpu on each node, 4 cores on each cpu (so 8 cores availables with cpusets). Node file is:

[begou@frog7 MPI_TESTS]$ cat $OAR_NODEFILE
frog7
frog7
frog7
frog7
frog8
frog8
frog8
frog8

I launch the application (I've added a grep here to limit the output on stdout and juste check processes location):

[begou@frog7 MPI_TESTS]$ mpirun -np 8 --hostfile $OAR_NODEFILE --bind-to core ./location.exe |grep 'thread is now running on PU' (process 2) thread is now running on PU logical index 2 (OS/physical index 6) on system frog7 (process 3) thread is now running on PU logical index 3 (OS/physical index 7) on system frog7 (process 0) thread is now running on PU logical index 0 (OS/physical index 0) on system frog7 (process 1) thread is now running on PU logical index 1 (OS/physical index 5) on system frog7 (process 6) thread is now running on PU logical index 2 (OS/physical index 2) on system frog8 (process 7) thread is now running on PU logical index 3 (OS/physical index 3) on system frog8 (process 4) thread is now running on PU logical index 0 (OS/physical index 0) on system frog8 (process 5) thread is now running on PU logical index 1 (OS/physical index 1) on system frog8

So one process on each core, no oversubscribing allowed with the patch applied in OpenMPI.

Now I set OMPI_MCA_plm_rsh_agent so something wrong and launch agin the job (without the final grep to have all informations):

[begou@frog7 MPI_TESTS]$ export OMPI_MCA_plm_rsh_agent=do-not-exist
[begou@frog7 MPI_TESTS]$ mpirun -np 8 --hostfile $OAR_NODEFILE --bind-to core ./location.exe
--------------------------------------------------------------------------
A request was made to bind to that would result in binding more
processes than cpus on a resource:

   Bind to:     CORE
   Node:        frog7
   #processes:  2
   #cpus:       1

You can override this protection by adding the "overload-allowed"
option to your binding directive.
--------------------------------------------------------------------------

The message only show only that OpenMPI try to allocate all processes on the local node.

Of course:
[begou@frog7 MPI_TESTS]$ which do-not-exist
/usr/bin/which: no do-not-exist in (/home/PROJECTS/...............


Patrick

--
===================================================================
|  Equipe M.O.S.T.         |                                      |
|  Patrick BEGOU           | mailto:patrick.be...@grenoble-inp.fr |
|  LEGI                    |                                      |
|  BP 53 X                 | Tel 04 76 82 51 35                   |
|  38041 GRENOBLE CEDEX    | Fax 04 76 82 52 71                   |
===================================================================

Reply via email to