Thanks Patrick, could you please try again with the --hetero-nodes mpirun option ? (I am afk, and not 100% sure about the syntax)
could you also submit a job with 2 nodes and 4 cores on each node, that does cat /proc/self/status oarshmost <remote host> cat /proc/self/status btw, is there any reason why do you use a machine file (frog.txt) instead of using $OAR_NODEFILE directly ? /* not to mention I am surprised a French supercomputer is called "frog" ;-) */ Cheers, Gilles On Friday, September 18, 2015, Patrick Begou < patrick.be...@legi.grenoble-inp.fr> wrote: > Gilles Gouaillardet wrote: > > Patrick, > > by the way, this will work when running on a single node. > > i do not know what will happen when you run on multiple node ... > since there is no OAR integration in openmpi, i guess you are using ssh to > start orted on the remote nodes > (unless you instructed ompi to use an OARified version of ssh) > > Yes, OMPI_MCA_plm_rsh_agent=oarshmost > This exports also needed environment instead of multpiple -x options. To > be as similar as possible to the environments on french national > supercomputers. > > my concern is the remote orted might not run within the cpuset that was > created by OAR for this job, > so you might end up using all the cores on the remote nodes. > > The oar environment does this. With older OpenMPI version all is working > fine. > > please let us know how that works for you > > Cheers, > > Gilles > > > On 9/18/2015 5:02 PM, Gilles Gouaillardet wrote: > > Patrick, > > i just filled PR 586 https://github.com/open-mpi/ompi-release/pull/586 > for the v1.10 series > > this is only a three line patch. > could you please give it a try ? > > > This patch solve the problem when OpenMPI uses one node but now I'm unable > to use more than one node. > On one node, with 4 cores in the cpuset: > > mpirun --bind-to core --hostfile $OAR_NODEFILE ./location.exe |grep > 'thread is now running on PU' |sort > (process 0) thread is now running on PU logical index 0 (OS/physical index > 12) on system frog26 > (process 1) thread is now running on PU logical index 1 (OS/physical index > 13) on system frog26 > (process 2) thread is now running on PU logical index 2 (OS/physical index > 14) on system frog26 > (process 3) thread is now running on PU logical index 3 (OS/physical index > 15) on system frog26 > > [begou@frog26 MPI_TESTS]$ mpirun -np 5 --bind-to core --hostfile > $OAR_NODEFILE ./location.exe > -------------------------------------------------------------------------- > A request was made to bind to that would result in binding more > processes than cpus on a resource: > > Bind to: CORE > Node: frog26 > #processes: 2 > #cpus: 1 > > You can override this protection by adding the "overload-allowed" > option to your binding directive. > > > > But if I request two nodes (4 cores one each) only 4 processes can start > on the local cores, none on the second host: > [begou@frog5 MPI_TESTS]$ cat $OAR_NODEFILE > frog5 > frog5 > frog5 > frog5 > frog6 > frog6 > frog6 > frog6 > > [begou@frog5 MPI_TESTS]$ cat ./frog.txt > frog5 slots=4 > frog6 slots=4 > > But only 4 processes are launched: > [begou@frog5 MPI_TESTS]$ mpirun --hostfile frog.txt --bind-to core > ./location.exe |grep 'thread is now running on PU' > (process 0) thread is now running on PU logical index 0 (OS/physical index > 12) on system frog5 > (process 1) thread is now running on PU logical index 1 (OS/physical index > 13) on system frog5 > (process 2) thread is now running on PU logical index 2 (OS/physical index > 14) on system frog5 > (process 3) thread is now running on PU logical index 3 (OS/physical index > 15) on system frog5 > > If I ask explicitly 8 processes (one for each 4 cores of the 2 nodes) > [begou@frog5 MPI_TESTS]$ mpirun --hostfile frog.txt -np 8 --bind-to core > ./location.exe > -------------------------------------------------------------------------- > A request was made to bind to that would result in binding more > processes than cpus on a resource: > > Bind to: CORE > Node: frog5 > #processes: 2 > #cpus: 1 > > You can override this protection by adding the "overload-allowed" > option to your binding directive. > > > > Cheers, > > Gilles > > On 9/18/2015 4:54 PM, Patrick Begou wrote: > > Ralph Castain wrote: > > As I said, if you don’t provide an explicit slot count in your hostfile, > we default to allowing oversubscription. We don’t have OAR integration in > OMPI, and so mpirun isn’t recognizing that you are running under a resource > manager - it thinks this is just being controlled by a hostfile. > > > That's look strange for me is that in this case (default) oversubscription > is allowed for the number of core of one cpu (8), not the number of cores > available in the node (16) or unlimited... > > If you want us to error out on oversubscription, you can either add the > flag you identified, or simply change your hostfile to: > > frog53 slots=4 > > Either will work. > > This syntax in the host file doesn't change anything to the oversuscribing > problem. It is still allowed with the same maximum amount of processes for > this test case: > > [begou@frog7 MPI_TESTS]$ mpirun -np 8 *--hostfile frog7.txt* --bind-to > core ./location.exe|grep 'thread is now running on PU' |sort > (process 0) thread is now running on PU logical index 0 (OS/physical index > 0) on system frog7 > (process 1) thread is now running on PU logical index 2 (OS/physical index > 6) on system frog7 > (process 2) thread is now running on PU logical index 0 (OS/physical index > 0) on system frog7 > (process 3) thread is now running on PU logical index 2 (OS/physical index > 6) on system frog7 > (process 4) thread is now running on PU logical index 3 (OS/physical index > 7) on system frog7 > (process 5) thread is now running on PU logical index 1 (OS/physical index > 5) on system frog7 > (process 6) thread is now running on PU logical index 2 (OS/physical index > 6) on system frog7 > (process 7) thread is now running on PU logical index 3 (OS/physical index > 7) on system frog7 > > [begou@frog7 MPI_TESTS]$ *cat frog7.txt* > frog7 slots=4 > > Patrick > > > On Sep 16, 2015, at 1:00 AM, Patrick Begou < > patrick.be...@legi.grenoble-inp.fr > <javascript:_e(%7B%7D,'cvml','patrick.be...@legi.grenoble-inp.fr');>> > wrote: > > Thanks all for your answers, I've added some details about the tests I > have run. See below. > > > Ralph Castain wrote: > > Not precisely correct. It depends on the environment. > > If there is a resource manager allocating nodes, or you provide a hostfile > that specifies the number of slots on the nodes, or you use -host, then we > default to no-oversubscribe. > > I'm using a batch scheduler (OAR). > # cat /dev/cpuset/oar/begou_7955553/cpuset.cpus > 4-7 > > So 4 cores allowed. Nodes have two height cores cpus. > > Node file contains: > # cat $OAR_NODEFILE > frog53 > frog53 > frog53 > frog53 > > # mpirun --hostfile $OAR_NODEFILE -bind-to core location.exe > is okay (my test code show one process on each core) > (process 3) thread is now running on PU logical index 1 (OS/physical index > 5) on system frog53 > (process 0) thread is now running on PU logical index 3 (OS/physical index > 7) on system frog53 > (process 1) thread is now running on PU logical index 0 (OS/physical index > 4) on system frog53 > (process 2) thread is now running on PU logical index 2 (OS/physical index > 6) on system frog53 > > # mpirun -np 5 --hostfile $OAR_NODEFILE -bind-to core location.exe > oversuscribe with: > (process 0) thread is now running on PU logical index 3 (OS/physical index > 7) on system frog53 > (process 1) thread is now running on PU logical index 1 (OS/physical index > 5) on system frog53 > (*process 3*) thread is now running on PU logical index *2 (OS/physical > index 6)* on system frog53 > (process 4) thread is now running on PU logical index 0 (OS/physical index > 4) on system frog53 > (*process 2*) thread is now running on PU logical index *2 (OS/physical > index 6)* on system frog53 > This is not allowed with OpenMPI 1.7.3 > > I can increase until the maximul core number of this first pocessor (8 > cores) > # mpirun -np 8 --hostfile $OAR_NODEFILE -bind-to core location.exe |grep > 'thread is now running on PU' > (process 5) thread is now running on PU logical index 1 (OS/physical index > 5) on system frog53 > (process 7) thread is now running on PU logical index 3 (OS/physical index > 7) on system frog53 > (process 4) thread is now running on PU logical index 0 (OS/physical index > 4) on system frog53 > (process 6) thread is now running on PU logical index 2 (OS/physical index > 6) on system frog53 > (process 2) thread is now running on PU logical index 1 (OS/physical index > 5) on system frog53 > (process 0) thread is now running on PU logical index 2 (OS/physical index > 6) on system frog53 > (process 1) thread is now running on PU logical index 0 (OS/physical index > 4) on system frog53 > (process 3) thread is now running on PU logical index 0 (OS/physical index > 4) on system frog53 > > But I cannot overload more than the 8 cores (max core number of one cpu). > # mpirun -np 9 --hostfile $OAR_NODEFILE -bind-to core location.exe > A request was made to bind to that would result in binding more > processes than cpus on a resource: > > Bind to: CORE > Node: frog53 > #processes: 2 > #cpus: 1 > > You can override this protection by adding the "overload-allowed" > option to your binding directive. > > Now if I add *--nooversubscribe* the problem doesn't exist anymore (no > more than 4 processes, one on each core). So looks like if default behavior > would be a nooversuscribe on cores number of the socket ??? > > Again, with 1.7.3 this problem doesn't occur at all. > > Patrick > > > > If you provide a hostfile that doesn’t specify slots, then we use the > number of cores we find on each node, and we allow oversubscription. > > What is being described sounds like more of a bug than an intended > feature. I’d need to know more about it, though, to be sure. Can you tell > me how you are specifying this cpuset? > > > On Sep 15, 2015, at 4:44 PM, Matt Thompson <fort...@gmail.com > <javascript:_e(%7B%7D,'cvml','fort...@gmail.com');>> wrote: > > Looking at the Open MPI 1.10.0 man page: > > https://www.open-mpi.org/doc/v1.10/man1/mpirun.1.php > > it looks like perhaps -oversubscribe (which was an option) is now the > default behavior. Instead we have: > > *-nooversubscribe, --nooversubscribe* Do not oversubscribe any nodes; > error (without starting any processes) if the requested number of processes > would cause oversubscription. This option implicitly sets "max_slots" equal > to the "slots" value for each node. > > It also looks like -map-by has a way to implement it as well (see man > page). > > Thanks for letting me/us know about this. On a system of mine I sort of > depend on the -nooversubscribe behavior! > > Matt > > > > On Tue, Sep 15, 2015 at 11:17 AM, Patrick Begou < > patrick.be...@legi.grenoble-inp.fr > <javascript:_e(%7B%7D,'cvml','patrick.be...@legi.grenoble-inp.fr');>> > wrote: > >> Hi, >> >> I'm runing OpenMPI 1.10.0 built with Intel 2015 compilers on a Bullx >> System. >> I've some troubles with the bind-to core option when using cpuset. >> If the cpuset is less than all the cores of a cpu (ex: 4 cores allowed on >> a 8 cores cpus) OpenMPI 1.10.0 allows to overload these cores until the >> maximum number of cores of the cpu. >> With this config and because the cpuset only allows 4 cores, I can reach >> 2 processes/core if I use: >> >> mpirun -np 8 --bind-to core my_application >> >> OpenMPI 1.7.3 doesn't show the problem with the same situation: >> mpirun -np 8 --bind-to-core my_application >> returns: >> *A request was made to bind to that would result in binding more* >> *processes than cpus on a resource* >> and that's okay of course. >> >> >> Is there a way to avoid this oveloading with OpenMPI 1.10.0 ? >> >> Thanks >> >> Patrick >> >> -- >> =================================================================== >> | Equipe M.O.S.T. | | >> | Patrick BEGOU | mailto:patrick.be...@grenoble-inp.fr >> <javascript:_e(%7B%7D,'cvml','patrick.be...@grenoble-inp.fr');> | >> | LEGI | | >> | BP 53 X | Tel 04 76 82 51 35 | >> | 38041 GRENOBLE CEDEX | Fax 04 76 82 52 71 | >> =================================================================== >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/09/27575.php >> > > > > -- > Matt Thompson > > Man Among Men > Fulcrum of History > > _______________________________________________ > users mailing list > us...@open-mpi.org <javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/09/27579.php > > > > > _______________________________________________ > users mailing listus...@open-mpi.org > <javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/09/27580.php > > > > -- > =================================================================== > | Equipe M.O.S.T. | | > | Patrick BEGOU | mailto:patrick.be...@grenoble-inp.fr > <javascript:_e(%7B%7D,'cvml','patrick.be...@grenoble-inp.fr');> | > | LEGI | | > | BP 53 X | Tel 04 76 82 51 35 | > | 38041 GRENOBLE CEDEX | Fax 04 76 82 52 71 | > =================================================================== > > _______________________________________________ > users mailing list > us...@open-mpi.org <javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/09/27583.php > > > > > _______________________________________________ > users mailing listus...@open-mpi.org > <javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/09/27590.php > > > > -- > =================================================================== > | Equipe M.O.S.T. | | > | Patrick BEGOU | mailto:patrick.be...@grenoble-inp.fr > <javascript:_e(%7B%7D,'cvml','patrick.be...@grenoble-inp.fr');> | > | LEGI | | > | BP 53 X | Tel 04 76 82 51 35 | > | 38041 GRENOBLE CEDEX | Fax 04 76 82 52 71 | > =================================================================== > > > > _______________________________________________ > users mailing listus...@open-mpi.org > <javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/09/27619.php > > > > > _______________________________________________ > users mailing listus...@open-mpi.org > <javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/09/27620.php > > > > > _______________________________________________ > users mailing listus...@open-mpi.org > <javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/09/27621.php > > > > -- > =================================================================== > | Equipe M.O.S.T. | | > | Patrick BEGOU | mailto:patrick.be...@grenoble-inp.fr > <javascript:_e(%7B%7D,'cvml','patrick.be...@grenoble-inp.fr');> | > | LEGI | | > | BP 53 X | Tel 04 76 82 51 35 | > | 38041 GRENOBLE CEDEX | Fax 04 76 82 52 71 | > =================================================================== > >