I’m really puzzled by that one - we very definitely will report an error and exit if the user specifies that MCA param and we don’t find the given agent.
Could you please send us the actual cmd line plus the hostfile you gave, and verify that the MCA param was set? > On Sep 21, 2015, at 8:42 AM, Gilles Gouaillardet > <gilles.gouaillar...@gmail.com> wrote: > > Patrick, > > thanks for the report. > > can you confirm what happened was > - you defined > OMPI_MCA_plm_rsh_agent=oarshmost > - oarshmost was not in the $PATH > - mpirun silently ignored the remote nodes > > if that is correct, then i think mpirun should have reported an error > (oarshmost not found, or cannot remote start orted) > instead of this silent behaviour > > Cheers, > > Gilles > > > On Mon, Sep 21, 2015 at 11:43 PM, Patrick Begou > <patrick.be...@legi.grenoble-inp.fr> wrote: >> Hi Gilles, >> >> I've done a big mistake! Compiling the patched version of openMPI and >> creating a new module, I've forgotten to add the path to oarshmost command >> while OMPI_MCA_plm_rsh_agent=oarshmost was set.... >> OpenMPI was silently ignoring oarshmost command as it was unable to find it >> and so only one node was available! >> >> The good thing is that with your patch, oversuscribing does not occur >> anymore on the nodes, it seems to solves efficiently the problem we had. >> I'll keep this patched version in prod for the users as the previous one was >> allowing 2 processes on some cores time to time, and haphazardly bad code >> performances in thes cases. >> >> Yes this computer is the biggest one of CIMENT mesocenter, it is called... >> froggy and all the nodes are littles frogs :-) >> https://ciment.ujf-grenoble.fr/wiki-pub/index.php/Hardware:Froggy >> >> I was using $OAR_NODEFILE and frog.txt to check different syntax, one with a >> liste of nodes (on line with a node name for each available core) and the >> second with one line per node and the "slots" information for the number of >> cores. EG: >> >> [begou@frog7 MPI_TESTS]$ cat $OAR_NODEFILE >> frog7 >> frog7 >> frog7 >> frog7 >> frog8 >> frog8 >> frog8 >> frog8 >> >> [begou@frog7 MPI_TESTS]$ cat frog.txt >> frog7 slots=4 >> frog8 slots=4 >> >> Thanks again for the patch and your help. >> >> Patrick >> >> >> Gilles Gouaillardet wrote: >> >> Thanks Patrick, >> >> could you please try again with the --hetero-nodes mpirun option ? >> (I am afk, and not 100% sure about the syntax) >> >> could you also submit a job with 2 nodes and 4 cores on each node, that does >> cat /proc/self/status >> oarshmost <remote host> cat /proc/self/status >> >> btw, is there any reason why do you use a machine file (frog.txt) instead of >> using $OAR_NODEFILE directly ? >> /* not to mention I am surprised a French supercomputer is called "frog" ;-) >> */ >> >> Cheers, >> >> Gilles >> >> On Friday, September 18, 2015, Patrick Begou >> <patrick.be...@legi.grenoble-inp.fr> wrote: >>> >>> Gilles Gouaillardet wrote: >>> >>> Patrick, >>> >>> by the way, this will work when running on a single node. >>> >>> i do not know what will happen when you run on multiple node ... >>> since there is no OAR integration in openmpi, i guess you are using ssh to >>> start orted on the remote nodes >>> (unless you instructed ompi to use an OARified version of ssh) >>> >>> Yes, OMPI_MCA_plm_rsh_agent=oarshmost >>> This exports also needed environment instead of multpiple -x options. To >>> be as similar as possible to the environments on french national >>> supercomputers. >>> >>> my concern is the remote orted might not run within the cpuset that was >>> created by OAR for this job, >>> so you might end up using all the cores on the remote nodes. >>> >>> The oar environment does this. With older OpenMPI version all is working >>> fine. >>> >>> please let us know how that works for you >>> >>> Cheers, >>> >>> Gilles >>> >>> >>> On 9/18/2015 5:02 PM, Gilles Gouaillardet wrote: >>> >>> Patrick, >>> >>> i just filled PR 586 https://github.com/open-mpi/ompi-release/pull/586 for >>> the v1.10 series >>> >>> this is only a three line patch. >>> could you please give it a try ? >>> >>> >>> This patch solve the problem when OpenMPI uses one node but now I'm unable >>> to use more than one node. >>> On one node, with 4 cores in the cpuset: >>> >>> mpirun --bind-to core --hostfile $OAR_NODEFILE ./location.exe |grep >>> 'thread is now running on PU' |sort >>> (process 0) thread is now running on PU logical index 0 (OS/physical index >>> 12) on system frog26 >>> (process 1) thread is now running on PU logical index 1 (OS/physical index >>> 13) on system frog26 >>> (process 2) thread is now running on PU logical index 2 (OS/physical index >>> 14) on system frog26 >>> (process 3) thread is now running on PU logical index 3 (OS/physical index >>> 15) on system frog26 >>> >>> [begou@frog26 MPI_TESTS]$ mpirun -np 5 --bind-to core --hostfile >>> $OAR_NODEFILE ./location.exe >>> -------------------------------------------------------------------------- >>> A request was made to bind to that would result in binding more >>> processes than cpus on a resource: >>> >>> Bind to: CORE >>> Node: frog26 >>> #processes: 2 >>> #cpus: 1 >>> >>> You can override this protection by adding the "overload-allowed" >>> option to your binding directive. >>> >>> >>> >>> But if I request two nodes (4 cores one each) only 4 processes can start >>> on the local cores, none on the second host: >>> [begou@frog5 MPI_TESTS]$ cat $OAR_NODEFILE >>> frog5 >>> frog5 >>> frog5 >>> frog5 >>> frog6 >>> frog6 >>> frog6 >>> frog6 >>> >>> [begou@frog5 MPI_TESTS]$ cat ./frog.txt >>> frog5 slots=4 >>> frog6 slots=4 >>> >>> But only 4 processes are launched: >>> [begou@frog5 MPI_TESTS]$ mpirun --hostfile frog.txt --bind-to core >>> ./location.exe |grep 'thread is now running on PU' >>> (process 0) thread is now running on PU logical index 0 (OS/physical index >>> 12) on system frog5 >>> (process 1) thread is now running on PU logical index 1 (OS/physical index >>> 13) on system frog5 >>> (process 2) thread is now running on PU logical index 2 (OS/physical index >>> 14) on system frog5 >>> (process 3) thread is now running on PU logical index 3 (OS/physical index >>> 15) on system frog5 >>> >>> If I ask explicitly 8 processes (one for each 4 cores of the 2 nodes) >>> [begou@frog5 MPI_TESTS]$ mpirun --hostfile frog.txt -np 8 --bind-to core >>> ./location.exe >>> -------------------------------------------------------------------------- >>> A request was made to bind to that would result in binding more >>> processes than cpus on a resource: >>> >>> Bind to: CORE >>> Node: frog5 >>> #processes: 2 >>> #cpus: 1 >>> >>> You can override this protection by adding the "overload-allowed" >>> option to your binding directive. >>> >>> >>> >>> Cheers, >>> >>> Gilles >>> >>> On 9/18/2015 4:54 PM, Patrick Begou wrote: >>> >>> Ralph Castain wrote: >>> >>> As I said, if you don’t provide an explicit slot count in your hostfile, >>> we default to allowing oversubscription. We don’t have OAR integration in >>> OMPI, and so mpirun isn’t recognizing that you are running under a resource >>> manager - it thinks this is just being controlled by a hostfile. >>> >>> >>> That's look strange for me is that in this case (default) oversubscription >>> is allowed for the number of core of one cpu (8), not the number of cores >>> available in the node (16) or unlimited... >>> >>> If you want us to error out on oversubscription, you can either add the >>> flag you identified, or simply change your hostfile to: >>> >>> frog53 slots=4 >>> >>> Either will work. >>> >>> This syntax in the host file doesn't change anything to the oversuscribing >>> problem. It is still allowed with the same maximum amount of processes for >>> this test case: >>> >>> [begou@frog7 MPI_TESTS]$ mpirun -np 8 --hostfile frog7.txt --bind-to core >>> ./location.exe|grep 'thread is now running on PU' |sort >>> (process 0) thread is now running on PU logical index 0 (OS/physical index >>> 0) on system frog7 >>> (process 1) thread is now running on PU logical index 2 (OS/physical index >>> 6) on system frog7 >>> (process 2) thread is now running on PU logical index 0 (OS/physical index >>> 0) on system frog7 >>> (process 3) thread is now running on PU logical index 2 (OS/physical index >>> 6) on system frog7 >>> (process 4) thread is now running on PU logical index 3 (OS/physical index >>> 7) on system frog7 >>> (process 5) thread is now running on PU logical index 1 (OS/physical index >>> 5) on system frog7 >>> (process 6) thread is now running on PU logical index 2 (OS/physical index >>> 6) on system frog7 >>> (process 7) thread is now running on PU logical index 3 (OS/physical index >>> 7) on system frog7 >>> >>> [begou@frog7 MPI_TESTS]$ cat frog7.txt >>> frog7 slots=4 >>> >>> Patrick >>> >>> >>> On Sep 16, 2015, at 1:00 AM, Patrick Begou >>> <patrick.be...@legi.grenoble-inp.fr> wrote: >>> >>> Thanks all for your answers, I've added some details about the tests I >>> have run. See below. >>> >>> >>> Ralph Castain wrote: >>> >>> Not precisely correct. It depends on the environment. >>> >>> If there is a resource manager allocating nodes, or you provide a hostfile >>> that specifies the number of slots on the nodes, or you use -host, then we >>> default to no-oversubscribe. >>> >>> I'm using a batch scheduler (OAR). >>> # cat /dev/cpuset/oar/begou_7955553/cpuset.cpus >>> 4-7 >>> >>> So 4 cores allowed. Nodes have two height cores cpus. >>> >>> Node file contains: >>> # cat $OAR_NODEFILE >>> frog53 >>> frog53 >>> frog53 >>> frog53 >>> >>> # mpirun --hostfile $OAR_NODEFILE -bind-to core location.exe >>> is okay (my test code show one process on each core) >>> (process 3) thread is now running on PU logical index 1 (OS/physical index >>> 5) on system frog53 >>> (process 0) thread is now running on PU logical index 3 (OS/physical index >>> 7) on system frog53 >>> (process 1) thread is now running on PU logical index 0 (OS/physical index >>> 4) on system frog53 >>> (process 2) thread is now running on PU logical index 2 (OS/physical index >>> 6) on system frog53 >>> >>> # mpirun -np 5 --hostfile $OAR_NODEFILE -bind-to core location.exe >>> oversuscribe with: >>> (process 0) thread is now running on PU logical index 3 (OS/physical index >>> 7) on system frog53 >>> (process 1) thread is now running on PU logical index 1 (OS/physical index >>> 5) on system frog53 >>> (process 3) thread is now running on PU logical index 2 (OS/physical index >>> 6) on system frog53 >>> (process 4) thread is now running on PU logical index 0 (OS/physical index >>> 4) on system frog53 >>> (process 2) thread is now running on PU logical index 2 (OS/physical index >>> 6) on system frog53 >>> This is not allowed with OpenMPI 1.7.3 >>> >>> I can increase until the maximul core number of this first pocessor (8 >>> cores) >>> # mpirun -np 8 --hostfile $OAR_NODEFILE -bind-to core location.exe |grep >>> 'thread is now running on PU' >>> (process 5) thread is now running on PU logical index 1 (OS/physical index >>> 5) on system frog53 >>> (process 7) thread is now running on PU logical index 3 (OS/physical index >>> 7) on system frog53 >>> (process 4) thread is now running on PU logical index 0 (OS/physical index >>> 4) on system frog53 >>> (process 6) thread is now running on PU logical index 2 (OS/physical index >>> 6) on system frog53 >>> (process 2) thread is now running on PU logical index 1 (OS/physical index >>> 5) on system frog53 >>> (process 0) thread is now running on PU logical index 2 (OS/physical index >>> 6) on system frog53 >>> (process 1) thread is now running on PU logical index 0 (OS/physical index >>> 4) on system frog53 >>> (process 3) thread is now running on PU logical index 0 (OS/physical index >>> 4) on system frog53 >>> >>> But I cannot overload more than the 8 cores (max core number of one cpu). >>> # mpirun -np 9 --hostfile $OAR_NODEFILE -bind-to core location.exe >>> A request was made to bind to that would result in binding more >>> processes than cpus on a resource: >>> >>> Bind to: CORE >>> Node: frog53 >>> #processes: 2 >>> #cpus: 1 >>> >>> You can override this protection by adding the "overload-allowed" >>> option to your binding directive. >>> >>> Now if I add --nooversubscribe the problem doesn't exist anymore (no more >>> than 4 processes, one on each core). So looks like if default behavior would >>> be a nooversuscribe on cores number of the socket ??? >>> >>> Again, with 1.7.3 this problem doesn't occur at all. >>> >>> Patrick >>> >>> >>> >>> If you provide a hostfile that doesn’t specify slots, then we use the >>> number of cores we find on each node, and we allow oversubscription. >>> >>> What is being described sounds like more of a bug than an intended >>> feature. I’d need to know more about it, though, to be sure. Can you tell me >>> how you are specifying this cpuset? >>> >>> >>> On Sep 15, 2015, at 4:44 PM, Matt Thompson <fort...@gmail.com> wrote: >>> >>> Looking at the Open MPI 1.10.0 man page: >>> >>> https://www.open-mpi.org/doc/v1.10/man1/mpirun.1.php >>> >>> it looks like perhaps -oversubscribe (which was an option) is now the >>> default behavior. Instead we have: >>> >>> -nooversubscribe, --nooversubscribe Do not oversubscribe any nodes; error >>> (without starting any processes) if the requested number of processes would >>> cause oversubscription. This option implicitly sets "max_slots" equal to the >>> "slots" value for each node. >>> >>> It also looks like -map-by has a way to implement it as well (see man >>> page). >>> >>> Thanks for letting me/us know about this. On a system of mine I sort of >>> depend on the -nooversubscribe behavior! >>> >>> Matt >>> >>> >>> >>> On Tue, Sep 15, 2015 at 11:17 AM, Patrick Begou >>> <patrick.be...@legi.grenoble-inp.fr> wrote: >>>> >>>> Hi, >>>> >>>> I'm runing OpenMPI 1.10.0 built with Intel 2015 compilers on a Bullx >>>> System. >>>> I've some troubles with the bind-to core option when using cpuset. >>>> If the cpuset is less than all the cores of a cpu (ex: 4 cores allowed on >>>> a 8 cores cpus) OpenMPI 1.10.0 allows to overload these cores until the >>>> maximum number of cores of the cpu. >>>> With this config and because the cpuset only allows 4 cores, I can reach >>>> 2 processes/core if I use: >>>> >>>> mpirun -np 8 --bind-to core my_application >>>> >>>> OpenMPI 1.7.3 doesn't show the problem with the same situation: >>>> mpirun -np 8 --bind-to-core my_application >>>> returns: >>>> A request was made to bind to that would result in binding more >>>> processes than cpus on a resource >>>> and that's okay of course. >>>> >>>> >>>> Is there a way to avoid this oveloading with OpenMPI 1.10.0 ? >>>> >>>> Thanks >>>> >>>> Patrick >>>> >>>> -- >>>> =================================================================== >>>> | Equipe M.O.S.T. | | >>>> | Patrick BEGOU | mailto:patrick.be...@grenoble-inp.fr | >>>> | LEGI | | >>>> | BP 53 X | Tel 04 76 82 51 35 | >>>> | 38041 GRENOBLE CEDEX | Fax 04 76 82 52 71 | >>>> =================================================================== >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2015/09/27575.php >>> >>> >>> >>> >>> -- >>> Matt Thompson >>> >>> Man Among Men >>> Fulcrum of History >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2015/09/27579.php >>> >>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2015/09/27580.php >>> >>> >>> >>> -- >>> =================================================================== >>> | Equipe M.O.S.T. | | >>> | Patrick BEGOU | mailto:patrick.be...@grenoble-inp.fr | >>> | LEGI | | >>> | BP 53 X | Tel 04 76 82 51 35 | >>> | 38041 GRENOBLE CEDEX | Fax 04 76 82 52 71 | >>> =================================================================== >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2015/09/27583.php >>> >>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2015/09/27590.php >>> >>> >>> >>> -- >>> =================================================================== >>> | Equipe M.O.S.T. | | >>> | Patrick BEGOU | mailto:patrick.be...@grenoble-inp.fr | >>> | LEGI | | >>> | BP 53 X | Tel 04 76 82 51 35 | >>> | 38041 GRENOBLE CEDEX | Fax 04 76 82 52 71 | >>> =================================================================== >>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2015/09/27619.php >>> >>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2015/09/27620.php >>> >>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2015/09/27621.php >>> >>> >>> >>> -- >>> =================================================================== >>> | Equipe M.O.S.T. | | >>> | Patrick BEGOU | mailto:patrick.be...@grenoble-inp.fr | >>> | LEGI | | >>> | BP 53 X | Tel 04 76 82 51 35 | >>> | 38041 GRENOBLE CEDEX | Fax 04 76 82 52 71 | >>> =================================================================== >> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/09/27624.php >> >> >> >> -- >> =================================================================== >> | Equipe M.O.S.T. | | >> | Patrick BEGOU | mailto:patrick.be...@grenoble-inp.fr | >> | LEGI | | >> | BP 53 X | Tel 04 76 82 51 35 | >> | 38041 GRENOBLE CEDEX | Fax 04 76 82 52 71 | >> =================================================================== >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/09/27644.php > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/09/27645.php