Patrick,

thanks for the report.

can you confirm what happened was
- you defined
OMPI_MCA_plm_rsh_agent=oarshmost
- oarshmost was not in the $PATH
- mpirun silently ignored the remote nodes

if that is correct, then i think mpirun should have reported an error
(oarshmost not found, or cannot remote start orted)
instead of this silent behaviour

Cheers,

Gilles


On Mon, Sep 21, 2015 at 11:43 PM, Patrick Begou
<patrick.be...@legi.grenoble-inp.fr> wrote:
> Hi Gilles,
>
> I've done a big mistake! Compiling the patched version of openMPI and
> creating a new module, I've forgotten to add the path to oarshmost command
> while OMPI_MCA_plm_rsh_agent=oarshmost was set....
> OpenMPI was silently ignoring oarshmost command as it was unable to find it
> and so only one node was available!
>
> The good thing is that with your patch, oversuscribing does not occur
> anymore on the nodes, it seems to solves efficiently the problem we had.
> I'll keep this patched version in prod for the users as the previous one was
> allowing 2 processes on some cores time to time, and haphazardly bad code
> performances in thes cases.
>
> Yes this computer is the biggest one of CIMENT mesocenter, it is called...
> froggy and all the nodes are littles frogs :-)
> https://ciment.ujf-grenoble.fr/wiki-pub/index.php/Hardware:Froggy
>
> I was using $OAR_NODEFILE and frog.txt to check different syntax, one with a
> liste of nodes (on line with a node name for each available core) and the
> second with one line per node and the "slots" information for the number of
> cores. EG:
>
> [begou@frog7 MPI_TESTS]$ cat $OAR_NODEFILE
> frog7
> frog7
> frog7
> frog7
> frog8
> frog8
> frog8
> frog8
>
> [begou@frog7 MPI_TESTS]$ cat frog.txt
> frog7 slots=4
> frog8 slots=4
>
> Thanks again for the patch and your help.
>
> Patrick
>
>
> Gilles Gouaillardet wrote:
>
> Thanks Patrick,
>
> could you please try again with the --hetero-nodes mpirun option ?
> (I am afk, and not 100% sure about the syntax)
>
> could you also submit a job with 2 nodes and 4 cores on each node, that does
> cat /proc/self/status
> oarshmost <remote host> cat /proc/self/status
>
> btw, is there any reason why do you use a machine file (frog.txt) instead of
> using $OAR_NODEFILE directly ?
> /* not to mention I am surprised a French supercomputer is called "frog" ;-)
> */
>
> Cheers,
>
> Gilles
>
> On Friday, September 18, 2015, Patrick Begou
> <patrick.be...@legi.grenoble-inp.fr> wrote:
>>
>> Gilles Gouaillardet wrote:
>>
>> Patrick,
>>
>> by the way, this will work when running on a single node.
>>
>> i do not know what will happen when you run on multiple node ...
>> since there is no OAR integration in openmpi, i guess you are using ssh to
>> start orted on the remote nodes
>> (unless you instructed ompi to use an OARified version of ssh)
>>
>> Yes, OMPI_MCA_plm_rsh_agent=oarshmost
>> This exports also needed environment instead of multpiple -x options. To
>> be as similar as possible to the environments on french national
>> supercomputers.
>>
>> my concern is the remote orted might not run within the cpuset that was
>> created by OAR for this job,
>> so you might end up using all the cores on the remote nodes.
>>
>> The oar environment does this. With older OpenMPI version all is working
>> fine.
>>
>> please let us know how that works for you
>>
>> Cheers,
>>
>> Gilles
>>
>>
>> On 9/18/2015 5:02 PM, Gilles Gouaillardet wrote:
>>
>> Patrick,
>>
>> i just filled PR 586 https://github.com/open-mpi/ompi-release/pull/586 for
>> the v1.10 series
>>
>> this is only a three line patch.
>> could you please give it a try ?
>>
>>
>> This patch solve the problem when OpenMPI uses one node but now I'm unable
>> to use more than one node.
>> On one node, with 4 cores in the cpuset:
>>
>> mpirun --bind-to core --hostfile $OAR_NODEFILE ./location.exe |grep
>> 'thread is now running on PU'  |sort
>> (process 0) thread is now running on PU logical index 0 (OS/physical index
>> 12) on system frog26
>> (process 1) thread is now running on PU logical index 1 (OS/physical index
>> 13) on system frog26
>> (process 2) thread is now running on PU logical index 2 (OS/physical index
>> 14) on system frog26
>> (process 3) thread is now running on PU logical index 3 (OS/physical index
>> 15) on system frog26
>>
>> [begou@frog26 MPI_TESTS]$ mpirun -np 5 --bind-to core --hostfile
>> $OAR_NODEFILE ./location.exe
>> --------------------------------------------------------------------------
>> A request was made to bind to that would result in binding more
>> processes than cpus on a resource:
>>
>>    Bind to:     CORE
>>    Node:        frog26
>>    #processes:  2
>>    #cpus:       1
>>
>> You can override this protection by adding the "overload-allowed"
>> option to your binding directive.
>>
>>
>>
>> But if I request two nodes (4 cores one each) only 4 processes can start
>> on the local cores, none on the second host:
>> [begou@frog5 MPI_TESTS]$ cat $OAR_NODEFILE
>> frog5
>> frog5
>> frog5
>> frog5
>> frog6
>> frog6
>> frog6
>> frog6
>>
>> [begou@frog5 MPI_TESTS]$ cat ./frog.txt
>> frog5 slots=4
>> frog6 slots=4
>>
>> But only 4 processes are launched:
>> [begou@frog5 MPI_TESTS]$ mpirun --hostfile frog.txt --bind-to core
>> ./location.exe |grep 'thread is now running on PU'
>> (process 0) thread is now running on PU logical index 0 (OS/physical index
>> 12) on system frog5
>> (process 1) thread is now running on PU logical index 1 (OS/physical index
>> 13) on system frog5
>> (process 2) thread is now running on PU logical index 2 (OS/physical index
>> 14) on system frog5
>> (process 3) thread is now running on PU logical index 3 (OS/physical index
>> 15) on system frog5
>>
>> If I ask explicitly 8 processes (one for each 4 cores of the 2 nodes)
>> [begou@frog5 MPI_TESTS]$ mpirun --hostfile frog.txt -np 8 --bind-to core
>> ./location.exe
>> --------------------------------------------------------------------------
>> A request was made to bind to that would result in binding more
>> processes than cpus on a resource:
>>
>>    Bind to:     CORE
>>    Node:        frog5
>>    #processes:  2
>>    #cpus:       1
>>
>> You can override this protection by adding the "overload-allowed"
>> option to your binding directive.
>>
>>
>>
>> Cheers,
>>
>> Gilles
>>
>> On 9/18/2015 4:54 PM, Patrick Begou wrote:
>>
>> Ralph Castain wrote:
>>
>> As I said, if you don’t provide an explicit slot count in your hostfile,
>> we default to allowing oversubscription. We don’t have OAR integration in
>> OMPI, and so mpirun isn’t recognizing that you are running under a resource
>> manager - it thinks this is just being controlled by a hostfile.
>>
>>
>> That's look strange for me is that in this case (default) oversubscription
>> is allowed for the number of core of one cpu (8), not the number of cores
>> available in the node (16) or unlimited...
>>
>> If you want us to error out on oversubscription, you can either add the
>> flag you identified, or simply change your hostfile to:
>>
>> frog53 slots=4
>>
>> Either will work.
>>
>> This syntax in the host file doesn't change anything to the oversuscribing
>> problem. It is still allowed with the same maximum amount of processes for
>> this test case:
>>
>> [begou@frog7 MPI_TESTS]$ mpirun -np 8 --hostfile frog7.txt --bind-to core
>> ./location.exe|grep 'thread is now running on PU'  |sort
>> (process 0) thread is now running on PU logical index 0 (OS/physical index
>> 0) on system frog7
>> (process 1) thread is now running on PU logical index 2 (OS/physical index
>> 6) on system frog7
>> (process 2) thread is now running on PU logical index 0 (OS/physical index
>> 0) on system frog7
>> (process 3) thread is now running on PU logical index 2 (OS/physical index
>> 6) on system frog7
>> (process 4) thread is now running on PU logical index 3 (OS/physical index
>> 7) on system frog7
>> (process 5) thread is now running on PU logical index 1 (OS/physical index
>> 5) on system frog7
>> (process 6) thread is now running on PU logical index 2 (OS/physical index
>> 6) on system frog7
>> (process 7) thread is now running on PU logical index 3 (OS/physical index
>> 7) on system frog7
>>
>> [begou@frog7 MPI_TESTS]$ cat frog7.txt
>> frog7 slots=4
>>
>> Patrick
>>
>>
>> On Sep 16, 2015, at 1:00 AM, Patrick Begou
>> <patrick.be...@legi.grenoble-inp.fr> wrote:
>>
>> Thanks all for your answers, I've added some details about the tests I
>> have run.  See below.
>>
>>
>> Ralph Castain wrote:
>>
>> Not precisely correct. It depends on the environment.
>>
>> If there is a resource manager allocating nodes, or you provide a hostfile
>> that specifies the number of slots on the nodes, or you use -host, then we
>> default to no-oversubscribe.
>>
>> I'm using a batch scheduler (OAR).
>> # cat /dev/cpuset/oar/begou_7955553/cpuset.cpus
>> 4-7
>>
>> So 4 cores allowed. Nodes have two height cores cpus.
>>
>> Node file contains:
>> # cat $OAR_NODEFILE
>> frog53
>> frog53
>> frog53
>> frog53
>>
>> # mpirun --hostfile $OAR_NODEFILE -bind-to core location.exe
>> is  okay (my test code show one process on each core)
>> (process 3) thread is now running on PU logical index 1 (OS/physical index
>> 5) on system frog53
>> (process 0) thread is now running on PU logical index 3 (OS/physical index
>> 7) on system frog53
>> (process 1) thread is now running on PU logical index 0 (OS/physical index
>> 4) on system frog53
>> (process 2) thread is now running on PU logical index 2 (OS/physical index
>> 6) on system frog53
>>
>> # mpirun -np 5 --hostfile $OAR_NODEFILE -bind-to core location.exe
>> oversuscribe with:
>> (process 0) thread is now running on PU logical index 3 (OS/physical index
>> 7) on system frog53
>> (process 1) thread is now running on PU logical index 1 (OS/physical index
>> 5) on system frog53
>> (process 3) thread is now running on PU logical index 2 (OS/physical index
>> 6) on system frog53
>> (process 4) thread is now running on PU logical index 0 (OS/physical index
>> 4) on system frog53
>> (process 2) thread is now running on PU logical index 2 (OS/physical index
>> 6) on system frog53
>> This is not allowed with OpenMPI 1.7.3
>>
>> I can increase until the maximul core number of this first pocessor (8
>> cores)
>> # mpirun -np 8 --hostfile $OAR_NODEFILE -bind-to core location.exe |grep
>> 'thread is now running on PU'
>> (process 5) thread is now running on PU logical index 1 (OS/physical index
>> 5) on system frog53
>> (process 7) thread is now running on PU logical index 3 (OS/physical index
>> 7) on system frog53
>> (process 4) thread is now running on PU logical index 0 (OS/physical index
>> 4) on system frog53
>> (process 6) thread is now running on PU logical index 2 (OS/physical index
>> 6) on system frog53
>> (process 2) thread is now running on PU logical index 1 (OS/physical index
>> 5) on system frog53
>> (process 0) thread is now running on PU logical index 2 (OS/physical index
>> 6) on system frog53
>> (process 1) thread is now running on PU logical index 0 (OS/physical index
>> 4) on system frog53
>> (process 3) thread is now running on PU logical index 0 (OS/physical index
>> 4) on system frog53
>>
>> But I cannot overload more than the 8 cores (max core number of one cpu).
>> # mpirun -np 9 --hostfile $OAR_NODEFILE -bind-to core location.exe
>> A request was made to bind to that would result in binding more
>> processes than cpus on a resource:
>>
>>    Bind to:     CORE
>>    Node:        frog53
>>    #processes:  2
>>    #cpus:       1
>>
>> You can override this protection by adding the "overload-allowed"
>> option to your binding directive.
>>
>> Now if I add --nooversubscribe the problem doesn't exist anymore (no more
>> than 4 processes, one on each core). So looks like if default behavior would
>> be a nooversuscribe on cores number of the socket ???
>>
>> Again, with 1.7.3 this problem doesn't occur at all.
>>
>> Patrick
>>
>>
>>
>> If you provide a hostfile that doesn’t specify slots, then we use the
>> number of cores we find on each node, and we allow oversubscription.
>>
>> What is being described sounds like more of a bug than an intended
>> feature. I’d need to know more about it, though, to be sure. Can you tell me
>> how you are specifying this cpuset?
>>
>>
>> On Sep 15, 2015, at 4:44 PM, Matt Thompson <fort...@gmail.com> wrote:
>>
>> Looking at the Open MPI 1.10.0 man page:
>>
>>   https://www.open-mpi.org/doc/v1.10/man1/mpirun.1.php
>>
>> it looks like perhaps -oversubscribe (which was an option) is now the
>> default behavior. Instead we have:
>>
>> -nooversubscribe, --nooversubscribe Do not oversubscribe any nodes; error
>> (without starting any processes) if the requested number of processes would
>> cause oversubscription. This option implicitly sets "max_slots" equal to the
>> "slots" value for each node.
>>
>> It also looks like -map-by has a way to implement it as well (see man
>> page).
>>
>> Thanks for letting me/us know about this. On a system of mine I sort of
>> depend on the -nooversubscribe behavior!
>>
>> Matt
>>
>>
>>
>> On Tue, Sep 15, 2015 at 11:17 AM, Patrick Begou
>> <patrick.be...@legi.grenoble-inp.fr> wrote:
>>>
>>> Hi,
>>>
>>> I'm runing OpenMPI 1.10.0 built with Intel 2015 compilers on a Bullx
>>> System.
>>> I've some troubles with the bind-to core option when using cpuset.
>>> If the cpuset is less than all the cores of a cpu (ex: 4 cores allowed on
>>> a 8 cores cpus) OpenMPI 1.10.0 allows to overload these cores  until the
>>> maximum number of cores of the cpu.
>>> With this config and because the cpuset only allows 4 cores, I can reach
>>> 2 processes/core if I use:
>>>
>>> mpirun -np 8 --bind-to core my_application
>>>
>>> OpenMPI 1.7.3 doesn't show the problem with the same situation:
>>> mpirun -np 8 --bind-to-core my_application
>>> returns:
>>> A request was made to bind to that would result in binding more
>>> processes than cpus on a resource
>>> and that's okay of course.
>>>
>>>
>>> Is there a way to avoid this oveloading with OpenMPI 1.10.0 ?
>>>
>>> Thanks
>>>
>>> Patrick
>>>
>>> --
>>> ===================================================================
>>> |  Equipe M.O.S.T.         |                                      |
>>> |  Patrick BEGOU           | mailto:patrick.be...@grenoble-inp.fr |
>>> |  LEGI                    |                                      |
>>> |  BP 53 X                 | Tel 04 76 82 51 35                   |
>>> |  38041 GRENOBLE CEDEX    | Fax 04 76 82 52 71                   |
>>> ===================================================================
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/users/2015/09/27575.php
>>
>>
>>
>>
>> --
>> Matt Thompson
>>
>> Man Among Men
>> Fulcrum of History
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2015/09/27579.php
>>
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2015/09/27580.php
>>
>>
>>
>> --
>> ===================================================================
>> |  Equipe M.O.S.T.         |                                      |
>> |  Patrick BEGOU           | mailto:patrick.be...@grenoble-inp.fr |
>> |  LEGI                    |                                      |
>> |  BP 53 X                 | Tel 04 76 82 51 35                   |
>> |  38041 GRENOBLE CEDEX    | Fax 04 76 82 52 71                   |
>> ===================================================================
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2015/09/27583.php
>>
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2015/09/27590.php
>>
>>
>>
>> --
>> ===================================================================
>> |  Equipe M.O.S.T.         |                                      |
>> |  Patrick BEGOU           | mailto:patrick.be...@grenoble-inp.fr |
>> |  LEGI                    |                                      |
>> |  BP 53 X                 | Tel 04 76 82 51 35                   |
>> |  38041 GRENOBLE CEDEX    | Fax 04 76 82 52 71                   |
>> ===================================================================
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2015/09/27619.php
>>
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2015/09/27620.php
>>
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2015/09/27621.php
>>
>>
>>
>> --
>> ===================================================================
>> |  Equipe M.O.S.T.         |                                      |
>> |  Patrick BEGOU           | mailto:patrick.be...@grenoble-inp.fr |
>> |  LEGI                    |                                      |
>> |  BP 53 X                 | Tel 04 76 82 51 35                   |
>> |  38041 GRENOBLE CEDEX    | Fax 04 76 82 52 71                   |
>> ===================================================================
>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/09/27624.php
>
>
>
> --
> ===================================================================
> |  Equipe M.O.S.T.         |                                      |
> |  Patrick BEGOU           | mailto:patrick.be...@grenoble-inp.fr |
> |  LEGI                    |                                      |
> |  BP 53 X                 | Tel 04 76 82 51 35                   |
> |  38041 GRENOBLE CEDEX    | Fax 04 76 82 52 71                   |
> ===================================================================
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/09/27644.php

Reply via email to