As I said, we removed the warning starting in 1.8.3

On Sep 24, 2014, at 1:23 PM, Brock Palen <bro...@umich.edu> wrote:

> So very hetero, I did some testing and I couldn't make it happen below 32 
> cores.  Not sure if this the real issue or if it requires a specific layout:
> 
> [brockp@nyx5512 ~]$ cat $PBS_NODEFILE | sort | uniq -c
>      1 nyx5512
>      1 nyx5515
>      1 nyx5518
>      1 nyx5523
>      1 nyx5527
>      2 nyx5537
>      1 nyx5542
>      1 nyx5560
>      2 nyx5561
>      2 nyx5562
>      3 nyx5589
>      1 nyx5591
>      1 nyx5593
>      1 nyx5617
>      2 nyx5620
>      1 nyx5622
>      5 nyx5629
>      1 nyx5630
>      1 nyx5770
>      1 nyx5771
>      2 nyx5772
>      1 nyx5780
>      3 nyx5784
>      2 nyx5820
>     10 nyx5844
>      2 nyx5847
>      1 nyx5849
>      1 nyx5852
>      2 nyx5856
>      1 nyx5870
>      8 nyx5872
>      1 nyx5894
> 
> This sort of layout gives me that warning, if I leave -np 64 
> A request was made to bind to that would result in binding more
> processes than cpus on a resource:
> 
>   Bind to:     CORE
>   Node:        nyx5589
>   #processes:  2
>   #cpus:       1
> 
> If I omit the -np ## it works and nyx5589 does get 3 processes started.
> 
> If I look at the binding of the three ranks on nyx5589 that it complains 
> about they appear correct:
> [root@nyx5589 ~]# hwloc-bind --get --pid 24826
> 0x00000080  ->  7
> [root@nyx5589 ~]# hwloc-bind --get --pid 24827
> 0x00000400 -> 10
> [root@nyx5589 ~]# hwloc-bind --get --pid 24828
> 0x00001000 -> 12
> 
> I think I found the problem though, and its on torque side,  while the CPU 
> set sets up 7, 10, and 12
> PBS server thinks it gave out 6,7, and 10.  Thus where the only 2 processes 
> come from.
> 
> I checked some of the other jobs and the cpusets and the pbs server cpu list 
> are the same.
> 
> More investigation required.  Still strange why would it give that message at 
> all?  Why would OpenMPI care, and why only when -np ## is given.
> 
> 
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> XSEDE Campus Champion
> bro...@umich.edu
> (734)936-1985
> 
> 
> 
> On Sep 23, 2014, at 3:27 PM, Maxime Boissonneault 
> <maxime.boissonnea...@calculquebec.ca> wrote:
> 
>> Do you know the topology of the cores allocated by Torque (i.e. were they 
>> all on the same nodes, or 8 per node, or a heterogenous distribution for 
>> example ?)
>> 
>> 
>> Le 2014-09-23 15:05, Brock Palen a écrit :
>>> Yes the request to torque was procs=64,
>>> 
>>> We are using cpusets.
>>> 
>>> the mpirun without -np 64  creates 64 spawned hostnames.
>>> 
>>> Brock Palen
>>> www.umich.edu/~brockp
>>> CAEN Advanced Computing
>>> XSEDE Campus Champion
>>> bro...@umich.edu
>>> (734)936-1985
>>> 
>>> 
>>> 
>>> On Sep 23, 2014, at 3:02 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>> 
>>>> FWIW: that warning has been removed from the upcoming 1.8.3 release
>>>> 
>>>> 
>>>> On Sep 23, 2014, at 11:45 AM, Reuti <re...@staff.uni-marburg.de> wrote:
>>>> 
>>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>>> Hash: SHA1
>>>>> 
>>>>> Am 23.09.2014 um 19:53 schrieb Brock Palen:
>>>>> 
>>>>>> I found a fun head scratcher, with openmpi 1.8.2  with torque 5 built 
>>>>>> with TM support, on hereto core layouts  I get the fun thing:
>>>>>> mpirun -report-bindings hostname        <-------- Works
>>>>> And you get 64 lines of output?
>>>>> 
>>>>> 
>>>>>> mpirun -report-bindings -np 64 hostname   <--------- Wat?
>>>>>> --------------------------------------------------------------------------
>>>>>> A request was made to bind to that would result in binding more
>>>>>> processes than cpus on a resource:
>>>>>> 
>>>>>> Bind to:     CORE
>>>>>> Node:        nyx5518
>>>>>> #processes:  2
>>>>>> #cpus:       1
>>>>>> 
>>>>>> You can override this protection by adding the "overload-allowed"
>>>>>> option to your binding directive.
>>>>>> --------------------------------------------------------------------------
>>>>> How many cores are physically installed on this machine - two as 
>>>>> mentioned above?
>>>>> 
>>>>> - -- Reuti
>>>>> 
>>>>> 
>>>>>> I ran with --oversubscribed and got the expected host list, which 
>>>>>> matched $PBS_NODEFILE and was 64 entires long:
>>>>>> 
>>>>>> mpirun -overload-allowed -report-bindings -np 64 --oversubscribe hostname
>>>>>> 
>>>>>> What did I do wrong?  I'm stumped why one works one doesn't but the one 
>>>>>> that doesn't if your force it appears correct.
>>>>>> 
>>>>>> 
>>>>>> Brock Palen
>>>>>> www.umich.edu/~brockp
>>>>>> CAEN Advanced Computing
>>>>>> XSEDE Campus Champion
>>>>>> bro...@umich.edu
>>>>>> (734)936-1985
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> Link to this post: 
>>>>>> http://www.open-mpi.org/community/lists/users/2014/09/25375.php
>>>>> -----BEGIN PGP SIGNATURE-----
>>>>> Version: GnuPG/MacGPG2 v2.0.20 (Darwin)
>>>>> Comment: GPGTools - http://gpgtools.org
>>>>> 
>>>>> iEYEARECAAYFAlQhv7IACgkQo/GbGkBRnRr3HgCgjZoD9l9a+WThl5CDaGF1jawx
>>>>> PWIAmwWnZwQdytNgAJgbir6V7yCyBt5D
>>>>> =NG0H
>>>>> -----END PGP SIGNATURE-----
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/users/2014/09/25376.php
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/users/2014/09/25378.php
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2014/09/25379.php
>> 
>> 
>> -- 
>> ---------------------------------
>> Maxime Boissonneault
>> Analyste de calcul - Calcul Québec, Université Laval
>> Ph. D. en physique
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/09/25380.php
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/09/25382.php

Reply via email to