As I said, we removed the warning starting in 1.8.3 On Sep 24, 2014, at 1:23 PM, Brock Palen <bro...@umich.edu> wrote:
> So very hetero, I did some testing and I couldn't make it happen below 32 > cores. Not sure if this the real issue or if it requires a specific layout: > > [brockp@nyx5512 ~]$ cat $PBS_NODEFILE | sort | uniq -c > 1 nyx5512 > 1 nyx5515 > 1 nyx5518 > 1 nyx5523 > 1 nyx5527 > 2 nyx5537 > 1 nyx5542 > 1 nyx5560 > 2 nyx5561 > 2 nyx5562 > 3 nyx5589 > 1 nyx5591 > 1 nyx5593 > 1 nyx5617 > 2 nyx5620 > 1 nyx5622 > 5 nyx5629 > 1 nyx5630 > 1 nyx5770 > 1 nyx5771 > 2 nyx5772 > 1 nyx5780 > 3 nyx5784 > 2 nyx5820 > 10 nyx5844 > 2 nyx5847 > 1 nyx5849 > 1 nyx5852 > 2 nyx5856 > 1 nyx5870 > 8 nyx5872 > 1 nyx5894 > > This sort of layout gives me that warning, if I leave -np 64 > A request was made to bind to that would result in binding more > processes than cpus on a resource: > > Bind to: CORE > Node: nyx5589 > #processes: 2 > #cpus: 1 > > If I omit the -np ## it works and nyx5589 does get 3 processes started. > > If I look at the binding of the three ranks on nyx5589 that it complains > about they appear correct: > [root@nyx5589 ~]# hwloc-bind --get --pid 24826 > 0x00000080 -> 7 > [root@nyx5589 ~]# hwloc-bind --get --pid 24827 > 0x00000400 -> 10 > [root@nyx5589 ~]# hwloc-bind --get --pid 24828 > 0x00001000 -> 12 > > I think I found the problem though, and its on torque side, while the CPU > set sets up 7, 10, and 12 > PBS server thinks it gave out 6,7, and 10. Thus where the only 2 processes > come from. > > I checked some of the other jobs and the cpusets and the pbs server cpu list > are the same. > > More investigation required. Still strange why would it give that message at > all? Why would OpenMPI care, and why only when -np ## is given. > > > Brock Palen > www.umich.edu/~brockp > CAEN Advanced Computing > XSEDE Campus Champion > bro...@umich.edu > (734)936-1985 > > > > On Sep 23, 2014, at 3:27 PM, Maxime Boissonneault > <maxime.boissonnea...@calculquebec.ca> wrote: > >> Do you know the topology of the cores allocated by Torque (i.e. were they >> all on the same nodes, or 8 per node, or a heterogenous distribution for >> example ?) >> >> >> Le 2014-09-23 15:05, Brock Palen a écrit : >>> Yes the request to torque was procs=64, >>> >>> We are using cpusets. >>> >>> the mpirun without -np 64 creates 64 spawned hostnames. >>> >>> Brock Palen >>> www.umich.edu/~brockp >>> CAEN Advanced Computing >>> XSEDE Campus Champion >>> bro...@umich.edu >>> (734)936-1985 >>> >>> >>> >>> On Sep 23, 2014, at 3:02 PM, Ralph Castain <r...@open-mpi.org> wrote: >>> >>>> FWIW: that warning has been removed from the upcoming 1.8.3 release >>>> >>>> >>>> On Sep 23, 2014, at 11:45 AM, Reuti <re...@staff.uni-marburg.de> wrote: >>>> >>>>> -----BEGIN PGP SIGNED MESSAGE----- >>>>> Hash: SHA1 >>>>> >>>>> Am 23.09.2014 um 19:53 schrieb Brock Palen: >>>>> >>>>>> I found a fun head scratcher, with openmpi 1.8.2 with torque 5 built >>>>>> with TM support, on hereto core layouts I get the fun thing: >>>>>> mpirun -report-bindings hostname <-------- Works >>>>> And you get 64 lines of output? >>>>> >>>>> >>>>>> mpirun -report-bindings -np 64 hostname <--------- Wat? >>>>>> -------------------------------------------------------------------------- >>>>>> A request was made to bind to that would result in binding more >>>>>> processes than cpus on a resource: >>>>>> >>>>>> Bind to: CORE >>>>>> Node: nyx5518 >>>>>> #processes: 2 >>>>>> #cpus: 1 >>>>>> >>>>>> You can override this protection by adding the "overload-allowed" >>>>>> option to your binding directive. >>>>>> -------------------------------------------------------------------------- >>>>> How many cores are physically installed on this machine - two as >>>>> mentioned above? >>>>> >>>>> - -- Reuti >>>>> >>>>> >>>>>> I ran with --oversubscribed and got the expected host list, which >>>>>> matched $PBS_NODEFILE and was 64 entires long: >>>>>> >>>>>> mpirun -overload-allowed -report-bindings -np 64 --oversubscribe hostname >>>>>> >>>>>> What did I do wrong? I'm stumped why one works one doesn't but the one >>>>>> that doesn't if your force it appears correct. >>>>>> >>>>>> >>>>>> Brock Palen >>>>>> www.umich.edu/~brockp >>>>>> CAEN Advanced Computing >>>>>> XSEDE Campus Champion >>>>>> bro...@umich.edu >>>>>> (734)936-1985 >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/users/2014/09/25375.php >>>>> -----BEGIN PGP SIGNATURE----- >>>>> Version: GnuPG/MacGPG2 v2.0.20 (Darwin) >>>>> Comment: GPGTools - http://gpgtools.org >>>>> >>>>> iEYEARECAAYFAlQhv7IACgkQo/GbGkBRnRr3HgCgjZoD9l9a+WThl5CDaGF1jawx >>>>> PWIAmwWnZwQdytNgAJgbir6V7yCyBt5D >>>>> =NG0H >>>>> -----END PGP SIGNATURE----- >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2014/09/25376.php >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2014/09/25378.php >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/09/25379.php >> >> >> -- >> --------------------------------- >> Maxime Boissonneault >> Analyste de calcul - Calcul Québec, Université Laval >> Ph. D. en physique >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/09/25380.php > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/09/25382.php