Hi Gilles, Wow, thanks - that was quick. I'm rebuilding now.
Cheers, Ben -----Original Message----- From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gilles Gouaillardet Sent: Friday, 29 January 2016 1:54 PM To: Open MPI Users <us...@open-mpi.org> Subject: Re: [OMPI users] Any changes to rmaps in 1.10.2? Ben, here is a patch that does fix that sorry for the inconvenience and thanks for your help in understanding this issue Cheers, Gilles diff --git a/opal/mca/hwloc/base/hwloc_base_util.c b/opal/mca/hwloc/base/hwloc_base_util.c index 237c6b0..a4fa193 100644 --- a/opal/mca/hwloc/base/hwloc_base_util.c +++ b/opal/mca/hwloc/base/hwloc_base_util.c @@ -492,8 +492,11 @@ static void df_search_cores(hwloc_obj_t obj, unsigned int *cnt) obj->userdata = (void*)data; } if (NULL == opal_hwloc_base_cpu_set) { - if (!hwloc_bitmap_isincluded(obj->cpuset, obj->allowed_cpuset)) { - /* do not count not allowed cores */ + if (!hwloc_bitmap_intersects(obj->cpuset, obj->allowed_cpuset)) { + /* + * do not count not allowed cores (e.g. cores with zero allowed PU) + * if SMT is enabled, do count cores with at least one allowed hwthread + */ return; } data->npus = 1; On 1/29/2016 11:43 AM, Ben Menadue wrote: > Yes, I'm able to reproduce it on a single node as well. > > Actually, even on just a single CPU (and -np 1) - won't let me launch unless > both threads of that core are in the cgroup. > > > -----Original Message----- > From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gilles > Gouaillardet > Sent: Friday, 29 January 2016 1:33 PM > To: Open MPI Users <us...@open-mpi.org> > Subject: Re: [OMPI users] Any changes to rmaps in 1.10.2? > > I was able to reproduce the issue on one node with a cpuset manually set. > > fwiw, i cannot reproduce the issue using taskset instead of cpuset (!) > > Cheers, > > Gilles > > On 1/29/2016 11:08 AM, Ben Menadue wrote: >> Hi Gilles, Ralph, >> >> Okay, it definitely seems to be due to the cpuset having only one of >> the hyperthreads of each physical core: >> >> >> [13:02:13 root@r60:4363542.r-man2] # echo 0-15 > cpuset.cpus >> >> 13:03 bjm900@r60 ~ > cat >> /cgroup/cpuset/pbspro/4363542.r-man2/cpuset.cpus >> 0-15 >> >> 13:03 bjm900@r60 ~ > /apps/openmpi/1.10.2/bin/mpirun hostname >> ---------------------------------------------------------------------- >> ---- A request for multiple cpus-per-proc was given, but a directive >> was also give to map to an object level that has less cpus than >> requested ones: >> >> #cpus-per-proc: 1 >> number of cpus: 0 >> map-by: BYCORE:NOOVERSUBSCRIBE >> >> Please specify a mapping level that has more cpus, or else let us >> define a default mapping that will allow multiple cpus-per-proc. >> ---------------------------------------------------------------------- >> ---- >> >> [13:03:43 root@r60:4363542.r-man2] # echo 0-31 > cpuset.cpus >> >> 13:03 bjm900@r60 ~ > cat >> /cgroup/cpuset/pbspro/4363542.r-man2/cpuset.cpus >> 0-31 >> >> 13:04 bjm900@r60 ~ > /apps/openmpi/1.10.2/bin/mpirun hostname >> <...hostnames...> >> >> >> Cheers, >> Ben >> >> >> >> -----Original Message----- >> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ben >> Menadue >> Sent: Friday, 29 January 2016 1:01 PM >> To: 'Open MPI Users' <us...@open-mpi.org> >> Subject: Re: [OMPI users] Any changes to rmaps in 1.10.2? >> >> Hi Gilles, >> >>> with respect to PBS, are both OpenMPI built the same way ? >>> e.g. configure --with-tm=/opt/pbs/default or something similar >> Both are built against TM explicitly using the --with-tm option. >> >>> you ran run >>> mpirun --mca plm_base_verbose 100 --mca ess_base_verbose 100 --mca >> ras_base_verbose 100 hostname >>> and you should see the "tm" module in the logs. >> Yes, it appears to use TM from what I can see. Outputs from 1.10.0 and >> 1.10.2 are attached from inside the same job - they look identical >> (apart from the pids), except at the very end where 1.10.2 errors out >> while 1.10.1 continues. >> >>> i noticed you run >>> mpirun -np 2 ... >>> is there any reason why you explicitly request 2 tasks ? >> The "-np 2" is because that's what I was using to benchmark the >> install >> (osu_bibw) and I just copied it over from when I realised it wasn't >> even starting. But it does the same regardless of whether I specify >> the number of processes or not (without it gets the number of tasks from > PBS). >>> by any chance, is hyperthreading enabled on your compute node ? >>> /* if yes, that means all cores are in the cpuset, but with only one >> thread per core */ >> >> The nodes are 2 x 8-core sockets with hyper-threading on, and you can >> chose whether to use the extra hardware threads when submitting the >> job. If you want them, your cgroup includes both threads on each core >> (e.g. 0-31), otherwise only one thread (e.g. 0-15) (cores 16-32 are >> the thread siblings of cores 0-15). >> >> For reference, the PBS job I was using above had ncpus=32,mem=16G, >> which becomes >> >> select=2:ncpus=16:mpiprocs=16:mem=8589934592b >> >> under the hood with a cpuset containing cores 0-15 on each of the two > nodes. >> Interestingly, if I use a cpuset containing both threads of each >> physical core (i.e. ask for hyperthreading on job submission), then it >> runs fine under 1.10.2. >> >> Cheers, >> Ben >> >> >> -----Original Message----- >> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gilles >> Gouaillardet >> Sent: Friday, 29 January 2016 11:07 AM >> To: Open MPI Users <us...@open-mpi.org> >> Subject: Re: [OMPI users] Any changes to rmaps in 1.10.2? >> >> Ben, >> >> >> >> that is not needed if you submit with qsub -l nodes=1:ppn=2 do you >> observe the same behavior without -np 2 ? >> >> >> Cheers, >> >> Gilles >> >> On 1/28/2016 7:57 AM, Ben Menadue wrote: >>> Hi, >>> >>> Were there any changes to rmaps in going to 1.10.2? An >>> otherwise-identical setup that worked in 1.10.0 fails to launch in >>> 1.10.2, complaining that there's no CPUs available in a socket... >>> >>> With 1.10.0: >>> >>> $ /apps/openmpi/1.10.0/bin/mpirun -np 2 -mca rmaps_base_verbose 1000 >>> hostname [r47:18709] mca: base: components_register: registering >>> rmaps components [r47:18709] mca: base: components_register: found >>> loaded component resilient [r47:18709] mca: base: components_register: >>> component resilient register function successful [r47:18709] mca: >>> base: components_register: found loaded component rank_file >>> [r47:18709] mca: base: components_register: component rank_file >>> register function successful [r47:18709] mca: base: >>> components_register: found loaded component staged [r47:18709] mca: >>> base: components_register: component staged has no register or open >>> function [r47:18709] mca: base: components_register: found loaded >>> component ppr [r47:18709] mca: base: components_register: component >>> ppr register function successful [r47:18709] mca: base: >>> components_register: found loaded component seq [r47:18709] mca: base: >>> components_register: component seq register function successful >>> [r47:18709] mca: base: components_register: found loaded component >>> round_robin [r47:18709] mca: base: components_register: component >>> round_robin register function successful [r47:18709] mca: base: >>> components_register: found loaded component mindist [r47:18709] mca: >>> base: components_register: component mindist register function >>> successful [r47:18709] [[63529,0],0] rmaps:base set policy with core >>> [r47:18709] mca: base: components_open: opening rmaps components >>> [r47:18709] mca: base: components_open: found loaded component >>> resilient [r47:18709] mca: base: components_open: component resilient >>> open function successful [r47:18709] mca: base: components_open: >>> found loaded component rank_file [r47:18709] mca: base: components_open: >>> component rank_file open function successful [r47:18709] mca: base: >>> components_open: found loaded component staged [r47:18709] mca: base: >>> components_open: component staged open function successful >>> [r47:18709] >>> mca: base: components_open: found loaded component ppr [r47:18709] >>> mca: base: components_open: component ppr open function successful >>> [r47:18709] mca: base: components_open: found loaded component seq >>> [r47:18709] mca: base: components_open: component seq open function >>> successful [r47:18709] mca: base: components_open: found loaded >>> component round_robin [r47:18709] mca: base: components_open: >>> component round_robin open function successful [r47:18709] mca: base: >>> components_open: found loaded component mindist [r47:18709] mca: base: >>> components_open: component mindist open function successful >>> [r47:18709] mca:rmaps:select: checking available component resilient >>> [r47:18709] mca:rmaps:select: Querying component [resilient] >>> [r47:18709] mca:rmaps:select: checking available component rank_file >>> [r47:18709] mca:rmaps:select: Querying component [rank_file] >>> [r47:18709] mca:rmaps:select: checking available component staged >>> [r47:18709] mca:rmaps:select: Querying component [staged] [r47:18709] >>> mca:rmaps:select: checking available component ppr [r47:18709] >>> mca:rmaps:select: Querying component [ppr] [r47:18709] >>> mca:rmaps:select: checking available component seq [r47:18709] >>> mca:rmaps:select: Querying component [seq] [r47:18709] >>> mca:rmaps:select: checking available component round_robin >>> [r47:18709] >>> mca:rmaps:select: Querying component [round_robin] [r47:18709] >>> mca:rmaps:select: checking available component mindist [r47:18709] >>> mca:rmaps:select: Querying component [mindist] [r47:18709] >>> [[63529,0],0]: Final mapper priorities >>> [r47:18709] Mapper: ppr Priority: 90 >>> [r47:18709] Mapper: seq Priority: 60 >>> [r47:18709] Mapper: resilient Priority: 40 >>> [r47:18709] Mapper: mindist Priority: 20 >>> [r47:18709] Mapper: round_robin Priority: 10 >>> [r47:18709] Mapper: staged Priority: 5 >>> [r47:18709] Mapper: rank_file Priority: 0 >>> [r47:18709] mca:rmaps: mapping job [63529,1] [r47:18709] mca:rmaps: >>> creating new map for job [63529,1] [r47:18709] mca:rmaps: nprocs 2 >>> [r47:18709] mca:rmaps mapping given - using default [r47:18709] >>> mca:rmaps:ppr: job [63529,1] not using ppr mapper [r47:18709] >>> mca:rmaps:seq: job [63529,1] not using seq mapper [r47:18709] >>> mca:rmaps:resilient: cannot perform initial map of job [63529,1] >>> - no fault groups >>> [r47:18709] mca:rmaps:mindist: job [63529,1] not using mindist mapper >>> [r47:18709] mca:rmaps:rr: mapping job [63529,1] [r47:18709] AVAILABLE >>> NODES FOR MAPPING: >>> [r47:18709] node: r47 daemon: 0 >>> [r47:18709] node: r57 daemon: 1 >>> [r47:18709] node: r58 daemon: 2 >>> [r47:18709] node: r59 daemon: 3 >>> [r47:18709] mca:rmaps:rr: mapping no-span by Core for job [63529,1] >>> slots 64 num_procs 2 [r47:18709] mca:rmaps:rr: found 16 Core objects >>> on node r47 [r47:18709] mca:rmaps:rr: assigning proc to object 0 >>> [r47:18709] mca:rmaps:rr: assigning proc to object 1 [r47:18709] >>> mca:rmaps: computing ranks by core for job [63529,1] [r47:18709] >>> mca:rmaps:rank_by: found 16 objects on node r47 with 2 procs >>> [r47:18709] mca:rmaps:rank_by: assigned rank 0 [r47:18709] >>> mca:rmaps:rank_by: assigned rank 1 [r47:18709] mca:rmaps:rank_by: >>> found 16 objects on node r57 with 0 procs [r47:18709] >>> mca:rmaps:rank_by: found 16 objects on node r58 with 0 procs >>> [r47:18709] mca:rmaps:rank_by: found 16 objects on node r59 with 0 >>> procs [r47:18709] mca:rmaps: compute bindings for job [63529,1] with >>> policy CORE[4008] [r47:18709] mca:rmaps: bindings for job [63529,1] - >>> bind in place [r47:18709] mca:rmaps: bind in place for job [63529,1] >>> with bindings CORE [r47:18709] [[63529,0],0] reset_usage: node r47 >>> has >>> 2 procs on it [r47:18709] [[63529,0],0] reset_usage: ignoring proc >>> [[63529,1],0] [r47:18709] [[63529,0],0] reset_usage: ignoring proc >>> [[63529,1],1] [r47:18709] BINDING PROC [[63529,1],0] TO Core NUMBER 0 >>> [r47:18709] [[63529,0],0] BOUND PROC [[63529,1],0] TO 0[Core:0] on >>> node r47 [r47:18709] BINDING PROC [[63529,1],1] TO Core NUMBER 1 >>> [r47:18709] [[63529,0],0] BOUND PROC [[63529,1],1] TO 1[Core:1] on >>> node r47 >>> r47 >>> r47 >>> [r47:18709] mca: base: close: component resilient closed [r47:18709] >>> mca: base: close: unloading component resilient [r47:18709] mca: base: >>> close: component rank_file closed [r47:18709] mca: base: close: >>> unloading component rank_file [r47:18709] mca: base: close: component >>> staged closed [r47:18709] mca: base: close: unloading component >>> staged [r47:18709] mca: base: close: component ppr closed [r47:18709] > mca: >>> base: close: unloading component ppr [r47:18709] mca: base: close: >>> component seq closed [r47:18709] mca: base: close: unloading >>> component seq [r47:18709] mca: base: close: component round_robin >>> closed [r47:18709] mca: base: close: unloading component round_robin >>> [r47:18709] mca: base: close: component mindist closed [r47:18709] >>> mca: base: close: unloading component mindist >>> >>> With 1.10.2: >>> >>> $ /apps/openmpi/1.10.2/bin/mpirun -np 2 -mca rmaps_base_verbose 1000 >>> hostname [r47:18733] mca: base: components_register: registering >>> rmaps components [r47:18733] mca: base: components_register: found >>> loaded component resilient [r47:18733] mca: base: components_register: >>> component resilient register function successful [r47:18733] mca: >>> base: components_register: found loaded component rank_file >>> [r47:18733] mca: base: components_register: component rank_file >>> register function successful [r47:18733] mca: base: >>> components_register: found loaded component staged [r47:18733] mca: >>> base: components_register: component staged has no register or open >>> function [r47:18733] mca: base: components_register: found loaded >>> component ppr [r47:18733] mca: base: components_register: component >>> ppr register function successful [r47:18733] mca: base: >>> components_register: found loaded component seq [r47:18733] mca: base: >>> components_register: component seq register function successful >>> [r47:18733] mca: base: components_register: found loaded component >>> round_robin [r47:18733] mca: base: components_register: component >>> round_robin register function successful [r47:18733] mca: base: >>> components_register: found loaded component mindist [r47:18733] mca: >>> base: components_register: component mindist register function >>> successful [r47:18733] [[63505,0],0] rmaps:base set policy with core >>> [r47:18733] mca: base: components_open: opening rmaps components >>> [r47:18733] mca: base: components_open: found loaded component >>> resilient [r47:18733] mca: base: components_open: component resilient >>> open function successful [r47:18733] mca: base: components_open: >>> found loaded component rank_file [r47:18733] mca: base: components_open: >>> component rank_file open function successful [r47:18733] mca: base: >>> components_open: found loaded component staged [r47:18733] mca: base: >>> components_open: component staged open function successful >>> [r47:18733] >>> mca: base: components_open: found loaded component ppr [r47:18733] >>> mca: base: components_open: component ppr open function successful >>> [r47:18733] mca: base: components_open: found loaded component seq >>> [r47:18733] mca: base: components_open: component seq open function >>> successful [r47:18733] mca: base: components_open: found loaded >>> component round_robin [r47:18733] mca: base: components_open: >>> component round_robin open function successful [r47:18733] mca: base: >>> components_open: found loaded component mindist [r47:18733] mca: base: >>> components_open: component mindist open function successful >>> [r47:18733] mca:rmaps:select: checking available component resilient >>> [r47:18733] mca:rmaps:select: Querying component [resilient] >>> [r47:18733] mca:rmaps:select: checking available component rank_file >>> [r47:18733] mca:rmaps:select: Querying component [rank_file] >>> [r47:18733] mca:rmaps:select: checking available component staged >>> [r47:18733] mca:rmaps:select: Querying component [staged] [r47:18733] >>> mca:rmaps:select: checking available component ppr [r47:18733] >>> mca:rmaps:select: Querying component [ppr] [r47:18733] >>> mca:rmaps:select: checking available component seq [r47:18733] >>> mca:rmaps:select: Querying component [seq] [r47:18733] >>> mca:rmaps:select: checking available component round_robin >>> [r47:18733] >>> mca:rmaps:select: Querying component [round_robin] [r47:18733] >>> mca:rmaps:select: checking available component mindist [r47:18733] >>> mca:rmaps:select: Querying component [mindist] [r47:18733] >>> [[63505,0],0]: Final mapper priorities >>> [r47:18733] Mapper: ppr Priority: 90 >>> [r47:18733] Mapper: seq Priority: 60 >>> [r47:18733] Mapper: resilient Priority: 40 >>> [r47:18733] Mapper: mindist Priority: 20 >>> [r47:18733] Mapper: round_robin Priority: 10 >>> [r47:18733] Mapper: staged Priority: 5 >>> [r47:18733] Mapper: rank_file Priority: 0 >>> [r47:18733] mca:rmaps: mapping job [63505,1] [r47:18733] mca:rmaps: >>> creating new map for job [63505,1] [r47:18733] mca:rmaps: nprocs 2 >>> [r47:18733] mca:rmaps mapping given - using default [r47:18733] >>> mca:rmaps:ppr: job [63505,1] not using ppr mapper [r47:18733] >>> mca:rmaps:seq: job [63505,1] not using seq mapper [r47:18733] >>> mca:rmaps:resilient: cannot perform initial map of job [63505,1] >>> - no fault groups >>> [r47:18733] mca:rmaps:mindist: job [63505,1] not using mindist mapper >>> [r47:18733] mca:rmaps:rr: mapping job [63505,1] [r47:18733] AVAILABLE >>> NODES FOR MAPPING: >>> [r47:18733] node: r47 daemon: 0 >>> [r47:18733] node: r57 daemon: 1 >>> [r47:18733] node: r58 daemon: 2 >>> [r47:18733] node: r59 daemon: 3 >>> [r47:18733] mca:rmaps:rr: mapping no-span by Core for job [63505,1] >>> slots 64 num_procs 2 [r47:18733] mca:rmaps:rr: found 16 Core objects >>> on node r47 [r47:18733] mca:rmaps:rr: assigning proc to object 0 >>> --------------------------------------------------------------------- >>> - >>> ---- A request for multiple cpus-per-proc was given, but a directive >>> was also give to map to an object level that has less cpus than >>> requested ones: >>> >>> #cpus-per-proc: 1 >>> number of cpus: 0 >>> map-by: BYCORE:NOOVERSUBSCRIBE >>> >>> Please specify a mapping level that has more cpus, or else let us >>> define a default mapping that will allow multiple cpus-per-proc. >>> --------------------------------------------------------------------- >>> - >>> ---- [r47:18733] mca: base: close: component resilient closed >>> [r47:18733] mca: base: close: unloading component resilient >>> [r47:18733] mca: base: close: component rank_file closed [r47:18733] >>> mca: base: close: unloading component rank_file [r47:18733] mca: base: >>> close: component staged closed [r47:18733] mca: base: close: >>> unloading component staged [r47:18733] mca: base: close: component >>> ppr closed [r47:18733] mca: base: close: unloading component ppr > [r47:18733] mca: >>> base: close: component seq closed [r47:18733] mca: base: close: >>> unloading component seq [r47:18733] mca: base: close: component >>> round_robin closed [r47:18733] mca: base: close: unloading component >>> round_robin [r47:18733] mca: base: close: component mindist closed >>> [r47:18733] mca: base: close: unloading component mindist >>> >>> There are both in the same PBS Pro job. And the cpuset definitely has >>> all cores available: >>> >>> $ cat /cgroup/cpuset/pbspro/4347646.r-man2/cpuset.cpus >>> 0-15 >>> >>> Is there something here I'm missing? >>> >>> Cheers, >>> Ben >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2016/01/28393.php >>> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/01/28400.php >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/01/28402.php >> > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/01/28404.php > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: http://www.open-mpi.org/community/lists/users/2016/01/28405.php > _______________________________________________ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2016/01/28406.php