Hi, Were there any changes to rmaps in going to 1.10.2? An otherwise-identical setup that worked in 1.10.0 fails to launch in 1.10.2, complaining that there's no CPUs available in a socket...
With 1.10.0: $ /apps/openmpi/1.10.0/bin/mpirun -np 2 -mca rmaps_base_verbose 1000 hostname [r47:18709] mca: base: components_register: registering rmaps components [r47:18709] mca: base: components_register: found loaded component resilient [r47:18709] mca: base: components_register: component resilient register function successful [r47:18709] mca: base: components_register: found loaded component rank_file [r47:18709] mca: base: components_register: component rank_file register function successful [r47:18709] mca: base: components_register: found loaded component staged [r47:18709] mca: base: components_register: component staged has no register or open function [r47:18709] mca: base: components_register: found loaded component ppr [r47:18709] mca: base: components_register: component ppr register function successful [r47:18709] mca: base: components_register: found loaded component seq [r47:18709] mca: base: components_register: component seq register function successful [r47:18709] mca: base: components_register: found loaded component round_robin [r47:18709] mca: base: components_register: component round_robin register function successful [r47:18709] mca: base: components_register: found loaded component mindist [r47:18709] mca: base: components_register: component mindist register function successful [r47:18709] [[63529,0],0] rmaps:base set policy with core [r47:18709] mca: base: components_open: opening rmaps components [r47:18709] mca: base: components_open: found loaded component resilient [r47:18709] mca: base: components_open: component resilient open function successful [r47:18709] mca: base: components_open: found loaded component rank_file [r47:18709] mca: base: components_open: component rank_file open function successful [r47:18709] mca: base: components_open: found loaded component staged [r47:18709] mca: base: components_open: component staged open function successful [r47:18709] mca: base: components_open: found loaded component ppr [r47:18709] mca: base: components_open: component ppr open function successful [r47:18709] mca: base: components_open: found loaded component seq [r47:18709] mca: base: components_open: component seq open function successful [r47:18709] mca: base: components_open: found loaded component round_robin [r47:18709] mca: base: components_open: component round_robin open function successful [r47:18709] mca: base: components_open: found loaded component mindist [r47:18709] mca: base: components_open: component mindist open function successful [r47:18709] mca:rmaps:select: checking available component resilient [r47:18709] mca:rmaps:select: Querying component [resilient] [r47:18709] mca:rmaps:select: checking available component rank_file [r47:18709] mca:rmaps:select: Querying component [rank_file] [r47:18709] mca:rmaps:select: checking available component staged [r47:18709] mca:rmaps:select: Querying component [staged] [r47:18709] mca:rmaps:select: checking available component ppr [r47:18709] mca:rmaps:select: Querying component [ppr] [r47:18709] mca:rmaps:select: checking available component seq [r47:18709] mca:rmaps:select: Querying component [seq] [r47:18709] mca:rmaps:select: checking available component round_robin [r47:18709] mca:rmaps:select: Querying component [round_robin] [r47:18709] mca:rmaps:select: checking available component mindist [r47:18709] mca:rmaps:select: Querying component [mindist] [r47:18709] [[63529,0],0]: Final mapper priorities [r47:18709] Mapper: ppr Priority: 90 [r47:18709] Mapper: seq Priority: 60 [r47:18709] Mapper: resilient Priority: 40 [r47:18709] Mapper: mindist Priority: 20 [r47:18709] Mapper: round_robin Priority: 10 [r47:18709] Mapper: staged Priority: 5 [r47:18709] Mapper: rank_file Priority: 0 [r47:18709] mca:rmaps: mapping job [63529,1] [r47:18709] mca:rmaps: creating new map for job [63529,1] [r47:18709] mca:rmaps: nprocs 2 [r47:18709] mca:rmaps mapping given - using default [r47:18709] mca:rmaps:ppr: job [63529,1] not using ppr mapper [r47:18709] mca:rmaps:seq: job [63529,1] not using seq mapper [r47:18709] mca:rmaps:resilient: cannot perform initial map of job [63529,1] - no fault groups [r47:18709] mca:rmaps:mindist: job [63529,1] not using mindist mapper [r47:18709] mca:rmaps:rr: mapping job [63529,1] [r47:18709] AVAILABLE NODES FOR MAPPING: [r47:18709] node: r47 daemon: 0 [r47:18709] node: r57 daemon: 1 [r47:18709] node: r58 daemon: 2 [r47:18709] node: r59 daemon: 3 [r47:18709] mca:rmaps:rr: mapping no-span by Core for job [63529,1] slots 64 num_procs 2 [r47:18709] mca:rmaps:rr: found 16 Core objects on node r47 [r47:18709] mca:rmaps:rr: assigning proc to object 0 [r47:18709] mca:rmaps:rr: assigning proc to object 1 [r47:18709] mca:rmaps: computing ranks by core for job [63529,1] [r47:18709] mca:rmaps:rank_by: found 16 objects on node r47 with 2 procs [r47:18709] mca:rmaps:rank_by: assigned rank 0 [r47:18709] mca:rmaps:rank_by: assigned rank 1 [r47:18709] mca:rmaps:rank_by: found 16 objects on node r57 with 0 procs [r47:18709] mca:rmaps:rank_by: found 16 objects on node r58 with 0 procs [r47:18709] mca:rmaps:rank_by: found 16 objects on node r59 with 0 procs [r47:18709] mca:rmaps: compute bindings for job [63529,1] with policy CORE[4008] [r47:18709] mca:rmaps: bindings for job [63529,1] - bind in place [r47:18709] mca:rmaps: bind in place for job [63529,1] with bindings CORE [r47:18709] [[63529,0],0] reset_usage: node r47 has 2 procs on it [r47:18709] [[63529,0],0] reset_usage: ignoring proc [[63529,1],0] [r47:18709] [[63529,0],0] reset_usage: ignoring proc [[63529,1],1] [r47:18709] BINDING PROC [[63529,1],0] TO Core NUMBER 0 [r47:18709] [[63529,0],0] BOUND PROC [[63529,1],0] TO 0[Core:0] on node r47 [r47:18709] BINDING PROC [[63529,1],1] TO Core NUMBER 1 [r47:18709] [[63529,0],0] BOUND PROC [[63529,1],1] TO 1[Core:1] on node r47 r47 r47 [r47:18709] mca: base: close: component resilient closed [r47:18709] mca: base: close: unloading component resilient [r47:18709] mca: base: close: component rank_file closed [r47:18709] mca: base: close: unloading component rank_file [r47:18709] mca: base: close: component staged closed [r47:18709] mca: base: close: unloading component staged [r47:18709] mca: base: close: component ppr closed [r47:18709] mca: base: close: unloading component ppr [r47:18709] mca: base: close: component seq closed [r47:18709] mca: base: close: unloading component seq [r47:18709] mca: base: close: component round_robin closed [r47:18709] mca: base: close: unloading component round_robin [r47:18709] mca: base: close: component mindist closed [r47:18709] mca: base: close: unloading component mindist With 1.10.2: $ /apps/openmpi/1.10.2/bin/mpirun -np 2 -mca rmaps_base_verbose 1000 hostname [r47:18733] mca: base: components_register: registering rmaps components [r47:18733] mca: base: components_register: found loaded component resilient [r47:18733] mca: base: components_register: component resilient register function successful [r47:18733] mca: base: components_register: found loaded component rank_file [r47:18733] mca: base: components_register: component rank_file register function successful [r47:18733] mca: base: components_register: found loaded component staged [r47:18733] mca: base: components_register: component staged has no register or open function [r47:18733] mca: base: components_register: found loaded component ppr [r47:18733] mca: base: components_register: component ppr register function successful [r47:18733] mca: base: components_register: found loaded component seq [r47:18733] mca: base: components_register: component seq register function successful [r47:18733] mca: base: components_register: found loaded component round_robin [r47:18733] mca: base: components_register: component round_robin register function successful [r47:18733] mca: base: components_register: found loaded component mindist [r47:18733] mca: base: components_register: component mindist register function successful [r47:18733] [[63505,0],0] rmaps:base set policy with core [r47:18733] mca: base: components_open: opening rmaps components [r47:18733] mca: base: components_open: found loaded component resilient [r47:18733] mca: base: components_open: component resilient open function successful [r47:18733] mca: base: components_open: found loaded component rank_file [r47:18733] mca: base: components_open: component rank_file open function successful [r47:18733] mca: base: components_open: found loaded component staged [r47:18733] mca: base: components_open: component staged open function successful [r47:18733] mca: base: components_open: found loaded component ppr [r47:18733] mca: base: components_open: component ppr open function successful [r47:18733] mca: base: components_open: found loaded component seq [r47:18733] mca: base: components_open: component seq open function successful [r47:18733] mca: base: components_open: found loaded component round_robin [r47:18733] mca: base: components_open: component round_robin open function successful [r47:18733] mca: base: components_open: found loaded component mindist [r47:18733] mca: base: components_open: component mindist open function successful [r47:18733] mca:rmaps:select: checking available component resilient [r47:18733] mca:rmaps:select: Querying component [resilient] [r47:18733] mca:rmaps:select: checking available component rank_file [r47:18733] mca:rmaps:select: Querying component [rank_file] [r47:18733] mca:rmaps:select: checking available component staged [r47:18733] mca:rmaps:select: Querying component [staged] [r47:18733] mca:rmaps:select: checking available component ppr [r47:18733] mca:rmaps:select: Querying component [ppr] [r47:18733] mca:rmaps:select: checking available component seq [r47:18733] mca:rmaps:select: Querying component [seq] [r47:18733] mca:rmaps:select: checking available component round_robin [r47:18733] mca:rmaps:select: Querying component [round_robin] [r47:18733] mca:rmaps:select: checking available component mindist [r47:18733] mca:rmaps:select: Querying component [mindist] [r47:18733] [[63505,0],0]: Final mapper priorities [r47:18733] Mapper: ppr Priority: 90 [r47:18733] Mapper: seq Priority: 60 [r47:18733] Mapper: resilient Priority: 40 [r47:18733] Mapper: mindist Priority: 20 [r47:18733] Mapper: round_robin Priority: 10 [r47:18733] Mapper: staged Priority: 5 [r47:18733] Mapper: rank_file Priority: 0 [r47:18733] mca:rmaps: mapping job [63505,1] [r47:18733] mca:rmaps: creating new map for job [63505,1] [r47:18733] mca:rmaps: nprocs 2 [r47:18733] mca:rmaps mapping given - using default [r47:18733] mca:rmaps:ppr: job [63505,1] not using ppr mapper [r47:18733] mca:rmaps:seq: job [63505,1] not using seq mapper [r47:18733] mca:rmaps:resilient: cannot perform initial map of job [63505,1] - no fault groups [r47:18733] mca:rmaps:mindist: job [63505,1] not using mindist mapper [r47:18733] mca:rmaps:rr: mapping job [63505,1] [r47:18733] AVAILABLE NODES FOR MAPPING: [r47:18733] node: r47 daemon: 0 [r47:18733] node: r57 daemon: 1 [r47:18733] node: r58 daemon: 2 [r47:18733] node: r59 daemon: 3 [r47:18733] mca:rmaps:rr: mapping no-span by Core for job [63505,1] slots 64 num_procs 2 [r47:18733] mca:rmaps:rr: found 16 Core objects on node r47 [r47:18733] mca:rmaps:rr: assigning proc to object 0 -------------------------------------------------------------------------- A request for multiple cpus-per-proc was given, but a directive was also give to map to an object level that has less cpus than requested ones: #cpus-per-proc: 1 number of cpus: 0 map-by: BYCORE:NOOVERSUBSCRIBE Please specify a mapping level that has more cpus, or else let us define a default mapping that will allow multiple cpus-per-proc. -------------------------------------------------------------------------- [r47:18733] mca: base: close: component resilient closed [r47:18733] mca: base: close: unloading component resilient [r47:18733] mca: base: close: component rank_file closed [r47:18733] mca: base: close: unloading component rank_file [r47:18733] mca: base: close: component staged closed [r47:18733] mca: base: close: unloading component staged [r47:18733] mca: base: close: component ppr closed [r47:18733] mca: base: close: unloading component ppr [r47:18733] mca: base: close: component seq closed [r47:18733] mca: base: close: unloading component seq [r47:18733] mca: base: close: component round_robin closed [r47:18733] mca: base: close: unloading component round_robin [r47:18733] mca: base: close: component mindist closed [r47:18733] mca: base: close: unloading component mindist There are both in the same PBS Pro job. And the cpuset definitely has all cores available: $ cat /cgroup/cpuset/pbspro/4347646.r-man2/cpuset.cpus 0-15 Is there something here I'm missing? Cheers, Ben