What version of slurm is this? I might try to debug it here. I’m not sure where the problem lies just yet.
> On Oct 3, 2015, at 8:59 AM, marcin.krotkiewski <marcin.krotkiew...@gmail.com> > wrote: > > Here is the output of lstopo. In short, (0,16) are core 0, (1,17) - core 1 > etc. > > Machine (64GB) > NUMANode L#0 (P#0 32GB) > Socket L#0 + L3 L#0 (20MB) > L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 > PU L#0 (P#0) > PU L#1 (P#16) > L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 > PU L#2 (P#1) > PU L#3 (P#17) > L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 > PU L#4 (P#2) > PU L#5 (P#18) > L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 > PU L#6 (P#3) > PU L#7 (P#19) > L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 > PU L#8 (P#4) > PU L#9 (P#20) > L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 > PU L#10 (P#5) > PU L#11 (P#21) > L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 > PU L#12 (P#6) > PU L#13 (P#22) > L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 > PU L#14 (P#7) > PU L#15 (P#23) > HostBridge L#0 > PCIBridge > PCI 8086:1521 > Net L#0 "eth0" > PCI 8086:1521 > Net L#1 "eth1" > PCIBridge > PCI 15b3:1003 > Net L#2 "ib0" > OpenFabrics L#3 "mlx4_0" > PCIBridge > PCI 102b:0532 > PCI 8086:1d02 > Block L#4 "sda" > NUMANode L#1 (P#1 32GB) + Socket L#1 + L3 L#1 (20MB) > L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8 > PU L#16 (P#8) > PU L#17 (P#24) > L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9 > PU L#18 (P#9) > PU L#19 (P#25) > L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10 > PU L#20 (P#10) > PU L#21 (P#26) > L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11 > PU L#22 (P#11) > PU L#23 (P#27) > L2 L#12 (256KB) + L1d L#12 (32KB) + L1i L#12 (32KB) + Core L#12 > PU L#24 (P#12) > PU L#25 (P#28) > L2 L#13 (256KB) + L1d L#13 (32KB) + L1i L#13 (32KB) + Core L#13 > PU L#26 (P#13) > PU L#27 (P#29) > L2 L#14 (256KB) + L1d L#14 (32KB) + L1i L#14 (32KB) + Core L#14 > PU L#28 (P#14) > PU L#29 (P#30) > L2 L#15 (256KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core L#15 > PU L#30 (P#15) > PU L#31 (P#31) > > > > On 10/03/2015 05:46 PM, Ralph Castain wrote: >> Maybe I’m just misreading your HT map - that slurm nodelist syntax is a new >> one to me, but they tend to change things around. Could you run lstopo on >> one of those compute nodes and send the output? >> >> I’m just suspicious because I’m not seeing a clear pairing of HT numbers in >> your output, but HT numbering is BIOS-specific and I may just not be >> understanding your particular pattern. Our error message is clearly >> indicating that we are seeing individual HTs (and not complete cores) >> assigned, and I don’t know the source of that confusion. >> >> >>> On Oct 3, 2015, at 8:28 AM, marcin.krotkiewski >>> <marcin.krotkiew...@gmail.com <mailto:marcin.krotkiew...@gmail.com>> wrote: >>> >>> >>> On 10/03/2015 04:38 PM, Ralph Castain wrote: >>>> If mpirun isn’t trying to do any binding, then you will of course get the >>>> right mapping as we’ll just inherit whatever we received. >>> Yes. I meant that whatever you received (what SLURM gives) is a correct cpu >>> map and assigns _whole_ CPUs, not a single HT to MPI processes. In the case >>> mentioned earlier openmpi should start 6 tasks on c1-30. If HT would be >>> treated as separate and independent cores, sched_getaffinity of an MPI >>> process started on c1-30 would return a map with 6 entries only. In my case >>> it returns a map with 12 entries - 2 for each core. So one process is in >>> fact allocated both HTs, not only one. Is what I'm saying correct? >>> >>>> Looking at your output, it’s pretty clear that you are getting independent >>>> HTs assigned and not full cores. >>> How do you mean? Is the above understanding wrong? I would expect that on >>> c1-30 with --bind-to core openmpi should bind to logical cores 0 and 16 >>> (rank 0), 1 and 17 (rank 2) and so on. All those logical cores are >>> available in sched_getaffinity map, and there is twice as many logical >>> cores as there are MPI processes started on the node. >>> >>>> My guess is that something in slurm has changed such that it detects that >>>> HT has been enabled, and then begins treating the HTs as completely >>>> independent cpus. >>>> >>>> Try changing “-bind-to core” to “-bind-to hwthread -use-hwthread-cpus” >>>> and see if that works >>>> >>> I have and the binding is wrong. For example, I got this output >>> >>> rank 0 @ compute-1-30.local 0, >>> rank 1 @ compute-1-30.local 16, >>> >>> Which means that two ranks have been bound to the same physical core >>> (logical cores 0 and 16 are two HTs of the same core). If I use --bind-to >>> core, I get the following correct binding >>> >>> rank 0 @ compute-1-30.local 0, 16, >>> >>> The problem is many other ranks get bad binding with 'rank XXX is not bound >>> (or bound to all available processors)' warning. >>> >>> But I think I was not entirely correct saying that 1.10.1rc1 did not fix >>> things. It still might have improved something, but not everything. >>> Consider this job: >>> >>> SLURM_JOB_CPUS_PER_NODE='5,4,6,5(x2),7,5,9,5,7,6' >>> SLURM_JOB_NODELIST='c8-[31,34],c9-[30-32,35-36],c10-[31-34]' >>> >>> If I run 32 tasks as follows (with 1.10.1rc1) >>> >>> mpirun --hetero-nodes --report-bindings --bind-to core -np 32 ./affinity >>> >>> I get the following error: >>> >>> -------------------------------------------------------------------------- >>> A request was made to bind to that would result in binding more >>> processes than cpus on a resource: >>> >>> Bind to: CORE >>> Node: c9-31 >>> #processes: 2 >>> #cpus: 1 >>> >>> You can override this protection by adding the "overload-allowed" >>> option to your binding directive. >>> -------------------------------------------------------------------------- >>> >>> >>> If I now use --bind-to core:overload-allowed, then openmpi starts and >>> _most_ of the threads are bound correctly (i.e., map contains two logical >>> cores in ALL cases), except this case that required the overload flag: >>> >>> rank 15 @ compute-9-31.local 1, 17, >>> rank 16 @ compute-9-31.local 11, 27, >>> rank 17 @ compute-9-31.local 2, 18, >>> rank 18 @ compute-9-31.local 12, 28, >>> rank 19 @ compute-9-31.local 1, 17, >>> >>> Note pair 1,17 is used twice. The original SLURM delivered map (no binding) >>> on this node is >>> >>> rank 15 @ compute-9-31.local 1, 2, 11, 12, 13, 17, 18, 27, 28, 29, >>> rank 16 @ compute-9-31.local 1, 2, 11, 12, 13, 17, 18, 27, 28, 29, >>> rank 17 @ compute-9-31.local 1, 2, 11, 12, 13, 17, 18, 27, 28, 29, >>> rank 18 @ compute-9-31.local 1, 2, 11, 12, 13, 17, 18, 27, 28, 29, >>> rank 19 @ compute-9-31.local 1, 2, 11, 12, 13, 17, 18, 27, 28, 29, >>> >>> Why does openmpi use cores (1,17) twice instead of using core (13,29)? >>> Clearly, the original SLURM-delivered map has 5 CPUs included, enough for 5 >>> MPI processes. >>> >>> Cheers, >>> >>> Marcin >>> >>> >>>> >>>>> On Oct 3, 2015, at 7:12 AM, marcin.krotkiewski >>>>> <marcin.krotkiew...@gmail.com <mailto:marcin.krotkiew...@gmail.com>> >>>>> wrote: >>>>> >>>>> >>>>> On 10/03/2015 01:06 PM, Ralph Castain wrote: >>>>>> Thanks Marcin. Looking at this, I’m guessing that Slurm may be treating >>>>>> HTs as “cores” - i.e., as independent cpus. Any chance that is true? >>>>> Not to the best of my knowledge, and at least not intentionally. SLURM >>>>> starts as many processes as there are physical cores, not threads. To >>>>> verify this, consider this test case: >>>>> >>>>> SLURM_JOB_CPUS_PER_NODE='6,8(x2),10' >>>>> SLURM_JOB_NODELIST='c1-[30-31],c2-[32,34]' >>>>> >>>>> If I now execute only one mpi process WITH NO BINDING, it will go onto >>>>> c1-30 and should have a map with 6 CPUs (12 hw threads). I run >>>>> >>>>> mpirun --bind-to none -np 1 ./affinity >>>>> rank 0 @ compute-1-30.local 0, 1, 3, 4, 5, 6, 16, 17, 19, 20, 21, 22, >>>>> >>>>> I have attached the affinity.c program FYI. Clearly, sched_getaffinity in >>>>> my test code returns the correct map. >>>>> >>>>> Now if I try to start all 32 processes in this example (still no binding): >>>>> >>>>> rank 0 @ compute-1-30.local 0, 1, 3, 4, 5, 6, 16, 17, 19, 20, 21, 22, >>>>> rank 1 @ compute-1-30.local 0, 1, 3, 4, 5, 6, 16, 17, 19, 20, 21, 22, >>>>> rank 10 @ compute-1-31.local 2, 3, 7, 11, 12, 13, 14, 15, 18, 19, 23, >>>>> 27, 28, 29, 30, 31, >>>>> rank 11 @ compute-1-31.local 2, 3, 7, 11, 12, 13, 14, 15, 18, 19, 23, >>>>> 27, 28, 29, 30, 31, >>>>> rank 12 @ compute-1-31.local 2, 3, 7, 11, 12, 13, 14, 15, 18, 19, 23, >>>>> 27, 28, 29, 30, 31, >>>>> rank 13 @ compute-1-31.local 2, 3, 7, 11, 12, 13, 14, 15, 18, 19, 23, >>>>> 27, 28, 29, 30, 31, >>>>> rank 6 @ compute-1-31.local 2, 3, 7, 11, 12, 13, 14, 15, 18, 19, 23, 27, >>>>> 28, 29, 30, 31, >>>>> rank 2 @ compute-1-30.local 0, 1, 3, 4, 5, 6, 16, 17, 19, 20, 21, 22, >>>>> rank 7 @ compute-1-31.local 2, 3, 7, 11, 12, 13, 14, 15, 18, 19, 23, 27, >>>>> 28, 29, 30, 31, >>>>> rank 8 @ compute-1-31.local 2, 3, 7, 11, 12, 13, 14, 15, 18, 19, 23, 27, >>>>> 28, 29, 30, 31, >>>>> rank 3 @ compute-1-30.local 0, 1, 3, 4, 5, 6, 16, 17, 19, 20, 21, 22, >>>>> rank 14 @ compute-2-32.local 7, 8, 9, 10, 11, 12, 13, 14, 23, 24, 25, >>>>> 26, 27, 28, 29, 30, >>>>> rank 4 @ compute-1-30.local 0, 1, 3, 4, 5, 6, 16, 17, 19, 20, 21, 22, >>>>> rank 15 @ compute-2-32.local 7, 8, 9, 10, 11, 12, 13, 14, 23, 24, 25, >>>>> 26, 27, 28, 29, 30, >>>>> rank 9 @ compute-1-31.local 2, 3, 7, 11, 12, 13, 14, 15, 18, 19, 23, 27, >>>>> 28, 29, 30, 31, >>>>> rank 5 @ compute-1-30.local 0, 1, 3, 4, 5, 6, 16, 17, 19, 20, 21, 22, >>>>> rank 16 @ compute-2-32.local 7, 8, 9, 10, 11, 12, 13, 14, 23, 24, 25, >>>>> 26, 27, 28, 29, 30, >>>>> rank 17 @ compute-2-32.local 7, 8, 9, 10, 11, 12, 13, 14, 23, 24, 25, >>>>> 26, 27, 28, 29, 30, >>>>> rank 29 @ compute-2-34.local 0, 1, 2, 3, 4, 5, 6, 7, 14, 15, 16, 17, 18, >>>>> 19, 20, 21, 22, 23, 30, 31, >>>>> rank 30 @ compute-2-34.local 0, 1, 2, 3, 4, 5, 6, 7, 14, 15, 16, 17, 18, >>>>> 19, 20, 21, 22, 23, 30, 31, >>>>> rank 18 @ compute-2-32.local 7, 8, 9, 10, 11, 12, 13, 14, 23, 24, 25, >>>>> 26, 27, 28, 29, 30, >>>>> rank 19 @ compute-2-32.local 7, 8, 9, 10, 11, 12, 13, 14, 23, 24, 25, >>>>> 26, 27, 28, 29, 30, >>>>> rank 31 @ compute-2-34.local 0, 1, 2, 3, 4, 5, 6, 7, 14, 15, 16, 17, 18, >>>>> 19, 20, 21, 22, 23, 30, 31, >>>>> rank 20 @ compute-2-32.local 7, 8, 9, 10, 11, 12, 13, 14, 23, 24, 25, >>>>> 26, 27, 28, 29, 30, >>>>> rank 22 @ compute-2-34.local 0, 1, 2, 3, 4, 5, 6, 7, 14, 15, 16, 17, 18, >>>>> 19, 20, 21, 22, 23, 30, 31, >>>>> rank 21 @ compute-2-32.local 7, 8, 9, 10, 11, 12, 13, 14, 23, 24, 25, >>>>> 26, 27, 28, 29, 30, >>>>> rank 23 @ compute-2-34.local 0, 1, 2, 3, 4, 5, 6, 7, 14, 15, 16, 17, 18, >>>>> 19, 20, 21, 22, 23, 30, 31, >>>>> rank 24 @ compute-2-34.local 0, 1, 2, 3, 4, 5, 6, 7, 14, 15, 16, 17, 18, >>>>> 19, 20, 21, 22, 23, 30, 31, >>>>> rank 25 @ compute-2-34.local 0, 1, 2, 3, 4, 5, 6, 7, 14, 15, 16, 17, 18, >>>>> 19, 20, 21, 22, 23, 30, 31, >>>>> rank 26 @ compute-2-34.local 0, 1, 2, 3, 4, 5, 6, 7, 14, 15, 16, 17, 18, >>>>> 19, 20, 21, 22, 23, 30, 31, >>>>> rank 27 @ compute-2-34.local 0, 1, 2, 3, 4, 5, 6, 7, 14, 15, 16, 17, 18, >>>>> 19, 20, 21, 22, 23, 30, 31, >>>>> rank 28 @ compute-2-34.local 0, 1, 2, 3, 4, 5, 6, 7, 14, 15, 16, 17, 18, >>>>> 19, 20, 21, 22, 23, 30, 31, >>>>> >>>>> >>>>> Still looks ok to me. If I now turn the binding on, openmpi fails: >>>>> >>>>> >>>>> -------------------------------------------------------------------------- >>>>> A request was made to bind to that would result in binding more >>>>> processes than cpus on a resource: >>>>> >>>>> Bind to: CORE >>>>> Node: c1-31 >>>>> #processes: 2 >>>>> #cpus: 1 >>>>> >>>>> You can override this protection by adding the "overload-allowed" >>>>> option to your binding directive. >>>>> -------------------------------------------------------------------------- >>>>> >>>>> The above tests were done with 1.10.1rc1, so it does not fix the problem. >>>>> >>>>> Marcin >>>>> >>>>> >>>>>> I’m wondering because bind-to core will attempt to bind your proc to >>>>>> both HTs on the core. For some reason, we thought that 8.24 were HTs on >>>>>> the same core, which is why we tried to bind to that pair of HTs. We got >>>>>> an error because HT #24 was not allocated to us on node c6, but HT #8 >>>>>> was. >>>>>> >>>>>> >>>>>>> On Oct 3, 2015, at 2:43 AM, marcin.krotkiewski < >>>>>>> <mailto:marcin.krotkiew...@gmail.com>marcin.krotkiew...@gmail.com >>>>>>> <mailto:marcin.krotkiew...@gmail.com>> wrote: >>>>>>> >>>>>>> Hi, Ralph, >>>>>>> >>>>>>> I submit my slurm job as follows >>>>>>> >>>>>>> salloc --ntasks=64 --mem-per-cpu=2G --time=1:0:0 >>>>>>> >>>>>>> Effectively, the allocated CPU cores are spread amount many cluster >>>>>>> nodes. SLURM uses cgroups to limit the CPU cores available for mpi >>>>>>> processes running on a given cluster node. Compute nodes are 2-socket, >>>>>>> 8-core E5-2670 systems with HyperThreading on >>>>>>> >>>>>>> node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23 >>>>>>> node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31 >>>>>>> node distances: >>>>>>> node 0 1 >>>>>>> 0: 10 21 >>>>>>> 1: 21 10 >>>>>>> >>>>>>> I run MPI program with command >>>>>>> >>>>>>> mpirun --report-bindings --bind-to core -np 64 ./affinity >>>>>>> >>>>>>> The program simply runs sched_getaffinity for each process and prints >>>>>>> out the result. >>>>>>> >>>>>>> ----------- >>>>>>> TEST RUN 1 >>>>>>> ----------- >>>>>>> For this particular job the problem is more severe: openmpi fails to >>>>>>> run at all with error >>>>>>> >>>>>>> -------------------------------------------------------------------------- >>>>>>> Open MPI tried to bind a new process, but something went wrong. The >>>>>>> process was killed without launching the target application. Your job >>>>>>> will now abort. >>>>>>> >>>>>>> Local host: c6-6 >>>>>>> Application name: ./affinity >>>>>>> Error message: hwloc_set_cpubind returned "Error" for bitmap "8,24" >>>>>>> Location: odls_default_module.c:551 >>>>>>> -------------------------------------------------------------------------- >>>>>>> >>>>>>> This is SLURM environment variables: >>>>>>> >>>>>>> SLURM_JOBID=12712225 >>>>>>> SLURM_JOB_CPUS_PER_NODE='3(x2),2,1(x3),2(x2),1,3(x3),5,1,4,1,3,2,3,7,1,5,6,1' >>>>>>> SLURM_JOB_ID=12712225 >>>>>>> SLURM_JOB_NODELIST='c6-[3,6-8,12,14,17,22-23],c8-[4,7,9,17,20,28],c15-[5,10,18,20,22-24,28],c16-11' >>>>>>> SLURM_JOB_NUM_NODES=24 >>>>>>> SLURM_JOB_PARTITION=normal >>>>>>> SLURM_MEM_PER_CPU=2048 >>>>>>> SLURM_NNODES=24 >>>>>>> SLURM_NODELIST='c6-[3,6-8,12,14,17,22-23],c8-[4,7,9,17,20,28],c15-[5,10,18,20,22-24,28],c16-11' >>>>>>> SLURM_NODE_ALIASES='(null)' >>>>>>> SLURM_NPROCS=64 >>>>>>> SLURM_NTASKS=64 >>>>>>> SLURM_SUBMIT_DIR=/cluster/home/marcink >>>>>>> SLURM_SUBMIT_HOST=login-0-2.local >>>>>>> SLURM_TASKS_PER_NODE='3(x2),2,1(x3),2(x2),1,3(x3),5,1,4,1,3,2,3,7,1,5,6,1' >>>>>>> >>>>>>> There is also a lot of warnings like >>>>>>> >>>>>>> [compute-6-6.local:20158] MCW rank 4 is not bound (or bound to all >>>>>>> available processors) >>>>>>> >>>>>>> >>>>>>> ----------- >>>>>>> TEST RUN 2 >>>>>>> ----------- >>>>>>> >>>>>>> In another allocation I got a different error >>>>>>> >>>>>>> -------------------------------------------------------------------------- >>>>>>> A request was made to bind to that would result in binding more >>>>>>> processes than cpus on a resource: >>>>>>> >>>>>>> Bind to: CORE >>>>>>> Node: c6-19 >>>>>>> #processes: 2 >>>>>>> #cpus: 1 >>>>>>> >>>>>>> You can override this protection by adding the "overload-allowed" >>>>>>> option to your binding directive. >>>>>>> -------------------------------------------------------------------------- >>>>>>> >>>>>>> and the allocation was the following >>>>>>> >>>>>>> SLURM_JOBID=12712250 >>>>>>> SLURM_JOB_CPUS_PER_NODE='3(x2),2,1,15,1,3,16,2,1,3(x2),2,5,4' >>>>>>> SLURM_JOB_ID=12712250 >>>>>>> SLURM_JOB_NODELIST='c6-[3,6-8,12,14,17,19,22-23],c8-[4,7,9,17,28]' >>>>>>> SLURM_JOB_NUM_NODES=15 >>>>>>> SLURM_JOB_PARTITION=normal >>>>>>> SLURM_MEM_PER_CPU=2048 >>>>>>> SLURM_NNODES=15 >>>>>>> SLURM_NODELIST='c6-[3,6-8,12,14,17,19,22-23],c8-[4,7,9,17,28]' >>>>>>> SLURM_NODE_ALIASES='(null)' >>>>>>> SLURM_NPROCS=64 >>>>>>> SLURM_NTASKS=64 >>>>>>> SLURM_SUBMIT_DIR=/cluster/home/marcink >>>>>>> SLURM_SUBMIT_HOST=login-0-2.local >>>>>>> SLURM_TASKS_PER_NODE='3(x2),2,1,15,1,3,16,2,1,3(x2),2,5,4' >>>>>>> >>>>>>> >>>>>>> If in this case I run on only 32 cores >>>>>>> >>>>>>> mpirun --report-bindings --bind-to core -np 32 ./affinity >>>>>>> >>>>>>> the process starts, but I get the original binding problem: >>>>>>> >>>>>>> [compute-6-8.local:31414] MCW rank 8 is not bound (or bound to all >>>>>>> available processors) >>>>>>> >>>>>>> Running with --hetero-nodes yields exactly the same results >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Hope the above is useful. The problem with binding under SLURM with CPU >>>>>>> cores spread over nodes seems to be very reproducible. It is actually >>>>>>> very often that OpenMPI dies with some error like above. These tests >>>>>>> were run with openmpi-1.8.8 and 1.10.0, both giving same results. >>>>>>> >>>>>>> One more suggestion. The warning message (MCW rank 8 is not bound...) >>>>>>> is ONLY displayed when I use --report-bindings. It is never shown if I >>>>>>> leave out this option, and although the binding is wrong the user is >>>>>>> not notified. I think it would be better to show this warning in all >>>>>>> cases binding fails. >>>>>>> >>>>>>> Let me know if you need more information. I can help to debug this - it >>>>>>> is a rather crucial issue. >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> Marcin >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 10/02/2015 11:49 PM, Ralph Castain wrote: >>>>>>>> Can you please send me the allocation request you made (so I can see >>>>>>>> what you specified on the cmd line), and the mpirun cmd line? >>>>>>>> >>>>>>>> Thanks >>>>>>>> Ralph >>>>>>>> >>>>>>>>> On Oct 2, 2015, at 8:25 AM, Marcin Krotkiewski >>>>>>>>> <marcin.krotkiew...@gmail.com <mailto:marcin.krotkiew...@gmail.com>> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I fail to make OpenMPI bind to cores correctly when running from >>>>>>>>> within SLURM-allocated CPU resources spread over a range of compute >>>>>>>>> nodes in an otherwise homogeneous cluster. I have found this thread >>>>>>>>> >>>>>>>>> http://www.open-mpi.org/community/lists/users/2014/06/24682.php >>>>>>>>> <http://www.open-mpi.org/community/lists/users/2014/06/24682.php> >>>>>>>>> >>>>>>>>> and did try to use what Ralph suggested there (--hetero-nodes), but >>>>>>>>> it does not work (v. 1.10.0). When running with --report-bindings I >>>>>>>>> get messages like >>>>>>>>> >>>>>>>>> [compute-9-11.local:27571] MCW rank 10 is not bound (or bound to all >>>>>>>>> available processors) >>>>>>>>> >>>>>>>>> for all ranks outside of my first physical compute node. Moreover, >>>>>>>>> everything works as expected if I ask SLURM to assign entire compute >>>>>>>>> nodes. So it does look like Ralph's diagnose presented in that thread >>>>>>>>> is correct, just the --hetero-nodes switch does not work for me. >>>>>>>>> >>>>>>>>> I have written a short code that uses sched_getaffinity to print the >>>>>>>>> effective bindings: all MPI ranks except of those on the first node >>>>>>>>> are bound to all CPU cores allocated by SLURM. >>>>>>>>> >>>>>>>>> Do I have to do something except of --hetero-nodes, or is this a >>>>>>>>> problem that needs further investigation? >>>>>>>>> >>>>>>>>> Thanks a lot! >>>>>>>>> >>>>>>>>> Marcin >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>>>>>> Subscription: >>>>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>>>>>>>> Link to this post: >>>>>>>>> <http://www.open-mpi.org/community/lists/users/2015/10/27770.php>http://www.open-mpi.org/community/lists/users/2015/10/27770.php >>>>>>>>> <http://www.open-mpi.org/community/lists/users/2015/10/27770.php> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>>>>> Subscription: >>>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>>>>>>> Link to this post: >>>>>>>> <http://www.open-mpi.org/community/lists/users/2015/10/27774.php>http://www.open-mpi.org/community/lists/users/2015/10/27774.php >>>>>>>> <http://www.open-mpi.org/community/lists/users/2015/10/27774.php> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>>>> Subscription: >>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>>>>>> Link to this post: >>>>>>> <http://www.open-mpi.org/community/lists/users/2015/10/27776.php>http://www.open-mpi.org/community/lists/users/2015/10/27776.php >>>>>>> <http://www.open-mpi.org/community/lists/users/2015/10/27776.php> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>>> Subscription: >>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>>>>> Link to this post: >>>>>> <http://www.open-mpi.org/community/lists/users/2015/10/27778.php>http://www.open-mpi.org/community/lists/users/2015/10/27778.php >>>>>> <http://www.open-mpi.org/community/lists/users/2015/10/27778.php> >>>>> >>>>> <affinity.c>_______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>> Subscription: >>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>>>> Link to this post: >>>>> <http://www.open-mpi.org/community/lists/users/2015/10/27781.php>http://www.open-mpi.org/community/lists/users/2015/10/27781.php >>>>> <http://www.open-mpi.org/community/lists/users/2015/10/27781.php> >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2015/10/27782.php >>>> <http://www.open-mpi.org/community/lists/users/2015/10/27782.php> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>> Subscription: >>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>http://www.open-mpi.org/mailman/listinfo.cgi/users >>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>> Link to this post: >>> <http://www.open-mpi.org/community/lists/users/2015/10/27783.php>http://www.open-mpi.org/community/lists/users/2015/10/27783.php >>> <http://www.open-mpi.org/community/lists/users/2015/10/27783.php> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/10/27784.php >> <http://www.open-mpi.org/community/lists/users/2015/10/27784.php> > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/10/27785.php