I filed an issue to track this problem here: https://github.com/open-mpi/ompi/issues/978
> On Oct 5, 2015, at 1:01 PM, Ralph Castain <r...@open-mpi.org> wrote: > > Thanks Marcin. I think we have three things we need to address: > > 1. the warning needs to be emitted regardless of whether or not > —report-bindings was given. Not sure how that warning got “covered” by the > option, but it is clearly a bug > > 2. improve the warning to include binding info - relatively easy to do > > 3. fix the mapping/binding under asymmetric topologies. Given further info > and consideration, I’m increasingly pushed towards the “fallback to the > map-by core default” solution. It provides a predictable and consistent > pattern. The other solution is technically viable, but leads to an > unpredictable “opportunistic” result that might cause odd application > behavior. If the user specifies a mapping option and we can’t do it because > of asymmetry, then error out. > > HTH > Ralph > > >> On Oct 5, 2015, at 9:36 AM, marcin.krotkiewski >> <marcin.krotkiew...@gmail.com> wrote: >> >> Hi, Gilles >>> you mentionned you had one failure with 1.10.1rc1 and -bind-to core >>> could you please send the full details (script, allocation and output) >>> in your slurm script, you can do >>> srun -N $SLURM_NNODES -n $SLURM_NNODES --cpu_bind=none -l grep >>> Cpus_allowed_list /proc/self/status >>> before invoking mpirun >>> >> It was an interactive job allocated with >> >> salloc --account=staff --ntasks=32 --mem-per-cpu=2G --time=120:0:0 >> >> The slurm environment is the following >> >> SLURM_JOBID=12714491 >> SLURM_JOB_CPUS_PER_NODE='4,2,5(x2),4,7,5' >> SLURM_JOB_ID=12714491 >> SLURM_JOB_NODELIST='c1-[2,4,8,13,16,23,26]' >> SLURM_JOB_NUM_NODES=7 >> SLURM_JOB_PARTITION=normal >> SLURM_MEM_PER_CPU=2048 >> SLURM_NNODES=7 >> SLURM_NODELIST='c1-[2,4,8,13,16,23,26]' >> SLURM_NODE_ALIASES='(null)' >> SLURM_NPROCS=32 >> SLURM_NTASKS=32 >> SLURM_SUBMIT_DIR=/cluster/home/marcink >> SLURM_SUBMIT_HOST=login-0-1.local >> SLURM_TASKS_PER_NODE='4,2,5(x2),4,7,5' >> >> The output of the command you asked for is >> >> 0: c1-2.local Cpus_allowed_list: 1-4,17-20 >> 1: c1-4.local Cpus_allowed_list: 1,15,17,31 >> 2: c1-8.local Cpus_allowed_list: 0,5,9,13-14,16,21,25,29-30 >> 3: c1-13.local Cpus_allowed_list: 3-7,19-23 >> 4: c1-16.local Cpus_allowed_list: 12-15,28-31 >> 5: c1-23.local Cpus_allowed_list: 2-4,8,13-15,18-20,24,29-31 >> 6: c1-26.local Cpus_allowed_list: 1,6,11,13,15,17,22,27,29,31 >> >> Running with command >> >> mpirun --mca rmaps_base_verbose 10 --hetero-nodes --bind-to core >> --report-bindings --map-by socket -np 32 ./affinity >> >> I have attached two output files: one for the original 1.10.1rc1, one for >> the patched version. >> >> When I said 'failed in one case' I was not precise. I got an error on node >> c1-8, which was the first one to have different number of MPI processes on >> the two sockets. It would also fail on some later nodes, just that because >> of the error we never got there. >> >> Let me know if you need more. >> >> Marcin >> >> >> >> >> >> >> >>> Cheers, >>> >>> Gilles >>> >>> On 10/4/2015 11:55 PM, marcin.krotkiewski wrote: >>>> Hi, all, >>>> >>>> I played a bit more and it seems that the problem results from >>>> >>>> trg_obj = opal_hwloc_base_find_min_bound_target_under_obj() >>>> >>>> called in rmaps_base_binding.c / bind_downwards being wrong. I do not know >>>> the reason, but I think I know when the problem happens (at least on >>>> 1.10.1rc1). It seems that by default openmpi maps by socket. The error >>>> happens when for a given compute node there is a different number of cores >>>> used on each socket. Consider previously studied case (the debug outputs I >>>> sent in last post). c1-8, which was source of error, has 5 mpi processes >>>> assigned, and the cpuset is the following: >>>> >>>> 0, 5, 9, 13, 14, 16, 21, 25, 29, 30 >>>> >>>> Cores 0,5 are on socket 0, cores 9, 13, 14 are on socket 1. Binding >>>> progresses correctly up to and including core 13 (see end of file >>>> out.1.10.1rc2, before the error). That is 2 cores on socket 0, and 2 cores >>>> on socket 1. Error is thrown when core 14 should be bound - extra core on >>>> socket 1 with no corresponding core on socket 0. At that point the >>>> returned trg_obj points to the first core on the node (os_index 0, socket >>>> 0). >>>> >>>> I have submitted a few other jobs and I always had an error in such >>>> situation. Moreover, if I now use --map-by core instead of socket, the >>>> error is gone, and I get my expected binding: >>>> >>>> rank 0 @ compute-1-2.local 1, 17, >>>> rank 1 @ compute-1-2.local 2, 18, >>>> rank 2 @ compute-1-2.local 3, 19, >>>> rank 3 @ compute-1-2.local 4, 20, >>>> rank 4 @ compute-1-4.local 1, 17, >>>> rank 5 @ compute-1-4.local 15, 31, >>>> rank 6 @ compute-1-8.local 0, 16, >>>> rank 7 @ compute-1-8.local 5, 21, >>>> rank 8 @ compute-1-8.local 9, 25, >>>> rank 9 @ compute-1-8.local 13, 29, >>>> rank 10 @ compute-1-8.local 14, 30, >>>> rank 11 @ compute-1-13.local 3, 19, >>>> rank 12 @ compute-1-13.local 4, 20, >>>> rank 13 @ compute-1-13.local 5, 21, >>>> rank 14 @ compute-1-13.local 6, 22, >>>> rank 15 @ compute-1-13.local 7, 23, >>>> rank 16 @ compute-1-16.local 12, 28, >>>> rank 17 @ compute-1-16.local 13, 29, >>>> rank 18 @ compute-1-16.local 14, 30, >>>> rank 19 @ compute-1-16.local 15, 31, >>>> rank 20 @ compute-1-23.local 2, 18, >>>> rank 29 @ compute-1-26.local 11, 27, >>>> rank 21 @ compute-1-23.local 3, 19, >>>> rank 30 @ compute-1-26.local 13, 29, >>>> rank 22 @ compute-1-23.local 4, 20, >>>> rank 31 @ compute-1-26.local 15, 31, >>>> rank 23 @ compute-1-23.local 8, 24, >>>> rank 27 @ compute-1-26.local 1, 17, >>>> rank 24 @ compute-1-23.local 13, 29, >>>> rank 28 @ compute-1-26.local 6, 22, >>>> rank 25 @ compute-1-23.local 14, 30, >>>> rank 26 @ compute-1-23.local 15, 31, >>>> >>>> Using --map-by core seems to fix the issue on 1.8.8, 1.10.0 and 1.10.1rc1. >>>> However, there is still a difference in behavior between 1.10.1rc1 and >>>> earlier versions. In the SLURM job described in last post, 1.10.1rc1 fails >>>> to bind only in 1 case, while the earlier versions fail in 21 out of 32 >>>> cases. You mentioned there was a bug in hwloc. Not sure if it can explain >>>> the difference in behavior. >>>> >>>> Hope this helps to nail this down. >>>> >>>> Marcin >>>> >>>> >>>> >>>> >>>> On 10/04/2015 09:55 AM, Gilles Gouaillardet wrote: >>>>> Ralph, >>>>> >>>>> I suspect ompi tries to bind to threads outside the cpuset. >>>>> this could be pretty similar to a previous issue when ompi tried to bind >>>>> to cores outside the cpuset. >>>>> /* when a core has more than one thread, would ompi assume all the >>>>> threads are available if the core is available ? */ >>>>> I will investigate this from tomorrow >>>>> >>>>> Cheers, >>>>> >>>>> Gilles >>>>> >>>>> On Sunday, October 4, 2015, Ralph Castain <r...@open-mpi.org> wrote: >>>>> Thanks - please go ahead and release that allocation as I’m not going to >>>>> get to this immediately. I’ve got several hot irons in the fire right >>>>> now, and I’m not sure when I’ll get a chance to track this down. >>>>> >>>>> Gilles or anyone else who might have time - feel free to take a gander >>>>> and see if something pops out at you. >>>>> >>>>> Ralph >>>>> >>>>> >>>>>> On Oct 3, 2015, at 11:05 AM, marcin.krotkiewski >>>>>> <marcin.krotkiew...@gmail.com> wrote: >>>>>> >>>>>> >>>>>> Done. I have compiled 1.10.0 and 1.10.rc1 with --enable-debug and >>>>>> executed >>>>>> >>>>>> mpirun --mca rmaps_base_verbose 10 --hetero-nodes --report-bindings >>>>>> --bind-to core -np 32 ./affinity >>>>>> >>>>>> In case of 1.10.rc1 I have also added :overload-allowed - output in a >>>>>> separate file. This option did not make much difference for 1.10.0, so I >>>>>> did not attach it here. >>>>>> >>>>>> First thing I noted for 1.10.0 are lines like >>>>>> >>>>>> [login-0-1.local:03399] [[37945,0],0] GOT 1 CPUS >>>>>> [login-0-1.local:03399] [[37945,0],0] PROC [[37945,1],27] BITMAP >>>>>> [login-0-1.local:03399] [[37945,0],0] PROC [[37945,1],27] ON c1-26 IS >>>>>> NOT BOUND >>>>>> >>>>>> with an empty BITMAP. >>>>>> >>>>>> The SLURM environment is >>>>>> >>>>>> set | grep SLURM >>>>>> SLURM_JOBID=12714491 >>>>>> SLURM_JOB_CPUS_PER_NODE='4,2,5(x2),4,7,5' >>>>>> SLURM_JOB_ID=12714491 >>>>>> SLURM_JOB_NODELIST='c1-[2,4,8,13,16,23,26]' >>>>>> SLURM_JOB_NUM_NODES=7 >>>>>> SLURM_JOB_PARTITION=normal >>>>>> SLURM_MEM_PER_CPU=2048 >>>>>> SLURM_NNODES=7 >>>>>> SLURM_NODELIST='c1-[2,4,8,13,16,23,26]' >>>>>> SLURM_NODE_ALIASES='(null)' >>>>>> SLURM_NPROCS=32 >>>>>> SLURM_NTASKS=32 >>>>>> SLURM_SUBMIT_DIR=/cluster/home/marcink >>>>>> SLURM_SUBMIT_HOST=login-0-1.local >>>>>> SLURM_TASKS_PER_NODE='4,2,5(x2),4,7,5' >>>>>> >>>>>> I have submitted an interactive job on screen for 120 hours now to work >>>>>> with one example, and not change it for every post :) >>>>>> >>>>>> If you need anything else, let me know. I could introduce some >>>>>> patch/printfs and recompile, if you need it. >>>>>> >>>>>> Marcin >>>>>> >>>>>> >>>>>> >>>>>> On 10/03/2015 07:17 PM, Ralph Castain wrote: >>>>>>> Rats - just realized I have no way to test this as none of the machines >>>>>>> I can access are setup for cgroup-based multi-tenant. Is this a debug >>>>>>> version of OMPI? If not, can you rebuild OMPI with —enable-debug? >>>>>>> >>>>>>> Then please run it with —mca rmaps_base_verbose 10 and pass along the >>>>>>> output. >>>>>>> >>>>>>> Thanks >>>>>>> Ralph >>>>>>> >>>>>>> >>>>>>>> On Oct 3, 2015, at 10:09 AM, Ralph Castain <r...@open-mpi.org> wrote: >>>>>>>> >>>>>>>> What version of slurm is this? I might try to debug it here. I’m not >>>>>>>> sure where the problem lies just yet. >>>>>>>> >>>>>>>> >>>>>>>>> On Oct 3, 2015, at 8:59 AM, marcin.krotkiewski >>>>>>>>> <marcin.krotkiew...@gmail.com> wrote: >>>>>>>>> >>>>>>>>> Here is the output of lstopo. In short, (0,16) are core 0, (1,17) - >>>>>>>>> core 1 etc. >>>>>>>>> >>>>>>>>> Machine (64GB) >>>>>>>>> NUMANode L#0 (P#0 32GB) >>>>>>>>> Socket L#0 + L3 L#0 (20MB) >>>>>>>>> L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 >>>>>>>>> PU L#0 (P#0) >>>>>>>>> PU L#1 (P#16) >>>>>>>>> L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 >>>>>>>>> PU L#2 (P#1) >>>>>>>>> PU L#3 (P#17) >>>>>>>>> L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 >>>>>>>>> PU L#4 (P#2) >>>>>>>>> PU L#5 (P#18) >>>>>>>>> L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 >>>>>>>>> PU L#6 (P#3) >>>>>>>>> PU L#7 (P#19) >>>>>>>>> L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 >>>>>>>>> PU L#8 (P#4) >>>>>>>>> PU L#9 (P#20) >>>>>>>>> L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 >>>>>>>>> PU L#10 (P#5) >>>>>>>>> PU L#11 (P#21) >>>>>>>>> L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 >>>>>>>>> PU L#12 (P#6) >>>>>>>>> PU L#13 (P#22) >>>>>>>>> L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 >>>>>>>>> PU L#14 (P#7) >>>>>>>>> PU L#15 (P#23) >>>>>>>>> HostBridge L#0 >>>>>>>>> PCIBridge >>>>>>>>> PCI 8086:1521 >>>>>>>>> Net L#0 "eth0" >>>>>>>>> PCI 8086:1521 >>>>>>>>> Net L#1 "eth1" >>>>>>>>> PCIBridge >>>>>>>>> PCI 15b3:1003 >>>>>>>>> Net L#2 "ib0" >>>>>>>>> OpenFabrics L#3 "mlx4_0" >>>>>>>>> PCIBridge >>>>>>>>> PCI 102b:0532 >>>>>>>>> PCI 8086:1d02 >>>>>>>>> Block L#4 "sda" >>>>>>>>> NUMANode L#1 (P#1 32GB) + Socket L#1 + L3 L#1 (20MB) >>>>>>>>> L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8 >>>>>>>>> PU L#16 (P#8) >>>>>>>>> PU L#17 (P#24) >>>>>>>>> L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9 >>>>>>>>> PU L#18 (P#9) >>>>>>>>> PU L#19 (P#25) >>>>>>>>> L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10 >>>>>>>>> PU L#20 (P#10) >>>>>>>>> PU L#21 (P#26) >>>>>>>>> L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11 >>>>>>>>> PU L#22 (P#11) >>>>>>>>> PU L#23 (P#27) >>>>>>>>> L2 L#12 (256KB) + L1d L#12 (32KB) + L1i L#12 (32KB) + Core L#12 >>>>>>>>> PU L#24 (P#12) >>>>>>>>> PU L#25 (P#28) >>>>>>>>> L2 L#13 (256KB) + L1d L#13 (32KB) + L1i L#13 (32KB) + Core L#13 >>>>>>>>> PU L#26 (P#13) >>>>>>>>> PU L#27 (P#29) >>>>>>>>> L2 L#14 (256KB) + L1d L#14 (32KB) + L1i L#14 (32KB) + Core L#14 >>>>>>>>> PU L#28 (P#14) >>>>>>>>> PU L#29 (P#30) >>>>>>>>> L2 L#15 (256KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core L#15 >>>>>>>>> PU L#30 (P#15) >>>>>>>>> PU L#31 (P#31) >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 10/03/2015 05:46 PM, Ralph Castain wrote: >>>>>>>>>> Maybe I’m just misreading your HT map - that slurm nodelist syntax >>>>>>>>>> is a new one to me, but they tend to change things around. Could you >>>>>>>>>> run lstopo on one of those compute nodes and send the output? >>>>>>>>>> >>>>>>>>>> I’m just suspicious because I’m not seeing a clear pairing of HT >>>>>>>>>> numbers in your output, but HT numbering is BIOS-specific and I may >>>>>>>>>> just not be understanding your particular pattern. Our error message >>>>>>>>>> is clearly indicating that we are seeing individual HTs (and not >>>>>>>>>> complete cores) assigned, and I don’t know the source of that >>>>>>>>>> confusion. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Oct 3, 2015, at 8:28 AM, marcin.krotkiewski >>>>>>>>>>> <marcin.krotkiew...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 10/03/2015 04:38 PM, Ralph Castain wrote: >>>>>>>>>>>> If mpirun isn’t trying to do any binding, then you will of course >>>>>>>>>>>> get the right mapping as we’ll just inherit whatever we received. >>>>>>>>>>> Yes. I meant that whatever you received (what SLURM gives) is a >>>>>>>>>>> correct cpu map and assigns _whole_ CPUs, not a single HT to MPI >>>>>>>>>>> processes. In the case mentioned earlier openmpi should start 6 >>>>>>>>>>> tasks on c1-30. If HT would be treated as separate and independent >>>>>>>>>>> cores, sched_getaffinity of an MPI process started on c1-30 would >>>>>>>>>>> return a map with 6 entries only. In my case it returns a map with >>>>>>>>>>> 12 entries - 2 for each core. So one process is in fact allocated >>>>>>>>>>> both HTs, not only one. Is what I'm saying correct? >>>>>>>>>>> >>>>>>>>>>>> Looking at your output, it’s pretty clear that you are getting >>>>>>>>>>>> independent HTs assigned and not full cores. >>>>>>>>>>> How do you mean? Is the above understanding wrong? I would expect >>>>>>>>>>> that on c1-30 with --bind-to core openmpi should bind to logical >>>>>>>>>>> cores 0 and 16 (rank 0), 1 and 17 (rank 2) and so on. All those >>>>>>>>>>> logical cores are available in sched_getaffinity map, and there is >>>>>>>>>>> twice as many logical cores as there are MPI processes started on >>>>>>>>>>> the node. >>>>>>>>>>> >>>>>>>>>>>> My guess is that something in slurm has changed such that it >>>>>>>>>>>> detects that HT has been enabled, and then begins treating the HTs >>>>>>>>>>>> as completely independent cpus. >>>>>>>>>>>> >>>>>>>>>>>> Try changing “-bind-to core” to “-bind-to hwthread >>>>>>>>>>>> -use-hwthread-cpus” and see if that works >>>>>>>>>>>> >>>>>>>>>>> I have and the binding is wrong. For example, I got this output >>>>>>>>>>> >>>>>>>>>>> rank 0 @ compute-1-30.local 0, >>>>>>>>>>> rank 1 @ compute-1-30.local 16, >>>>>>>>>>> >>>>>>>>>>> Which means that two ranks have been bound to the same physical >>>>>>>>>>> core (logical cores 0 and 16 are two HTs of the same core). If I >>>>>>>>>>> use --bind-to core, I get the following correct binding >>>>>>>>>>> >>>>>>>>>>> rank 0 @ compute-1-30.local 0, 16, >>>>>>>>>>> >>>>>>>>>>> The problem is many other ranks get bad binding with 'rank XXX is >>>>>>>>>>> not bound (or bound to all available processors)' warning. >>>>>>>>>>> >>>>>>>>>>> But I think I was not entirely correct saying that 1.10.1rc1 did >>>>>>>>>>> not fix things. It still might have improved something, but not >>>>>>>>>>> everything. Consider this job: >>>>>>>>>>> >>>>>>>>>>> SLURM_JOB_CPUS_PER_NODE='5,4,6,5(x2),7,5,9,5,7,6' >>>>>>>>>>> SLURM_JOB_NODELIST='c8-[31,34],c9-[30-32,35-36],c10-[31-34]' >>>>>>>>>>> >>>>>>>>>>> If I run 32 tasks as follows (with 1.10.1rc1) >>>>>>>>>>> >>>>>>>>>>> mpirun --hetero-nodes --report-bindings --bind-to core -np 32 >>>>>>>>>>> ./affinity >>>>>>>>>>> >>>>>>>>>>> I get the following error: >>>>>>>>>>> >>>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>>> A request was made to bind to that would result in binding more >>>>>>>>>>> processes than cpus on a resource: >>>>>>>>>>> >>>>>>>>>>> Bind to: CORE >>>>>>>>>>> Node: c9-31 >>>>>>>>>>> #processes: 2 >>>>>>>>>>> #cpus: 1 >>>>>>>>>>> >>>>>>>>>>> You can override this protection by adding the "overload-allowed" >>>>>>>>>>> option to your binding directive. >>>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> If I now use --bind-to core:overload-allowed, then openmpi starts >>>>>>>>>>> and _most_ of the threads are bound correctly (i.e., map contains >>>>>>>>>>> two logical cores in ALL cases), except this case that required the >>>>>>>>>>> overload flag: >>>>>>>>>>> >>>>>>>>>>> rank 15 @ compute-9-31.local 1, 17, >>>>>>>>>>> rank 16 @ compute-9-31.local 11, 27, >>>>>>>>>>> rank 17 @ compute-9-31.local 2, 18, >>>>>>>>>>> rank 18 @ compute-9-31.local 12, 28, >>>>>>>>>>> rank 19 @ compute-9-31.local 1, 17, >>>>>>>>>>> >>>>>>>>>>> Note pair 1,17 is used twice. The original SLURM delivered map (no >>>>>>>>>>> binding) on this node is >>>>>>>>>>> >>>>>>>>>>> rank 15 @ compute-9-31.local 1, 2, 11, 12, 13, 17, 18, 27, 28, 29, >>>>>>>>>>> rank 16 @ compute-9-31.local 1, 2, 11, 12, 13, 17, 18, 27, 28, 29, >>>>>>>>>>> rank 17 @ compute-9-31.local 1, 2, 11, 12, 13, 17, 18, 27, 28, 29, >>>>>>>>>>> rank 18 @ compute-9-31.local 1, 2, 11, 12, 13, 17, 18, 27, 28, 29, >>>>>>>>>>> rank 19 @ compute-9-31.local 1, 2, 11, 12, 13, 17, 18, 27, 28, 29, >>>>>>>>>>> >>>>>>>>>>> Why does openmpi use cores (1,17) twice instead of using core >>>>>>>>>>> (13,29)? Clearly, the original SLURM-delivered map has 5 CPUs >>>>>>>>>>> included, enough for 5 MPI processes. >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> >>>>>>>>>>> Marcin >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> On Oct 3, 2015, at 7:12 AM, marcin.krotkiewski >>>>>>>>>>>>> <marcin.krotkiew...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 10/03/2015 01:06 PM, Ralph Castain wrote: >>>>>>>>>>>>>> Thanks Marcin. Looking at this, I’m guessing that Slurm may be >>>>>>>>>>>>>> treating HTs as “cores” - i.e., as independent cpus. Any chance >>>>>>>>>>>>>> that is true? >>>>>>>>>>>>> Not to the best of my knowledge, and at least not intentionally. >>>>>>>>>>>>> SLURM starts as many processes as there are physical cores, not >>>>>>>>>>>>> threads. To verify this, consider this test case: >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> >>>>> us...@open-mpi.org >>>>> >>>>> Subscription: >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2015/10/27790.php >>>> >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> >>>> us...@open-mpi.org >>>> >>>> Subscription: >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2015/10/27791.php >>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> >>> us...@open-mpi.org >>> >>> Subscription: >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2015/10/27792.php >> >> <out.1.10.1rc1-patched><out.1.10.1rc1-orig>_______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/10/27799.php > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/10/27800.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/