Hi, Gilles
you mentionned you had one failure with 1.10.1rc1 and -bind-to core
could you please send the full details (script, allocation and output)
in your slurm script, you can do
srun -N $SLURM_NNODES -n $SLURM_NNODES --cpu_bind=none -l grep Cpus_allowed_list /proc/self/status
before invoking mpirun

It was an interactive job allocated with

salloc --account=staff --ntasks=32 --mem-per-cpu=2G --time=120:0:0

The slurm environment is the following

SLURM_JOBID=12714491
SLURM_JOB_CPUS_PER_NODE='4,2,5(x2),4,7,5'
SLURM_JOB_ID=12714491
SLURM_JOB_NODELIST='c1-[2,4,8,13,16,23,26]'
SLURM_JOB_NUM_NODES=7
SLURM_JOB_PARTITION=normal
SLURM_MEM_PER_CPU=2048
SLURM_NNODES=7
SLURM_NODELIST='c1-[2,4,8,13,16,23,26]'
SLURM_NODE_ALIASES='(null)'
SLURM_NPROCS=32
SLURM_NTASKS=32
SLURM_SUBMIT_DIR=/cluster/home/marcink
SLURM_SUBMIT_HOST=login-0-1.local
SLURM_TASKS_PER_NODE='4,2,5(x2),4,7,5'

The output of the command you asked for is

0: c1-2.local  Cpus_allowed_list:        1-4,17-20
1: c1-4.local  Cpus_allowed_list:        1,15,17,31
2: c1-8.local  Cpus_allowed_list:        0,5,9,13-14,16,21,25,29-30
3: c1-13.local  Cpus_allowed_list:       3-7,19-23
4: c1-16.local  Cpus_allowed_list:       12-15,28-31
5: c1-23.local  Cpus_allowed_list:       2-4,8,13-15,18-20,24,29-31
6: c1-26.local  Cpus_allowed_list:       1,6,11,13,15,17,22,27,29,31

Running with command

mpirun --mca rmaps_base_verbose 10 --hetero-nodes --bind-to core --report-bindings --map-by socket -np 32 ./affinity

I have attached two output files: one for the original 1.10.1rc1, one for the patched version.

When I said 'failed in one case' I was not precise. I got an error on node c1-8, which was the first one to have different number of MPI processes on the two sockets. It would also fail on some later nodes, just that because of the error we never got there.

Let me know if you need more.

Marcin







Cheers,

Gilles

On 10/4/2015 11:55 PM, marcin.krotkiewski wrote:
Hi, all,

I played a bit more and it seems that the problem results from

trg_obj = opal_hwloc_base_find_min_bound_target_under_obj()

called in rmaps_base_binding.c / bind_downwards being wrong. I do not know the reason, but I think I know when the problem happens (at least on 1.10.1rc1). It seems that by default openmpi maps by socket. The error happens when for a given compute node there is a different number of cores used on each socket. Consider previously studied case (the debug outputs I sent in last post). c1-8, which was source of error, has 5 mpi processes assigned, and the cpuset is the following:

0, 5, 9, 13, 14, 16, 21, 25, 29, 30

Cores 0,5 are on socket 0, cores 9, 13, 14 are on socket 1. Binding progresses correctly up to and including core 13 (see end of file out.1.10.1rc2, before the error). That is 2 cores on socket 0, and 2 cores on socket 1. Error is thrown when core 14 should be bound - extra core on socket 1 with no corresponding core on socket 0. At that point the returned trg_obj points to the first core on the node (os_index 0, socket 0).

I have submitted a few other jobs and I always had an error in such situation. Moreover, if I now use --map-by core instead of socket, the error is gone, and I get my expected binding:

rank 0 @ compute-1-2.local  1, 17,
rank 1 @ compute-1-2.local  2, 18,
rank 2 @ compute-1-2.local  3, 19,
rank 3 @ compute-1-2.local  4, 20,
rank 4 @ compute-1-4.local  1, 17,
rank 5 @ compute-1-4.local  15, 31,
rank 6 @ compute-1-8.local  0, 16,
rank 7 @ compute-1-8.local  5, 21,
rank 8 @ compute-1-8.local  9, 25,
rank 9 @ compute-1-8.local  13, 29,
rank 10 @ compute-1-8.local  14, 30,
rank 11 @ compute-1-13.local  3, 19,
rank 12 @ compute-1-13.local  4, 20,
rank 13 @ compute-1-13.local  5, 21,
rank 14 @ compute-1-13.local  6, 22,
rank 15 @ compute-1-13.local  7, 23,
rank 16 @ compute-1-16.local  12, 28,
rank 17 @ compute-1-16.local  13, 29,
rank 18 @ compute-1-16.local  14, 30,
rank 19 @ compute-1-16.local  15, 31,
rank 20 @ compute-1-23.local  2, 18,
rank 29 @ compute-1-26.local  11, 27,
rank 21 @ compute-1-23.local  3, 19,
rank 30 @ compute-1-26.local  13, 29,
rank 22 @ compute-1-23.local  4, 20,
rank 31 @ compute-1-26.local  15, 31,
rank 23 @ compute-1-23.local  8, 24,
rank 27 @ compute-1-26.local  1, 17,
rank 24 @ compute-1-23.local  13, 29,
rank 28 @ compute-1-26.local  6, 22,
rank 25 @ compute-1-23.local  14, 30,
rank 26 @ compute-1-23.local  15, 31,

Using --map-by core seems to fix the issue on 1.8.8, 1.10.0 and 1.10.1rc1. However, there is still a difference in behavior between 1.10.1rc1 and earlier versions. In the SLURM job described in last post, 1.10.1rc1 fails to bind only in 1 case, while the earlier versions fail in 21 out of 32 cases. You mentioned there was a bug in hwloc. Not sure if it can explain the difference in behavior.

Hope this helps to nail this down.

Marcin




On 10/04/2015 09:55 AM, Gilles Gouaillardet wrote:
Ralph,

I suspect ompi tries to bind to threads outside the cpuset.
this could be pretty similar to a previous issue when ompi tried to bind to cores outside the cpuset. /* when a core has more than one thread, would ompi assume all the threads are available if the core is available ? */
I will investigate this from tomorrow

Cheers,

Gilles

On Sunday, October 4, 2015, Ralph Castain <r...@open-mpi.org> wrote:

    Thanks - please go ahead and release that allocation as I’m not
    going to get to this immediately. I’ve got several hot irons in
    the fire right now, and I’m not sure when I’ll get a chance to
    track this down.

    Gilles or anyone else who might have time - feel free to take a
    gander and see if something pops out at you.

    Ralph


    On Oct 3, 2015, at 11:05 AM, marcin.krotkiewski
    <marcin.krotkiew...@gmail.com
    <javascript:_e(%7B%7D,'cvml','marcin.krotkiew...@gmail.com');>>
    wrote:


    Done. I have compiled 1.10.0 and 1.10.rc1 with --enable-debug
    and executed

    mpirun --mca rmaps_base_verbose 10 --hetero-nodes
    --report-bindings --bind-to core -np 32 ./affinity

    In case of 1.10.rc1 I have also added :overload-allowed -
    output in a separate file. This option did not make much
    difference for 1.10.0, so I did not attach it here.

    First thing I noted for 1.10.0 are lines like

    [login-0-1.local:03399] [[37945,0],0] GOT 1 CPUS
    [login-0-1.local:03399] [[37945,0],0] PROC [[37945,1],27] BITMAP
    [login-0-1.local:03399] [[37945,0],0] PROC [[37945,1],27] ON
    c1-26 IS NOT BOUND

    with an empty BITMAP.

    The SLURM environment is

    set | grep SLURM
    SLURM_JOBID=12714491
    SLURM_JOB_CPUS_PER_NODE='4,2,5(x2),4,7,5'
    SLURM_JOB_ID=12714491
    SLURM_JOB_NODELIST='c1-[2,4,8,13,16,23,26]'
    SLURM_JOB_NUM_NODES=7
    SLURM_JOB_PARTITION=normal
    SLURM_MEM_PER_CPU=2048
    SLURM_NNODES=7
    SLURM_NODELIST='c1-[2,4,8,13,16,23,26]'
    SLURM_NODE_ALIASES='(null)'
    SLURM_NPROCS=32
    SLURM_NTASKS=32
    SLURM_SUBMIT_DIR=/cluster/home/marcink
    SLURM_SUBMIT_HOST=login-0-1.local
    SLURM_TASKS_PER_NODE='4,2,5(x2),4,7,5'

    I have submitted an interactive job on screen for 120 hours now
    to work with one example, and not change it for every post :)

    If you need anything else, let me know. I could introduce some
    patch/printfs and recompile, if you need it.

    Marcin



    On 10/03/2015 07:17 PM, Ralph Castain wrote:
    Rats - just realized I have no way to test this as none of the
    machines I can access are setup for cgroup-based multi-tenant.
    Is this a debug version of OMPI? If not, can you rebuild OMPI
    with —enable-debug?

    Then please run it with —mca rmaps_base_verbose 10 and pass
    along the output.

    Thanks
    Ralph


    On Oct 3, 2015, at 10:09 AM, Ralph Castain <r...@open-mpi.org
    <javascript:_e(%7B%7D,'cvml','r...@open-mpi.org');>> wrote:

    What version of slurm is this? I might try to debug it here.
    I’m not sure where the problem lies just yet.


    On Oct 3, 2015, at 8:59 AM, marcin.krotkiewski
    <marcin.krotkiew...@gmail.com> wrote:

    Here is the output of lstopo. In short, (0,16) are core 0,
    (1,17) - core 1 etc.

    Machine (64GB)
      NUMANode L#0 (P#0 32GB)
        Socket L#0 + L3 L#0 (20MB)
          L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) +
    Core L#0
            PU L#0 (P#0)
            PU L#1 (P#16)
          L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) +
    Core L#1
            PU L#2 (P#1)
            PU L#3 (P#17)
          L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) +
    Core L#2
            PU L#4 (P#2)
            PU L#5 (P#18)
          L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) +
    Core L#3
            PU L#6 (P#3)
            PU L#7 (P#19)
          L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) +
    Core L#4
            PU L#8 (P#4)
            PU L#9 (P#20)
          L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) +
    Core L#5
            PU L#10 (P#5)
            PU L#11 (P#21)
          L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) +
    Core L#6
            PU L#12 (P#6)
            PU L#13 (P#22)
          L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) +
    Core L#7
            PU L#14 (P#7)
            PU L#15 (P#23)
        HostBridge L#0
          PCIBridge
            PCI 8086:1521
              Net L#0 "eth0"
            PCI 8086:1521
              Net L#1 "eth1"
          PCIBridge
            PCI 15b3:1003
              Net L#2 "ib0"
              OpenFabrics L#3 "mlx4_0"
          PCIBridge
            PCI 102b:0532
          PCI 8086:1d02
            Block L#4 "sda"
      NUMANode L#1 (P#1 32GB) + Socket L#1 + L3 L#1 (20MB)
        L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8
          PU L#16 (P#8)
          PU L#17 (P#24)
        L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9
          PU L#18 (P#9)
          PU L#19 (P#25)
        L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) +
    Core L#10
          PU L#20 (P#10)
          PU L#21 (P#26)
        L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) +
    Core L#11
          PU L#22 (P#11)
          PU L#23 (P#27)
        L2 L#12 (256KB) + L1d L#12 (32KB) + L1i L#12 (32KB) +
    Core L#12
          PU L#24 (P#12)
          PU L#25 (P#28)
        L2 L#13 (256KB) + L1d L#13 (32KB) + L1i L#13 (32KB) +
    Core L#13
          PU L#26 (P#13)
          PU L#27 (P#29)
        L2 L#14 (256KB) + L1d L#14 (32KB) + L1i L#14 (32KB) +
    Core L#14
          PU L#28 (P#14)
          PU L#29 (P#30)
        L2 L#15 (256KB) + L1d L#15 (32KB) + L1i L#15 (32KB) +
    Core L#15
          PU L#30 (P#15)
          PU L#31 (P#31)



    On 10/03/2015 05:46 PM, Ralph Castain wrote:
    Maybe I’m just misreading your HT map - that slurm nodelist
    syntax is a new one to me, but they tend to change things
    around. Could you run lstopo on one of those compute nodes
    and send the output?

    I’m just suspicious because I’m not seeing a clear pairing
    of HT numbers in your output, but HT numbering is
    BIOS-specific and I may just not be understanding your
    particular pattern. Our error message is clearly indicating
    that we are seeing individual HTs (and not complete cores)
    assigned, and I don’t know the source of that confusion.


    On Oct 3, 2015, at 8:28 AM, marcin.krotkiewski
    <marcin.krotkiew...@gmail.com> wrote:


    On 10/03/2015 04:38 PM, Ralph Castain wrote:
    If mpirun isn’t trying to do any binding, then you will
    of course get the right mapping as we’ll just inherit
    whatever we received.
    Yes. I meant that whatever you received (what SLURM gives)
    is a correct cpu map and assigns _whole_ CPUs, not a
    single HT to MPI processes. In the case mentioned earlier
    openmpi should start 6 tasks on c1-30. If HT would be
    treated as separate and independent cores,
    sched_getaffinity of an MPI process started on c1-30 would
    return a map with 6 entries only. In my case it returns a
    map with 12 entries - 2 for each core. So one process is
    in fact allocated both HTs, not only one. Is what I'm
    saying correct?

    Looking at your output, it’s pretty clear that you are
    getting independent HTs assigned and not full cores.
    How do you mean? Is the above understanding wrong? I would
    expect that on c1-30 with --bind-to core openmpi should
    bind to logical cores 0 and 16 (rank 0), 1 and 17 (rank 2)
    and so on. All those logical cores are available in
    sched_getaffinity map, and there is twice as many logical
    cores as there are MPI processes started on the node.

    My guess is that something in slurm has changed such that
    it detects that HT has been enabled, and then begins
    treating the HTs as completely independent cpus.

    Try changing “-bind-to core” to “-bind-to hwthread
     -use-hwthread-cpus” and see if that works

    I have and the binding is wrong. For example, I got this
    output

    rank 0 @ compute-1-30.local 0,
    rank 1 @ compute-1-30.local 16,

    Which means that two ranks have been bound to the same
    physical core (logical cores 0 and 16 are two HTs of the
    same core). If I use --bind-to core, I get the following
    correct binding

    rank 0 @ compute-1-30.local 0, 16,

    The problem is many other ranks get bad binding with 'rank
    XXX is not bound (or bound to all available processors)'
    warning.

    But I think I was not entirely correct saying that
    1.10.1rc1 did not fix things. It still might have improved
    something, but not everything. Consider this job:

    SLURM_JOB_CPUS_PER_NODE='5,4,6,5(x2),7,5,9,5,7,6'
    SLURM_JOB_NODELIST='c8-[31,34],c9-[30-32,35-36],c10-[31-34]'

    If I run 32 tasks as follows (with 1.10.1rc1)

    mpirun --hetero-nodes --report-bindings --bind-to core -np
    32 ./affinity

    I get the following error:

    --------------------------------------------------------------------------
    A request was made to bind to that would result in binding
    more
    processes than cpus on a resource:

       Bind to:     CORE
    Node: c9-31
    #processes:  2
    #cpus:       1

    You can override this protection by adding the
    "overload-allowed"
    option to your binding directive.
    --------------------------------------------------------------------------


    If I now use --bind-to core:overload-allowed, then openmpi
    starts and _most_ of the threads are bound correctly
    (i.e., map contains two logical cores in ALL cases),
    except this case that required the overload flag:

    rank 15 @ compute-9-31.local 1, 17,
    rank 16 @ compute-9-31.local 11, 27,
    rank 17 @ compute-9-31.local 2, 18,
    rank 18 @ compute-9-31.local 12, 28,
    rank 19 @ compute-9-31.local 1, 17,

    Note pair 1,17 is used twice. The original SLURM delivered
    map (no binding) on this node is

    rank 15 @ compute-9-31.local 1, 2, 11, 12, 13, 17, 18, 27,
    28, 29,
    rank 16 @ compute-9-31.local 1, 2, 11, 12, 13, 17, 18, 27,
    28, 29,
    rank 17 @ compute-9-31.local 1, 2, 11, 12, 13, 17, 18, 27,
    28, 29,
    rank 18 @ compute-9-31.local 1, 2, 11, 12, 13, 17, 18, 27,
    28, 29,
    rank 19 @ compute-9-31.local 1, 2, 11, 12, 13, 17, 18, 27,
    28, 29,

    Why does openmpi use cores (1,17) twice instead of using
    core (13,29)? Clearly, the original SLURM-delivered map
    has 5 CPUs included, enough for 5 MPI processes.

    Cheers,

    Marcin



    On Oct 3, 2015, at 7:12 AM, marcin.krotkiewski
    <marcin.krotkiew...@gmail.com> wrote:


    On 10/03/2015 01:06 PM, Ralph Castain wrote:
    Thanks Marcin. Looking at this, I’m guessing that Slurm
    may be treating HTs as “cores” - i.e., as independent
    cpus. Any chance that is true?
    Not to the best of my knowledge, and at least not
    intentionally. SLURM starts as many processes as there
    are physical cores, not threads. To verify this,
    consider this test case:



_______________________________________________
users mailing list
us...@open-mpi.org
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this 
post:http://www.open-mpi.org/community/lists/users/2015/10/27790.php



_______________________________________________
users mailing list
us...@open-mpi.org
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this 
post:http://www.open-mpi.org/community/lists/users/2015/10/27791.php



_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/10/27792.php

[login-0-1.local:14686] mca: base: components_register: registering rmaps 
components
[login-0-1.local:14686] mca: base: components_register: found loaded component 
round_robin
[login-0-1.local:14686] mca: base: components_register: component round_robin 
register function successful
[login-0-1.local:14686] mca: base: components_register: found loaded component 
rank_file
[login-0-1.local:14686] mca: base: components_register: component rank_file 
register function successful
[login-0-1.local:14686] mca: base: components_register: found loaded component 
seq
[login-0-1.local:14686] mca: base: components_register: component seq register 
function successful
[login-0-1.local:14686] mca: base: components_register: found loaded component 
resilient
[login-0-1.local:14686] mca: base: components_register: component resilient 
register function successful
[login-0-1.local:14686] mca: base: components_register: found loaded component 
staged
[login-0-1.local:14686] mca: base: components_register: component staged has no 
register or open function
[login-0-1.local:14686] mca: base: components_register: found loaded component 
mindist
[login-0-1.local:14686] mca: base: components_register: component mindist 
register function successful
[login-0-1.local:14686] mca: base: components_register: found loaded component 
ppr
[login-0-1.local:14686] mca: base: components_register: component ppr register 
function successful
[login-0-1.local:14686] [[40992,0],0] rmaps:base set policy with socket
[login-0-1.local:14686] mca: base: components_open: opening rmaps components
[login-0-1.local:14686] mca: base: components_open: found loaded component 
round_robin
[login-0-1.local:14686] mca: base: components_open: component round_robin open 
function successful
[login-0-1.local:14686] mca: base: components_open: found loaded component 
rank_file
[login-0-1.local:14686] mca: base: components_open: component rank_file open 
function successful
[login-0-1.local:14686] mca: base: components_open: found loaded component seq
[login-0-1.local:14686] mca: base: components_open: component seq open function 
successful
[login-0-1.local:14686] mca: base: components_open: found loaded component 
resilient
[login-0-1.local:14686] mca: base: components_open: component resilient open 
function successful
[login-0-1.local:14686] mca: base: components_open: found loaded component 
staged
[login-0-1.local:14686] mca: base: components_open: component staged open 
function successful
[login-0-1.local:14686] mca: base: components_open: found loaded component 
mindist
[login-0-1.local:14686] mca: base: components_open: component mindist open 
function successful
[login-0-1.local:14686] mca: base: components_open: found loaded component ppr
[login-0-1.local:14686] mca: base: components_open: component ppr open function 
successful
[login-0-1.local:14686] mca:rmaps:select: checking available component 
round_robin
[login-0-1.local:14686] mca:rmaps:select: Querying component [round_robin]
[login-0-1.local:14686] mca:rmaps:select: checking available component rank_file
[login-0-1.local:14686] mca:rmaps:select: Querying component [rank_file]
[login-0-1.local:14686] mca:rmaps:select: checking available component seq
[login-0-1.local:14686] mca:rmaps:select: Querying component [seq]
[login-0-1.local:14686] mca:rmaps:select: checking available component resilient
[login-0-1.local:14686] mca:rmaps:select: Querying component [resilient]
[login-0-1.local:14686] mca:rmaps:select: checking available component staged
[login-0-1.local:14686] mca:rmaps:select: Querying component [staged]
[login-0-1.local:14686] mca:rmaps:select: checking available component mindist
[login-0-1.local:14686] mca:rmaps:select: Querying component [mindist]
[login-0-1.local:14686] mca:rmaps:select: checking available component ppr
[login-0-1.local:14686] mca:rmaps:select: Querying component [ppr]
[login-0-1.local:14686] [[40992,0],0]: Final mapper priorities
[login-0-1.local:14686]         Mapper: ppr Priority: 90
[login-0-1.local:14686]         Mapper: seq Priority: 60
[login-0-1.local:14686]         Mapper: resilient Priority: 40
[login-0-1.local:14686]         Mapper: mindist Priority: 20
[login-0-1.local:14686]         Mapper: round_robin Priority: 10
[login-0-1.local:14686]         Mapper: staged Priority: 5
[login-0-1.local:14686]         Mapper: rank_file Priority: 0
[login-0-1.local:14686] mca:rmaps: mapping job [40992,1]
[login-0-1.local:14686] mca:rmaps: creating new map for job [40992,1]
[login-0-1.local:14686] mca:rmaps: nprocs 32
[login-0-1.local:14686] mca:rmaps mapping given - using default
[login-0-1.local:14686] mca:rmaps:ppr: job [40992,1] not using ppr mapper
[login-0-1.local:14686] [[40992,0],0] rmaps:seq called on job [40992,1]
[login-0-1.local:14686] mca:rmaps:seq: job [40992,1] not using seq mapper
[login-0-1.local:14686] mca:rmaps:resilient: cannot perform initial map of job 
[40992,1] - no fault groups
[login-0-1.local:14686] mca:rmaps:mindist: job [40992,1] not using mindist 
mapper
[login-0-1.local:14686] mca:rmaps:rr: mapping job [40992,1]
[login-0-1.local:14686] [[40992,0],0] Starting with 7 nodes in list
[login-0-1.local:14686] [[40992,0],0] Filtering thru apps
[login-0-1.local:14686] [[40992,0],0] Retained 7 nodes in list
[login-0-1.local:14686] [[40992,0],0] node c1-2 has 4 slots available
[login-0-1.local:14686] [[40992,0],0] node c1-4 has 2 slots available
[login-0-1.local:14686] [[40992,0],0] node c1-8 has 5 slots available
[login-0-1.local:14686] [[40992,0],0] node c1-13 has 5 slots available
[login-0-1.local:14686] [[40992,0],0] node c1-16 has 4 slots available
[login-0-1.local:14686] [[40992,0],0] node c1-23 has 7 slots available
[login-0-1.local:14686] [[40992,0],0] node c1-26 has 5 slots available
[login-0-1.local:14686] AVAILABLE NODES FOR MAPPING:
[login-0-1.local:14686]     node: c1-2 daemon: 1
[login-0-1.local:14686]     node: c1-4 daemon: 2
[login-0-1.local:14686]     node: c1-8 daemon: 3
[login-0-1.local:14686]     node: c1-13 daemon: 4
[login-0-1.local:14686]     node: c1-16 daemon: 5
[login-0-1.local:14686]     node: c1-23 daemon: 6
[login-0-1.local:14686]     node: c1-26 daemon: 7
[login-0-1.local:14686] [[40992,0],0] Starting bookmark at node c1-2
[login-0-1.local:14686] [[40992,0],0] Starting at node c1-2
[login-0-1.local:14686] mca:rmaps:rr: mapping no-span by Socket for job 
[40992,1] slots 32 num_procs 32
[login-0-1.local:14686] mca:rmaps:rr: found 1 Socket objects on node c1-2
[login-0-1.local:14686] mca:rmaps:rr: found 2 Socket objects on node c1-4
[login-0-1.local:14686] mca:rmaps:rr: found 2 Socket objects on node c1-8
[login-0-1.local:14686] mca:rmaps:rr: found 1 Socket objects on node c1-13
[login-0-1.local:14686] mca:rmaps:rr: found 1 Socket objects on node c1-16
[login-0-1.local:14686] mca:rmaps:rr: found 2 Socket objects on node c1-23
[login-0-1.local:14686] mca:rmaps:rr: found 2 Socket objects on node c1-26
[login-0-1.local:14686] mca:rmaps: computing ranks by socket for job [40992,1]
[login-0-1.local:14686] mca:rmaps:rank_by: found 1 objects on node c1-2 with 4 
procs
[login-0-1.local:14686] mca:rmaps:rank_by: assigned rank 0
[login-0-1.local:14686] mca:rmaps:rank_by: assigned rank 1
[login-0-1.local:14686] mca:rmaps:rank_by: assigned rank 2
[login-0-1.local:14686] mca:rmaps:rank_by: assigned rank 3
[login-0-1.local:14686] mca:rmaps:rank_by: found 2 objects on node c1-4 with 2 
procs
[login-0-1.local:14686] mca:rmaps:rank_by: assigned rank 4
[login-0-1.local:14686] mca:rmaps:rank_by: assigned rank 5
[login-0-1.local:14686] mca:rmaps:rank_by: found 2 objects on node c1-8 with 5 
procs
[login-0-1.local:14686] mca:rmaps:rank_by: assigned rank 6
[login-0-1.local:14686] mca:rmaps:rank_by: assigned rank 7
[login-0-1.local:14686] mca:rmaps:rank_by: assigned rank 8
[login-0-1.local:14686] mca:rmaps:rank_by: assigned rank 9
[login-0-1.local:14686] mca:rmaps:rank_by: assigned rank 10
[login-0-1.local:14686] mca:rmaps:rank_by: found 1 objects on node c1-13 with 5 
procs
[login-0-1.local:14686] mca:rmaps:rank_by: assigned rank 11
[login-0-1.local:14686] mca:rmaps:rank_by: assigned rank 12
[login-0-1.local:14686] mca:rmaps:rank_by: assigned rank 13
[login-0-1.local:14686] mca:rmaps:rank_by: assigned rank 14
[login-0-1.local:14686] mca:rmaps:rank_by: assigned rank 15
[login-0-1.local:14686] mca:rmaps:rank_by: found 1 objects on node c1-16 with 4 
procs
[login-0-1.local:14686] mca:rmaps:rank_by: assigned rank 16
[login-0-1.local:14686] mca:rmaps:rank_by: assigned rank 17
[login-0-1.local:14686] mca:rmaps:rank_by: assigned rank 18
[login-0-1.local:14686] mca:rmaps:rank_by: assigned rank 19
[login-0-1.local:14686] mca:rmaps:rank_by: found 2 objects on node c1-23 with 7 
procs
[login-0-1.local:14686] mca:rmaps:rank_by: assigned rank 20
[login-0-1.local:14686] mca:rmaps:rank_by: assigned rank 21
[login-0-1.local:14686] mca:rmaps:rank_by: assigned rank 22
[login-0-1.local:14686] mca:rmaps:rank_by: assigned rank 23
[login-0-1.local:14686] mca:rmaps:rank_by: assigned rank 24
[login-0-1.local:14686] mca:rmaps:rank_by: assigned rank 25
[login-0-1.local:14686] mca:rmaps:rank_by: assigned rank 26
[login-0-1.local:14686] mca:rmaps:rank_by: found 2 objects on node c1-26 with 5 
procs
[login-0-1.local:14686] mca:rmaps:rank_by: assigned rank 27
[login-0-1.local:14686] mca:rmaps:rank_by: assigned rank 28
[login-0-1.local:14686] mca:rmaps:rank_by: assigned rank 29
[login-0-1.local:14686] mca:rmaps:rank_by: assigned rank 30
[login-0-1.local:14686] mca:rmaps:rank_by: assigned rank 31
[login-0-1.local:14686] [[40992,0],0] rmaps:base:compute_usage
[login-0-1.local:14686] mca:rmaps: compute bindings for job [40992,1] with 
policy CORE[4008]
[login-0-1.local:14686] [[40992,0],0] reset_usage: node c1-2 has 4 procs on it
[login-0-1.local:14686] [[40992,0],0] reset_usage: ignoring proc [[40992,1],0]
[login-0-1.local:14686] [[40992,0],0] reset_usage: ignoring proc [[40992,1],1]
[login-0-1.local:14686] [[40992,0],0] reset_usage: ignoring proc [[40992,1],2]
[login-0-1.local:14686] [[40992,0],0] reset_usage: ignoring proc [[40992,1],3]
[login-0-1.local:14686] [[40992,0],0] bind_depth: 6 map_depth 2
[login-0-1.local:14686] mca:rmaps: bind downward for job [40992,1] with 
bindings CORE
[login-0-1.local:14686] [[40992,0],0] GOT 1 CPUS
[login-0-1.local:14686] [[40992,0],0] PROC [[40992,1],0] BITMAP 1,17
[login-0-1.local:14686] [[40992,0],0] BOUND PROC [[40992,1],0][c1-2] TO socket 
0[core 1[hwt 0-1]]: [../BB/../../../../../..][../../../../../../../..]
[login-0-1.local:14686] [[40992,0],0] GOT 1 CPUS
[login-0-1.local:14686] [[40992,0],0] PROC [[40992,1],1] BITMAP 2,18
[login-0-1.local:14686] [[40992,0],0] BOUND PROC [[40992,1],1][c1-2] TO socket 
0[core 2[hwt 0-1]]: [../../BB/../../../../..][../../../../../../../..]
[login-0-1.local:14686] [[40992,0],0] GOT 1 CPUS
[login-0-1.local:14686] [[40992,0],0] PROC [[40992,1],2] BITMAP 3,19
[login-0-1.local:14686] [[40992,0],0] BOUND PROC [[40992,1],2][c1-2] TO socket 
0[core 3[hwt 0-1]]: [../../../BB/../../../..][../../../../../../../..]
[login-0-1.local:14686] [[40992,0],0] GOT 1 CPUS
[login-0-1.local:14686] [[40992,0],0] PROC [[40992,1],3] BITMAP 4,20
[login-0-1.local:14686] [[40992,0],0] BOUND PROC [[40992,1],3][c1-2] TO socket 
0[core 4[hwt 0-1]]: [../../../../BB/../../..][../../../../../../../..]
[login-0-1.local:14686] [[40992,0],0] reset_usage: node c1-4 has 2 procs on it
[login-0-1.local:14686] [[40992,0],0] reset_usage: ignoring proc [[40992,1],4]
[login-0-1.local:14686] [[40992,0],0] reset_usage: ignoring proc [[40992,1],5]
[login-0-1.local:14686] [[40992,0],0] bind_depth: 6 map_depth 2
[login-0-1.local:14686] mca:rmaps: bind downward for job [40992,1] with 
bindings CORE
[login-0-1.local:14686] [[40992,0],0] GOT 1 CPUS
[login-0-1.local:14686] [[40992,0],0] PROC [[40992,1],4] BITMAP 1,17
[login-0-1.local:14686] [[40992,0],0] BOUND PROC [[40992,1],4][c1-4] TO socket 
0[core 1[hwt 0-1]]: [../BB/../../../../../..][../../../../../../../..]
[login-0-1.local:14686] [[40992,0],0] GOT 1 CPUS
[login-0-1.local:14686] [[40992,0],0] PROC [[40992,1],5] BITMAP 15,31
[login-0-1.local:14686] [[40992,0],0] BOUND PROC [[40992,1],5][c1-4] TO socket 
1[core 15[hwt 0-1]]: [../../../../../../../..][../../../../../../../BB]
[login-0-1.local:14686] [[40992,0],0] reset_usage: node c1-8 has 5 procs on it
[login-0-1.local:14686] [[40992,0],0] reset_usage: ignoring proc [[40992,1],6]
[login-0-1.local:14686] [[40992,0],0] reset_usage: ignoring proc [[40992,1],7]
[login-0-1.local:14686] [[40992,0],0] reset_usage: ignoring proc [[40992,1],8]
[login-0-1.local:14686] [[40992,0],0] reset_usage: ignoring proc [[40992,1],9]
[login-0-1.local:14686] [[40992,0],0] reset_usage: ignoring proc [[40992,1],10]
[login-0-1.local:14686] [[40992,0],0] bind_depth: 6 map_depth 2
[login-0-1.local:14686] mca:rmaps: bind downward for job [40992,1] with 
bindings CORE
[login-0-1.local:14686] [[40992,0],0] GOT 1 CPUS
[login-0-1.local:14686] [[40992,0],0] PROC [[40992,1],6] BITMAP 0,16
[login-0-1.local:14686] [[40992,0],0] BOUND PROC [[40992,1],6][c1-8] TO socket 
0[core 0[hwt 0-1]]: [BB/../../../../../../..][../../../../../../../..]
[login-0-1.local:14686] [[40992,0],0] GOT 1 CPUS
[login-0-1.local:14686] [[40992,0],0] PROC [[40992,1],7] BITMAP 9,25
[login-0-1.local:14686] [[40992,0],0] BOUND PROC [[40992,1],7][c1-8] TO socket 
1[core 9[hwt 0-1]]: [../../../../../../../..][../BB/../../../../../..]
[login-0-1.local:14686] [[40992,0],0] GOT 1 CPUS
[login-0-1.local:14686] [[40992,0],0] PROC [[40992,1],8] BITMAP 5,21
[login-0-1.local:14686] [[40992,0],0] BOUND PROC [[40992,1],8][c1-8] TO socket 
0[core 5[hwt 0-1]]: [../../../../../BB/../..][../../../../../../../..]
[login-0-1.local:14686] [[40992,0],0] GOT 1 CPUS
[login-0-1.local:14686] [[40992,0],0] PROC [[40992,1],9] BITMAP 13,29
[login-0-1.local:14686] [[40992,0],0] BOUND PROC [[40992,1],9][c1-8] TO socket 
1[core 13[hwt 0-1]]: [../../../../../../../..][../../../../../BB/../..]
[login-0-1.local:14686] [[40992,0],0] GOT 1 CPUS
[login-0-1.local:14686] [[40992,0],0] PROC [[40992,1],10] BITMAP 14,30
[login-0-1.local:14686] [[40992,0],0] BOUND PROC [[40992,1],10][c1-8] TO socket 
1[core 14[hwt 0-1]]: [../../../../../../../..][../../../../../../BB/..]
[login-0-1.local:14686] [[40992,0],0] reset_usage: node c1-13 has 5 procs on it
[login-0-1.local:14686] [[40992,0],0] reset_usage: ignoring proc [[40992,1],11]
[login-0-1.local:14686] [[40992,0],0] reset_usage: ignoring proc [[40992,1],12]
[login-0-1.local:14686] [[40992,0],0] reset_usage: ignoring proc [[40992,1],13]
[login-0-1.local:14686] [[40992,0],0] reset_usage: ignoring proc [[40992,1],14]
[login-0-1.local:14686] [[40992,0],0] reset_usage: ignoring proc [[40992,1],15]
[login-0-1.local:14686] [[40992,0],0] bind_depth: 6 map_depth 2
[login-0-1.local:14686] mca:rmaps: bind downward for job [40992,1] with 
bindings CORE
[login-0-1.local:14686] [[40992,0],0] GOT 1 CPUS
[login-0-1.local:14686] [[40992,0],0] PROC [[40992,1],11] BITMAP 3,19
[login-0-1.local:14686] [[40992,0],0] BOUND PROC [[40992,1],11][c1-13] TO 
socket 0[core 3[hwt 0-1]]: [../../../BB/../../../..][../../../../../../../..]
[login-0-1.local:14686] [[40992,0],0] GOT 1 CPUS
[login-0-1.local:14686] [[40992,0],0] PROC [[40992,1],12] BITMAP 4,20
[login-0-1.local:14686] [[40992,0],0] BOUND PROC [[40992,1],12][c1-13] TO 
socket 0[core 4[hwt 0-1]]: [../../../../BB/../../..][../../../../../../../..]
[login-0-1.local:14686] [[40992,0],0] GOT 1 CPUS
[login-0-1.local:14686] [[40992,0],0] PROC [[40992,1],13] BITMAP 5,21
[login-0-1.local:14686] [[40992,0],0] BOUND PROC [[40992,1],13][c1-13] TO 
socket 0[core 5[hwt 0-1]]: [../../../../../BB/../..][../../../../../../../..]
[login-0-1.local:14686] [[40992,0],0] GOT 1 CPUS
[login-0-1.local:14686] [[40992,0],0] PROC [[40992,1],14] BITMAP 6,22
[login-0-1.local:14686] [[40992,0],0] BOUND PROC [[40992,1],14][c1-13] TO 
socket 0[core 6[hwt 0-1]]: [../../../../../../BB/..][../../../../../../../..]
[login-0-1.local:14686] [[40992,0],0] GOT 1 CPUS
[login-0-1.local:14686] [[40992,0],0] PROC [[40992,1],15] BITMAP 7,23
[login-0-1.local:14686] [[40992,0],0] BOUND PROC [[40992,1],15][c1-13] TO 
socket 0[core 7[hwt 0-1]]: [../../../../../../../BB][../../../../../../../..]
[login-0-1.local:14686] [[40992,0],0] reset_usage: node c1-16 has 4 procs on it
[login-0-1.local:14686] [[40992,0],0] reset_usage: ignoring proc [[40992,1],16]
[login-0-1.local:14686] [[40992,0],0] reset_usage: ignoring proc [[40992,1],17]
[login-0-1.local:14686] [[40992,0],0] reset_usage: ignoring proc [[40992,1],18]
[login-0-1.local:14686] [[40992,0],0] reset_usage: ignoring proc [[40992,1],19]
[login-0-1.local:14686] [[40992,0],0] bind_depth: 6 map_depth 2
[login-0-1.local:14686] mca:rmaps: bind downward for job [40992,1] with 
bindings CORE
[login-0-1.local:14686] [[40992,0],0] GOT 1 CPUS
[login-0-1.local:14686] [[40992,0],0] PROC [[40992,1],16] BITMAP 12,28
[login-0-1.local:14686] [[40992,0],0] BOUND PROC [[40992,1],16][c1-16] TO 
socket 1[core 12[hwt 0-1]]: [../../../../../../../..][../../../../BB/../../..]
[login-0-1.local:14686] [[40992,0],0] GOT 1 CPUS
[login-0-1.local:14686] [[40992,0],0] PROC [[40992,1],17] BITMAP 13,29
[login-0-1.local:14686] [[40992,0],0] BOUND PROC [[40992,1],17][c1-16] TO 
socket 1[core 13[hwt 0-1]]: [../../../../../../../..][../../../../../BB/../..]
[login-0-1.local:14686] [[40992,0],0] GOT 1 CPUS
[login-0-1.local:14686] [[40992,0],0] PROC [[40992,1],18] BITMAP 14,30
[login-0-1.local:14686] [[40992,0],0] BOUND PROC [[40992,1],18][c1-16] TO 
socket 1[core 14[hwt 0-1]]: [../../../../../../../..][../../../../../../BB/..]
[login-0-1.local:14686] [[40992,0],0] GOT 1 CPUS
[login-0-1.local:14686] [[40992,0],0] PROC [[40992,1],19] BITMAP 15,31
[login-0-1.local:14686] [[40992,0],0] BOUND PROC [[40992,1],19][c1-16] TO 
socket 1[core 15[hwt 0-1]]: [../../../../../../../..][../../../../../../../BB]
[login-0-1.local:14686] [[40992,0],0] reset_usage: node c1-23 has 7 procs on it
[login-0-1.local:14686] [[40992,0],0] reset_usage: ignoring proc [[40992,1],20]
[login-0-1.local:14686] [[40992,0],0] reset_usage: ignoring proc [[40992,1],21]
[login-0-1.local:14686] [[40992,0],0] reset_usage: ignoring proc [[40992,1],22]
[login-0-1.local:14686] [[40992,0],0] reset_usage: ignoring proc [[40992,1],23]
[login-0-1.local:14686] [[40992,0],0] reset_usage: ignoring proc [[40992,1],24]
[login-0-1.local:14686] [[40992,0],0] reset_usage: ignoring proc [[40992,1],25]
[login-0-1.local:14686] [[40992,0],0] reset_usage: ignoring proc [[40992,1],26]
[login-0-1.local:14686] [[40992,0],0] bind_depth: 6 map_depth 2
[login-0-1.local:14686] mca:rmaps: bind downward for job [40992,1] with 
bindings CORE
[login-0-1.local:14686] [[40992,0],0] GOT 1 CPUS
[login-0-1.local:14686] [[40992,0],0] PROC [[40992,1],20] BITMAP 2,18
[login-0-1.local:14686] [[40992,0],0] BOUND PROC [[40992,1],20][c1-23] TO 
socket 0[core 2[hwt 0-1]]: [../../BB/../../../../..][../../../../../../../..]
[login-0-1.local:14686] [[40992,0],0] GOT 1 CPUS
[login-0-1.local:14686] [[40992,0],0] PROC [[40992,1],21] BITMAP 8,24
[login-0-1.local:14686] [[40992,0],0] BOUND PROC [[40992,1],21][c1-23] TO 
socket 1[core 8[hwt 0-1]]: [../../../../../../../..][BB/../../../../../../..]
[login-0-1.local:14686] [[40992,0],0] GOT 1 CPUS
[login-0-1.local:14686] [[40992,0],0] PROC [[40992,1],22] BITMAP 3,19
[login-0-1.local:14686] [[40992,0],0] BOUND PROC [[40992,1],22][c1-23] TO 
socket 0[core 3[hwt 0-1]]: [../../../BB/../../../..][../../../../../../../..]
[login-0-1.local:14686] [[40992,0],0] GOT 1 CPUS
[login-0-1.local:14686] [[40992,0],0] PROC [[40992,1],23] BITMAP 13,29
[login-0-1.local:14686] [[40992,0],0] BOUND PROC [[40992,1],23][c1-23] TO 
socket 1[core 13[hwt 0-1]]: [../../../../../../../..][../../../../../BB/../..]
[login-0-1.local:14686] [[40992,0],0] GOT 1 CPUS
[login-0-1.local:14686] [[40992,0],0] PROC [[40992,1],24] BITMAP 4,20
[login-0-1.local:14686] [[40992,0],0] BOUND PROC [[40992,1],24][c1-23] TO 
socket 0[core 4[hwt 0-1]]: [../../../../BB/../../..][../../../../../../../..]
[login-0-1.local:14686] [[40992,0],0] GOT 1 CPUS
[login-0-1.local:14686] [[40992,0],0] PROC [[40992,1],25] BITMAP 14,30
[login-0-1.local:14686] [[40992,0],0] BOUND PROC [[40992,1],25][c1-23] TO 
socket 1[core 14[hwt 0-1]]: [../../../../../../../..][../../../../../../BB/..]
[login-0-1.local:14686] [[40992,0],0] GOT 1 CPUS
[login-0-1.local:14686] [[40992,0],0] PROC [[40992,1],26] BITMAP 15,31
[login-0-1.local:14686] [[40992,0],0] BOUND PROC [[40992,1],26][c1-23] TO 
socket 1[core 15[hwt 0-1]]: [../../../../../../../..][../../../../../../../BB]
[login-0-1.local:14686] [[40992,0],0] reset_usage: node c1-26 has 5 procs on it
[login-0-1.local:14686] [[40992,0],0] reset_usage: ignoring proc [[40992,1],27]
[login-0-1.local:14686] [[40992,0],0] reset_usage: ignoring proc [[40992,1],28]
[login-0-1.local:14686] [[40992,0],0] reset_usage: ignoring proc [[40992,1],29]
[login-0-1.local:14686] [[40992,0],0] reset_usage: ignoring proc [[40992,1],30]
[login-0-1.local:14686] [[40992,0],0] reset_usage: ignoring proc [[40992,1],31]
[login-0-1.local:14686] [[40992,0],0] bind_depth: 6 map_depth 2
[login-0-1.local:14686] mca:rmaps: bind downward for job [40992,1] with 
bindings CORE
[login-0-1.local:14686] [[40992,0],0] GOT 1 CPUS
[login-0-1.local:14686] [[40992,0],0] PROC [[40992,1],27] BITMAP 1,17
[login-0-1.local:14686] [[40992,0],0] BOUND PROC [[40992,1],27][c1-26] TO 
socket 0[core 1[hwt 0-1]]: [../BB/../../../../../..][../../../../../../../..]
[login-0-1.local:14686] [[40992,0],0] GOT 1 CPUS
[login-0-1.local:14686] [[40992,0],0] PROC [[40992,1],28] BITMAP 11,27
[login-0-1.local:14686] [[40992,0],0] BOUND PROC [[40992,1],28][c1-26] TO 
socket 1[core 11[hwt 0-1]]: [../../../../../../../..][../../../BB/../../../..]
[login-0-1.local:14686] [[40992,0],0] GOT 1 CPUS
[login-0-1.local:14686] [[40992,0],0] PROC [[40992,1],29] BITMAP 6,22
[login-0-1.local:14686] [[40992,0],0] BOUND PROC [[40992,1],29][c1-26] TO 
socket 0[core 6[hwt 0-1]]: [../../../../../../BB/..][../../../../../../../..]
[login-0-1.local:14686] [[40992,0],0] GOT 1 CPUS
[login-0-1.local:14686] [[40992,0],0] PROC [[40992,1],30] BITMAP 13,29
[login-0-1.local:14686] [[40992,0],0] BOUND PROC [[40992,1],30][c1-26] TO 
socket 1[core 13[hwt 0-1]]: [../../../../../../../..][../../../../../BB/../..]
[login-0-1.local:14686] [[40992,0],0] GOT 1 CPUS
[login-0-1.local:14686] [[40992,0],0] PROC [[40992,1],31] BITMAP 15,31
[login-0-1.local:14686] [[40992,0],0] BOUND PROC [[40992,1],31][c1-26] TO 
socket 1[core 15[hwt 0-1]]: [../../../../../../../..][../../../../../../../BB]
[compute-1-4.local:07121] MCW rank 4 bound to socket 0[core 1[hwt 0-1]]: 
[../BB/../../../../../..][../../../../../../../..]
[compute-1-4.local:07121] MCW rank 5 bound to socket 1[core 15[hwt 0-1]]: 
[../../../../../../../..][../../../../../../../BB]
[compute-1-2.local:12116] MCW rank 0 bound to socket 0[core 1[hwt 0-1]]: 
[../BB/../../../../../..][../../../../../../../..]
[compute-1-2.local:12116] MCW rank 1 bound to socket 0[core 2[hwt 0-1]]: 
[../../BB/../../../../..][../../../../../../../..]
[compute-1-16.local:10183] MCW rank 16 bound to socket 1[core 12[hwt 0-1]]: 
[../../../../../../../..][../../../../BB/../../..]
[compute-1-2.local:12116] MCW rank 2 bound to socket 0[core 3[hwt 0-1]]: 
[../../../BB/../../../..][../../../../../../../..]
[compute-1-13.local:27052] MCW rank 11 bound to socket 0[core 3[hwt 0-1]]: 
[../../../BB/../../../..][../../../../../../../..]
[compute-1-16.local:10183] MCW rank 17 bound to socket 1[core 13[hwt 0-1]]: 
[../../../../../../../..][../../../../../BB/../..]
[compute-1-16.local:10183] MCW rank 18 bound to socket 1[core 14[hwt 0-1]]: 
[../../../../../../../..][../../../../../../BB/..]
[compute-1-2.local:12116] MCW rank 3 bound to socket 0[core 4[hwt 0-1]]: 
[../../../../BB/../../..][../../../../../../../..]
[compute-1-13.local:27052] MCW rank 12 bound to socket 0[core 4[hwt 0-1]]: 
[../../../../BB/../../..][../../../../../../../..]
[compute-1-13.local:27052] MCW rank 13 bound to socket 0[core 5[hwt 0-1]]: 
[../../../../../BB/../..][../../../../../../../..]
[compute-1-16.local:10183] MCW rank 19 bound to socket 1[core 15[hwt 0-1]]: 
[../../../../../../../..][../../../../../../../BB]
[compute-1-13.local:27052] MCW rank 14 bound to socket 0[core 6[hwt 0-1]]: 
[../../../../../../BB/..][../../../../../../../..]
[compute-1-13.local:27052] MCW rank 15 bound to socket 0[core 7[hwt 0-1]]: 
[../../../../../../../BB][../../../../../../../..]
[compute-1-8.local:15116] MCW rank 8 bound to socket 0[core 5[hwt 0-1]]: 
[../../../../../BB/../..][../../../../../../../..]
[compute-1-8.local:15116] MCW rank 9 bound to socket 1[core 13[hwt 0-1]]: 
[../../../../../../../..][../../../../../BB/../..]
[compute-1-8.local:15116] MCW rank 10 bound to socket 1[core 14[hwt 0-1]]: 
[../../../../../../../..][../../../../../../BB/..]
[compute-1-8.local:15116] MCW rank 6 bound to socket 0[core 0[hwt 0-1]]: 
[BB/../../../../../../..][../../../../../../../..]
[compute-1-8.local:15116] MCW rank 7 bound to socket 1[core 9[hwt 0-1]]: 
[../../../../../../../..][../BB/../../../../../..]
[compute-1-26.local:32321] MCW rank 31 bound to socket 1[core 15[hwt 0-1]]: 
[../../../../../../../..][../../../../../../../BB]
[compute-1-26.local:32321] MCW rank 27 bound to socket 0[core 1[hwt 0-1]]: 
[../BB/../../../../../..][../../../../../../../..]
[compute-1-26.local:32321] MCW rank 28 bound to socket 1[core 11[hwt 0-1]]: 
[../../../../../../../..][../../../BB/../../../..]
[compute-1-26.local:32321] MCW rank 29 bound to socket 0[core 6[hwt 0-1]]: 
[../../../../../../BB/..][../../../../../../../..]
[compute-1-26.local:32321] MCW rank 30 bound to socket 1[core 13[hwt 0-1]]: 
[../../../../../../../..][../../../../../BB/../..]
[compute-1-23.local:19935] MCW rank 20 bound to socket 0[core 2[hwt 0-1]]: 
[../../BB/../../../../..][../../../../../../../..]
[compute-1-23.local:19935] MCW rank 21 bound to socket 1[core 8[hwt 0-1]]: 
[../../../../../../../..][BB/../../../../../../..]
[compute-1-23.local:19935] MCW rank 22 bound to socket 0[core 3[hwt 0-1]]: 
[../../../BB/../../../..][../../../../../../../..]
[compute-1-23.local:19935] MCW rank 23 bound to socket 1[core 13[hwt 0-1]]: 
[../../../../../../../..][../../../../../BB/../..]
[compute-1-23.local:19935] MCW rank 24 bound to socket 0[core 4[hwt 0-1]]: 
[../../../../BB/../../..][../../../../../../../..]
[compute-1-23.local:19935] MCW rank 25 bound to socket 1[core 14[hwt 0-1]]: 
[../../../../../../../..][../../../../../../BB/..]
[compute-1-23.local:19935] MCW rank 26 bound to socket 1[core 15[hwt 0-1]]: 
[../../../../../../../..][../../../../../../../BB]
rank 0 @ compute-1-2.local  1, 17,
rank 1 @ compute-1-2.local  2, 18,
rank 2 @ compute-1-2.local  3, 19,
rank 3 @ compute-1-2.local  4, 20,
rank 4 @ compute-1-4.local  1, 17,
rank 5 @ compute-1-4.local  15, 31,
rank 6 @ compute-1-8.local  0, 16,
rank 7 @ compute-1-8.local  9, 25,
rank 8 @ compute-1-8.local  5, 21,
rank 9 @ compute-1-8.local  13, 29,
rank 10 @ compute-1-8.local  14, 30,
rank 11 @ compute-1-13.local  3, 19,
rank 12 @ compute-1-13.local  4, 20,
rank 13 @ compute-1-13.local  5, 21,
rank 14 @ compute-1-13.local  6, 22,
rank 15 @ compute-1-13.local  7, 23,
rank 16 @ compute-1-16.local  12, 28,
rank 17 @ compute-1-16.local  13, 29,
rank 18 @ compute-1-16.local  14, 30,
rank 19 @ compute-1-16.local  15, 31,
rank 20 @ compute-1-23.local  2, 18,
rank 21 @ compute-1-23.local  8, 24,
rank 22 @ compute-1-23.local  3, 19,
rank 23 @ compute-1-23.local  13, 29,
rank 24 @ compute-1-23.local  4, 20,
rank 25 @ compute-1-23.local  14, 30,
rank 26 @ compute-1-23.local  15, 31,
rank 27 @ compute-1-26.local  1, 17,
rank 28 @ compute-1-26.local  11, 27,
rank 29 @ compute-1-26.local  6, 22,
rank 30 @ compute-1-26.local  13, 29,
rank 31 @ compute-1-26.local  15, 31,
[login-0-1.local:14686] mca: base: close: component round_robin closed
[login-0-1.local:14686] mca: base: close: unloading component round_robin
[login-0-1.local:14686] mca: base: close: component rank_file closed
[login-0-1.local:14686] mca: base: close: unloading component rank_file
[login-0-1.local:14686] mca: base: close: component seq closed
[login-0-1.local:14686] mca: base: close: unloading component seq
[login-0-1.local:14686] mca: base: close: component resilient closed
[login-0-1.local:14686] mca: base: close: unloading component resilient
[login-0-1.local:14686] mca: base: close: component staged closed
[login-0-1.local:14686] mca: base: close: unloading component staged
[login-0-1.local:14686] mca: base: close: component mindist closed
[login-0-1.local:14686] mca: base: close: unloading component mindist
[login-0-1.local:14686] mca: base: close: component ppr closed
[login-0-1.local:14686] mca: base: close: unloading component ppr
[login-0-1.local:03004] mca: base: components_register: registering rmaps 
components
[login-0-1.local:03004] mca: base: components_register: found loaded component 
round_robin
[login-0-1.local:03004] mca: base: components_register: component round_robin 
register function successful
[login-0-1.local:03004] mca: base: components_register: found loaded component 
rank_file
[login-0-1.local:03004] mca: base: components_register: component rank_file 
register function successful
[login-0-1.local:03004] mca: base: components_register: found loaded component 
seq
[login-0-1.local:03004] mca: base: components_register: component seq register 
function successful
[login-0-1.local:03004] mca: base: components_register: found loaded component 
resilient
[login-0-1.local:03004] mca: base: components_register: component resilient 
register function successful
[login-0-1.local:03004] mca: base: components_register: found loaded component 
staged
[login-0-1.local:03004] mca: base: components_register: component staged has no 
register or open function
[login-0-1.local:03004] mca: base: components_register: found loaded component 
mindist
[login-0-1.local:03004] mca: base: components_register: component mindist 
register function successful
[login-0-1.local:03004] mca: base: components_register: found loaded component 
ppr
[login-0-1.local:03004] mca: base: components_register: component ppr register 
function successful
[login-0-1.local:03004] [[37570,0],0] rmaps:base set policy with NULL
[login-0-1.local:03004] mca: base: components_open: opening rmaps components
[login-0-1.local:03004] mca: base: components_open: found loaded component 
round_robin
[login-0-1.local:03004] mca: base: components_open: component round_robin open 
function successful
[login-0-1.local:03004] mca: base: components_open: found loaded component 
rank_file
[login-0-1.local:03004] mca: base: components_open: component rank_file open 
function successful
[login-0-1.local:03004] mca: base: components_open: found loaded component seq
[login-0-1.local:03004] mca: base: components_open: component seq open function 
successful
[login-0-1.local:03004] mca: base: components_open: found loaded component 
resilient
[login-0-1.local:03004] mca: base: components_open: component resilient open 
function successful
[login-0-1.local:03004] mca: base: components_open: found loaded component 
staged
[login-0-1.local:03004] mca: base: components_open: component staged open 
function successful
[login-0-1.local:03004] mca: base: components_open: found loaded component 
mindist
[login-0-1.local:03004] mca: base: components_open: component mindist open 
function successful
[login-0-1.local:03004] mca: base: components_open: found loaded component ppr
[login-0-1.local:03004] mca: base: components_open: component ppr open function 
successful
[login-0-1.local:03004] mca:rmaps:select: checking available component 
round_robin
[login-0-1.local:03004] mca:rmaps:select: Querying component [round_robin]
[login-0-1.local:03004] mca:rmaps:select: checking available component rank_file
[login-0-1.local:03004] mca:rmaps:select: Querying component [rank_file]
[login-0-1.local:03004] mca:rmaps:select: checking available component seq
[login-0-1.local:03004] mca:rmaps:select: Querying component [seq]
[login-0-1.local:03004] mca:rmaps:select: checking available component resilient
[login-0-1.local:03004] mca:rmaps:select: Querying component [resilient]
[login-0-1.local:03004] mca:rmaps:select: checking available component staged
[login-0-1.local:03004] mca:rmaps:select: Querying component [staged]
[login-0-1.local:03004] mca:rmaps:select: checking available component mindist
[login-0-1.local:03004] mca:rmaps:select: Querying component [mindist]
[login-0-1.local:03004] mca:rmaps:select: checking available component ppr
[login-0-1.local:03004] mca:rmaps:select: Querying component [ppr]
[login-0-1.local:03004] [[37570,0],0]: Final mapper priorities
[login-0-1.local:03004]         Mapper: ppr Priority: 90
[login-0-1.local:03004]         Mapper: seq Priority: 60
[login-0-1.local:03004]         Mapper: resilient Priority: 40
[login-0-1.local:03004]         Mapper: mindist Priority: 20
[login-0-1.local:03004]         Mapper: round_robin Priority: 10
[login-0-1.local:03004]         Mapper: staged Priority: 5
[login-0-1.local:03004]         Mapper: rank_file Priority: 0
[login-0-1.local:03004] mca:rmaps: mapping job [37570,1]
[login-0-1.local:03004] mca:rmaps: creating new map for job [37570,1]
[login-0-1.local:03004] mca:rmaps: nprocs 32
[login-0-1.local:03004] mca:rmaps[139] mapping not given - using bysocket
[login-0-1.local:03004] mca:rmaps:ppr: job [37570,1] not using ppr mapper
[login-0-1.local:03004] [[37570,0],0] rmaps:seq called on job [37570,1]
[login-0-1.local:03004] mca:rmaps:seq: job [37570,1] not using seq mapper
[login-0-1.local:03004] mca:rmaps:resilient: cannot perform initial map of job 
[37570,1] - no fault groups
[login-0-1.local:03004] mca:rmaps:mindist: job [37570,1] not using mindist 
mapper
[login-0-1.local:03004] mca:rmaps:rr: mapping job [37570,1]
[login-0-1.local:03004] [[37570,0],0] Starting with 7 nodes in list
[login-0-1.local:03004] [[37570,0],0] Filtering thru apps
[login-0-1.local:03004] [[37570,0],0] Retained 7 nodes in list
[login-0-1.local:03004] [[37570,0],0] node c1-2 has 4 slots available
[login-0-1.local:03004] [[37570,0],0] node c1-4 has 2 slots available
[login-0-1.local:03004] [[37570,0],0] node c1-8 has 5 slots available
[login-0-1.local:03004] [[37570,0],0] node c1-13 has 5 slots available
[login-0-1.local:03004] [[37570,0],0] node c1-16 has 4 slots available
[login-0-1.local:03004] [[37570,0],0] node c1-23 has 7 slots available
[login-0-1.local:03004] [[37570,0],0] node c1-26 has 5 slots available
[login-0-1.local:03004] AVAILABLE NODES FOR MAPPING:
[login-0-1.local:03004]     node: c1-2 daemon: 1
[login-0-1.local:03004]     node: c1-4 daemon: 2
[login-0-1.local:03004]     node: c1-8 daemon: 3
[login-0-1.local:03004]     node: c1-13 daemon: 4
[login-0-1.local:03004]     node: c1-16 daemon: 5
[login-0-1.local:03004]     node: c1-23 daemon: 6
[login-0-1.local:03004]     node: c1-26 daemon: 7
[login-0-1.local:03004] [[37570,0],0] Starting bookmark at node c1-2
[login-0-1.local:03004] [[37570,0],0] Starting at node c1-2
[login-0-1.local:03004] mca:rmaps:rr: mapping no-span by Socket for job 
[37570,1] slots 32 num_procs 32
[login-0-1.local:03004] mca:rmaps:rr: found 1 Socket objects on node c1-2
[login-0-1.local:03004] mca:rmaps:rr: found 2 Socket objects on node c1-4
[login-0-1.local:03004] mca:rmaps:rr: found 2 Socket objects on node c1-8
[login-0-1.local:03004] mca:rmaps:rr: found 1 Socket objects on node c1-13
[login-0-1.local:03004] mca:rmaps:rr: found 1 Socket objects on node c1-16
[login-0-1.local:03004] mca:rmaps:rr: found 2 Socket objects on node c1-23
[login-0-1.local:03004] mca:rmaps:rr: found 2 Socket objects on node c1-26
[login-0-1.local:03004] mca:rmaps:base: computing vpids by slot for job 
[37570,1]
[login-0-1.local:03004] mca:rmaps:base: assigning rank 0 to node c1-2
[login-0-1.local:03004] mca:rmaps:base: assigning rank 1 to node c1-2
[login-0-1.local:03004] mca:rmaps:base: assigning rank 2 to node c1-2
[login-0-1.local:03004] mca:rmaps:base: assigning rank 3 to node c1-2
[login-0-1.local:03004] mca:rmaps:base: assigning rank 4 to node c1-4
[login-0-1.local:03004] mca:rmaps:base: assigning rank 5 to node c1-4
[login-0-1.local:03004] mca:rmaps:base: assigning rank 6 to node c1-8
[login-0-1.local:03004] mca:rmaps:base: assigning rank 7 to node c1-8
[login-0-1.local:03004] mca:rmaps:base: assigning rank 8 to node c1-8
[login-0-1.local:03004] mca:rmaps:base: assigning rank 9 to node c1-8
[login-0-1.local:03004] mca:rmaps:base: assigning rank 10 to node c1-8
[login-0-1.local:03004] mca:rmaps:base: assigning rank 11 to node c1-13
[login-0-1.local:03004] mca:rmaps:base: assigning rank 12 to node c1-13
[login-0-1.local:03004] mca:rmaps:base: assigning rank 13 to node c1-13
[login-0-1.local:03004] mca:rmaps:base: assigning rank 14 to node c1-13
[login-0-1.local:03004] mca:rmaps:base: assigning rank 15 to node c1-13
[login-0-1.local:03004] mca:rmaps:base: assigning rank 16 to node c1-16
[login-0-1.local:03004] mca:rmaps:base: assigning rank 17 to node c1-16
[login-0-1.local:03004] mca:rmaps:base: assigning rank 18 to node c1-16
[login-0-1.local:03004] mca:rmaps:base: assigning rank 19 to node c1-16
[login-0-1.local:03004] mca:rmaps:base: assigning rank 20 to node c1-23
[login-0-1.local:03004] mca:rmaps:base: assigning rank 21 to node c1-23
[login-0-1.local:03004] mca:rmaps:base: assigning rank 22 to node c1-23
[login-0-1.local:03004] mca:rmaps:base: assigning rank 23 to node c1-23
[login-0-1.local:03004] mca:rmaps:base: assigning rank 24 to node c1-23
[login-0-1.local:03004] mca:rmaps:base: assigning rank 25 to node c1-23
[login-0-1.local:03004] mca:rmaps:base: assigning rank 26 to node c1-23
[login-0-1.local:03004] mca:rmaps:base: assigning rank 27 to node c1-26
[login-0-1.local:03004] mca:rmaps:base: assigning rank 28 to node c1-26
[login-0-1.local:03004] mca:rmaps:base: assigning rank 29 to node c1-26
[login-0-1.local:03004] mca:rmaps:base: assigning rank 30 to node c1-26
[login-0-1.local:03004] mca:rmaps:base: assigning rank 31 to node c1-26
[login-0-1.local:03004] [[37570,0],0] rmaps:base:compute_usage
[login-0-1.local:03004] mca:rmaps: compute bindings for job [37570,1] with 
policy CORE[4008]
[login-0-1.local:03004] [[37570,0],0] reset_usage: node c1-2 has 4 procs on it
[login-0-1.local:03004] [[37570,0],0] reset_usage: ignoring proc [[37570,1],0]
[login-0-1.local:03004] [[37570,0],0] reset_usage: ignoring proc [[37570,1],1]
[login-0-1.local:03004] [[37570,0],0] reset_usage: ignoring proc [[37570,1],2]
[login-0-1.local:03004] [[37570,0],0] reset_usage: ignoring proc [[37570,1],3]
[login-0-1.local:03004] [[37570,0],0] bind_depth: 6 map_depth 2
[login-0-1.local:03004] mca:rmaps: bind downward for job [37570,1] with 
bindings CORE
[login-0-1.local:03004] [[37570,0],0] GOT 1 CPUS
[login-0-1.local:03004] [[37570,0],0] PROC [[37570,1],0] BITMAP 1,17
[login-0-1.local:03004] [[37570,0],0] BOUND PROC [[37570,1],0][c1-2] TO socket 
0[core 1[hwt 0-1]]: [../BB/../../../../../..][../../../../../../../..]
[login-0-1.local:03004] [[37570,0],0] GOT 1 CPUS
[login-0-1.local:03004] [[37570,0],0] PROC [[37570,1],1] BITMAP 2,18
[login-0-1.local:03004] [[37570,0],0] BOUND PROC [[37570,1],1][c1-2] TO socket 
0[core 2[hwt 0-1]]: [../../BB/../../../../..][../../../../../../../..]
[login-0-1.local:03004] [[37570,0],0] GOT 1 CPUS
[login-0-1.local:03004] [[37570,0],0] PROC [[37570,1],2] BITMAP 3,19
[login-0-1.local:03004] [[37570,0],0] BOUND PROC [[37570,1],2][c1-2] TO socket 
0[core 3[hwt 0-1]]: [../../../BB/../../../..][../../../../../../../..]
[login-0-1.local:03004] [[37570,0],0] GOT 1 CPUS
[login-0-1.local:03004] [[37570,0],0] PROC [[37570,1],3] BITMAP 4,20
[login-0-1.local:03004] [[37570,0],0] BOUND PROC [[37570,1],3][c1-2] TO socket 
0[core 4[hwt 0-1]]: [../../../../BB/../../..][../../../../../../../..]
[login-0-1.local:03004] [[37570,0],0] reset_usage: node c1-4 has 2 procs on it
[login-0-1.local:03004] [[37570,0],0] reset_usage: ignoring proc [[37570,1],4]
[login-0-1.local:03004] [[37570,0],0] reset_usage: ignoring proc [[37570,1],5]
[login-0-1.local:03004] [[37570,0],0] bind_depth: 6 map_depth 2
[login-0-1.local:03004] mca:rmaps: bind downward for job [37570,1] with 
bindings CORE
[login-0-1.local:03004] [[37570,0],0] GOT 1 CPUS
[login-0-1.local:03004] [[37570,0],0] PROC [[37570,1],4] BITMAP 1,17
[login-0-1.local:03004] [[37570,0],0] BOUND PROC [[37570,1],4][c1-4] TO socket 
0[core 1[hwt 0-1]]: [../BB/../../../../../..][../../../../../../../..]
[login-0-1.local:03004] [[37570,0],0] GOT 1 CPUS
[login-0-1.local:03004] [[37570,0],0] PROC [[37570,1],5] BITMAP 15,31
[login-0-1.local:03004] [[37570,0],0] BOUND PROC [[37570,1],5][c1-4] TO socket 
1[core 15[hwt 0-1]]: [../../../../../../../..][../../../../../../../BB]
[login-0-1.local:03004] [[37570,0],0] reset_usage: node c1-8 has 5 procs on it
[login-0-1.local:03004] [[37570,0],0] reset_usage: ignoring proc [[37570,1],6]
[login-0-1.local:03004] [[37570,0],0] reset_usage: ignoring proc [[37570,1],7]
[login-0-1.local:03004] [[37570,0],0] reset_usage: ignoring proc [[37570,1],8]
[login-0-1.local:03004] [[37570,0],0] reset_usage: ignoring proc [[37570,1],9]
[login-0-1.local:03004] [[37570,0],0] reset_usage: ignoring proc [[37570,1],10]
[login-0-1.local:03004] [[37570,0],0] bind_depth: 6 map_depth 2
[login-0-1.local:03004] mca:rmaps: bind downward for job [37570,1] with 
bindings CORE
[login-0-1.local:03004] [[37570,0],0] GOT 1 CPUS
[login-0-1.local:03004] [[37570,0],0] PROC [[37570,1],6] BITMAP 0,16
[login-0-1.local:03004] [[37570,0],0] BOUND PROC [[37570,1],6][c1-8] TO socket 
0[core 0[hwt 0-1]]: [BB/../../../../../../..][../../../../../../../..]
[login-0-1.local:03004] [[37570,0],0] GOT 1 CPUS
[login-0-1.local:03004] [[37570,0],0] PROC [[37570,1],7] BITMAP 9,25
[login-0-1.local:03004] [[37570,0],0] BOUND PROC [[37570,1],7][c1-8] TO socket 
1[core 9[hwt 0-1]]: [../../../../../../../..][../BB/../../../../../..]
[login-0-1.local:03004] [[37570,0],0] GOT 1 CPUS
[login-0-1.local:03004] [[37570,0],0] PROC [[37570,1],8] BITMAP 5,21
[login-0-1.local:03004] [[37570,0],0] BOUND PROC [[37570,1],8][c1-8] TO socket 
0[core 5[hwt 0-1]]: [../../../../../BB/../..][../../../../../../../..]
[login-0-1.local:03004] [[37570,0],0] GOT 1 CPUS
[login-0-1.local:03004] [[37570,0],0] PROC [[37570,1],9] BITMAP 13,29
[login-0-1.local:03004] [[37570,0],0] BOUND PROC [[37570,1],9][c1-8] TO socket 
1[core 13[hwt 0-1]]: [../../../../../../../..][../../../../../BB/../..]
[login-0-1.local:03004] [[37570,0],0] GOT 1 CPUS
--------------------------------------------------------------------------
A request was made to bind to that would result in binding more
processes than cpus on a resource:

   Bind to:     CORE
   Node:        c1-8
   #processes:  2
   #cpus:       1

You can override this protection by adding the "overload-allowed"
option to your binding directive.
--------------------------------------------------------------------------
[login-0-1.local:03004] mca: base: close: component round_robin closed
[login-0-1.local:03004] mca: base: close: unloading component round_robin
[login-0-1.local:03004] mca: base: close: component rank_file closed
[login-0-1.local:03004] mca: base: close: unloading component rank_file
[login-0-1.local:03004] mca: base: close: component seq closed
[login-0-1.local:03004] mca: base: close: unloading component seq
[login-0-1.local:03004] mca: base: close: component resilient closed
[login-0-1.local:03004] mca: base: close: unloading component resilient
[login-0-1.local:03004] mca: base: close: component staged closed
[login-0-1.local:03004] mca: base: close: unloading component staged
[login-0-1.local:03004] mca: base: close: component mindist closed
[login-0-1.local:03004] mca: base: close: unloading component mindist
[login-0-1.local:03004] mca: base: close: component ppr closed
[login-0-1.local:03004] mca: base: close: unloading component ppr

Reply via email to