Bill, were you able to get a core file and analyze the stack with gdb ?
I suspect the error occurs in mca_btl_sm_add_procs but this is just my best guess. if this is correct, can you check the value of mca_btl_sm_component.num_smp_procs ? as a workaround, can you try mpirun --mca btl ^sm ... I do not see how I can tackle the root cause without being able to reproduce the issue :-( can you try to reproduce the issue with the smallest hostfile, and then run lstopo on all the nodes ? btw, you are not mixing 32 bits and 64 bits OS, are you ? Cheers, Gilles mca_btl_sm_add_procs( int mca_btl_sm_add_procs(On Wednesday, June 24, 2015, Lane, William < william.l...@cshs.org> wrote: > Gilles, > > All the blades only have two core Xeons (without hyperthreading) > populating both their sockets. All > the x3550 nodes have hyperthreading capable Xeons and Sandybridge server > CPU's. It's possible > hyperthreading has been disabled on some of these nodes though. The 3-0-n > nodes are all IBM x3550 > nodes while the 3-6-n nodes are all blade nodes. > > I have run this exact same test code successfully in the past on another > cluster (~200 nodes of Sunfire X2100 > 2x dual-core Opterons) w/no issues on upwards of 390 slots. I even tested > it recently on OpenMPI 1.8.5 > on another smaller R&D cluster consisting of 10 Sunfire X2100 nodes (w/2 > dual core Opterons apiece). > On this particular cluster I've had success running this code on < 132 > slots. > > Anyway, here's the results of the following mpirun: > > mpirun -np 132 -display-devel-map --prefix /hpc/apps/mpi/openmpi/1.8.6/ > --hostfile hostfile-noslots --mca btl_tcp_if_include eth0 --hetero-nodes > --bind-to core /hpc/home/lanew/mpi/openmpi/ProcessColors3 >> out.txt 2>&1 > > -------------------------------------------------------------------------- > WARNING: a request was made to bind a process. While the system > supports binding the process itself, at least one node does NOT > support binding memory to the process location. > > Node: csclprd3-6-1 > > This usually is due to not having the required NUMA support installed > on the node. In some Linux distributions, the required support is > contained in the libnumactl and libnumactl-devel packages. > This is a warning only; your job will continue, though performance may be > degraded. > -------------------------------------------------------------------------- > Data for JOB [51718,1] offset 0 > > Mapper requested: NULL Last mapper: round_robin Mapping policy: > BYSOCKET Ranking policy: SLOT > Binding policy: CORE Cpu set: NULL PPR: NULL Cpus-per-rank: 1 > Num new daemons: 0 New daemon starting vpid INVALID > Num nodes: 15 > > Data for node: csclprd3-6-1 Launch id: -1 State: 0 > Daemon: [[51718,0],1] Daemon launched: True > Num slots: 4 Slots in use: 4 Oversubscribed: FALSE > Num slots allocated: 4 Max slots: 0 > Username on node: NULL > Num procs: 4 Next node_rank: 4 > Data for proc: [[51718,1],0] > Pid: 0 Local rank: 0 Node rank: 0 App rank: 0 > State: INITIALIZED App_context: 0 > Locale: [B/B][./.] > Binding: [B/.][./.] > Data for proc: [[51718,1],1] > Pid: 0 Local rank: 1 Node rank: 1 App rank: 1 > State: INITIALIZED App_context: 0 > Locale: [./.][B/B] > Binding: [./.][B/.] > Data for proc: [[51718,1],2] > Pid: 0 Local rank: 2 Node rank: 2 App rank: 2 > State: INITIALIZED App_context: 0 > Locale: [B/B][./.] > Binding: [./B][./.] > Data for proc: [[51718,1],3] > Pid: 0 Local rank: 3 Node rank: 3 App rank: 3 > State: INITIALIZED App_context: 0 > Locale: [./.][B/B] > Binding: [./.][./B] > > Data for node: csclprd3-6-5 Launch id: -1 State: 0 > Daemon: [[51718,0],2] Daemon launched: True > Num slots: 4 Slots in use: 4 Oversubscribed: FALSE > Num slots allocated: 4 Max slots: 0 > Username on node: NULL > Num procs: 4 Next node_rank: 4 > Data for proc: [[51718,1],4] > Pid: 0 Local rank: 0 Node rank: 0 App rank: 4 > State: INITIALIZED App_context: 0 > Locale: [B/B][./.] > Binding: [B/.][./.] > Data for proc: [[51718,1],5] > Pid: 0 Local rank: 1 Node rank: 1 App rank: 5 > State: INITIALIZED App_context: 0 > Locale: [./.][B/B] > Binding: [./.][B/.] > Data for proc: [[51718,1],6] > Pid: 0 Local rank: 2 Node rank: 2 App rank: 6 > State: INITIALIZED App_context: 0 > Locale: [B/B][./.] > Binding: [./B][./.] > Data for proc: [[51718,1],7] > Pid: 0 Local rank: 3 Node rank: 3 App rank: 7 > State: INITIALIZED App_context: 0 > Locale: [./.][B/B] > Binding: [./.][./B] > > Data for node: csclprd3-0-0 Launch id: -1 State: 0 > Daemon: [[51718,0],3] Daemon launched: True > Num slots: 12 Slots in use: 12 Oversubscribed: FALSE > Num slots allocated: 12 Max slots: 0 > Username on node: NULL > Num procs: 12 Next node_rank: 12 > Data for proc: [[51718,1],8] > Pid: 0 Local rank: 0 Node rank: 0 App rank: 8 > State: INITIALIZED App_context: 0 > Locale: [B/B/B/B/B/B][./././././.] > Binding: [B/././././.][./././././.] > Data for proc: [[51718,1],9] > Pid: 0 Local rank: 1 Node rank: 1 App rank: 9 > State: INITIALIZED App_context: 0 > Locale: [./././././.][B/B/B/B/B/B] > Binding: [./././././.][B/././././.] > Data for proc: [[51718,1],10] > Pid: 0 Local rank: 2 Node rank: 2 App rank: 10 > State: INITIALIZED App_context: 0 > Locale: [B/B/B/B/B/B][./././././.] > Binding: [./B/./././.][./././././.] > Data for proc: [[51718,1],11] > Pid: 0 Local rank: 3 Node rank: 3 App rank: 11 > State: INITIALIZED App_context: 0 > Locale: [./././././.][B/B/B/B/B/B] > Binding: [./././././.][./B/./././.] > Data for proc: [[51718,1],12] > Pid: 0 Local rank: 4 Node rank: 4 App rank: 12 > State: INITIALIZED App_context: 0 > Locale: [B/B/B/B/B/B][./././././.] > Binding: [././B/././.][./././././.] > Data for proc: [[51718,1],13] > Pid: 0 Local rank: 5 Node rank: 5 App rank: 13 > State: INITIALIZED App_context: 0 > Locale: [./././././.][B/B/B/B/B/B] > Binding: [./././././.][././B/././.] > Data for proc: [[51718,1],14] > Pid: 0 Local rank: 6 Node rank: 6 App rank: 14 > State: INITIALIZED App_context: 0 > Locale: [B/B/B/B/B/B][./././././.] > Binding: [./././B/./.][./././././.] > Data for proc: [[51718,1],15] > Pid: 0 Local rank: 7 Node rank: 7 App rank: 15 > State: INITIALIZED App_context: 0 > Locale: [./././././.][B/B/B/B/B/B] > Binding: [./././././.][./././B/./.] > Data for proc: [[51718,1],16] > Pid: 0 Local rank: 8 Node rank: 8 App rank: 16 > State: INITIALIZED App_context: 0 > Locale: [B/B/B/B/B/B][./././././.] > Binding: [././././B/.][./././././.] > Data for proc: [[51718,1],17] > Pid: 0 Local rank: 9 Node rank: 9 App rank: 17 > State: INITIALIZED App_context: 0 > Locale: [./././././.][B/B/B/B/B/B] > Binding: [./././././.][././././B/.] > Data for proc: [[51718,1],18] > Pid: 0 Local rank: 10 Node rank: 10 App rank: 18 > State: INITIALIZED App_context: 0 > Locale: [B/B/B/B/B/B][./././././.] > Binding: [./././././B][./././././.] > Data for proc: [[51718,1],19] > Pid: 0 Local rank: 11 Node rank: 11 App rank: 19 > State: INITIALIZED App_context: 0 > Locale: [./././././.][B/B/B/B/B/B] > Binding: [./././././.][./././././B] > > Data for node: csclprd3-0-1 Launch id: -1 State: 0 > Daemon: [[51718,0],4] Daemon launched: True > Num slots: 6 Slots in use: 6 Oversubscribed: FALSE > Num slots allocated: 6 Max slots: 0 > Username on node: NULL > Num procs: 6 Next node_rank: 6 > Data for proc: [[51718,1],20] > Pid: 0 Local rank: 0 Node rank: 0 App rank: 20 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [B/././././.] > Data for proc: [[51718,1],21] > Pid: 0 Local rank: 1 Node rank: 1 App rank: 21 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [./B/./././.] > Data for proc: [[51718,1],22] > Pid: 0 Local rank: 2 Node rank: 2 App rank: 22 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [././B/././.] > Data for proc: [[51718,1],23] > Pid: 0 Local rank: 3 Node rank: 3 App rank: 23 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [./././B/./.] > Data for proc: [[51718,1],24] > Pid: 0 Local rank: 4 Node rank: 4 App rank: 24 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [././././B/.] > Data for proc: [[51718,1],25] > Pid: 0 Local rank: 5 Node rank: 5 App rank: 25 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [./././././B] > > Data for node: csclprd3-0-2 Launch id: -1 State: 0 > Daemon: [[51718,0],5] Daemon launched: True > Num slots: 6 Slots in use: 6 Oversubscribed: FALSE > Num slots allocated: 6 Max slots: 0 > Username on node: NULL > Num procs: 6 Next node_rank: 6 > Data for proc: [[51718,1],26] > Pid: 0 Local rank: 0 Node rank: 0 App rank: 26 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [B/././././.] > Data for proc: [[51718,1],27] > Pid: 0 Local rank: 1 Node rank: 1 App rank: 27 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [./B/./././.] > Data for proc: [[51718,1],28] > Pid: 0 Local rank: 2 Node rank: 2 App rank: 28 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [././B/././.] > Data for proc: [[51718,1],29] > Pid: 0 Local rank: 3 Node rank: 3 App rank: 29 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [./././B/./.] > Data for proc: [[51718,1],30] > Pid: 0 Local rank: 4 Node rank: 4 App rank: 30 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [././././B/.] > Data for proc: [[51718,1],31] > Pid: 0 Local rank: 5 Node rank: 5 App rank: 31 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [./././././B] > > Data for node: csclprd3-0-3 Launch id: -1 State: 0 > Daemon: [[51718,0],6] Daemon launched: True > Num slots: 6 Slots in use: 6 Oversubscribed: FALSE > Num slots allocated: 6 Max slots: 0 > Username on node: NULL > Num procs: 6 Next node_rank: 6 > Data for proc: [[51718,1],32] > Pid: 0 Local rank: 0 Node rank: 0 App rank: 32 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [B/././././.] > Data for proc: [[51718,1],33] > Pid: 0 Local rank: 1 Node rank: 1 App rank: 33 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [./B/./././.] > Data for proc: [[51718,1],34] > Pid: 0 Local rank: 2 Node rank: 2 App rank: 34 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [././B/././.] > Data for proc: [[51718,1],35] > Pid: 0 Local rank: 3 Node rank: 3 App rank: 35 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [./././B/./.] > Data for proc: [[51718,1],36] > Pid: 0 Local rank: 4 Node rank: 4 App rank: 36 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [././././B/.] > Data for proc: [[51718,1],37] > Pid: 0 Local rank: 5 Node rank: 5 App rank: 37 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [./././././B] > > Data for node: csclprd3-0-4 Launch id: -1 State: 0 > Daemon: [[51718,0],7] Daemon launched: True > Num slots: 6 Slots in use: 6 Oversubscribed: FALSE > Num slots allocated: 6 Max slots: 0 > Username on node: NULL > Num procs: 6 Next node_rank: 6 > Data for proc: [[51718,1],38] > Pid: 0 Local rank: 0 Node rank: 0 App rank: 38 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [B/././././.] > Data for proc: [[51718,1],39] > Pid: 0 Local rank: 1 Node rank: 1 App rank: 39 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [./B/./././.] > Data for proc: [[51718,1],40] > Pid: 0 Local rank: 2 Node rank: 2 App rank: 40 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [././B/././.] > Data for proc: [[51718,1],41] > Pid: 0 Local rank: 3 Node rank: 3 App rank: 41 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [./././B/./.] > Data for proc: [[51718,1],42] > Pid: 0 Local rank: 4 Node rank: 4 App rank: 42 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [././././B/.] > Data for proc: [[51718,1],43] > Pid: 0 Local rank: 5 Node rank: 5 App rank: 43 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [./././././B] > > Data for node: csclprd3-0-5 Launch id: -1 State: 0 > Daemon: [[51718,0],8] Daemon launched: True > Num slots: 6 Slots in use: 6 Oversubscribed: FALSE > Num slots allocated: 6 Max slots: 0 > Username on node: NULL > Num procs: 6 Next node_rank: 6 > Data for proc: [[51718,1],44] > Pid: 0 Local rank: 0 Node rank: 0 App rank: 44 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [B/././././.] > Data for proc: [[51718,1],45] > Pid: 0 Local rank: 1 Node rank: 1 App rank: 45 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [./B/./././.] > Data for proc: [[51718,1],46] > Pid: 0 Local rank: 2 Node rank: 2 App rank: 46 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [././B/././.] > Data for proc: [[51718,1],47] > Pid: 0 Local rank: 3 Node rank: 3 App rank: 47 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [./././B/./.] > Data for proc: [[51718,1],48] > Pid: 0 Local rank: 4 Node rank: 4 App rank: 48 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [././././B/.] > Data for proc: [[51718,1],49] > Pid: 0 Local rank: 5 Node rank: 5 App rank: 49 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [./././././B] > > Data for node: csclprd3-0-6 Launch id: -1 State: 0 > Daemon: [[51718,0],9] Daemon launched: True > Num slots: 6 Slots in use: 6 Oversubscribed: FALSE > Num slots allocated: 6 Max slots: 0 > Username on node: NULL > Num procs: 6 Next node_rank: 6 > Data for proc: [[51718,1],50] > Pid: 0 Local rank: 0 Node rank: 0 App rank: 50 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [B/././././.] > Data for proc: [[51718,1],51] > Pid: 0 Local rank: 1 Node rank: 1 App rank: 51 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [./B/./././.] > Data for proc: [[51718,1],52] > Pid: 0 Local rank: 2 Node rank: 2 App rank: 52 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [././B/././.] > Data for proc: [[51718,1],53] > Pid: 0 Local rank: 3 Node rank: 3 App rank: 53 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [./././B/./.] > Data for proc: [[51718,1],54] > Pid: 0 Local rank: 4 Node rank: 4 App rank: 54 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [././././B/.] > Data for proc: [[51718,1],55] > Pid: 0 Local rank: 5 Node rank: 5 App rank: 55 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [./././././B] > > Data for node: csclprd3-0-7 Launch id: -1 State: 0 > Daemon: [[51718,0],10] Daemon launched: True > Num slots: 16 Slots in use: 16 Oversubscribed: FALSE > Num slots allocated: 16 Max slots: 0 > Username on node: NULL > Num procs: 16 Next node_rank: 16 > Data for proc: [[51718,1],56] > Pid: 0 Local rank: 0 Node rank: 0 App rank: 56 > State: INITIALIZED App_context: 0 > Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > Binding: [BB/../../../../../../..][../../../../../../../..] > Data for proc: [[51718,1],57] > Pid: 0 Local rank: 1 Node rank: 1 App rank: 57 > State: INITIALIZED App_context: 0 > Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > Binding: [../../../../../../../..][BB/../../../../../../..] > Data for proc: [[51718,1],58] > Pid: 0 Local rank: 2 Node rank: 2 App rank: 58 > State: INITIALIZED App_context: 0 > Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > Binding: [../BB/../../../../../..][../../../../../../../..] > Data for proc: [[51718,1],59] > Pid: 0 Local rank: 3 Node rank: 3 App rank: 59 > State: INITIALIZED App_context: 0 > Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > Binding: [../../../../../../../..][../BB/../../../../../..] > Data for proc: [[51718,1],60] > Pid: 0 Local rank: 4 Node rank: 4 App rank: 60 > State: INITIALIZED App_context: 0 > Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > Binding: [../../BB/../../../../..][../../../../../../../..] > Data for proc: [[51718,1],61] > Pid: 0 Local rank: 5 Node rank: 5 App rank: 61 > State: INITIALIZED App_context: 0 > Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > Binding: [../../../../../../../..][../../BB/../../../../..] > Data for proc: [[51718,1],62] > Pid: 0 Local rank: 6 Node rank: 6 App rank: 62 > State: INITIALIZED App_context: 0 > Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > Binding: [../../../BB/../../../..][../../../../../../../..] > Data for proc: [[51718,1],63] > Pid: 0 Local rank: 7 Node rank: 7 App rank: 63 > State: INITIALIZED App_context: 0 > Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > Binding: [../../../../../../../..][../../../BB/../../../..] > Data for proc: [[51718,1],64] > Pid: 0 Local rank: 8 Node rank: 8 App rank: 64 > State: INITIALIZED App_context: 0 > Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > Binding: [../../../../BB/../../..][../../../../../../../..] > Data for proc: [[51718,1],65] > Pid: 0 Local rank: 9 Node rank: 9 App rank: 65 > State: INITIALIZED App_context: 0 > Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > Binding: [../../../../../../../..][../../../../BB/../../..] > Data for proc: [[51718,1],66] > Pid: 0 Local rank: 10 Node rank: 10 App rank: 66 > State: INITIALIZED App_context: 0 > Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > Binding: [../../../../../BB/../..][../../../../../../../..] > Data for proc: [[51718,1],67] > Pid: 0 Local rank: 11 Node rank: 11 App rank: 67 > State: INITIALIZED App_context: 0 > Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > Binding: [../../../../../../../..][../../../../../BB/../..] > Data for proc: [[51718,1],68] > Pid: 0 Local rank: 12 Node rank: 12 App rank: 68 > State: INITIALIZED App_context: 0 > Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > Binding: [../../../../../../BB/..][../../../../../../../..] > Data for proc: [[51718,1],69] > Pid: 0 Local rank: 13 Node rank: 13 App rank: 69 > State: INITIALIZED App_context: 0 > Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > Binding: [../../../../../../../..][../../../../../../BB/..] > Data for proc: [[51718,1],70] > Pid: 0 Local rank: 14 Node rank: 14 App rank: 70 > State: INITIALIZED App_context: 0 > Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > Binding: [../../../../../../../BB][../../../../../../../..] > Data for proc: [[51718,1],71] > Pid: 0 Local rank: 15 Node rank: 15 App rank: 71 > State: INITIALIZED App_context: 0 > Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > Binding: [../../../../../../../..][../../../../../../../BB] > > Data for node: csclprd3-0-8 Launch id: -1 State: 0 > Daemon: [[51718,0],11] Daemon launched: True > Num slots: 16 Slots in use: 16 Oversubscribed: FALSE > Num slots allocated: 16 Max slots: 0 > Username on node: NULL > Num procs: 16 Next node_rank: 16 > Data for proc: [[51718,1],72] > Pid: 0 Local rank: 0 Node rank: 0 App rank: 72 > State: INITIALIZED App_context: 0 > Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > Binding: [BB/../../../../../../..][../../../../../../../..] > Data for proc: [[51718,1],73] > Pid: 0 Local rank: 1 Node rank: 1 App rank: 73 > State: INITIALIZED App_context: 0 > Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > Binding: [../../../../../../../..][BB/../../../../../../..] > Data for proc: [[51718,1],74] > Pid: 0 Local rank: 2 Node rank: 2 App rank: 74 > State: INITIALIZED App_context: 0 > Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > Binding: [../BB/../../../../../..][../../../../../../../..] > Data for proc: [[51718,1],75] > Pid: 0 Local rank: 3 Node rank: 3 App rank: 75 > State: INITIALIZED App_context: 0 > Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > Binding: [../../../../../../../..][../BB/../../../../../..] > Data for proc: [[51718,1],76] > Pid: 0 Local rank: 4 Node rank: 4 App rank: 76 > State: INITIALIZED App_context: 0 > Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > Binding: [../../BB/../../../../..][../../../../../../../..] > Data for proc: [[51718,1],77] > Pid: 0 Local rank: 5 Node rank: 5 App rank: 77 > State: INITIALIZED App_context: 0 > Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > Binding: [../../../../../../../..][../../BB/../../../../..] > Data for proc: [[51718,1],78] > Pid: 0 Local rank: 6 Node rank: 6 App rank: 78 > State: INITIALIZED App_context: 0 > Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > Binding: [../../../BB/../../../..][../../../../../../../..] > Data for proc: [[51718,1],79] > Pid: 0 Local rank: 7 Node rank: 7 App rank: 79 > State: INITIALIZED App_context: 0 > Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > Binding: [../../../../../../../..][../../../BB/../../../..] > Data for proc: [[51718,1],80] > Pid: 0 Local rank: 8 Node rank: 8 App rank: 80 > State: INITIALIZED App_context: 0 > Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > Binding: [../../../../BB/../../..][../../../../../../../..] > Data for proc: [[51718,1],81] > Pid: 0 Local rank: 9 Node rank: 9 App rank: 81 > State: INITIALIZED App_context: 0 > Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > Binding: [../../../../../../../..][../../../../BB/../../..] > Data for proc: [[51718,1],82] > Pid: 0 Local rank: 10 Node rank: 10 App rank: 82 > State: INITIALIZED App_context: 0 > Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > Binding: [../../../../../BB/../..][../../../../../../../..] > Data for proc: [[51718,1],83] > Pid: 0 Local rank: 11 Node rank: 11 App rank: 83 > State: INITIALIZED App_context: 0 > Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > Binding: [../../../../../../../..][../../../../../BB/../..] > Data for proc: [[51718,1],84] > Pid: 0 Local rank: 12 Node rank: 12 App rank: 84 > State: INITIALIZED App_context: 0 > Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > Binding: [../../../../../../BB/..][../../../../../../../..] > Data for proc: [[51718,1],85] > Pid: 0 Local rank: 13 Node rank: 13 App rank: 85 > State: INITIALIZED App_context: 0 > Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > Binding: [../../../../../../../..][../../../../../../BB/..] > Data for proc: [[51718,1],86] > Pid: 0 Local rank: 14 Node rank: 14 App rank: 86 > State: INITIALIZED App_context: 0 > Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > Binding: [../../../../../../../BB][../../../../../../../..] > Data for proc: [[51718,1],87] > Pid: 0 Local rank: 15 Node rank: 15 App rank: 87 > State: INITIALIZED App_context: 0 > Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > Binding: [../../../../../../../..][../../../../../../../BB] > > Data for node: csclprd3-0-10 Launch id: -1 State: 0 > Daemon: [[51718,0],12] Daemon launched: True > Num slots: 16 Slots in use: 16 Oversubscribed: FALSE > Num slots allocated: 16 Max slots: 0 > Username on node: NULL > Num procs: 16 Next node_rank: 16 > Data for proc: [[51718,1],88] > Pid: 0 Local rank: 0 Node rank: 0 App rank: 88 > State: INITIALIZED App_context: 0 > Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > Binding: [BB/../../../../../../..][../../../../../../../..] > Data for proc: [[51718,1],89] > Pid: 0 Local rank: 1 Node rank: 1 App rank: 89 > State: INITIALIZED App_context: 0 > Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > Binding: [../../../../../../../..][BB/../../../../../../..] > Data for proc: [[51718,1],90] > Pid: 0 Local rank: 2 Node rank: 2 App rank: 90 > State: INITIALIZED App_context: 0 > Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > Binding: [../BB/../../../../../..][../../../../../../../..] > Data for proc: [[51718,1],91] > Pid: 0 Local rank: 3 Node rank: 3 App rank: 91 > State: INITIALIZED App_context: 0 > Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > Binding: [../../../../../../../..][../BB/../../../../../..] > Data for proc: [[51718,1],92] > Pid: 0 Local rank: 4 Node rank: 4 App rank: 92 > State: INITIALIZED App_context: 0 > Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > Binding: [../../BB/../../../../..][../../../../../../../..] > Data for proc: [[51718,1],93] > Pid: 0 Local rank: 5 Node rank: 5 App rank: 93 > State: INITIALIZED App_context: 0 > Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > Binding: [../../../../../../../..][../../BB/../../../../..] > Data for proc: [[51718,1],94] > Pid: 0 Local rank: 6 Node rank: 6 App rank: 94 > State: INITIALIZED App_context: 0 > Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > Binding: [../../../BB/../../../..][../../../../../../../..] > Data for proc: [[51718,1],95] > Pid: 0 Local rank: 7 Node rank: 7 App rank: 95 > State: INITIALIZED App_context: 0 > Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > Binding: [../../../../../../../..][../../../BB/../../../..] > Data for proc: [[51718,1],96] > Pid: 0 Local rank: 8 Node rank: 8 App rank: 96 > State: INITIALIZED App_context: 0 > Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > Binding: [../../../../BB/../../..][../../../../../../../..] > Data for proc: [[51718,1],97] > Pid: 0 Local rank: 9 Node rank: 9 App rank: 97 > State: INITIALIZED App_context: 0 > Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > Binding: [../../../../../../../..][../../../../BB/../../..] > Data for proc: [[51718,1],98] > Pid: 0 Local rank: 10 Node rank: 10 App rank: 98 > State: INITIALIZED App_context: 0 > Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > Binding: [../../../../../BB/../..][../../../../../../../..] > Data for proc: [[51718,1],99] > Pid: 0 Local rank: 11 Node rank: 11 App rank: 99 > State: INITIALIZED App_context: 0 > Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > Binding: [../../../../../../../..][../../../../../BB/../..] > Data for proc: [[51718,1],100] > Pid: 0 Local rank: 12 Node rank: 12 App rank: 100 > State: INITIALIZED App_context: 0 > Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > Binding: [../../../../../../BB/..][../../../../../../../..] > Data for proc: [[51718,1],101] > Pid: 0 Local rank: 13 Node rank: 13 App rank: 101 > State: INITIALIZED App_context: 0 > Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > Binding: [../../../../../../../..][../../../../../../BB/..] > Data for proc: [[51718,1],102] > Pid: 0 Local rank: 14 Node rank: 14 App rank: 102 > State: INITIALIZED App_context: 0 > Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > Binding: [../../../../../../../BB][../../../../../../../..] > Data for proc: [[51718,1],103] > Pid: 0 Local rank: 15 Node rank: 15 App rank: 103 > State: INITIALIZED App_context: 0 > Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > Binding: [../../../../../../../..][../../../../../../../BB] > > Data for node: csclprd3-0-11 Launch id: -1 State: 0 > Daemon: [[51718,0],13] Daemon launched: True > Num slots: 16 Slots in use: 16 Oversubscribed: FALSE > Num slots allocated: 16 Max slots: 0 > Username on node: NULL > Num procs: 16 Next node_rank: 16 > Data for proc: [[51718,1],104] > Pid: 0 Local rank: 0 Node rank: 0 App rank: 104 > State: INITIALIZED App_context: 0 > Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > Binding: [BB/../../../../../../..][../../../../../../../..] > Data for proc: [[51718,1],105] > Pid: 0 Local rank: 1 Node rank: 1 App rank: 105 > State: INITIALIZED App_context: 0 > Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > Binding: [../../../../../../../..][BB/../../../../../../..] > Data for proc: [[51718,1],106] > Pid: 0 Local rank: 2 Node rank: 2 App rank: 106 > State: INITIALIZED App_context: 0 > Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > Binding: [../BB/../../../../../..][../../../../../../../..] > Data for proc: [[51718,1],107] > Pid: 0 Local rank: 3 Node rank: 3 App rank: 107 > State: INITIALIZED App_context: 0 > Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > Binding: [../../../../../../../..][../BB/../../../../../..] > Data for proc: [[51718,1],108] > Pid: 0 Local rank: 4 Node rank: 4 App rank: 108 > State: INITIALIZED App_context: 0 > Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > Binding: [../../BB/../../../../..][../../../../../../../..] > Data for proc: [[51718,1],109] > Pid: 0 Local rank: 5 Node rank: 5 App rank: 109 > State: INITIALIZED App_context: 0 > Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > Binding: [../../../../../../../..][../../BB/../../../../..] > Data for proc: [[51718,1],110] > Pid: 0 Local rank: 6 Node rank: 6 App rank: 110 > State: INITIALIZED App_context: 0 > Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > Binding: [../../../BB/../../../..][../../../../../../../..] > Data for proc: [[51718,1],111] > Pid: 0 Local rank: 7 Node rank: 7 App rank: 111 > State: INITIALIZED App_context: 0 > Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > Binding: [../../../../../../../..][../../../BB/../../../..] > Data for proc: [[51718,1],112] > Pid: 0 Local rank: 8 Node rank: 8 App rank: 112 > State: INITIALIZED App_context: 0 > Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > Binding: [../../../../BB/../../..][../../../../../../../..] > Data for proc: [[51718,1],113] > Pid: 0 Local rank: 9 Node rank: 9 App rank: 113 > State: INITIALIZED App_context: 0 > Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > Binding: [../../../../../../../..][../../../../BB/../../..] > Data for proc: [[51718,1],114] > Pid: 0 Local rank: 10 Node rank: 10 App rank: 114 > State: INITIALIZED App_context: 0 > Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > Binding: [../../../../../BB/../..][../../../../../../../..] > Data for proc: [[51718,1],115] > Pid: 0 Local rank: 11 Node rank: 11 App rank: 115 > State: INITIALIZED App_context: 0 > Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > Binding: [../../../../../../../..][../../../../../BB/../..] > Data for proc: [[51718,1],116] > Pid: 0 Local rank: 12 Node rank: 12 App rank: 116 > State: INITIALIZED App_context: 0 > Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > Binding: [../../../../../../BB/..][../../../../../../../..] > Data for proc: [[51718,1],117] > Pid: 0 Local rank: 13 Node rank: 13 App rank: 117 > State: INITIALIZED App_context: 0 > Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > Binding: [../../../../../../../..][../../../../../../BB/..] > Data for proc: [[51718,1],118] > Pid: 0 Local rank: 14 Node rank: 14 App rank: 118 > State: INITIALIZED App_context: 0 > Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > Binding: [../../../../../../../BB][../../../../../../../..] > Data for proc: [[51718,1],119] > Pid: 0 Local rank: 15 Node rank: 15 App rank: 119 > State: INITIALIZED App_context: 0 > Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > Binding: [../../../../../../../..][../../../../../../../BB] > > Data for node: csclprd3-0-12 Launch id: -1 State: 0 > Daemon: [[51718,0],14] Daemon launched: True > Num slots: 6 Slots in use: 6 Oversubscribed: FALSE > Num slots allocated: 6 Max slots: 0 > Username on node: NULL > Num procs: 6 Next node_rank: 6 > Data for proc: [[51718,1],120] > Pid: 0 Local rank: 0 Node rank: 0 App rank: 120 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [BB/../../../../..] > Data for proc: [[51718,1],121] > Pid: 0 Local rank: 1 Node rank: 1 App rank: 121 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [../BB/../../../..] > Data for proc: [[51718,1],122] > Pid: 0 Local rank: 2 Node rank: 2 App rank: 122 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [../../BB/../../..] > Data for proc: [[51718,1],123] > Pid: 0 Local rank: 3 Node rank: 3 App rank: 123 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [../../../BB/../..] > Data for proc: [[51718,1],124] > Pid: 0 Local rank: 4 Node rank: 4 App rank: 124 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [../../../../BB/..] > Data for proc: [[51718,1],125] > Pid: 0 Local rank: 5 Node rank: 5 App rank: 125 > State: INITIALIZED App_context: 0 > Locale: UNKNOWN > Binding: [../../../../../BB] > > Data for node: csclprd3-0-13 Launch id: -1 State: 0 > Daemon: [[51718,0],15] Daemon launched: True > Num slots: 12 Slots in use: 6 Oversubscribed: FALSE > Num slots allocated: 12 Max slots: 0 > Username on node: NULL > Num procs: 6 Next node_rank: 6 > Data for proc: [[51718,1],126] > Pid: 0 Local rank: 0 Node rank: 0 App rank: 126 > State: INITIALIZED App_context: 0 > Locale: [BB/BB/BB/BB/BB/BB][../../../../../..] > Binding: [BB/../../../../..][../../../../../..] > Data for proc: [[51718,1],127] > Pid: 0 Local rank: 1 Node rank: 1 App rank: 127 > State: INITIALIZED App_context: 0 > Locale: [../../../../../..][BB/BB/BB/BB/BB/BB] > Binding: [../../../../../..][BB/../../../../..] > Data for proc: [[51718,1],128] > Pid: 0 Local rank: 2 Node rank: 2 App rank: 128 > State: INITIALIZED App_context: 0 > Locale: [BB/BB/BB/BB/BB/BB][../../../../../..] > Binding: [../BB/../../../..][../../../../../..] > Data for proc: [[51718,1],129] > Pid: 0 Local rank: 3 Node rank: 3 App rank: 129 > State: INITIALIZED App_context: 0 > Locale: [../../../../../..][BB/BB/BB/BB/BB/BB] > Binding: [../../../../../..][../BB/../../../..] > Data for proc: [[51718,1],130] > Pid: 0 Local rank: 4 Node rank: 4 App rank: 130 > State: INITIALIZED App_context: 0 > Locale: [BB/BB/BB/BB/BB/BB][../../../../../..] > Binding: [../../BB/../../..][../../../../../..] > Data for proc: [[51718,1],131] > Pid: 0 Local rank: 5 Node rank: 5 App rank: 131 > State: INITIALIZED App_context: 0 > Locale: [../../../../../..][BB/BB/BB/BB/BB/BB] > Binding: [../../../../../..][../../BB/../../..] > [csclprd3-0-13:31619] *** Process received signal *** > [csclprd3-0-13:31619] Signal: Bus error (7) > [csclprd3-0-13:31619] Signal code: Non-existant physical address (2) > [csclprd3-0-13:31619] Failing at address: 0x7f1374267a00 > [csclprd3-0-13:31620] *** Process received signal *** > [csclprd3-0-13:31620] Signal: Bus error (7) > [csclprd3-0-13:31620] Signal code: Non-existant physical address (2) > [csclprd3-0-13:31620] Failing at address: 0x7fcc702a7980 > [csclprd3-0-13:31615] *** Process received signal *** > [csclprd3-0-13:31615] Signal: Bus error (7) > [csclprd3-0-13:31615] Signal code: Non-existant physical address (2) > [csclprd3-0-13:31615] Failing at address: 0x7f8128367880 > [csclprd3-0-13:31616] *** Process received signal *** > [csclprd3-0-13:31616] Signal: Bus error (7) > [csclprd3-0-13:31616] Signal code: Non-existant physical address (2) > [csclprd3-0-13:31616] Failing at address: 0x7fe674227a00 > [csclprd3-0-13:31617] *** Process received signal *** > [csclprd3-0-13:31617] Signal: Bus error (7) > [csclprd3-0-13:31617] Signal code: Non-existant physical address (2) > [csclprd3-0-13:31617] Failing at address: 0x7f061c32db80 > [csclprd3-0-13:31618] *** Process received signal *** > [csclprd3-0-13:31618] Signal: Bus error (7) > [csclprd3-0-13:31618] Signal code: Non-existant physical address (2) > [csclprd3-0-13:31618] Failing at address: 0x7fb8402eaa80 > [csclprd3-0-13:31618] [ 0] /lib64/libpthread.so.0(+0xf500)[0x7fb851851500] > [csclprd3-0-13:31618] [ 1] [csclprd3-0-13:31616] [ 0] > /lib64/libpthread.so.0(+0xf500)[0x7fe6843a4500] > [csclprd3-0-13:31616] [ 1] [csclprd3-0-13:31620] [ 0] > /lib64/libpthread.so.0(+0xf500)[0x7fcc80c54500] > [csclprd3-0-13:31620] [ 1] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x167f61)[0x7fcc80fc9f61] > [csclprd3-0-13:31620] [ 2] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x168047)[0x7fcc80fca047] > [csclprd3-0-13:31620] [ 3] [csclprd3-0-13:31615] [ 0] > /lib64/libpthread.so.0(+0xf500)[0x7f81385ca500] > [csclprd3-0-13:31615] [ 1] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x167f61)[0x7f813893ff61] > [csclprd3-0-13:31615] [ 2] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x168047)[0x7f8138940047] > [csclprd3-0-13:31615] [ 3] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x167f61)[0x7fb851bc6f61] > [csclprd3-0-13:31618] [ 2] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x168047)[0x7fb851bc7047] > [csclprd3-0-13:31618] [ 3] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x55670)[0x7fb851ab4670] > [csclprd3-0-13:31618] [ 4] [csclprd3-0-13:31617] [ 0] > /lib64/libpthread.so.0(+0xf500)[0x7f062cfe5500] > [csclprd3-0-13:31617] [ 1] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x167f61)[0x7f062d35af61] > [csclprd3-0-13:31617] [ 2] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x168047)[0x7f062d35b047] > [csclprd3-0-13:31617] [ 3] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x55670)[0x7f062d248670] > [csclprd3-0-13:31617] [ 4] [csclprd3-0-13:31619] [ 0] > /lib64/libpthread.so.0(+0xf500)[0x7f1384fde500] > [csclprd3-0-13:31619] [ 1] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x167f61)[0x7f1385353f61] > [csclprd3-0-13:31619] [ 2] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x167f61)[0x7fe684719f61] > [csclprd3-0-13:31616] [ 2] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x168047)[0x7fe68471a047] > [csclprd3-0-13:31616] [ 3] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x55670)[0x7fe684607670] > [csclprd3-0-13:31616] [ 4] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x168047)[0x7f1385354047] > [csclprd3-0-13:31619] [ 3] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x55670)[0x7f1385241670] > [csclprd3-0-13:31619] [ 4] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x3b9)[0x7f13852425ab] > [csclprd3-0-13:31619] [ 5] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0xfb)[0x7f1385242751] > [csclprd3-0-13:31619] [ 6] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_btl_sm_add_procs+0x671)[0x7f13853501c9] > [csclprd3-0-13:31619] [ 7] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x14a628)[0x7f1385336628] > [csclprd3-0-13:31619] [ 8] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x55670)[0x7fcc80eb7670] > [csclprd3-0-13:31620] [ 4] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x3b9)[0x7fcc80eb85ab] > [csclprd3-0-13:31620] [ 5] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0xfb)[0x7fcc80eb8751] > [csclprd3-0-13:31620] [ 6] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_btl_sm_add_procs+0x671)[0x7fcc80fc61c9] > [csclprd3-0-13:31620] [ 7] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x14a628)[0x7fcc80fac628] > [csclprd3-0-13:31620] [ 8] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xff)[0x7fcc8111fd61] > [csclprd3-0-13:31620] [ 9] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x55670)[0x7f813882d670] > [csclprd3-0-13:31615] [ 4] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x3b9)[0x7f813882e5ab] > [csclprd3-0-13:31615] [ 5] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0xfb)[0x7f813882e751] > [csclprd3-0-13:31615] [ 6] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_btl_sm_add_procs+0x671)[0x7f813893c1c9] > [csclprd3-0-13:31615] [ 7] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x14a628)[0x7f8138922628] > [csclprd3-0-13:31615] [ 8] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xff)[0x7f8138a95d61] > [csclprd3-0-13:31615] [ 9] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0xbda)[0x7f813885d747] > [csclprd3-0-13:31615] [10] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x3b9)[0x7fb851ab55ab] > [csclprd3-0-13:31618] [ 5] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0xfb)[0x7fb851ab5751] > [csclprd3-0-13:31618] [ 6] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_btl_sm_add_procs+0x671)[0x7fb851bc31c9] > [csclprd3-0-13:31618] [ 7] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x14a628)[0x7fb851ba9628] > [csclprd3-0-13:31618] [ 8] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xff)[0x7fb851d1cd61] > [csclprd3-0-13:31618] [ 9] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0xbda)[0x7fb851ae4747] > [csclprd3-0-13:31618] [10] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x3b9)[0x7f062d2495ab] > [csclprd3-0-13:31617] [ 5] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0xfb)[0x7f062d249751] > [csclprd3-0-13:31617] [ 6] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_btl_sm_add_procs+0x671)[0x7f062d3571c9] > [csclprd3-0-13:31617] [ 7] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x14a628)[0x7f062d33d628] > [csclprd3-0-13:31617] [ 8] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xff)[0x7f062d4b0d61] > [csclprd3-0-13:31617] [ 9] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0xbda)[0x7f062d278747] > [csclprd3-0-13:31617] [10] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x3b9)[0x7fe6846085ab] > [csclprd3-0-13:31616] [ 5] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0xfb)[0x7fe684608751] > [csclprd3-0-13:31616] [ 6] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_btl_sm_add_procs+0x671)[0x7fe6847161c9] > [csclprd3-0-13:31616] [ 7] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x14a628)[0x7fe6846fc628] > [csclprd3-0-13:31616] [ 8] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xff)[0x7fe68486fd61] > [csclprd3-0-13:31616] [ 9] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0xbda)[0x7fe684637747] > [csclprd3-0-13:31616] [10] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x185)[0x7fe68467750b] > [csclprd3-0-13:31616] [11] > /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] > [csclprd3-0-13:31616] [12] > /lib64/libc.so.6(__libc_start_main+0xfd)[0x7fe684021cdd] > [csclprd3-0-13:31616] [13] > /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] > [csclprd3-0-13:31616] *** End of error message *** > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x185)[0x7f062d2b850b] > [csclprd3-0-13:31617] [11] > /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] > [csclprd3-0-13:31617] [12] > /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f062cc62cdd] > [csclprd3-0-13:31617] [13] > /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] > [csclprd3-0-13:31617] *** End of error message *** > > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xff)[0x7f13854a9d61] > [csclprd3-0-13:31619] [ 9] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0xbda)[0x7f1385271747] > [csclprd3-0-13:31619] [10] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x185)[0x7f13852b150b] > [csclprd3-0-13:31619] [11] > /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] > [csclprd3-0-13:31619] [12] > /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f1384c5bcdd] > [csclprd3-0-13:31619] [13] > /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] > [csclprd3-0-13:31619] *** End of error message *** > > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0xbda)[0x7fcc80ee7747] > [csclprd3-0-13:31620] [10] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x185)[0x7fcc80f2750b] > [csclprd3-0-13:31620] [11] > /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] > [csclprd3-0-13:31620] [12] > /lib64/libc.so.6(__libc_start_main+0xfd)[0x7fcc808d1cdd] > [csclprd3-0-13:31620] [13] > /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] > [csclprd3-0-13:31620] *** End of error message *** > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x185)[0x7f813889d50b] > [csclprd3-0-13:31615] [11] > /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] > [csclprd3-0-13:31615] [12] > /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f8138247cdd] > [csclprd3-0-13:31615] [13] > /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] > [csclprd3-0-13:31615] *** End of error message *** > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x185)[0x7fb851b2450b] > [csclprd3-0-13:31618] [11] > /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] > [csclprd3-0-13:31618] [12] > /lib64/libc.so.6(__libc_start_main+0xfd)[0x7fb8514cecdd] > [csclprd3-0-13:31618] [13] > /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] > [csclprd3-0-13:31618] *** End of error message *** > -------------------------------------------------------------------------- > mpirun noticed that process rank 126 with PID 0 on node csclprd3-0-13 > exited on signal 7 (Bus error). > -------------------------------------------------------------------------- > > ------------------------------ > *From:* users [users-boun...@open-mpi.org > <javascript:_e(%7B%7D,'cvml','users-boun...@open-mpi.org');>] on behalf > of Ralph Castain [r...@open-mpi.org > <javascript:_e(%7B%7D,'cvml','r...@open-mpi.org');>] > *Sent:* Tuesday, June 23, 2015 6:20 PM > *To:* Open MPI Users > *Subject:* Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = > crash > > Wow - that is one sick puppy! I see that some nodes are reporting > not-bound for their procs, and the rest are binding to socket (as they > should). Some of your nodes clearly do not have hyper threads enabled (or > only have single-thread cores on them), and have 2 cores/socket. Other > nodes have 8 cores/socket with hyper threads enabled, while still others > have 6 cores/socket and HT enabled. > > I don't see anyone binding to a single HT if multiple HTs/core are > available. I think you are being fooled by those nodes that either don't > have HT enabled, or have only 1 HT/core. > > In both cases, it is node 13 that is the node that fails. I also note > that you said everything works okay with < 132 ranks, and node 13 hosts > ranks 127-131. So node 13 would host ranks even if you reduced the number > in the job to 131. This would imply that it probably isn't something wrong > with the node itself. > > Is there any way you could run a job of this size on a homogeneous > cluster? The procs all show bindings that look right, but I'm wondering if > the heterogeneity is the source of the trouble here. We may be > communicating the binding pattern incorrectly and giving bad info to the > backend daemon. > > Also, rather than --report-bindings, use "--display-devel-map" on the > command line and let's see what the mapper thinks it did. If there is a > problem with placement, that is where it would exist. > > > On Tue, Jun 23, 2015 at 5:12 PM, Lane, William <william.l...@cshs.org > <javascript:_e(%7B%7D,'cvml','william.l...@cshs.org');>> wrote: > >> Ralph, >> >> There is something funny going on, the trace from the >> runs w/the debug build aren't showing any differences from >> what I got earlier. However, I did do a run w/the --bind-to core >> switch and was surprised to see that hyperthreading cores were >> sometimes being used. >> >> Here's the traces that I have: >> >> mpirun -np 132 -report-bindings --prefix /hpc/apps/mpi/openmpi/1.8.6/ >> --hostfile hostfile-noslots --mca btl_tcp_if_include eth0 --hetero-nodes >> /hpc/home/lanew/mpi/openmpi/ProcessColors3 >> [csclprd3-0-5:16802] MCW rank 44 is not bound (or bound to all available >> processors) >> [csclprd3-0-5:16802] MCW rank 45 is not bound (or bound to all available >> processors) >> [csclprd3-0-5:16802] MCW rank 46 is not bound (or bound to all available >> processors) >> [csclprd3-6-5:12480] MCW rank 4 bound to socket 0[core 0[hwt 0]], socket >> 0[core 1[hwt 0]]: [B/B][./.] >> [csclprd3-6-5:12480] MCW rank 5 bound to socket 1[core 2[hwt 0]], socket >> 1[core 3[hwt 0]]: [./.][B/B] >> [csclprd3-6-5:12480] MCW rank 6 bound to socket 0[core 0[hwt 0]], socket >> 0[core 1[hwt 0]]: [B/B][./.] >> [csclprd3-6-5:12480] MCW rank 7 bound to socket 1[core 2[hwt 0]], socket >> 1[core 3[hwt 0]]: [./.][B/B] >> [csclprd3-0-5:16802] MCW rank 47 is not bound (or bound to all available >> processors) >> [csclprd3-0-5:16802] MCW rank 48 is not bound (or bound to all available >> processors) >> [csclprd3-0-5:16802] MCW rank 49 is not bound (or bound to all available >> processors) >> [csclprd3-0-1:14318] MCW rank 22 is not bound (or bound to all available >> processors) >> [csclprd3-0-1:14318] MCW rank 23 is not bound (or bound to all available >> processors) >> [csclprd3-0-1:14318] MCW rank 24 is not bound (or bound to all available >> processors) >> [csclprd3-6-1:24682] MCW rank 3 bound to socket 1[core 2[hwt 0]], socket >> 1[core 3[hwt 0]]: [./.][B/B] >> [csclprd3-6-1:24682] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket >> 0[core 1[hwt 0]]: [B/B][./.] >> [csclprd3-0-1:14318] MCW rank 25 is not bound (or bound to all available >> processors) >> [csclprd3-0-1:14318] MCW rank 20 is not bound (or bound to all available >> processors) >> [csclprd3-0-3:13827] MCW rank 34 is not bound (or bound to all available >> processors) >> [csclprd3-0-1:14318] MCW rank 21 is not bound (or bound to all available >> processors) >> [csclprd3-0-3:13827] MCW rank 35 is not bound (or bound to all available >> processors) >> [csclprd3-6-1:24682] MCW rank 1 bound to socket 1[core 2[hwt 0]], socket >> 1[core 3[hwt 0]]: [./.][B/B] >> [csclprd3-0-3:13827] MCW rank 36 is not bound (or bound to all available >> processors) >> [csclprd3-6-1:24682] MCW rank 2 bound to socket 0[core 0[hwt 0]], socket >> 0[core 1[hwt 0]]: [B/B][./.] >> [csclprd3-0-6:30371] MCW rank 51 is not bound (or bound to all available >> processors) >> [csclprd3-0-6:30371] MCW rank 52 is not bound (or bound to all available >> processors) >> [csclprd3-0-6:30371] MCW rank 53 is not bound (or bound to all available >> processors) >> [csclprd3-0-2:05825] MCW rank 30 is not bound (or bound to all available >> processors) >> [csclprd3-0-6:30371] MCW rank 54 is not bound (or bound to all available >> processors) >> [csclprd3-0-3:13827] MCW rank 37 is not bound (or bound to all available >> processors) >> [csclprd3-0-2:05825] MCW rank 31 is not bound (or bound to all available >> processors) >> [csclprd3-0-3:13827] MCW rank 32 is not bound (or bound to all available >> processors) >> [csclprd3-0-6:30371] MCW rank 55 is not bound (or bound to all available >> processors) >> [csclprd3-0-3:13827] MCW rank 33 is not bound (or bound to all available >> processors) >> [csclprd3-0-6:30371] MCW rank 50 is not bound (or bound to all available >> processors) >> [csclprd3-0-2:05825] MCW rank 26 is not bound (or bound to all available >> processors) >> [csclprd3-0-2:05825] MCW rank 27 is not bound (or bound to all available >> processors) >> [csclprd3-0-2:05825] MCW rank 28 is not bound (or bound to all available >> processors) >> [csclprd3-0-2:05825] MCW rank 29 is not bound (or bound to all available >> processors) >> [csclprd3-0-12:12383] MCW rank 121 is not bound (or bound to all >> available processors) >> [csclprd3-0-12:12383] MCW rank 122 is not bound (or bound to all >> available processors) >> [csclprd3-0-12:12383] MCW rank 123 is not bound (or bound to all >> available processors) >> [csclprd3-0-12:12383] MCW rank 124 is not bound (or bound to all >> available processors) >> [csclprd3-0-12:12383] MCW rank 125 is not bound (or bound to all >> available processors) >> [csclprd3-0-12:12383] MCW rank 120 is not bound (or bound to all >> available processors) >> [csclprd3-0-0:31079] MCW rank 13 bound to socket 1[core 6[hwt 0]], socket >> 1[core 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], socket >> 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: [./././././.][B/B/B/B/B/B] >> [csclprd3-0-0:31079] MCW rank 14 bound to socket 0[core 0[hwt 0]], socket >> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket >> 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [B/B/B/B/B/B][./././././.] >> [csclprd3-0-0:31079] MCW rank 15 bound to socket 1[core 6[hwt 0]], socket >> 1[core 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], socket >> 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: [./././././.][B/B/B/B/B/B] >> [csclprd3-0-0:31079] MCW rank 16 bound to socket 0[core 0[hwt 0]], socket >> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket >> 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [B/B/B/B/B/B][./././././.] >> [csclprd3-0-7:20515] MCW rank 68 bound to socket 0[core 0[hwt 0-1]], >> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> [csclprd3-0-10:19096] MCW rank 100 bound to socket 0[core 0[hwt 0-1]], >> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> [csclprd3-0-7:20515] MCW rank 69 bound to socket 1[core 8[hwt 0-1]], >> socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt >> 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket >> 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: >> [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> [csclprd3-0-10:19096] MCW rank 101 bound to socket 1[core 8[hwt 0-1]], >> socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt >> 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket >> 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: >> [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> [csclprd3-0-0:31079] MCW rank 17 bound to socket 1[core 6[hwt 0]], socket >> 1[core 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], socket >> 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: [./././././.][B/B/B/B/B/B] >> [csclprd3-0-7:20515] MCW rank 70 bound to socket 0[core 0[hwt 0-1]], >> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> [csclprd3-0-10:19096] MCW rank 102 bound to socket 0[core 0[hwt 0-1]], >> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> [csclprd3-0-11:31636] MCW rank 116 bound to socket 0[core 0[hwt 0-1]], >> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> [csclprd3-0-11:31636] MCW rank 117 bound to socket 1[core 8[hwt 0-1]], >> socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt >> 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket >> 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: >> [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> [csclprd3-0-0:31079] MCW rank 18 bound to socket 0[core 0[hwt 0]], socket >> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket >> 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [B/B/B/B/B/B][./././././.] >> [csclprd3-0-11:31636] MCW rank 118 bound to socket 0[core 0[hwt 0-1]], >> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> [csclprd3-0-0:31079] MCW rank 19 bound to socket 1[core 6[hwt 0]], socket >> 1[core 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], socket >> 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: [./././././.][B/B/B/B/B/B] >> [csclprd3-0-7:20515] MCW rank 71 bound to socket 1[core 8[hwt 0-1]], >> socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt >> 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket >> 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: >> [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> [csclprd3-0-10:19096] MCW rank 103 bound to socket 1[core 8[hwt 0-1]], >> socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt >> 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket >> 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: >> [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> [csclprd3-0-0:31079] MCW rank 8 bound to socket 0[core 0[hwt 0]], socket >> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket >> 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [B/B/B/B/B/B][./././././.] >> [csclprd3-0-0:31079] MCW rank 9 bound to socket 1[core 6[hwt 0]], socket >> 1[core 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], socket >> 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: [./././././.][B/B/B/B/B/B] >> [csclprd3-0-10:19096] MCW rank 88 bound to socket 0[core 0[hwt 0-1]], >> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> [csclprd3-0-11:31636] MCW rank 119 bound to socket 1[core 8[hwt 0-1]], >> socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt >> 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket >> 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: >> [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> [csclprd3-0-7:20515] MCW rank 56 bound to socket 0[core 0[hwt 0-1]], >> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> [csclprd3-0-0:31079] MCW rank 10 bound to socket 0[core 0[hwt 0]], socket >> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket >> 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [B/B/B/B/B/B][./././././.] >> [csclprd3-0-7:20515] MCW rank 57 bound to socket 1[core 8[hwt 0-1]], >> socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt >> 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket >> 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: >> [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> [csclprd3-0-10:19096] MCW rank 89 bound to socket 1[core 8[hwt 0-1]], >> socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt >> 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket >> 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: >> [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> [csclprd3-0-11:31636] MCW rank 104 bound to socket 0[core 0[hwt 0-1]], >> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> [csclprd3-0-0:31079] MCW rank 11 bound to socket 1[core 6[hwt 0]], socket >> 1[core 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], socket >> 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: [./././././.][B/B/B/B/B/B] >> [csclprd3-0-0:31079] MCW rank 12 bound to socket 0[core 0[hwt 0]], socket >> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket >> 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [B/B/B/B/B/B][./././././.] >> [csclprd3-0-4:30348] MCW rank 42 is not bound (or bound to all >> >