I think trying with --mca btl ^sm makes a lot of sense and may solve the problem. I also noted that we are having trouble with the topology of several of the nodes - seeing only one socket, non-HT where you say we should see two sockets and HT-enabled. In those cases, the locality is "unknown" - given that those procs are on remote nodes from the one being impacted, I don't think it should cause a problem. However, it isn't correct, and that raises flags.
My best guess of the root cause of that error is either we are getting bad topology info on those nodes, or we have a bug that is mishandling this scenario. It would probably be good to get this error fixed to ensure it isn't the source of the eventual crash, even though I'm not sure they are related. Bill: Can we examine one of the problem nodes? Let's pick csclprd3-0-1 (or take another one from your list - just look for one where "locality" is reported as "unknown" for the procs in the output map). Can you run lstopo on that node and send us the output? In the above map, it is reporting a single socket with 6 cores, non-HT. Is that what lstopo shows when run on the node? Is it what you expected? On Wed, Jun 24, 2015 at 4:07 AM, Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > Bill, > > were you able to get a core file and analyze the stack with gdb ? > > I suspect the error occurs in mca_btl_sm_add_procs but this is just my > best guess. > if this is correct, can you check the value of > mca_btl_sm_component.num_smp_procs ? > > as a workaround, can you try > mpirun --mca btl ^sm ... > > I do not see how I can tackle the root cause without being able to > reproduce the issue :-( > > can you try to reproduce the issue with the smallest hostfile, and then > run lstopo on all the nodes ? > btw, you are not mixing 32 bits and 64 bits OS, are you ? > > Cheers, > > Gilles > > > > mca_btl_sm_add_procs( > > int mca_btl_sm_add_procs( > On Wednesday, June 24, 2015, Lane, William <william.l...@cshs.org> wrote: > >> Gilles, >> >> All the blades only have two core Xeons (without hyperthreading) >> populating both their sockets. All >> the x3550 nodes have hyperthreading capable Xeons and Sandybridge server >> CPU's. It's possible >> hyperthreading has been disabled on some of these nodes though. The 3-0-n >> nodes are all IBM x3550 >> nodes while the 3-6-n nodes are all blade nodes. >> >> I have run this exact same test code successfully in the past on another >> cluster (~200 nodes of Sunfire X2100 >> 2x dual-core Opterons) w/no issues on upwards of 390 slots. I even tested >> it recently on OpenMPI 1.8.5 >> on another smaller R&D cluster consisting of 10 Sunfire X2100 nodes (w/2 >> dual core Opterons apiece). >> On this particular cluster I've had success running this code on < 132 >> slots. >> >> Anyway, here's the results of the following mpirun: >> >> mpirun -np 132 -display-devel-map --prefix /hpc/apps/mpi/openmpi/1.8.6/ >> --hostfile hostfile-noslots --mca btl_tcp_if_include eth0 --hetero-nodes >> --bind-to core /hpc/home/lanew/mpi/openmpi/ProcessColors3 >> out.txt 2>&1 >> >> -------------------------------------------------------------------------- >> WARNING: a request was made to bind a process. While the system >> supports binding the process itself, at least one node does NOT >> support binding memory to the process location. >> >> Node: csclprd3-6-1 >> >> This usually is due to not having the required NUMA support installed >> on the node. In some Linux distributions, the required support is >> contained in the libnumactl and libnumactl-devel packages. >> This is a warning only; your job will continue, though performance may be >> degraded. >> -------------------------------------------------------------------------- >> Data for JOB [51718,1] offset 0 >> >> Mapper requested: NULL Last mapper: round_robin Mapping policy: >> BYSOCKET Ranking policy: SLOT >> Binding policy: CORE Cpu set: NULL PPR: NULL Cpus-per-rank: 1 >> Num new daemons: 0 New daemon starting vpid INVALID >> Num nodes: 15 >> >> Data for node: csclprd3-6-1 Launch id: -1 State: 0 >> Daemon: [[51718,0],1] Daemon launched: True >> Num slots: 4 Slots in use: 4 Oversubscribed: FALSE >> Num slots allocated: 4 Max slots: 0 >> Username on node: NULL >> Num procs: 4 Next node_rank: 4 >> Data for proc: [[51718,1],0] >> Pid: 0 Local rank: 0 Node rank: 0 App rank: 0 >> State: INITIALIZED App_context: 0 >> Locale: [B/B][./.] >> Binding: [B/.][./.] >> Data for proc: [[51718,1],1] >> Pid: 0 Local rank: 1 Node rank: 1 App rank: 1 >> State: INITIALIZED App_context: 0 >> Locale: [./.][B/B] >> Binding: [./.][B/.] >> Data for proc: [[51718,1],2] >> Pid: 0 Local rank: 2 Node rank: 2 App rank: 2 >> State: INITIALIZED App_context: 0 >> Locale: [B/B][./.] >> Binding: [./B][./.] >> Data for proc: [[51718,1],3] >> Pid: 0 Local rank: 3 Node rank: 3 App rank: 3 >> State: INITIALIZED App_context: 0 >> Locale: [./.][B/B] >> Binding: [./.][./B] >> >> Data for node: csclprd3-6-5 Launch id: -1 State: 0 >> Daemon: [[51718,0],2] Daemon launched: True >> Num slots: 4 Slots in use: 4 Oversubscribed: FALSE >> Num slots allocated: 4 Max slots: 0 >> Username on node: NULL >> Num procs: 4 Next node_rank: 4 >> Data for proc: [[51718,1],4] >> Pid: 0 Local rank: 0 Node rank: 0 App rank: 4 >> State: INITIALIZED App_context: 0 >> Locale: [B/B][./.] >> Binding: [B/.][./.] >> Data for proc: [[51718,1],5] >> Pid: 0 Local rank: 1 Node rank: 1 App rank: 5 >> State: INITIALIZED App_context: 0 >> Locale: [./.][B/B] >> Binding: [./.][B/.] >> Data for proc: [[51718,1],6] >> Pid: 0 Local rank: 2 Node rank: 2 App rank: 6 >> State: INITIALIZED App_context: 0 >> Locale: [B/B][./.] >> Binding: [./B][./.] >> Data for proc: [[51718,1],7] >> Pid: 0 Local rank: 3 Node rank: 3 App rank: 7 >> State: INITIALIZED App_context: 0 >> Locale: [./.][B/B] >> Binding: [./.][./B] >> >> Data for node: csclprd3-0-0 Launch id: -1 State: 0 >> Daemon: [[51718,0],3] Daemon launched: True >> Num slots: 12 Slots in use: 12 Oversubscribed: FALSE >> Num slots allocated: 12 Max slots: 0 >> Username on node: NULL >> Num procs: 12 Next node_rank: 12 >> Data for proc: [[51718,1],8] >> Pid: 0 Local rank: 0 Node rank: 0 App rank: 8 >> State: INITIALIZED App_context: 0 >> Locale: [B/B/B/B/B/B][./././././.] >> Binding: [B/././././.][./././././.] >> Data for proc: [[51718,1],9] >> Pid: 0 Local rank: 1 Node rank: 1 App rank: 9 >> State: INITIALIZED App_context: 0 >> Locale: [./././././.][B/B/B/B/B/B] >> Binding: [./././././.][B/././././.] >> Data for proc: [[51718,1],10] >> Pid: 0 Local rank: 2 Node rank: 2 App rank: 10 >> State: INITIALIZED App_context: 0 >> Locale: [B/B/B/B/B/B][./././././.] >> Binding: [./B/./././.][./././././.] >> Data for proc: [[51718,1],11] >> Pid: 0 Local rank: 3 Node rank: 3 App rank: 11 >> State: INITIALIZED App_context: 0 >> Locale: [./././././.][B/B/B/B/B/B] >> Binding: [./././././.][./B/./././.] >> Data for proc: [[51718,1],12] >> Pid: 0 Local rank: 4 Node rank: 4 App rank: 12 >> State: INITIALIZED App_context: 0 >> Locale: [B/B/B/B/B/B][./././././.] >> Binding: [././B/././.][./././././.] >> Data for proc: [[51718,1],13] >> Pid: 0 Local rank: 5 Node rank: 5 App rank: 13 >> State: INITIALIZED App_context: 0 >> Locale: [./././././.][B/B/B/B/B/B] >> Binding: [./././././.][././B/././.] >> Data for proc: [[51718,1],14] >> Pid: 0 Local rank: 6 Node rank: 6 App rank: 14 >> State: INITIALIZED App_context: 0 >> Locale: [B/B/B/B/B/B][./././././.] >> Binding: [./././B/./.][./././././.] >> Data for proc: [[51718,1],15] >> Pid: 0 Local rank: 7 Node rank: 7 App rank: 15 >> State: INITIALIZED App_context: 0 >> Locale: [./././././.][B/B/B/B/B/B] >> Binding: [./././././.][./././B/./.] >> Data for proc: [[51718,1],16] >> Pid: 0 Local rank: 8 Node rank: 8 App rank: 16 >> State: INITIALIZED App_context: 0 >> Locale: [B/B/B/B/B/B][./././././.] >> Binding: [././././B/.][./././././.] >> Data for proc: [[51718,1],17] >> Pid: 0 Local rank: 9 Node rank: 9 App rank: 17 >> State: INITIALIZED App_context: 0 >> Locale: [./././././.][B/B/B/B/B/B] >> Binding: [./././././.][././././B/.] >> Data for proc: [[51718,1],18] >> Pid: 0 Local rank: 10 Node rank: 10 App rank: 18 >> State: INITIALIZED App_context: 0 >> Locale: [B/B/B/B/B/B][./././././.] >> Binding: [./././././B][./././././.] >> Data for proc: [[51718,1],19] >> Pid: 0 Local rank: 11 Node rank: 11 App rank: 19 >> State: INITIALIZED App_context: 0 >> Locale: [./././././.][B/B/B/B/B/B] >> Binding: [./././././.][./././././B] >> >> Data for node: csclprd3-0-1 Launch id: -1 State: 0 >> Daemon: [[51718,0],4] Daemon launched: True >> Num slots: 6 Slots in use: 6 Oversubscribed: FALSE >> Num slots allocated: 6 Max slots: 0 >> Username on node: NULL >> Num procs: 6 Next node_rank: 6 >> Data for proc: [[51718,1],20] >> Pid: 0 Local rank: 0 Node rank: 0 App rank: 20 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [B/././././.] >> Data for proc: [[51718,1],21] >> Pid: 0 Local rank: 1 Node rank: 1 App rank: 21 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [./B/./././.] >> Data for proc: [[51718,1],22] >> Pid: 0 Local rank: 2 Node rank: 2 App rank: 22 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [././B/././.] >> Data for proc: [[51718,1],23] >> Pid: 0 Local rank: 3 Node rank: 3 App rank: 23 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [./././B/./.] >> Data for proc: [[51718,1],24] >> Pid: 0 Local rank: 4 Node rank: 4 App rank: 24 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [././././B/.] >> Data for proc: [[51718,1],25] >> Pid: 0 Local rank: 5 Node rank: 5 App rank: 25 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [./././././B] >> >> Data for node: csclprd3-0-2 Launch id: -1 State: 0 >> Daemon: [[51718,0],5] Daemon launched: True >> Num slots: 6 Slots in use: 6 Oversubscribed: FALSE >> Num slots allocated: 6 Max slots: 0 >> Username on node: NULL >> Num procs: 6 Next node_rank: 6 >> Data for proc: [[51718,1],26] >> Pid: 0 Local rank: 0 Node rank: 0 App rank: 26 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [B/././././.] >> Data for proc: [[51718,1],27] >> Pid: 0 Local rank: 1 Node rank: 1 App rank: 27 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [./B/./././.] >> Data for proc: [[51718,1],28] >> Pid: 0 Local rank: 2 Node rank: 2 App rank: 28 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [././B/././.] >> Data for proc: [[51718,1],29] >> Pid: 0 Local rank: 3 Node rank: 3 App rank: 29 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [./././B/./.] >> Data for proc: [[51718,1],30] >> Pid: 0 Local rank: 4 Node rank: 4 App rank: 30 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [././././B/.] >> Data for proc: [[51718,1],31] >> Pid: 0 Local rank: 5 Node rank: 5 App rank: 31 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [./././././B] >> >> Data for node: csclprd3-0-3 Launch id: -1 State: 0 >> Daemon: [[51718,0],6] Daemon launched: True >> Num slots: 6 Slots in use: 6 Oversubscribed: FALSE >> Num slots allocated: 6 Max slots: 0 >> Username on node: NULL >> Num procs: 6 Next node_rank: 6 >> Data for proc: [[51718,1],32] >> Pid: 0 Local rank: 0 Node rank: 0 App rank: 32 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [B/././././.] >> Data for proc: [[51718,1],33] >> Pid: 0 Local rank: 1 Node rank: 1 App rank: 33 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [./B/./././.] >> Data for proc: [[51718,1],34] >> Pid: 0 Local rank: 2 Node rank: 2 App rank: 34 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [././B/././.] >> Data for proc: [[51718,1],35] >> Pid: 0 Local rank: 3 Node rank: 3 App rank: 35 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [./././B/./.] >> Data for proc: [[51718,1],36] >> Pid: 0 Local rank: 4 Node rank: 4 App rank: 36 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [././././B/.] >> Data for proc: [[51718,1],37] >> Pid: 0 Local rank: 5 Node rank: 5 App rank: 37 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [./././././B] >> >> Data for node: csclprd3-0-4 Launch id: -1 State: 0 >> Daemon: [[51718,0],7] Daemon launched: True >> Num slots: 6 Slots in use: 6 Oversubscribed: FALSE >> Num slots allocated: 6 Max slots: 0 >> Username on node: NULL >> Num procs: 6 Next node_rank: 6 >> Data for proc: [[51718,1],38] >> Pid: 0 Local rank: 0 Node rank: 0 App rank: 38 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [B/././././.] >> Data for proc: [[51718,1],39] >> Pid: 0 Local rank: 1 Node rank: 1 App rank: 39 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [./B/./././.] >> Data for proc: [[51718,1],40] >> Pid: 0 Local rank: 2 Node rank: 2 App rank: 40 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [././B/././.] >> Data for proc: [[51718,1],41] >> Pid: 0 Local rank: 3 Node rank: 3 App rank: 41 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [./././B/./.] >> Data for proc: [[51718,1],42] >> Pid: 0 Local rank: 4 Node rank: 4 App rank: 42 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [././././B/.] >> Data for proc: [[51718,1],43] >> Pid: 0 Local rank: 5 Node rank: 5 App rank: 43 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [./././././B] >> >> Data for node: csclprd3-0-5 Launch id: -1 State: 0 >> Daemon: [[51718,0],8] Daemon launched: True >> Num slots: 6 Slots in use: 6 Oversubscribed: FALSE >> Num slots allocated: 6 Max slots: 0 >> Username on node: NULL >> Num procs: 6 Next node_rank: 6 >> Data for proc: [[51718,1],44] >> Pid: 0 Local rank: 0 Node rank: 0 App rank: 44 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [B/././././.] >> Data for proc: [[51718,1],45] >> Pid: 0 Local rank: 1 Node rank: 1 App rank: 45 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [./B/./././.] >> Data for proc: [[51718,1],46] >> Pid: 0 Local rank: 2 Node rank: 2 App rank: 46 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [././B/././.] >> Data for proc: [[51718,1],47] >> Pid: 0 Local rank: 3 Node rank: 3 App rank: 47 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [./././B/./.] >> Data for proc: [[51718,1],48] >> Pid: 0 Local rank: 4 Node rank: 4 App rank: 48 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [././././B/.] >> Data for proc: [[51718,1],49] >> Pid: 0 Local rank: 5 Node rank: 5 App rank: 49 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [./././././B] >> >> Data for node: csclprd3-0-6 Launch id: -1 State: 0 >> Daemon: [[51718,0],9] Daemon launched: True >> Num slots: 6 Slots in use: 6 Oversubscribed: FALSE >> Num slots allocated: 6 Max slots: 0 >> Username on node: NULL >> Num procs: 6 Next node_rank: 6 >> Data for proc: [[51718,1],50] >> Pid: 0 Local rank: 0 Node rank: 0 App rank: 50 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [B/././././.] >> Data for proc: [[51718,1],51] >> Pid: 0 Local rank: 1 Node rank: 1 App rank: 51 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [./B/./././.] >> Data for proc: [[51718,1],52] >> Pid: 0 Local rank: 2 Node rank: 2 App rank: 52 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [././B/././.] >> Data for proc: [[51718,1],53] >> Pid: 0 Local rank: 3 Node rank: 3 App rank: 53 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [./././B/./.] >> Data for proc: [[51718,1],54] >> Pid: 0 Local rank: 4 Node rank: 4 App rank: 54 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [././././B/.] >> Data for proc: [[51718,1],55] >> Pid: 0 Local rank: 5 Node rank: 5 App rank: 55 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [./././././B] >> >> Data for node: csclprd3-0-7 Launch id: -1 State: 0 >> Daemon: [[51718,0],10] Daemon launched: True >> Num slots: 16 Slots in use: 16 Oversubscribed: FALSE >> Num slots allocated: 16 Max slots: 0 >> Username on node: NULL >> Num procs: 16 Next node_rank: 16 >> Data for proc: [[51718,1],56] >> Pid: 0 Local rank: 0 Node rank: 0 App rank: 56 >> State: INITIALIZED App_context: 0 >> Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> Binding: [BB/../../../../../../..][../../../../../../../..] >> Data for proc: [[51718,1],57] >> Pid: 0 Local rank: 1 Node rank: 1 App rank: 57 >> State: INITIALIZED App_context: 0 >> Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> Binding: [../../../../../../../..][BB/../../../../../../..] >> Data for proc: [[51718,1],58] >> Pid: 0 Local rank: 2 Node rank: 2 App rank: 58 >> State: INITIALIZED App_context: 0 >> Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> Binding: [../BB/../../../../../..][../../../../../../../..] >> Data for proc: [[51718,1],59] >> Pid: 0 Local rank: 3 Node rank: 3 App rank: 59 >> State: INITIALIZED App_context: 0 >> Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> Binding: [../../../../../../../..][../BB/../../../../../..] >> Data for proc: [[51718,1],60] >> Pid: 0 Local rank: 4 Node rank: 4 App rank: 60 >> State: INITIALIZED App_context: 0 >> Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> Binding: [../../BB/../../../../..][../../../../../../../..] >> Data for proc: [[51718,1],61] >> Pid: 0 Local rank: 5 Node rank: 5 App rank: 61 >> State: INITIALIZED App_context: 0 >> Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> Binding: [../../../../../../../..][../../BB/../../../../..] >> Data for proc: [[51718,1],62] >> Pid: 0 Local rank: 6 Node rank: 6 App rank: 62 >> State: INITIALIZED App_context: 0 >> Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> Binding: [../../../BB/../../../..][../../../../../../../..] >> Data for proc: [[51718,1],63] >> Pid: 0 Local rank: 7 Node rank: 7 App rank: 63 >> State: INITIALIZED App_context: 0 >> Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> Binding: [../../../../../../../..][../../../BB/../../../..] >> Data for proc: [[51718,1],64] >> Pid: 0 Local rank: 8 Node rank: 8 App rank: 64 >> State: INITIALIZED App_context: 0 >> Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> Binding: [../../../../BB/../../..][../../../../../../../..] >> Data for proc: [[51718,1],65] >> Pid: 0 Local rank: 9 Node rank: 9 App rank: 65 >> State: INITIALIZED App_context: 0 >> Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> Binding: [../../../../../../../..][../../../../BB/../../..] >> Data for proc: [[51718,1],66] >> Pid: 0 Local rank: 10 Node rank: 10 App rank: 66 >> State: INITIALIZED App_context: 0 >> Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> Binding: [../../../../../BB/../..][../../../../../../../..] >> Data for proc: [[51718,1],67] >> Pid: 0 Local rank: 11 Node rank: 11 App rank: 67 >> State: INITIALIZED App_context: 0 >> Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> Binding: [../../../../../../../..][../../../../../BB/../..] >> Data for proc: [[51718,1],68] >> Pid: 0 Local rank: 12 Node rank: 12 App rank: 68 >> State: INITIALIZED App_context: 0 >> Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> Binding: [../../../../../../BB/..][../../../../../../../..] >> Data for proc: [[51718,1],69] >> Pid: 0 Local rank: 13 Node rank: 13 App rank: 69 >> State: INITIALIZED App_context: 0 >> Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> Binding: [../../../../../../../..][../../../../../../BB/..] >> Data for proc: [[51718,1],70] >> Pid: 0 Local rank: 14 Node rank: 14 App rank: 70 >> State: INITIALIZED App_context: 0 >> Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> Binding: [../../../../../../../BB][../../../../../../../..] >> Data for proc: [[51718,1],71] >> Pid: 0 Local rank: 15 Node rank: 15 App rank: 71 >> State: INITIALIZED App_context: 0 >> Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> Binding: [../../../../../../../..][../../../../../../../BB] >> >> Data for node: csclprd3-0-8 Launch id: -1 State: 0 >> Daemon: [[51718,0],11] Daemon launched: True >> Num slots: 16 Slots in use: 16 Oversubscribed: FALSE >> Num slots allocated: 16 Max slots: 0 >> Username on node: NULL >> Num procs: 16 Next node_rank: 16 >> Data for proc: [[51718,1],72] >> Pid: 0 Local rank: 0 Node rank: 0 App rank: 72 >> State: INITIALIZED App_context: 0 >> Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> Binding: [BB/../../../../../../..][../../../../../../../..] >> Data for proc: [[51718,1],73] >> Pid: 0 Local rank: 1 Node rank: 1 App rank: 73 >> State: INITIALIZED App_context: 0 >> Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> Binding: [../../../../../../../..][BB/../../../../../../..] >> Data for proc: [[51718,1],74] >> Pid: 0 Local rank: 2 Node rank: 2 App rank: 74 >> State: INITIALIZED App_context: 0 >> Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> Binding: [../BB/../../../../../..][../../../../../../../..] >> Data for proc: [[51718,1],75] >> Pid: 0 Local rank: 3 Node rank: 3 App rank: 75 >> State: INITIALIZED App_context: 0 >> Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> Binding: [../../../../../../../..][../BB/../../../../../..] >> Data for proc: [[51718,1],76] >> Pid: 0 Local rank: 4 Node rank: 4 App rank: 76 >> State: INITIALIZED App_context: 0 >> Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> Binding: [../../BB/../../../../..][../../../../../../../..] >> Data for proc: [[51718,1],77] >> Pid: 0 Local rank: 5 Node rank: 5 App rank: 77 >> State: INITIALIZED App_context: 0 >> Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> Binding: [../../../../../../../..][../../BB/../../../../..] >> Data for proc: [[51718,1],78] >> Pid: 0 Local rank: 6 Node rank: 6 App rank: 78 >> State: INITIALIZED App_context: 0 >> Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> Binding: [../../../BB/../../../..][../../../../../../../..] >> Data for proc: [[51718,1],79] >> Pid: 0 Local rank: 7 Node rank: 7 App rank: 79 >> State: INITIALIZED App_context: 0 >> Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> Binding: [../../../../../../../..][../../../BB/../../../..] >> Data for proc: [[51718,1],80] >> Pid: 0 Local rank: 8 Node rank: 8 App rank: 80 >> State: INITIALIZED App_context: 0 >> Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> Binding: [../../../../BB/../../..][../../../../../../../..] >> Data for proc: [[51718,1],81] >> Pid: 0 Local rank: 9 Node rank: 9 App rank: 81 >> State: INITIALIZED App_context: 0 >> Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> Binding: [../../../../../../../..][../../../../BB/../../..] >> Data for proc: [[51718,1],82] >> Pid: 0 Local rank: 10 Node rank: 10 App rank: 82 >> State: INITIALIZED App_context: 0 >> Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> Binding: [../../../../../BB/../..][../../../../../../../..] >> Data for proc: [[51718,1],83] >> Pid: 0 Local rank: 11 Node rank: 11 App rank: 83 >> State: INITIALIZED App_context: 0 >> Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> Binding: [../../../../../../../..][../../../../../BB/../..] >> Data for proc: [[51718,1],84] >> Pid: 0 Local rank: 12 Node rank: 12 App rank: 84 >> State: INITIALIZED App_context: 0 >> Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> Binding: [../../../../../../BB/..][../../../../../../../..] >> Data for proc: [[51718,1],85] >> Pid: 0 Local rank: 13 Node rank: 13 App rank: 85 >> State: INITIALIZED App_context: 0 >> Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> Binding: [../../../../../../../..][../../../../../../BB/..] >> Data for proc: [[51718,1],86] >> Pid: 0 Local rank: 14 Node rank: 14 App rank: 86 >> State: INITIALIZED App_context: 0 >> Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> Binding: [../../../../../../../BB][../../../../../../../..] >> Data for proc: [[51718,1],87] >> Pid: 0 Local rank: 15 Node rank: 15 App rank: 87 >> State: INITIALIZED App_context: 0 >> Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> Binding: [../../../../../../../..][../../../../../../../BB] >> >> Data for node: csclprd3-0-10 Launch id: -1 State: 0 >> Daemon: [[51718,0],12] Daemon launched: True >> Num slots: 16 Slots in use: 16 Oversubscribed: FALSE >> Num slots allocated: 16 Max slots: 0 >> Username on node: NULL >> Num procs: 16 Next node_rank: 16 >> Data for proc: [[51718,1],88] >> Pid: 0 Local rank: 0 Node rank: 0 App rank: 88 >> State: INITIALIZED App_context: 0 >> Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> Binding: [BB/../../../../../../..][../../../../../../../..] >> Data for proc: [[51718,1],89] >> Pid: 0 Local rank: 1 Node rank: 1 App rank: 89 >> State: INITIALIZED App_context: 0 >> Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> Binding: [../../../../../../../..][BB/../../../../../../..] >> Data for proc: [[51718,1],90] >> Pid: 0 Local rank: 2 Node rank: 2 App rank: 90 >> State: INITIALIZED App_context: 0 >> Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> Binding: [../BB/../../../../../..][../../../../../../../..] >> Data for proc: [[51718,1],91] >> Pid: 0 Local rank: 3 Node rank: 3 App rank: 91 >> State: INITIALIZED App_context: 0 >> Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> Binding: [../../../../../../../..][../BB/../../../../../..] >> Data for proc: [[51718,1],92] >> Pid: 0 Local rank: 4 Node rank: 4 App rank: 92 >> State: INITIALIZED App_context: 0 >> Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> Binding: [../../BB/../../../../..][../../../../../../../..] >> Data for proc: [[51718,1],93] >> Pid: 0 Local rank: 5 Node rank: 5 App rank: 93 >> State: INITIALIZED App_context: 0 >> Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> Binding: [../../../../../../../..][../../BB/../../../../..] >> Data for proc: [[51718,1],94] >> Pid: 0 Local rank: 6 Node rank: 6 App rank: 94 >> State: INITIALIZED App_context: 0 >> Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> Binding: [../../../BB/../../../..][../../../../../../../..] >> Data for proc: [[51718,1],95] >> Pid: 0 Local rank: 7 Node rank: 7 App rank: 95 >> State: INITIALIZED App_context: 0 >> Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> Binding: [../../../../../../../..][../../../BB/../../../..] >> Data for proc: [[51718,1],96] >> Pid: 0 Local rank: 8 Node rank: 8 App rank: 96 >> State: INITIALIZED App_context: 0 >> Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> Binding: [../../../../BB/../../..][../../../../../../../..] >> Data for proc: [[51718,1],97] >> Pid: 0 Local rank: 9 Node rank: 9 App rank: 97 >> State: INITIALIZED App_context: 0 >> Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> Binding: [../../../../../../../..][../../../../BB/../../..] >> Data for proc: [[51718,1],98] >> Pid: 0 Local rank: 10 Node rank: 10 App rank: 98 >> State: INITIALIZED App_context: 0 >> Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> Binding: [../../../../../BB/../..][../../../../../../../..] >> Data for proc: [[51718,1],99] >> Pid: 0 Local rank: 11 Node rank: 11 App rank: 99 >> State: INITIALIZED App_context: 0 >> Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> Binding: [../../../../../../../..][../../../../../BB/../..] >> Data for proc: [[51718,1],100] >> Pid: 0 Local rank: 12 Node rank: 12 App rank: 100 >> State: INITIALIZED App_context: 0 >> Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> Binding: [../../../../../../BB/..][../../../../../../../..] >> Data for proc: [[51718,1],101] >> Pid: 0 Local rank: 13 Node rank: 13 App rank: 101 >> State: INITIALIZED App_context: 0 >> Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> Binding: [../../../../../../../..][../../../../../../BB/..] >> Data for proc: [[51718,1],102] >> Pid: 0 Local rank: 14 Node rank: 14 App rank: 102 >> State: INITIALIZED App_context: 0 >> Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> Binding: [../../../../../../../BB][../../../../../../../..] >> Data for proc: [[51718,1],103] >> Pid: 0 Local rank: 15 Node rank: 15 App rank: 103 >> State: INITIALIZED App_context: 0 >> Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> Binding: [../../../../../../../..][../../../../../../../BB] >> >> Data for node: csclprd3-0-11 Launch id: -1 State: 0 >> Daemon: [[51718,0],13] Daemon launched: True >> Num slots: 16 Slots in use: 16 Oversubscribed: FALSE >> Num slots allocated: 16 Max slots: 0 >> Username on node: NULL >> Num procs: 16 Next node_rank: 16 >> Data for proc: [[51718,1],104] >> Pid: 0 Local rank: 0 Node rank: 0 App rank: 104 >> State: INITIALIZED App_context: 0 >> Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> Binding: [BB/../../../../../../..][../../../../../../../..] >> Data for proc: [[51718,1],105] >> Pid: 0 Local rank: 1 Node rank: 1 App rank: 105 >> State: INITIALIZED App_context: 0 >> Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> Binding: [../../../../../../../..][BB/../../../../../../..] >> Data for proc: [[51718,1],106] >> Pid: 0 Local rank: 2 Node rank: 2 App rank: 106 >> State: INITIALIZED App_context: 0 >> Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> Binding: [../BB/../../../../../..][../../../../../../../..] >> Data for proc: [[51718,1],107] >> Pid: 0 Local rank: 3 Node rank: 3 App rank: 107 >> State: INITIALIZED App_context: 0 >> Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> Binding: [../../../../../../../..][../BB/../../../../../..] >> Data for proc: [[51718,1],108] >> Pid: 0 Local rank: 4 Node rank: 4 App rank: 108 >> State: INITIALIZED App_context: 0 >> Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> Binding: [../../BB/../../../../..][../../../../../../../..] >> Data for proc: [[51718,1],109] >> Pid: 0 Local rank: 5 Node rank: 5 App rank: 109 >> State: INITIALIZED App_context: 0 >> Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> Binding: [../../../../../../../..][../../BB/../../../../..] >> Data for proc: [[51718,1],110] >> Pid: 0 Local rank: 6 Node rank: 6 App rank: 110 >> State: INITIALIZED App_context: 0 >> Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> Binding: [../../../BB/../../../..][../../../../../../../..] >> Data for proc: [[51718,1],111] >> Pid: 0 Local rank: 7 Node rank: 7 App rank: 111 >> State: INITIALIZED App_context: 0 >> Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> Binding: [../../../../../../../..][../../../BB/../../../..] >> Data for proc: [[51718,1],112] >> Pid: 0 Local rank: 8 Node rank: 8 App rank: 112 >> State: INITIALIZED App_context: 0 >> Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> Binding: [../../../../BB/../../..][../../../../../../../..] >> Data for proc: [[51718,1],113] >> Pid: 0 Local rank: 9 Node rank: 9 App rank: 113 >> State: INITIALIZED App_context: 0 >> Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> Binding: [../../../../../../../..][../../../../BB/../../..] >> Data for proc: [[51718,1],114] >> Pid: 0 Local rank: 10 Node rank: 10 App rank: 114 >> State: INITIALIZED App_context: 0 >> Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> Binding: [../../../../../BB/../..][../../../../../../../..] >> Data for proc: [[51718,1],115] >> Pid: 0 Local rank: 11 Node rank: 11 App rank: 115 >> State: INITIALIZED App_context: 0 >> Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> Binding: [../../../../../../../..][../../../../../BB/../..] >> Data for proc: [[51718,1],116] >> Pid: 0 Local rank: 12 Node rank: 12 App rank: 116 >> State: INITIALIZED App_context: 0 >> Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> Binding: [../../../../../../BB/..][../../../../../../../..] >> Data for proc: [[51718,1],117] >> Pid: 0 Local rank: 13 Node rank: 13 App rank: 117 >> State: INITIALIZED App_context: 0 >> Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> Binding: [../../../../../../../..][../../../../../../BB/..] >> Data for proc: [[51718,1],118] >> Pid: 0 Local rank: 14 Node rank: 14 App rank: 118 >> State: INITIALIZED App_context: 0 >> Locale: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >> Binding: [../../../../../../../BB][../../../../../../../..] >> Data for proc: [[51718,1],119] >> Pid: 0 Local rank: 15 Node rank: 15 App rank: 119 >> State: INITIALIZED App_context: 0 >> Locale: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >> Binding: [../../../../../../../..][../../../../../../../BB] >> >> Data for node: csclprd3-0-12 Launch id: -1 State: 0 >> Daemon: [[51718,0],14] Daemon launched: True >> Num slots: 6 Slots in use: 6 Oversubscribed: FALSE >> Num slots allocated: 6 Max slots: 0 >> Username on node: NULL >> Num procs: 6 Next node_rank: 6 >> Data for proc: [[51718,1],120] >> Pid: 0 Local rank: 0 Node rank: 0 App rank: 120 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [BB/../../../../..] >> Data for proc: [[51718,1],121] >> Pid: 0 Local rank: 1 Node rank: 1 App rank: 121 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [../BB/../../../..] >> Data for proc: [[51718,1],122] >> Pid: 0 Local rank: 2 Node rank: 2 App rank: 122 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [../../BB/../../..] >> Data for proc: [[51718,1],123] >> Pid: 0 Local rank: 3 Node rank: 3 App rank: 123 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [../../../BB/../..] >> Data for proc: [[51718,1],124] >> Pid: 0 Local rank: 4 Node rank: 4 App rank: 124 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [../../../../BB/..] >> Data for proc: [[51718,1],125] >> Pid: 0 Local rank: 5 Node rank: 5 App rank: 125 >> State: INITIALIZED App_context: 0 >> Locale: UNKNOWN >> Binding: [../../../../../BB] >> >> Data for node: csclprd3-0-13 Launch id: -1 State: 0 >> Daemon: [[51718,0],15] Daemon launched: True >> Num slots: 12 Slots in use: 6 Oversubscribed: FALSE >> Num slots allocated: 12 Max slots: 0 >> Username on node: NULL >> Num procs: 6 Next node_rank: 6 >> Data for proc: [[51718,1],126] >> Pid: 0 Local rank: 0 Node rank: 0 App rank: 126 >> State: INITIALIZED App_context: 0 >> Locale: [BB/BB/BB/BB/BB/BB][../../../../../..] >> Binding: [BB/../../../../..][../../../../../..] >> Data for proc: [[51718,1],127] >> Pid: 0 Local rank: 1 Node rank: 1 App rank: 127 >> State: INITIALIZED App_context: 0 >> Locale: [../../../../../..][BB/BB/BB/BB/BB/BB] >> Binding: [../../../../../..][BB/../../../../..] >> Data for proc: [[51718,1],128] >> Pid: 0 Local rank: 2 Node rank: 2 App rank: 128 >> State: INITIALIZED App_context: 0 >> Locale: [BB/BB/BB/BB/BB/BB][../../../../../..] >> Binding: [../BB/../../../..][../../../../../..] >> Data for proc: [[51718,1],129] >> Pid: 0 Local rank: 3 Node rank: 3 App rank: 129 >> State: INITIALIZED App_context: 0 >> Locale: [../../../../../..][BB/BB/BB/BB/BB/BB] >> Binding: [../../../../../..][../BB/../../../..] >> Data for proc: [[51718,1],130] >> Pid: 0 Local rank: 4 Node rank: 4 App rank: 130 >> State: INITIALIZED App_context: 0 >> Locale: [BB/BB/BB/BB/BB/BB][../../../../../..] >> Binding: [../../BB/../../..][../../../../../..] >> Data for proc: [[51718,1],131] >> Pid: 0 Local rank: 5 Node rank: 5 App rank: 131 >> State: INITIALIZED App_context: 0 >> Locale: [../../../../../..][BB/BB/BB/BB/BB/BB] >> Binding: [../../../../../..][../../BB/../../..] >> [csclprd3-0-13:31619] *** Process received signal *** >> [csclprd3-0-13:31619] Signal: Bus error (7) >> [csclprd3-0-13:31619] Signal code: Non-existant physical address (2) >> [csclprd3-0-13:31619] Failing at address: 0x7f1374267a00 >> [csclprd3-0-13:31620] *** Process received signal *** >> [csclprd3-0-13:31620] Signal: Bus error (7) >> [csclprd3-0-13:31620] Signal code: Non-existant physical address (2) >> [csclprd3-0-13:31620] Failing at address: 0x7fcc702a7980 >> [csclprd3-0-13:31615] *** Process received signal *** >> [csclprd3-0-13:31615] Signal: Bus error (7) >> [csclprd3-0-13:31615] Signal code: Non-existant physical address (2) >> [csclprd3-0-13:31615] Failing at address: 0x7f8128367880 >> [csclprd3-0-13:31616] *** Process received signal *** >> [csclprd3-0-13:31616] Signal: Bus error (7) >> [csclprd3-0-13:31616] Signal code: Non-existant physical address (2) >> [csclprd3-0-13:31616] Failing at address: 0x7fe674227a00 >> [csclprd3-0-13:31617] *** Process received signal *** >> [csclprd3-0-13:31617] Signal: Bus error (7) >> [csclprd3-0-13:31617] Signal code: Non-existant physical address (2) >> [csclprd3-0-13:31617] Failing at address: 0x7f061c32db80 >> [csclprd3-0-13:31618] *** Process received signal *** >> [csclprd3-0-13:31618] Signal: Bus error (7) >> [csclprd3-0-13:31618] Signal code: Non-existant physical address (2) >> [csclprd3-0-13:31618] Failing at address: 0x7fb8402eaa80 >> [csclprd3-0-13:31618] [ 0] /lib64/libpthread.so.0(+0xf500)[0x7fb851851500] >> [csclprd3-0-13:31618] [ 1] [csclprd3-0-13:31616] [ 0] >> /lib64/libpthread.so.0(+0xf500)[0x7fe6843a4500] >> [csclprd3-0-13:31616] [ 1] [csclprd3-0-13:31620] [ 0] >> /lib64/libpthread.so.0(+0xf500)[0x7fcc80c54500] >> [csclprd3-0-13:31620] [ 1] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x167f61)[0x7fcc80fc9f61] >> [csclprd3-0-13:31620] [ 2] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x168047)[0x7fcc80fca047] >> [csclprd3-0-13:31620] [ 3] [csclprd3-0-13:31615] [ 0] >> /lib64/libpthread.so.0(+0xf500)[0x7f81385ca500] >> [csclprd3-0-13:31615] [ 1] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x167f61)[0x7f813893ff61] >> [csclprd3-0-13:31615] [ 2] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x168047)[0x7f8138940047] >> [csclprd3-0-13:31615] [ 3] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x167f61)[0x7fb851bc6f61] >> [csclprd3-0-13:31618] [ 2] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x168047)[0x7fb851bc7047] >> [csclprd3-0-13:31618] [ 3] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x55670)[0x7fb851ab4670] >> [csclprd3-0-13:31618] [ 4] [csclprd3-0-13:31617] [ 0] >> /lib64/libpthread.so.0(+0xf500)[0x7f062cfe5500] >> [csclprd3-0-13:31617] [ 1] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x167f61)[0x7f062d35af61] >> [csclprd3-0-13:31617] [ 2] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x168047)[0x7f062d35b047] >> [csclprd3-0-13:31617] [ 3] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x55670)[0x7f062d248670] >> [csclprd3-0-13:31617] [ 4] [csclprd3-0-13:31619] [ 0] >> /lib64/libpthread.so.0(+0xf500)[0x7f1384fde500] >> [csclprd3-0-13:31619] [ 1] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x167f61)[0x7f1385353f61] >> [csclprd3-0-13:31619] [ 2] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x167f61)[0x7fe684719f61] >> [csclprd3-0-13:31616] [ 2] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x168047)[0x7fe68471a047] >> [csclprd3-0-13:31616] [ 3] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x55670)[0x7fe684607670] >> [csclprd3-0-13:31616] [ 4] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x168047)[0x7f1385354047] >> [csclprd3-0-13:31619] [ 3] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x55670)[0x7f1385241670] >> [csclprd3-0-13:31619] [ 4] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x3b9)[0x7f13852425ab] >> [csclprd3-0-13:31619] [ 5] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0xfb)[0x7f1385242751] >> [csclprd3-0-13:31619] [ 6] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_btl_sm_add_procs+0x671)[0x7f13853501c9] >> [csclprd3-0-13:31619] [ 7] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x14a628)[0x7f1385336628] >> [csclprd3-0-13:31619] [ 8] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x55670)[0x7fcc80eb7670] >> [csclprd3-0-13:31620] [ 4] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x3b9)[0x7fcc80eb85ab] >> [csclprd3-0-13:31620] [ 5] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0xfb)[0x7fcc80eb8751] >> [csclprd3-0-13:31620] [ 6] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_btl_sm_add_procs+0x671)[0x7fcc80fc61c9] >> [csclprd3-0-13:31620] [ 7] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x14a628)[0x7fcc80fac628] >> [csclprd3-0-13:31620] [ 8] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xff)[0x7fcc8111fd61] >> [csclprd3-0-13:31620] [ 9] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x55670)[0x7f813882d670] >> [csclprd3-0-13:31615] [ 4] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x3b9)[0x7f813882e5ab] >> [csclprd3-0-13:31615] [ 5] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0xfb)[0x7f813882e751] >> [csclprd3-0-13:31615] [ 6] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_btl_sm_add_procs+0x671)[0x7f813893c1c9] >> [csclprd3-0-13:31615] [ 7] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x14a628)[0x7f8138922628] >> [csclprd3-0-13:31615] [ 8] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xff)[0x7f8138a95d61] >> [csclprd3-0-13:31615] [ 9] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0xbda)[0x7f813885d747] >> [csclprd3-0-13:31615] [10] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x3b9)[0x7fb851ab55ab] >> [csclprd3-0-13:31618] [ 5] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0xfb)[0x7fb851ab5751] >> [csclprd3-0-13:31618] [ 6] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_btl_sm_add_procs+0x671)[0x7fb851bc31c9] >> [csclprd3-0-13:31618] [ 7] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x14a628)[0x7fb851ba9628] >> [csclprd3-0-13:31618] [ 8] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xff)[0x7fb851d1cd61] >> [csclprd3-0-13:31618] [ 9] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0xbda)[0x7fb851ae4747] >> [csclprd3-0-13:31618] [10] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x3b9)[0x7f062d2495ab] >> [csclprd3-0-13:31617] [ 5] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0xfb)[0x7f062d249751] >> [csclprd3-0-13:31617] [ 6] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_btl_sm_add_procs+0x671)[0x7f062d3571c9] >> [csclprd3-0-13:31617] [ 7] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x14a628)[0x7f062d33d628] >> [csclprd3-0-13:31617] [ 8] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xff)[0x7f062d4b0d61] >> [csclprd3-0-13:31617] [ 9] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0xbda)[0x7f062d278747] >> [csclprd3-0-13:31617] [10] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x3b9)[0x7fe6846085ab] >> [csclprd3-0-13:31616] [ 5] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0xfb)[0x7fe684608751] >> [csclprd3-0-13:31616] [ 6] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_btl_sm_add_procs+0x671)[0x7fe6847161c9] >> [csclprd3-0-13:31616] [ 7] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x14a628)[0x7fe6846fc628] >> [csclprd3-0-13:31616] [ 8] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xff)[0x7fe68486fd61] >> [csclprd3-0-13:31616] [ 9] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0xbda)[0x7fe684637747] >> [csclprd3-0-13:31616] [10] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x185)[0x7fe68467750b] >> [csclprd3-0-13:31616] [11] >> /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] >> [csclprd3-0-13:31616] [12] >> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7fe684021cdd] >> [csclprd3-0-13:31616] [13] >> /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] >> [csclprd3-0-13:31616] *** End of error message *** >> >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x185)[0x7f062d2b850b] >> [csclprd3-0-13:31617] [11] >> /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] >> [csclprd3-0-13:31617] [12] >> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f062cc62cdd] >> [csclprd3-0-13:31617] [13] >> /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] >> [csclprd3-0-13:31617] *** End of error message *** >> >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xff)[0x7f13854a9d61] >> [csclprd3-0-13:31619] [ 9] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0xbda)[0x7f1385271747] >> [csclprd3-0-13:31619] [10] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x185)[0x7f13852b150b] >> [csclprd3-0-13:31619] [11] >> /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] >> [csclprd3-0-13:31619] [12] >> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f1384c5bcdd] >> [csclprd3-0-13:31619] [13] >> /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] >> [csclprd3-0-13:31619] *** End of error message *** >> >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0xbda)[0x7fcc80ee7747] >> [csclprd3-0-13:31620] [10] >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x185)[0x7fcc80f2750b] >> [csclprd3-0-13:31620] [11] >> /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] >> [csclprd3-0-13:31620] [12] >> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7fcc808d1cdd] >> [csclprd3-0-13:31620] [13] >> /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] >> [csclprd3-0-13:31620] *** End of error message *** >> >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x185)[0x7f813889d50b] >> [csclprd3-0-13:31615] [11] >> /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] >> [csclprd3-0-13:31615] [12] >> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f8138247cdd] >> [csclprd3-0-13:31615] [13] >> /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] >> [csclprd3-0-13:31615] *** End of error message *** >> >> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x185)[0x7fb851b2450b] >> [csclprd3-0-13:31618] [11] >> /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] >> [csclprd3-0-13:31618] [12] >> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7fb8514cecdd] >> [csclprd3-0-13:31618] [13] >> /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] >> [csclprd3-0-13:31618] *** End of error message *** >> -------------------------------------------------------------------------- >> mpirun noticed that process rank 126 with PID 0 on node csclprd3-0-13 >> exited on signal 7 (Bus error). >> -------------------------------------------------------------------------- >> >> ------------------------------ >> *From:* users [users-boun...@open-mpi.org] on behalf of Ralph Castain [ >> r...@open-mpi.org] >> *Sent:* Tuesday, June 23, 2015 6:20 PM >> *To:* Open MPI Users >> *Subject:* Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = >> crash >> >> Wow - that is one sick puppy! I see that some nodes are reporting >> not-bound for their procs, and the rest are binding to socket (as they >> should). Some of your nodes clearly do not have hyper threads enabled (or >> only have single-thread cores on them), and have 2 cores/socket. Other >> nodes have 8 cores/socket with hyper threads enabled, while still others >> have 6 cores/socket and HT enabled. >> >> I don't see anyone binding to a single HT if multiple HTs/core are >> available. I think you are being fooled by those nodes that either don't >> have HT enabled, or have only 1 HT/core. >> >> In both cases, it is node 13 that is the node that fails. I also note >> that you said everything works okay with < 132 ranks, and node 13 hosts >> ranks 127-131. So node 13 would host ranks even if you reduced the number >> in the job to 131. This would imply that it probably isn't something wrong >> with the node itself. >> >> Is there any way you could run a job of this size on a homogeneous >> cluster? The procs all show bindings that look right, but I'm wondering if >> the heterogeneity is the source of the trouble here. We may be >> communicating the binding pattern incorrectly and giving bad info to the >> backend daemon. >> >> Also, rather than --report-bindings, use "--display-devel-map" on the >> command line and let's see what the mapper thinks it did. If there is a >> problem with placement, that is where it would exist. >> >> >> On Tue, Jun 23, 2015 at 5:12 PM, Lane, William <william.l...@cshs.org> >> wrote: >> >>> Ralph, >>> >>> There is something funny going on, the trace from the >>> runs w/the debug build aren't showing any differences from >>> what I got earlier. However, I did do a run w/the --bind-to core >>> switch and was surprised to see that hyperthreading cores were >>> sometimes being used. >>> >>> Here's the traces that I have: >>> >>> mpirun -np 132 -report-bindings --prefix /hpc/apps/mpi/openmpi/1.8.6/ >>> --hostfile hostfile-noslots --mca btl_tcp_if_include eth0 --hetero-nodes >>> /hpc/home/lanew/mpi/openmpi/ProcessColors3 >>> [csclprd3-0-5:16802] MCW rank 44 is not bound (or bound to all available >>> processors) >>> [csclprd3-0-5:16802] MCW rank 45 is not bound (or bound to all available >>> processors) >>> [csclprd3-0-5:16802] MCW rank 46 is not bound (or bound to all available >>> processors) >>> [csclprd3-6-5:12480] MCW rank 4 bound to socket 0[core 0[hwt 0]], socket >>> 0[core 1[hwt 0]]: [B/B][./.] >>> [csclprd3-6-5:12480] MCW rank 5 bound to socket 1[core 2[hwt 0]], socket >>> 1[core 3[hwt 0]]: [./.][B/B] >>> [csclprd3-6-5:12480] MCW rank 6 bound to socket 0[core 0[hwt 0]], socket >>> 0[core 1[hwt 0]]: [B/B][./.] >>> [csclprd3-6-5:12480] MCW rank 7 bound to socket 1[core 2[hwt 0]], socket >>> 1[core 3[hwt 0]]: [./.][B/B] >>> [csclprd3-0-5:16802] MCW rank 47 is not bound (or bound to all available >>> processors) >>> [csclprd3-0-5:16802] MCW rank 48 is not bound (or bound to all available >>> processors) >>> [csclprd3-0-5:16802] MCW rank 49 is not bound (or bound to all available >>> processors) >>> [csclprd3-0-1:14318] MCW rank 22 is not bound (or bound to all available >>> processors) >>> [csclprd3-0-1:14318] MCW rank 23 is not bound (or bound to all available >>> processors) >>> [csclprd3-0-1:14318] MCW rank 24 is not bound (or bound to all available >>> processors) >>> [csclprd3-6-1:24682] MCW rank 3 bound to socket 1[core 2[hwt 0]], socket >>> 1[core 3[hwt 0]]: [./.][B/B] >>> [csclprd3-6-1:24682] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket >>> 0[core 1[hwt 0]]: [B/B][./.] >>> [csclprd3-0-1:14318] MCW rank 25 is not bound (or bound to all available >>> processors) >>> [csclprd3-0-1:14318] MCW rank 20 is not bound (or bound to all available >>> processors) >>> [csclprd3-0-3:13827] MCW rank 34 is not bound (or bound to all available >>> processors) >>> [csclprd3-0-1:14318] MCW rank 21 is not bound (or bound to all available >>> processors) >>> [csclprd3-0-3:13827] MCW rank 35 is not bound (or bound to all available >>> processors) >>> [csclprd3-6-1:24682] MCW rank 1 bound to socket 1[core 2[hwt 0]], socket >>> 1[core 3[hwt 0]]: [./.][B/B] >>> [csclprd3-0-3:13827] MCW rank 36 is not bound (or bound to all available >>> processors) >>> [csclprd3-6-1:24682] MCW rank 2 bound to socket 0[core 0[hwt 0]], socket >>> 0[core 1[hwt 0]]: [B/B][./.] >>> [csclprd3-0-6:30371] MCW rank 51 is not bound (or bound to all available >>> processors) >>> [csclprd3-0-6:30371] MCW rank 52 is not bound (or bound to all available >>> processors) >>> [csclprd3-0-6:30371] MCW rank 53 is not bound (or bound to all available >>> processors) >>> [csclprd3-0-2:05825] MCW rank 30 is not bound (or bound to all available >>> processors) >>> [csclprd3-0-6:30371] MCW rank 54 is not bound (or bound to all available >>> processors) >>> [csclprd3-0-3:13827] MCW rank 37 is not bound (or bound to all available >>> processors) >>> [csclprd3-0-2:05825] MCW rank 31 is not bound (or bound to all available >>> processors) >>> [csclprd3-0-3:13827] MCW rank 32 is not bound (or bound to all available >>> processors) >>> [csclprd3-0-6:30371] MCW rank 55 is not bound (or bound to all available >>> processors) >>> [csclprd3-0-3:13827] MCW rank 33 is not bound (or bound to all available >>> processors) >>> [csclprd3-0-6:30371] MCW rank 50 is not bound (or bound to all available >>> processors) >>> [csclprd3-0-2:05825] MCW rank 26 is not bound (or bound to all available >>> processors) >>> [csclprd3-0-2:05825] MCW rank 27 is not bound (or bound to all available >>> processors) >>> [csclprd3-0-2:05825] MCW rank 28 is not bound (or bound to all available >>> processors) >>> [csclprd3-0-2:05825] MCW rank 29 is not bound (or bound to all available >>> processors) >>> [csclprd3-0-12:12383] MCW rank 121 is not bound (or bound to all >>> available processors) >>> [csclprd3-0-12:12383] MCW rank 122 is not bound (or bound to all >>> available processors) >>> [csclprd3-0-12:12383] MCW rank 123 is not bound (or bound to all >>> available processors) >>> [csclprd3-0-12:12383] MCW rank 124 is not bound (or bound to all >>> available processors) >>> [csclprd3-0-12:12383] MCW rank 125 is not bound (or bound to all >>> available processors) >>> [csclprd3-0-12:12383] MCW rank 120 is not bound (or bound to all >>> available processors) >>> [csclprd3-0-0:31079] MCW rank 13 bound to socket 1[core 6[hwt 0]], >>> socket 1[core 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], >>> socket 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: >>> [./././././.][B/B/B/B/B/B] >>> [csclprd3-0-0:31079] MCW rank 14 bound to socket 0[core 0[hwt 0]], >>> socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], >>> socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [B/B/B/B/B/B][./././././.] >>> [csclprd3-0-0:31079] MCW rank 15 bound to socket 1[core 6[hwt 0]], >>> socket 1[core 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], >>> socket 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: >>> [./././././.][B/B/B/B/B/B] >>> [csclprd3-0-0:31079] MCW rank 16 bound to socket 0[core 0[hwt 0]], >>> socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], >>> socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [B/B/B/B/B/B][./././././.] >>> [csclprd3-0-7:20515] MCW rank 68 bound to socket 0[core 0[hwt 0-1]], >>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >>> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >>> [csclprd3-0-10:19096] MCW rank 100 bound to socket 0[core 0[hwt 0-1]], >>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >>> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >>> [csclprd3-0-7:20515] MCW rank 69 bound to socket 1[core 8[hwt 0-1]], >>> socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt >>> 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket >>> 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: >>> [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >>> [csclprd3-0-10:19096] MCW rank 101 bound to socket 1[core 8[hwt 0-1]], >>> socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt >>> 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket >>> 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: >>> [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >>> [csclprd3-0-0:31079] MCW rank 17 bound to socket 1[core 6[hwt 0]], >>> socket 1[core 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], >>> socket 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: >>> [./././././.][B/B/B/B/B/B] >>> [csclprd3-0-7:20515] MCW rank 70 bound to socket 0[core 0[hwt 0-1]], >>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >>> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >>> [csclprd3-0-10:19096] MCW rank 102 bound to socket 0[core 0[hwt 0-1]], >>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >>> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >>> [csclprd3-0-11:31636] MCW rank 116 bound to socket 0[core 0[hwt 0-1]], >>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >>> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >>> [csclprd3-0-11:31636] MCW rank 117 bound to socket 1[core 8[hwt 0-1]], >>> socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt >>> 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket >>> 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: >>> [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >>> [csclprd3-0-0:31079] MCW rank 18 bound to socket 0[core 0[hwt 0]], >>> socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], >>> socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [B/B/B/B/B/B][./././././.] >>> [csclprd3-0-11:31636] MCW rank 118 bound to socket 0[core 0[hwt 0-1]], >>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >>> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >>> [csclprd3-0-0:31079] MCW rank 19 bound to socket 1[core 6[hwt 0]], >>> socket 1[core 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], >>> socket 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: >>> [./././././.][B/B/B/B/B/B] >>> [csclprd3-0-7:20515] MCW rank 71 bound to socket 1[core 8[hwt 0-1]], >>> socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt >>> 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket >>> 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: >>> [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >>> [csclprd3-0-10:19096] MCW rank 103 bound to socket 1[core 8[hwt 0-1]], >>> socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt >>> 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket >>> 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: >>> [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >>> [csclprd3-0-0:31079] MCW rank 8 bound to socket 0[core 0[hwt 0]], socket >>> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket >>> 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [B/B/B/B/B/B][./././././.] >>> [csclprd3-0-0:31079] MCW rank 9 bound to socket 1[core 6[hwt 0]], socket >>> 1[core 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], socket >>> 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: [./././././.][B/B/B/B/B/B] >>> [csclprd3-0-10:19096] MCW rank 88 bound to socket 0[core 0[hwt 0-1]], >>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >>> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >>> [csclprd3-0-11:31636] MCW rank 119 bound to socket 1[core 8[hwt 0-1]], >>> socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt >>> 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket >>> 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: >>> [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >>> [csclprd3-0-7:20515] MCW rank 56 bound to socket 0[core 0[hwt 0-1]], >>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >>> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >>> [csclprd3-0-0:31079] MCW rank 10 bound to socket 0[core 0[hwt 0]], >>> socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], >>> socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [B/B/B/B/B/B][./././././.] >>> [csclprd3-0-7:20515] MCW rank 57 bound to socket 1[core 8[hwt 0-1]], >>> socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt >>> 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket >>> 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: >>> [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >>> [csclprd3-0-10:19096] MCW rank 89 bound to socket 1[core 8[hwt 0-1]], >>> socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt >>> 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket >>> 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: >>> [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >>> [csclprd3-0-11:31636] MCW rank 104 bound to socket 0[core 0[hwt 0-1]], >>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >>> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >>> [csclprd3-0-0:31079] MCW rank 11 bound to socket 1[core 6[hwt 0]], >>> socket 1[core 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], >>> socket 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: >>> [./././././.][B/B/B/B/B/B] >>> [csclprd3-0-0:31079] MCW rank 12 bound to socket 0[core 0[hwt 0]], >>> socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], >>> socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [B/B/B/B/B/B][./././././.] >>> [csclprd3-0-4:30348] MCW rank 42 is not bound (or bound to all >>> >> > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/06/27185.php >