Ralph, There is something funny going on, the trace from the runs w/the debug build aren't showing any differences from what I got earlier. However, I did do a run w/the --bind-to core switch and was surprised to see that hyperthreading cores were sometimes being used.
Here's the traces that I have: mpirun -np 132 -report-bindings --prefix /hpc/apps/mpi/openmpi/1.8.6/ --hostfile hostfile-noslots --mca btl_tcp_if_include eth0 --hetero-nodes /hpc/home/lanew/mpi/openmpi/ProcessColors3 [csclprd3-0-5:16802] MCW rank 44 is not bound (or bound to all available processors) [csclprd3-0-5:16802] MCW rank 45 is not bound (or bound to all available processors) [csclprd3-0-5:16802] MCW rank 46 is not bound (or bound to all available processors) [csclprd3-6-5:12480] MCW rank 4 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]]: [B/B][./.] [csclprd3-6-5:12480] MCW rank 5 bound to socket 1[core 2[hwt 0]], socket 1[core 3[hwt 0]]: [./.][B/B] [csclprd3-6-5:12480] MCW rank 6 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]]: [B/B][./.] [csclprd3-6-5:12480] MCW rank 7 bound to socket 1[core 2[hwt 0]], socket 1[core 3[hwt 0]]: [./.][B/B] [csclprd3-0-5:16802] MCW rank 47 is not bound (or bound to all available processors) [csclprd3-0-5:16802] MCW rank 48 is not bound (or bound to all available processors) [csclprd3-0-5:16802] MCW rank 49 is not bound (or bound to all available processors) [csclprd3-0-1:14318] MCW rank 22 is not bound (or bound to all available processors) [csclprd3-0-1:14318] MCW rank 23 is not bound (or bound to all available processors) [csclprd3-0-1:14318] MCW rank 24 is not bound (or bound to all available processors) [csclprd3-6-1:24682] MCW rank 3 bound to socket 1[core 2[hwt 0]], socket 1[core 3[hwt 0]]: [./.][B/B] [csclprd3-6-1:24682] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]]: [B/B][./.] [csclprd3-0-1:14318] MCW rank 25 is not bound (or bound to all available processors) [csclprd3-0-1:14318] MCW rank 20 is not bound (or bound to all available processors) [csclprd3-0-3:13827] MCW rank 34 is not bound (or bound to all available processors) [csclprd3-0-1:14318] MCW rank 21 is not bound (or bound to all available processors) [csclprd3-0-3:13827] MCW rank 35 is not bound (or bound to all available processors) [csclprd3-6-1:24682] MCW rank 1 bound to socket 1[core 2[hwt 0]], socket 1[core 3[hwt 0]]: [./.][B/B] [csclprd3-0-3:13827] MCW rank 36 is not bound (or bound to all available processors) [csclprd3-6-1:24682] MCW rank 2 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]]: [B/B][./.] [csclprd3-0-6:30371] MCW rank 51 is not bound (or bound to all available processors) [csclprd3-0-6:30371] MCW rank 52 is not bound (or bound to all available processors) [csclprd3-0-6:30371] MCW rank 53 is not bound (or bound to all available processors) [csclprd3-0-2:05825] MCW rank 30 is not bound (or bound to all available processors) [csclprd3-0-6:30371] MCW rank 54 is not bound (or bound to all available processors) [csclprd3-0-3:13827] MCW rank 37 is not bound (or bound to all available processors) [csclprd3-0-2:05825] MCW rank 31 is not bound (or bound to all available processors) [csclprd3-0-3:13827] MCW rank 32 is not bound (or bound to all available processors) [csclprd3-0-6:30371] MCW rank 55 is not bound (or bound to all available processors) [csclprd3-0-3:13827] MCW rank 33 is not bound (or bound to all available processors) [csclprd3-0-6:30371] MCW rank 50 is not bound (or bound to all available processors) [csclprd3-0-2:05825] MCW rank 26 is not bound (or bound to all available processors) [csclprd3-0-2:05825] MCW rank 27 is not bound (or bound to all available processors) [csclprd3-0-2:05825] MCW rank 28 is not bound (or bound to all available processors) [csclprd3-0-2:05825] MCW rank 29 is not bound (or bound to all available processors) [csclprd3-0-12:12383] MCW rank 121 is not bound (or bound to all available processors) [csclprd3-0-12:12383] MCW rank 122 is not bound (or bound to all available processors) [csclprd3-0-12:12383] MCW rank 123 is not bound (or bound to all available processors) [csclprd3-0-12:12383] MCW rank 124 is not bound (or bound to all available processors) [csclprd3-0-12:12383] MCW rank 125 is not bound (or bound to all available processors) [csclprd3-0-12:12383] MCW rank 120 is not bound (or bound to all available processors) [csclprd3-0-0:31079] MCW rank 13 bound to socket 1[core 6[hwt 0]], socket 1[core 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], socket 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: [./././././.][B/B/B/B/B/B] [csclprd3-0-0:31079] MCW rank 14 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [B/B/B/B/B/B][./././././.] [csclprd3-0-0:31079] MCW rank 15 bound to socket 1[core 6[hwt 0]], socket 1[core 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], socket 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: [./././././.][B/B/B/B/B/B] [csclprd3-0-0:31079] MCW rank 16 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [B/B/B/B/B/B][./././././.] [csclprd3-0-7:20515] MCW rank 68 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-10:19096] MCW rank 100 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-7:20515] MCW rank 69 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-10:19096] MCW rank 101 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-0:31079] MCW rank 17 bound to socket 1[core 6[hwt 0]], socket 1[core 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], socket 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: [./././././.][B/B/B/B/B/B] [csclprd3-0-7:20515] MCW rank 70 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-10:19096] MCW rank 102 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-11:31636] MCW rank 116 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-11:31636] MCW rank 117 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-0:31079] MCW rank 18 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [B/B/B/B/B/B][./././././.] [csclprd3-0-11:31636] MCW rank 118 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-0:31079] MCW rank 19 bound to socket 1[core 6[hwt 0]], socket 1[core 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], socket 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: [./././././.][B/B/B/B/B/B] [csclprd3-0-7:20515] MCW rank 71 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-10:19096] MCW rank 103 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-0:31079] MCW rank 8 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [B/B/B/B/B/B][./././././.] [csclprd3-0-0:31079] MCW rank 9 bound to socket 1[core 6[hwt 0]], socket 1[core 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], socket 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: [./././././.][B/B/B/B/B/B] [csclprd3-0-10:19096] MCW rank 88 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-11:31636] MCW rank 119 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-7:20515] MCW rank 56 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-0:31079] MCW rank 10 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [B/B/B/B/B/B][./././././.] [csclprd3-0-7:20515] MCW rank 57 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-10:19096] MCW rank 89 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-11:31636] MCW rank 104 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-0:31079] MCW rank 11 bound to socket 1[core 6[hwt 0]], socket 1[core 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], socket 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: [./././././.][B/B/B/B/B/B] [csclprd3-0-0:31079] MCW rank 12 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [B/B/B/B/B/B][./././././.] [csclprd3-0-4:30348] MCW rank 42 is not bound (or bound to all available processors) [csclprd3-0-4:30348] MCW rank 43 is not bound (or bound to all available processors) [csclprd3-0-10:19096] MCW rank 90 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-4:30348] MCW rank 38 is not bound (or bound to all available processors) [csclprd3-0-7:20515] MCW rank 58 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-4:30348] MCW rank 39 is not bound (or bound to all available processors) [csclprd3-0-4:30348] MCW rank 40 is not bound (or bound to all available processors) [csclprd3-0-4:30348] MCW rank 41 is not bound (or bound to all available processors) [csclprd3-0-11:31636] MCW rank 105 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-13:29118] MCW rank 127 bound to socket 1[core 6[hwt 0-1]], socket 1[core 7[hwt 0-1]], socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]]: [../../../../../..][BB/BB/BB/BB/BB/BB] [csclprd3-0-13:29118] MCW rank 128 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]]: [BB/BB/BB/BB/BB/BB][../../../../../..] [csclprd3-0-13:29118] MCW rank 129 bound to socket 1[core 6[hwt 0-1]], socket 1[core 7[hwt 0-1]], socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]]: [../../../../../..][BB/BB/BB/BB/BB/BB] [csclprd3-0-13:29118] MCW rank 130 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]]: [BB/BB/BB/BB/BB/BB][../../../../../..] [csclprd3-0-8:15542] MCW rank 84 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-13:29118] MCW rank 131 bound to socket 1[core 6[hwt 0-1]], socket 1[core 7[hwt 0-1]], socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]]: [../../../../../..][BB/BB/BB/BB/BB/BB] [csclprd3-0-8:15542] MCW rank 85 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-13:29118] MCW rank 126 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]]: [BB/BB/BB/BB/BB/BB][../../../../../..] [csclprd3-0-8:15542] MCW rank 86 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-8:15542] MCW rank 87 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-7:20515] MCW rank 59 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-10:19096] MCW rank 91 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-11:31636] MCW rank 106 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-8:15542] MCW rank 72 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-7:20515] MCW rank 60 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-10:19096] MCW rank 92 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-11:31636] MCW rank 107 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-7:20515] MCW rank 61 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-11:31636] MCW rank 108 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-10:19096] MCW rank 93 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-8:15542] MCW rank 73 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-7:20515] MCW rank 62 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-10:19096] MCW rank 94 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-11:31636] MCW rank 109 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-7:20515] MCW rank 63 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-10:19096] MCW rank 95 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-11:31636] MCW rank 110 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-8:15542] MCW rank 74 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-7:20515] MCW rank 64 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-10:19096] MCW rank 96 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-11:31636] MCW rank 111 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-7:20515] MCW rank 65 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-10:19096] MCW rank 97 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-11:31636] MCW rank 112 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-8:15542] MCW rank 75 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-7:20515] MCW rank 66 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-10:19096] MCW rank 98 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-11:31636] MCW rank 113 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-7:20515] MCW rank 67 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-10:19096] MCW rank 99 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-11:31636] MCW rank 114 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-8:15542] MCW rank 76 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-11:31636] MCW rank 115 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-8:15542] MCW rank 77 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-8:15542] MCW rank 78 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-8:15542] MCW rank 79 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-8:15542] MCW rank 80 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-8:15542] MCW rank 81 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-8:15542] MCW rank 82 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-8:15542] MCW rank 83 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-13:29120] *** Process received signal *** [csclprd3-0-13:29120] Signal: Bus error (7) [csclprd3-0-13:29120] Signal code: Non-existant physical address (2) [csclprd3-0-13:29120] Failing at address: 0x7f181832ba80 [csclprd3-0-13:29121] *** Process received signal *** [csclprd3-0-13:29121] Signal: Bus error (7) [csclprd3-0-13:29121] Signal code: Non-existant physical address (2) [csclprd3-0-13:29121] Failing at address: 0x7f5ca82a7980 [csclprd3-0-13:29122] *** Process received signal *** [csclprd3-0-13:29122] Signal: Bus error (7) [csclprd3-0-13:29122] Signal code: Non-existant physical address (2) [csclprd3-0-13:29122] Failing at address: 0x7fac6ba24980 [csclprd3-0-13:29123] *** Process received signal *** [csclprd3-0-13:29123] Signal: Bus error (7) [csclprd3-0-13:29123] Signal code: Non-existant physical address (2) [csclprd3-0-13:29123] Failing at address: 0x7faa24267a00 [csclprd3-0-13:29125] *** Process received signal *** [csclprd3-0-13:29125] Signal: Bus error (7) [csclprd3-0-13:29125] Signal code: Non-existant physical address (2) [csclprd3-0-13:29125] Failing at address: 0x7fa493ae7a00 [csclprd3-0-13:29119] *** Process received signal *** [csclprd3-0-13:29119] Signal: Bus error (7) [csclprd3-0-13:29119] Signal code: Non-existant physical address (2) [csclprd3-0-13:29119] Failing at address: 0x7fed7436ba80 [csclprd3-0-13:29120] [ 0] /lib64/libpthread.so.0(+0xf500)[0x7f182913e500] [csclprd3-0-13:29120] [ 1] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x167f61)[0x7f18294b3f61] [csclprd3-0-13:29120] [ 2] [csclprd3-0-13:29121] [ 0] /lib64/libpthread.so.0(+0xf500)[0x7f5cb8803500] [csclprd3-0-13:29121] [ 1] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x167f61)[0x7f5cb8b78f61] [csclprd3-0-13:29121] [ 2] [csclprd3-0-13:29122] [ 0] /lib64/libpthread.so.0(+0xf500)[0x7fac7b20c500] [csclprd3-0-13:29122] [ 1] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x167f61)[0x7fac7b581f61] [csclprd3-0-13:29122] [ 2] [csclprd3-0-13:29123] [ 0] /lib64/libpthread.so.0(+0xf500)[0x7faa33edd500] [csclprd3-0-13:29123] [ 1] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x167f61)[0x7faa34252f61] [csclprd3-0-13:29123] [ 2] [csclprd3-0-13:29125] [ 0] /lib64/libpthread.so.0(+0xf500)[0x7fa4a3097500] [csclprd3-0-13:29125] [ 1] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x167f61)[0x7fa4a340cf61] [csclprd3-0-13:29125] [ 2] [csclprd3-0-13:29119] [ 0] /lib64/libpthread.so.0(+0xf500)[0x7fed85c95500] [csclprd3-0-13:29119] [ 1] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x167f61)[0x7fed8600af61] [csclprd3-0-13:29119] [ 2] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x168047)[0x7fa4a340d047] [csclprd3-0-13:29125] [ 3] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x55670)[0x7fa4a32fa670] [csclprd3-0-13:29125] [ 4] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x3b9)[0x7fa4a32fb5ab] [csclprd3-0-13:29125] [ 5] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0xfb)[0x7fa4a32fb751] [csclprd3-0-13:29125] [ 6] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x168047)[0x7f18294b4047] [csclprd3-0-13:29120] [ 3] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x55670)[0x7f18293a1670] [csclprd3-0-13:29120] [ 4] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x3b9)[0x7f18293a25ab] [csclprd3-0-13:29120] [ 5] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0xfb)[0x7f18293a2751] [csclprd3-0-13:29120] [ 6] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x168047)[0x7f5cb8b79047] [csclprd3-0-13:29121] [ 3] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x55670)[0x7f5cb8a66670] [csclprd3-0-13:29121] [ 4] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x3b9)[0x7f5cb8a675ab] [csclprd3-0-13:29121] [ 5] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0xfb)[0x7f5cb8a67751] [csclprd3-0-13:29121] [ 6] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x168047)[0x7fac7b582047] [csclprd3-0-13:29122] [ 3] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x55670)[0x7fac7b46f670] [csclprd3-0-13:29122] [ 4] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x3b9)[0x7fac7b4705ab] [csclprd3-0-13:29122] [ 5] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0xfb)[0x7fac7b470751] [csclprd3-0-13:29122] [ 6] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x168047)[0x7faa34253047] [csclprd3-0-13:29123] [ 3] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x55670)[0x7faa34140670] [csclprd3-0-13:29123] [ 4] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x3b9)[0x7faa341415ab] [csclprd3-0-13:29123] [ 5] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0xfb)[0x7faa34141751] [csclprd3-0-13:29123] [ 6] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x168047)[0x7fed8600b047] [csclprd3-0-13:29119] [ 3] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x55670)[0x7fed85ef8670] [csclprd3-0-13:29119] [ 4] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x3b9)[0x7fed85ef95ab] [csclprd3-0-13:29119] [ 5] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0xfb)[0x7fed85ef9751] [csclprd3-0-13:29119] [ 6] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_btl_sm_add_procs+0x671)[0x7fed860071c9] [csclprd3-0-13:29119] [ 7] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x14a628)[0x7fed85fed628] [csclprd3-0-13:29119] [ 8] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xff)[0x7fed86160d61] [csclprd3-0-13:29119] [ 9] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_btl_sm_add_procs+0x671)[0x7faa3424f1c9] [csclprd3-0-13:29123] [ 7] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x14a628)[0x7faa34235628] [csclprd3-0-13:29123] [ 8] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xff)[0x7faa343a8d61] [csclprd3-0-13:29123] [ 9] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_btl_sm_add_procs+0x671)[0x7fa4a34091c9] [csclprd3-0-13:29125] [ 7] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x14a628)[0x7fa4a33ef628] [csclprd3-0-13:29125] [ 8] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xff)[0x7fa4a3562d61] [csclprd3-0-13:29125] [ 9] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_btl_sm_add_procs+0x671)[0x7f18294b01c9] [csclprd3-0-13:29120] [ 7] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x14a628)[0x7f1829496628] [csclprd3-0-13:29120] [ 8] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_btl_sm_add_procs+0x671)[0x7f5cb8b751c9] [csclprd3-0-13:29121] [ 7] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x14a628)[0x7f5cb8b5b628] [csclprd3-0-13:29121] [ 8] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xff)[0x7f5cb8cced61] [csclprd3-0-13:29121] [ 9] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0xbda)[0x7f5cb8a96747] [csclprd3-0-13:29121] [10] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_btl_sm_add_procs+0x671)[0x7fac7b57e1c9] [csclprd3-0-13:29122] [ 7] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x14a628)[0x7fac7b564628] [csclprd3-0-13:29122] [ 8] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xff)[0x7fac7b6d7d61] [csclprd3-0-13:29122] [ 9] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0xbda)[0x7fac7b49f747] [csclprd3-0-13:29122] [10] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0xbda)[0x7fed85f28747] [csclprd3-0-13:29119] [10] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x185)[0x7fed85f6850b] [csclprd3-0-13:29119] [11] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] [csclprd3-0-13:29119] [12] /lib64/libc.so.6(__libc_start_main+0xfd)[0x7fed85912cdd] [csclprd3-0-13:29119] [13] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] [csclprd3-0-13:29119] *** End of error message *** /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0xbda)[0x7faa34170747] [csclprd3-0-13:29123] [10] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x185)[0x7faa341b050b] [csclprd3-0-13:29123] [11] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] [csclprd3-0-13:29123] [12] /lib64/libc.so.6(__libc_start_main+0xfd)[0x7faa33b5acdd] [csclprd3-0-13:29123] [13] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] [csclprd3-0-13:29123] *** End of error message *** /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0xbda)[0x7fa4a332a747] [csclprd3-0-13:29125] [10] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x185)[0x7fa4a336a50b] [csclprd3-0-13:29125] [11] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] [csclprd3-0-13:29125] [12] /lib64/libc.so.6(__libc_start_main+0xfd)[0x7fa4a2d14cdd] [csclprd3-0-13:29125] [13] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] [csclprd3-0-13:29125] *** End of error message *** /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xff)[0x7f1829609d61] [csclprd3-0-13:29120] [ 9] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0xbda)[0x7f18293d1747] [csclprd3-0-13:29120] [10] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x185)[0x7f182941150b] [csclprd3-0-13:29120] [11] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] [csclprd3-0-13:29120] [12] /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f1828dbbcdd] [csclprd3-0-13:29120] [13] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] [csclprd3-0-13:29120] *** End of error message *** /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x185)[0x7f5cb8ad650b] [csclprd3-0-13:29121] [11] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] [csclprd3-0-13:29121] [12] /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f5cb8480cdd] [csclprd3-0-13:29121] [13] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] [csclprd3-0-13:29121] *** End of error message *** /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x185)[0x7fac7b4df50b] [csclprd3-0-13:29122] [11] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] [csclprd3-0-13:29122] [12] /lib64/libc.so.6(__libc_start_main+0xfd)[0x7fac7ae89cdd] [csclprd3-0-13:29122] [13] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] [csclprd3-0-13:29122] *** End of error message *** -------------------------------------------------------------------------- mpirun noticed that process rank 126 with PID 0 on node csclprd3-0-13 exited on signal 7 (Bus error). -------------------------------------------------------------------------- 2. mpirun -np 132 -report-bindings --prefix /hpc/apps/mpi/openmpi/1.8.6/ --hostfile hostfile-noslots --mca btl_tcp_if_include eth0 --hetero-nodes --bind-to core /hpc/home/lanew/mpi/openmpi/ProcessColors3 -------------------------------------------------------------------------- WARNING: a request was made to bind a process. While the system supports binding the process itself, at least one node does NOT support binding memory to the process location. Node: csclprd3-6-1 This usually is due to not having the required NUMA support installed on the node. In some Linux distributions, the required support is contained in the libnumactl and libnumactl-devel packages. This is a warning only; your job will continue, though performance may be degraded. -------------------------------------------------------------------------- [csclprd3-6-1:24853] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/.][./.] [csclprd3-6-1:24853] MCW rank 1 bound to socket 1[core 2[hwt 0]]: [./.][B/.] [csclprd3-6-1:24853] MCW rank 2 bound to socket 0[core 1[hwt 0]]: [./B][./.] [csclprd3-6-1:24853] MCW rank 3 bound to socket 1[core 3[hwt 0]]: [./.][./B] [csclprd3-6-5:12646] MCW rank 4 bound to socket 0[core 0[hwt 0]]: [B/.][./.] [csclprd3-6-5:12646] MCW rank 5 bound to socket 1[core 2[hwt 0]]: [./.][B/.] [csclprd3-6-5:12646] MCW rank 6 bound to socket 0[core 1[hwt 0]]: [./B][./.] [csclprd3-6-5:12646] MCW rank 7 bound to socket 1[core 3[hwt 0]]: [./.][./B] [csclprd3-0-1:14499] MCW rank 24 bound to socket 0[core 4[hwt 0]]: [././././B/.] [csclprd3-0-1:14499] MCW rank 25 bound to socket 0[core 5[hwt 0]]: [./././././B] [csclprd3-0-1:14499] MCW rank 20 bound to socket 0[core 0[hwt 0]]: [B/././././.] [csclprd3-0-5:16978] MCW rank 44 bound to socket 0[core 0[hwt 0]]: [B/././././.] [csclprd3-0-5:16978] MCW rank 45 bound to socket 0[core 1[hwt 0]]: [./B/./././.] [csclprd3-0-1:14499] MCW rank 21 bound to socket 0[core 1[hwt 0]]: [./B/./././.] [csclprd3-0-5:16978] MCW rank 46 bound to socket 0[core 2[hwt 0]]: [././B/././.] [csclprd3-0-1:14499] MCW rank 22 bound to socket 0[core 2[hwt 0]]: [././B/././.] [csclprd3-0-1:14499] MCW rank 23 bound to socket 0[core 3[hwt 0]]: [./././B/./.] [csclprd3-0-5:16978] MCW rank 47 bound to socket 0[core 3[hwt 0]]: [./././B/./.] [csclprd3-0-5:16978] MCW rank 48 bound to socket 0[core 4[hwt 0]]: [././././B/.] [csclprd3-0-5:16978] MCW rank 49 bound to socket 0[core 5[hwt 0]]: [./././././B] [csclprd3-0-6:30547] MCW rank 51 bound to socket 0[core 1[hwt 0]]: [./B/./././.] [csclprd3-0-2:06006] MCW rank 30 bound to socket 0[core 4[hwt 0]]: [././././B/.] [csclprd3-0-6:30547] MCW rank 52 bound to socket 0[core 2[hwt 0]]: [././B/././.] [csclprd3-0-2:06006] MCW rank 31 bound to socket 0[core 5[hwt 0]]: [./././././B] [csclprd3-0-6:30547] MCW rank 53 bound to socket 0[core 3[hwt 0]]: [./././B/./.] [csclprd3-0-2:06006] MCW rank 26 bound to socket 0[core 0[hwt 0]]: [B/././././.] [csclprd3-0-6:30547] MCW rank 54 bound to socket 0[core 4[hwt 0]]: [././././B/.] [csclprd3-0-2:06006] MCW rank 27 bound to socket 0[core 1[hwt 0]]: [./B/./././.] [csclprd3-0-2:06006] MCW rank 28 bound to socket 0[core 2[hwt 0]]: [././B/././.] [csclprd3-0-6:30547] MCW rank 55 bound to socket 0[core 5[hwt 0]]: [./././././B] [csclprd3-0-3:14008] MCW rank 34 bound to socket 0[core 2[hwt 0]]: [././B/././.] [csclprd3-0-6:30547] MCW rank 50 bound to socket 0[core 0[hwt 0]]: [B/././././.] [csclprd3-0-3:14008] MCW rank 35 bound to socket 0[core 3[hwt 0]]: [./././B/./.] [csclprd3-0-3:14008] MCW rank 36 bound to socket 0[core 4[hwt 0]]: [././././B/.] [csclprd3-0-3:14008] MCW rank 37 bound to socket 0[core 5[hwt 0]]: [./././././B] [csclprd3-0-3:14008] MCW rank 32 bound to socket 0[core 0[hwt 0]]: [B/././././.] [csclprd3-0-3:14008] MCW rank 33 bound to socket 0[core 1[hwt 0]]: [./B/./././.] [csclprd3-0-2:06006] MCW rank 29 bound to socket 0[core 3[hwt 0]]: [./././B/./.] [csclprd3-0-12:12559] MCW rank 120 bound to socket 0[core 0[hwt 0-1]]: [BB/../../../../..] [csclprd3-0-12:12559] MCW rank 121 bound to socket 0[core 1[hwt 0-1]]: [../BB/../../../..] [csclprd3-0-12:12559] MCW rank 122 bound to socket 0[core 2[hwt 0-1]]: [../../BB/../../..] [csclprd3-0-12:12559] MCW rank 123 bound to socket 0[core 3[hwt 0-1]]: [../../../BB/../..] [csclprd3-0-12:12559] MCW rank 124 bound to socket 0[core 4[hwt 0-1]]: [../../../../BB/..] [csclprd3-0-12:12559] MCW rank 125 bound to socket 0[core 5[hwt 0-1]]: [../../../../../BB] [csclprd3-0-0:31325] MCW rank 8 bound to socket 0[core 0[hwt 0]]: [B/././././.][./././././.] [csclprd3-0-0:31325] MCW rank 9 bound to socket 1[core 6[hwt 0]]: [./././././.][B/././././.] [csclprd3-0-0:31325] MCW rank 10 bound to socket 0[core 1[hwt 0]]: [./B/./././.][./././././.] [csclprd3-0-7:20792] MCW rank 68 bound to socket 0[core 6[hwt 0-1]]: [../../../../../../BB/..][../../../../../../../..] [csclprd3-0-7:20792] MCW rank 69 bound to socket 1[core 14[hwt 0-1]]: [../../../../../../../..][../../../../../../BB/..] [csclprd3-0-0:31325] MCW rank 11 bound to socket 1[core 7[hwt 0]]: [./././././.][./B/./././.] [csclprd3-0-10:19372] MCW rank 100 bound to socket 0[core 6[hwt 0-1]]: [../../../../../../BB/..][../../../../../../../..] [csclprd3-0-10:19372] MCW rank 101 bound to socket 1[core 14[hwt 0-1]]: [../../../../../../../..][../../../../../../BB/..] [csclprd3-0-11:31905] MCW rank 116 bound to socket 0[core 6[hwt 0-1]]: [../../../../../../BB/..][../../../../../../../..] [csclprd3-0-11:31905] MCW rank 117 bound to socket 1[core 14[hwt 0-1]]: [../../../../../../../..][../../../../../../BB/..] [csclprd3-0-7:20792] MCW rank 70 bound to socket 0[core 7[hwt 0-1]]: [../../../../../../../BB][../../../../../../../..] [csclprd3-0-10:19372] MCW rank 102 bound to socket 0[core 7[hwt 0-1]]: [../../../../../../../BB][../../../../../../../..] [csclprd3-0-11:31905] MCW rank 118 bound to socket 0[core 7[hwt 0-1]]: [../../../../../../../BB][../../../../../../../..] [csclprd3-0-7:20792] MCW rank 71 bound to socket 1[core 15[hwt 0-1]]: [../../../../../../../..][../../../../../../../BB] [csclprd3-0-10:19372] MCW rank 103 bound to socket 1[core 15[hwt 0-1]]: [../../../../../../../..][../../../../../../../BB] [csclprd3-0-0:31325] MCW rank 12 bound to socket 0[core 2[hwt 0]]: [././B/././.][./././././.] [csclprd3-0-11:31905] MCW rank 119 bound to socket 1[core 15[hwt 0-1]]: [../../../../../../../..][../../../../../../../BB] [csclprd3-0-0:31325] MCW rank 13 bound to socket 1[core 8[hwt 0]]: [./././././.][././B/././.] [csclprd3-0-7:20792] MCW rank 56 bound to socket 0[core 0[hwt 0-1]]: [BB/../../../../../../..][../../../../../../../..] [csclprd3-0-10:19372] MCW rank 88 bound to socket 0[core 0[hwt 0-1]]: [BB/../../../../../../..][../../../../../../../..] [csclprd3-0-11:31905] MCW rank 104 bound to socket 0[core 0[hwt 0-1]]: [BB/../../../../../../..][../../../../../../../..] [csclprd3-0-10:19372] MCW rank 89 bound to socket 1[core 8[hwt 0-1]]: [../../../../../../../..][BB/../../../../../../..] [csclprd3-0-7:20792] MCW rank 57 bound to socket 1[core 8[hwt 0-1]]: [../../../../../../../..][BB/../../../../../../..] [csclprd3-0-10:19372] MCW rank 90 bound to socket 0[core 1[hwt 0-1]]: [../BB/../../../../../..][../../../../../../../..] [csclprd3-0-11:31905] MCW rank 105 bound to socket 1[core 8[hwt 0-1]]: [../../../../../../../..][BB/../../../../../../..] [csclprd3-0-0:31325] MCW rank 14 bound to socket 0[core 3[hwt 0]]: [./././B/./.][./././././.] [csclprd3-0-7:20792] MCW rank 58 bound to socket 0[core 1[hwt 0-1]]: [../BB/../../../../../..][../../../../../../../..] [csclprd3-0-10:19372] MCW rank 91 bound to socket 1[core 9[hwt 0-1]]: [../../../../../../../..][../BB/../../../../../..] [csclprd3-0-0:31325] MCW rank 15 bound to socket 1[core 9[hwt 0]]: [./././././.][./././B/./.] [csclprd3-0-7:20792] MCW rank 59 bound to socket 1[core 9[hwt 0-1]]: [../../../../../../../..][../BB/../../../../../..] [csclprd3-0-10:19372] MCW rank 92 bound to socket 0[core 2[hwt 0-1]]: [../../BB/../../../../..][../../../../../../../..] [csclprd3-0-0:31325] MCW rank 16 bound to socket 0[core 4[hwt 0]]: [././././B/.][./././././.] [csclprd3-0-11:31905] MCW rank 106 bound to socket 0[core 1[hwt 0-1]]: [../BB/../../../../../..][../../../../../../../..] [csclprd3-0-0:31325] MCW rank 17 bound to socket 1[core 10[hwt 0]]: [./././././.][././././B/.] [csclprd3-0-7:20792] MCW rank 60 bound to socket 0[core 2[hwt 0-1]]: [../../BB/../../../../..][../../../../../../../..] [csclprd3-0-10:19372] MCW rank 93 bound to socket 1[core 10[hwt 0-1]]: [../../../../../../../..][../../BB/../../../../..] [csclprd3-0-0:31325] MCW rank 18 bound to socket 0[core 5[hwt 0]]: [./././././B][./././././.] [csclprd3-0-11:31905] MCW rank 107 bound to socket 1[core 9[hwt 0-1]]: [../../../../../../../..][../BB/../../../../../..] [csclprd3-0-7:20792] MCW rank 61 bound to socket 1[core 10[hwt 0-1]]: [../../../../../../../..][../../BB/../../../../..] [csclprd3-0-10:19372] MCW rank 94 bound to socket 0[core 3[hwt 0-1]]: [../../../BB/../../../..][../../../../../../../..] [csclprd3-0-11:31905] MCW rank 108 bound to socket 0[core 2[hwt 0-1]]: [../../BB/../../../../..][../../../../../../../..] [csclprd3-0-7:20792] MCW rank 62 bound to socket 0[core 3[hwt 0-1]]: [../../../BB/../../../..][../../../../../../../..] [csclprd3-0-11:31905] MCW rank 109 bound to socket 1[core 10[hwt 0-1]]: [../../../../../../../..][../../BB/../../../../..] [csclprd3-0-7:20792] MCW rank 63 bound to socket 1[core 11[hwt 0-1]]: [../../../../../../../..][../../../BB/../../../..] [csclprd3-0-10:19372] MCW rank 95 bound to socket 1[core 11[hwt 0-1]]: [../../../../../../../..][../../../BB/../../../..] [csclprd3-0-11:31905] MCW rank 110 bound to socket 0[core 3[hwt 0-1]]: [../../../BB/../../../..][../../../../../../../..] [csclprd3-0-7:20792] MCW rank 64 bound to socket 0[core 4[hwt 0-1]]: [../../../../BB/../../..][../../../../../../../..] [csclprd3-0-10:19372] MCW rank 96 bound to socket 0[core 4[hwt 0-1]]: [../../../../BB/../../..][../../../../../../../..] [csclprd3-0-11:31905] MCW rank 111 bound to socket 1[core 11[hwt 0-1]]: [../../../../../../../..][../../../BB/../../../..] [csclprd3-0-0:31325] MCW rank 19 bound to socket 1[core 11[hwt 0]]: [./././././.][./././././B] [csclprd3-0-4:30528] MCW rank 42 bound to socket 0[core 4[hwt 0]]: [././././B/.] [csclprd3-0-4:30528] MCW rank 43 bound to socket 0[core 5[hwt 0]]: [./././././B] [csclprd3-0-4:30528] MCW rank 38 bound to socket 0[core 0[hwt 0]]: [B/././././.] [csclprd3-0-4:30528] MCW rank 39 bound to socket 0[core 1[hwt 0]]: [./B/./././.] [csclprd3-0-4:30528] MCW rank 40 bound to socket 0[core 2[hwt 0]]: [././B/././.] [csclprd3-0-4:30528] MCW rank 41 bound to socket 0[core 3[hwt 0]]: [./././B/./.] [csclprd3-0-13:29240] MCW rank 127 bound to socket 1[core 6[hwt 0-1]]: [../../../../../..][BB/../../../../..] [csclprd3-0-8:15818] MCW rank 76 bound to socket 0[core 2[hwt 0-1]]: [../../BB/../../../../..][../../../../../../../..] [csclprd3-0-13:29240] MCW rank 128 bound to socket 0[core 1[hwt 0-1]]: [../BB/../../../..][../../../../../..] [csclprd3-0-8:15818] MCW rank 77 bound to socket 1[core 10[hwt 0-1]]: [../../../../../../../..][../../BB/../../../../..] [csclprd3-0-13:29240] MCW rank 129 bound to socket 1[core 7[hwt 0-1]]: [../../../../../..][../BB/../../../..] [csclprd3-0-8:15818] MCW rank 78 bound to socket 0[core 3[hwt 0-1]]: [../../../BB/../../../..][../../../../../../../..] [csclprd3-0-13:29240] MCW rank 130 bound to socket 0[core 2[hwt 0-1]]: [../../BB/../../..][../../../../../..] [csclprd3-0-8:15818] MCW rank 79 bound to socket 1[core 11[hwt 0-1]]: [../../../../../../../..][../../../BB/../../../..] [csclprd3-0-13:29240] MCW rank 131 bound to socket 1[core 8[hwt 0-1]]: [../../../../../..][../../BB/../../..] [csclprd3-0-8:15818] MCW rank 80 bound to socket 0[core 4[hwt 0-1]]: [../../../../BB/../../..][../../../../../../../..] [csclprd3-0-13:29240] MCW rank 126 bound to socket 0[core 0[hwt 0-1]]: [BB/../../../../..][../../../../../..] [csclprd3-0-8:15818] MCW rank 81 bound to socket 1[core 12[hwt 0-1]]: [../../../../../../../..][../../../../BB/../../..] [csclprd3-0-8:15818] MCW rank 82 bound to socket 0[core 5[hwt 0-1]]: [../../../../../BB/../..][../../../../../../../..] [csclprd3-0-8:15818] MCW rank 83 bound to socket 1[core 13[hwt 0-1]]: [../../../../../../../..][../../../../../BB/../..] [csclprd3-0-8:15818] MCW rank 84 bound to socket 0[core 6[hwt 0-1]]: [../../../../../../BB/..][../../../../../../../..] [csclprd3-0-8:15818] MCW rank 85 bound to socket 1[core 14[hwt 0-1]]: [../../../../../../../..][../../../../../../BB/..] [csclprd3-0-8:15818] MCW rank 86 bound to socket 0[core 7[hwt 0-1]]: [../../../../../../../BB][../../../../../../../..] [csclprd3-0-8:15818] MCW rank 87 bound to socket 1[core 15[hwt 0-1]]: [../../../../../../../..][../../../../../../../BB] [csclprd3-0-8:15818] MCW rank 72 bound to socket 0[core 0[hwt 0-1]]: [BB/../../../../../../..][../../../../../../../..] [csclprd3-0-10:19372] MCW rank 97 bound to socket 1[core 12[hwt 0-1]]: [../../../../../../../..][../../../../BB/../../..] [csclprd3-0-11:31905] MCW rank 112 bound to socket 0[core 4[hwt 0-1]]: [../../../../BB/../../..][../../../../../../../..] [csclprd3-0-7:20792] MCW rank 65 bound to socket 1[core 12[hwt 0-1]]: [../../../../../../../..][../../../../BB/../../..] [csclprd3-0-8:15818] MCW rank 73 bound to socket 1[core 8[hwt 0-1]]: [../../../../../../../..][BB/../../../../../../..] [csclprd3-0-10:19372] MCW rank 98 bound to socket 0[core 5[hwt 0-1]]: [../../../../../BB/../..][../../../../../../../..] [csclprd3-0-11:31905] MCW rank 113 bound to socket 1[core 12[hwt 0-1]]: [../../../../../../../..][../../../../BB/../../..] [csclprd3-0-8:15818] MCW rank 74 bound to socket 0[core 1[hwt 0-1]]: [../BB/../../../../../..][../../../../../../../..] [csclprd3-0-7:20792] MCW rank 66 bound to socket 0[core 5[hwt 0-1]]: [../../../../../BB/../..][../../../../../../../..] [csclprd3-0-10:19372] MCW rank 99 bound to socket 1[core 13[hwt 0-1]]: [../../../../../../../..][../../../../../BB/../..] [csclprd3-0-11:31905] MCW rank 114 bound to socket 0[core 5[hwt 0-1]]: [../../../../../BB/../..][../../../../../../../..] [csclprd3-0-11:31905] MCW rank 115 bound to socket 1[core 13[hwt 0-1]]: [../../../../../../../..][../../../../../BB/../..] [csclprd3-0-8:15818] MCW rank 75 bound to socket 1[core 9[hwt 0-1]]: [../../../../../../../..][../BB/../../../../../..] [csclprd3-0-7:20792] MCW rank 67 bound to socket 1[core 13[hwt 0-1]]: [../../../../../../../..][../../../../../BB/../..] [csclprd3-0-13:29244] *** Process received signal *** [csclprd3-0-13:29244] Signal: Bus error (7) [csclprd3-0-13:29244] Signal code: Non-existant physical address (2) [csclprd3-0-13:29244] Failing at address: 0x7f67c02a7980 [csclprd3-0-13:29245] *** Process received signal *** [csclprd3-0-13:29245] Signal: Bus error (7) [csclprd3-0-13:29245] Signal code: Non-existant physical address (2) [csclprd3-0-13:29245] Failing at address: 0x7f6390225900 [csclprd3-0-13:29247] *** Process received signal *** [csclprd3-0-13:29247] Signal: Bus error (7) [csclprd3-0-13:29247] Signal code: Non-existant physical address (2) [csclprd3-0-13:29247] Failing at address: 0x7ff4842e8980 [csclprd3-0-13:29241] *** Process received signal *** [csclprd3-0-13:29241] Signal: Bus error (7) [csclprd3-0-13:29241] Signal code: Non-existant physical address (2) [csclprd3-0-13:29241] Failing at address: 0x7fbd7c36ba80 [csclprd3-0-13:29242] *** Process received signal *** [csclprd3-0-13:29242] Signal: Bus error (7) [csclprd3-0-13:29242] Signal code: Non-existant physical address (2) [csclprd3-0-13:29242] Failing at address: 0x7f6773728a80 [csclprd3-0-13:29243] *** Process received signal *** [csclprd3-0-13:29243] Signal: Bus error (7) [csclprd3-0-13:29243] Signal code: Non-existant physical address (2) [csclprd3-0-13:29243] Failing at address: 0x7fbd7ea60980 [csclprd3-0-13:29244] [ 0] /lib64/libpthread.so.0(+0xf500)[0x7f67cfa7b500] [csclprd3-0-13:29244] [ 1] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x167f61)[0x7f67cfdf0f61] [csclprd3-0-13:29244] [ 2] [csclprd3-0-13:29245] [ 0] /lib64/libpthread.so.0(+0xf500)[0x7f639fac4500] [csclprd3-0-13:29245] [ 1] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x167f61)[0x7f639fe39f61] [csclprd3-0-13:29245] [ 2] [csclprd3-0-13:29247] [ 0] /lib64/libpthread.so.0(+0xf500)[0x7ff493ea8500] [csclprd3-0-13:29247] [ 1] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x167f61)[0x7ff49421df61] [csclprd3-0-13:29247] [ 2] [csclprd3-0-13:29243] [ 0] /lib64/libpthread.so.0(+0xf500)[0x7fbd8e1b0500] [csclprd3-0-13:29243] [ 1] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x167f61)[0x7fbd8e525f61] [csclprd3-0-13:29243] [ 2] [csclprd3-0-13:29241] [ 0] /lib64/libpthread.so.0(+0xf500)[0x7fbd8cd79500] [csclprd3-0-13:29241] [ 1] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x167f61)[0x7fbd8d0eef61] [csclprd3-0-13:29241] [ 2] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x168047)[0x7fbd8d0ef047] [csclprd3-0-13:29241] [ 3] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x55670)[0x7fbd8cfdc670] [csclprd3-0-13:29241] [ 4] [csclprd3-0-13:29242] [ 0] /lib64/libpthread.so.0(+0xf500)[0x7f6782cd0500] [csclprd3-0-13:29242] [ 1] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x167f61)[0x7f6783045f61] [csclprd3-0-13:29242] [ 2] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x168047)[0x7f6783046047] [csclprd3-0-13:29242] [ 3] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x168047)[0x7fbd8e526047] [csclprd3-0-13:29243] [ 3] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x55670)[0x7fbd8e413670] [csclprd3-0-13:29243] [ 4] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x3b9)[0x7fbd8e4145ab] [csclprd3-0-13:29243] [ 5] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0xfb)[0x7fbd8e414751] [csclprd3-0-13:29243] [ 6] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_btl_sm_add_procs+0x671)[0x7fbd8e5221c9] [csclprd3-0-13:29243] [ 7] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x14a628)[0x7fbd8e508628] [csclprd3-0-13:29243] [ 8] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x3b9)[0x7fbd8cfdd5ab] [csclprd3-0-13:29241] [ 5] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0xfb)[0x7fbd8cfdd751] [csclprd3-0-13:29241] [ 6] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_btl_sm_add_procs+0x671)[0x7fbd8d0eb1c9] [csclprd3-0-13:29241] [ 7] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x14a628)[0x7fbd8d0d1628] [csclprd3-0-13:29241] [ 8] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xff)[0x7fbd8d244d61] [csclprd3-0-13:29241] [ 9] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x168047)[0x7ff49421e047] [csclprd3-0-13:29247] [ 3] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x55670)[0x7ff49410b670] [csclprd3-0-13:29247] [ 4] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x3b9)[0x7ff49410c5ab] [csclprd3-0-13:29247] [ 5] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0xfb)[0x7ff49410c751] [csclprd3-0-13:29247] [ 6] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_btl_sm_add_procs+0x671)[0x7ff49421a1c9] [csclprd3-0-13:29247] [ 7] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x14a628)[0x7ff494200628] [csclprd3-0-13:29247] [ 8] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xff)[0x7ff494373d61] [csclprd3-0-13:29247] [ 9] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x168047)[0x7f67cfdf1047] [csclprd3-0-13:29244] [ 3] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x55670)[0x7f67cfcde670] [csclprd3-0-13:29244] [ 4] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x3b9)[0x7f67cfcdf5ab] [csclprd3-0-13:29244] [ 5] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0xfb)[0x7f67cfcdf751] [csclprd3-0-13:29244] [ 6] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_btl_sm_add_procs+0x671)[0x7f67cfded1c9] [csclprd3-0-13:29244] [ 7] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x14a628)[0x7f67cfdd3628] [csclprd3-0-13:29244] [ 8] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xff)[0x7f67cff46d61] [csclprd3-0-13:29244] [ 9] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x168047)[0x7f639fe3a047] [csclprd3-0-13:29245] [ 3] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x55670)[0x7f639fd27670] [csclprd3-0-13:29245] [ 4] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x3b9)[0x7f639fd285ab] [csclprd3-0-13:29245] [ 5] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0xfb)[0x7f639fd28751] [csclprd3-0-13:29245] [ 6] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_btl_sm_add_procs+0x671)[0x7f639fe361c9] [csclprd3-0-13:29245] [ 7] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x14a628)[0x7f639fe1c628] [csclprd3-0-13:29245] [ 8] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xff)[0x7f639ff8fd61] [csclprd3-0-13:29245] [ 9] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x55670)[0x7f6782f33670] [csclprd3-0-13:29242] [ 4] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x3b9)[0x7f6782f345ab] [csclprd3-0-13:29242] [ 5] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0xfb)[0x7f6782f34751] [csclprd3-0-13:29242] [ 6] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_btl_sm_add_procs+0x671)[0x7f67830421c9] [csclprd3-0-13:29242] [ 7] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x14a628)[0x7f6783028628] [csclprd3-0-13:29242] [ 8] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0xbda)[0x7fbd8d00c747] [csclprd3-0-13:29241] [10] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x185)[0x7fbd8d04c50b] [csclprd3-0-13:29241] [11] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] [csclprd3-0-13:29241] [12] /lib64/libc.so.6(__libc_start_main+0xfd)[0x7fbd8c9f6cdd] [csclprd3-0-13:29241] [13] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] [csclprd3-0-13:29241] *** End of error message *** /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0xbda)[0x7ff49413b747] [csclprd3-0-13:29247] [10] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x185)[0x7ff49417b50b] [csclprd3-0-13:29247] [11] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] [csclprd3-0-13:29247] [12] /lib64/libc.so.6(__libc_start_main+0xfd)[0x7ff493b25cdd] [csclprd3-0-13:29247] [13] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] [csclprd3-0-13:29247] *** End of error message *** /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0xbda)[0x7f67cfd0e747] [csclprd3-0-13:29244] [10] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x185)[0x7f67cfd4e50b] [csclprd3-0-13:29244] [11] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] [csclprd3-0-13:29244] [12] /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f67cf6f8cdd] [csclprd3-0-13:29244] [13] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] [csclprd3-0-13:29244] *** End of error message *** /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xff)[0x7f678319bd61] [csclprd3-0-13:29242] [ 9] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0xbda)[0x7f6782f63747] [csclprd3-0-13:29242] [10] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x185)[0x7f6782fa350b] [csclprd3-0-13:29242] [11] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] [csclprd3-0-13:29242] [12] /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f678294dcdd] [csclprd3-0-13:29242] [13] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] [csclprd3-0-13:29242] *** End of error message *** /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0xbda)[0x7f639fd57747] [csclprd3-0-13:29245] [10] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x185)[0x7f639fd9750b] [csclprd3-0-13:29245] [11] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] [csclprd3-0-13:29245] [12] /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f639f741cdd] [csclprd3-0-13:29245] [13] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] [csclprd3-0-13:29245] *** End of error message *** /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xff)[0x7fbd8e67bd61] [csclprd3-0-13:29243] [ 9] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0xbda)[0x7fbd8e443747] [csclprd3-0-13:29243] [10] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x185)[0x7fbd8e48350b] [csclprd3-0-13:29243] [11] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] [csclprd3-0-13:29243] [12] /lib64/libc.so.6(__libc_start_main+0xfd)[0x7fbd8de2dcdd] [csclprd3-0-13:29243] [13] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] [csclprd3-0-13:29243] *** End of error message *** -------------------------------------------------------------------------- mpirun noticed that process rank 126 with PID 0 on node csclprd3-0-13 exited on signal 7 (Bus error). -------------------------------------------------------------------------- [lanew@csclprd3s1 openmpi]$ ________________________________ From: users [users-boun...@open-mpi.org] on behalf of Ralph Castain [r...@open-mpi.org] Sent: Tuesday, June 23, 2015 2:54 PM To: Open MPI Users Subject: Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash You shouldn't need any special flags for mpicc or mpirun to replicate the problem. This will just let us see the line numbers associated with the crash so we can narrow down the problem. Once we get that, we may need to rerun with specific params to narrow it down further. BTW: when you get the backtrace, could you check the other threads as well? There are several threads running underneath now, and it would help to get the backtrace for each of them just to ensure there isn't something funny going on. Thanks Ralph On Tue, Jun 23, 2015 at 12:19 PM, Lane, William <william.l...@cshs.org<mailto:william.l...@cshs.org>> wrote: Ralph, I've had OpenMPI 1.8.6 installed on our cluster w/the --enable-debug option. Here's what I think are the relevant flags returned from ompi_info: openMPI 1.8.6 build info Fort MPI_SIZEOF: no C profiling: yes C++ profiling: yes Fort mpif.h profiling: yes Fort use mpi profiling: yes Fort use mpi_f08 prof: no C++ exceptions: no Thread support: posix (MPI_THREAD_MULTIPLE: no, OPAL support: yes, OMPI progress: no, ORTE progress: yes, Event lib: yes) Sparse Groups: no Internal debug support: yes MPI interface warnings: yes MPI parameter check: runtime Memory profiling support: no Memory debugging support: no dl support: yes Heterogeneous support: no mpirun default --prefix: no Do I need to compile my OpenMPI C test code w/any special switches passed to mpicc? Are there any special switches should I use with mpirun to run my job? Thanks for your help w/these issues. -Bill L. ________________________________ From: users [users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>] on behalf of Ralph Castain [r...@open-mpi.org<mailto:r...@open-mpi.org>] Sent: Friday, June 19, 2015 6:40 AM To: Open MPI Users Subject: Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash Good point William: can you rebuild OMPI with -enable-debug and run this again so we can see where the code is breaking? Thanks Ralph On Jun 19, 2015, at 6:11 AM, Gilles Gouaillardet <gilles.gouaillar...@gmail.com<mailto:gilles.gouaillar...@gmail.com>> wrote: Ralph, I got that, but I cannot read the stack trace (optimized build) my best bet is to reproduce the issue, and then find how and why ompi_free_list_t is segfault'ing. that's why I requested info about the environment iirc, ompi_free_list_t are different between master and v1.8, so an incorrect back port could be the root cause. Cheers, Gilles On Friday, June 19, 2015, Ralph Castain <r...@open-mpi.org<mailto:r...@open-mpi.org>> wrote: Gilles I was fooled too, but that isn't the issue. The problem is that ompi_free_list is segfaulting: [csclprd3-0-13:30901] *** Process received signal *** [csclprd3-0-13:30901] Signal: Bus error (7) [csclprd3-0-13:30901] Signal code: Non-existant physical address (2) [csclprd3-0-13:30901] Failing at address: 0x7ff404351d80 [csclprd3-0-13:30901] [ 0] /lib64/libpthread.so.0(+0xf500)[0x7ff41453c500] [csclprd3-0-13:30901] [ 1] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0xd4fea)[0x7ff41481efea] [csclprd3-0-13:30901] [ 2] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x219)[0x7ff41479f009] [csclprd3-0-13:30901] [ 3] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0x40)[0x7ff41479f110] [csclprd3-0-13:30901] [ 4] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0xc568e)[0x7ff41480f68e] [csclprd3-0-13:30901] [ 5] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xd5)[0x7ff4148e3715] [csclprd3-0-13:30901] [ 6] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0x8d6)[0x7ff4147b9ad6] [csclprd3-0-13:30901] [ 7] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x170)[0x7ff4147d8c60] [csclprd3-0-13:30901] [ 8] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] [csclprd3-0-13:30901] [ 9] /lib64/libc.so.6(__libc_start_main+0xfd)[0x7ff4141b9cdd] [csclprd3-0-13:30901] [10] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] [csclprd3-0-13:30901] *** End of error message *** On Jun 19, 2015, at 5:52 AM, Gilles Gouaillardet <gilles.gouaillar...@gmail.com<http://UrlBlockedError.aspx>> wrote: Lane, could you please describe your configuration ? how many sockets per node ? how many cores per socket ? how many threads per core ? what is the minimum number of nodes needed to reproduce the issue ? do all the nodes have the same configuration ? if yes, what happens without --hetero-nodes ? Cheers, Gilles On Friday, June 19, 2015, Lane, William <william.l...@cshs.org<http://UrlBlockedError.aspx>> wrote: Ralph, I created a hostfile that just has the names of the hosts while specifying no slot information whatsoever (e.g. csclprd3-0-0) and received the following errors: mpirun -np 132 -report-bindings --prefix /hpc/apps/mpi/openmpi/1.8.6/ --hostfile hostfile-noslots --mca btl_tcp_if_include eth0 --hetero-nodes /hpc/home/lanew/mpi/openmpi/ProcessColors3 [csclprd3-6-5:14770] MCW rank 4 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]]: [B/B][./.] [csclprd3-6-5:14770] MCW rank 5 bound to socket 1[core 2[hwt 0]], socket 1[core 3[hwt 0]]: [./.][B/B] [csclprd3-6-5:14770] MCW rank 6 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]]: [B/B][./.] [csclprd3-6-5:14770] MCW rank 7 bound to socket 1[core 2[hwt 0]], socket 1[core 3[hwt 0]]: [./.][B/B] [csclprd3-0-1:16437] MCW rank 24 is not bound (or bound to all available processors) [csclprd3-0-5:18925] MCW rank 48 is not bound (or bound to all available processors) [csclprd3-0-1:16437] MCW rank 25 is not bound (or bound to all available processors) [csclprd3-0-5:18925] MCW rank 49 is not bound (or bound to all available processors) [csclprd3-0-1:16437] MCW rank 20 is not bound (or bound to all available processors) [csclprd3-0-1:16437] MCW rank 21 is not bound (or bound to all available processors) [csclprd3-0-5:18925] MCW rank 44 is not bound (or bound to all available processors) [csclprd3-0-5:18925] MCW rank 45 is not bound (or bound to all available processors) [csclprd3-0-1:16437] MCW rank 22 is not bound (or bound to all available processors) [csclprd3-0-1:16437] MCW rank 23 is not bound (or bound to all available processors) [csclprd3-0-5:18925] MCW rank 46 is not bound (or bound to all available processors) [csclprd3-0-5:18925] MCW rank 47 is not bound (or bound to all available processors) [csclprd3-0-3:15946] MCW rank 36 is not bound (or bound to all available processors) [csclprd3-0-3:15946] MCW rank 37 is not bound (or bound to all available processors) [csclprd3-0-3:15946] MCW rank 32 is not bound (or bound to all available processors) [csclprd3-0-3:15946] MCW rank 33 is not bound (or bound to all available processors) [csclprd3-0-3:15946] MCW rank 34 is not bound (or bound to all available processors) [csclprd3-0-3:15946] MCW rank 35 is not bound (or bound to all available processors) [csclprd3-0-12:09165] MCW rank 124 is not bound (or bound to all available processors) [csclprd3-0-12:09165] MCW rank 125 is not bound (or bound to all available processors) [csclprd3-0-12:09165] MCW rank 120 is not bound (or bound to all available processors) [csclprd3-0-12:09165] MCW rank 121 is not bound (or bound to all available processors) [csclprd3-0-12:09165] MCW rank 122 is not bound (or bound to all available processors) [csclprd3-0-12:09165] MCW rank 123 is not bound (or bound to all available processors) [csclprd3-6-1:27030] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]]: [B/B][./.] [csclprd3-6-1:27030] MCW rank 1 bound to socket 1[core 2[hwt 0]], socket 1[core 3[hwt 0]]: [./.][B/B] [csclprd3-6-1:27030] MCW rank 2 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]]: [B/B][./.] [csclprd3-6-1:27030] MCW rank 3 bound to socket 1[core 2[hwt 0]], socket 1[core 3[hwt 0]]: [./.][B/B] [csclprd3-0-2:07944] MCW rank 30 is not bound (or bound to all available processors) [csclprd3-0-6:32510] MCW rank 54 is not bound (or bound to all available processors) [csclprd3-0-2:07944] MCW rank 31 is not bound (or bound to all available processors) [csclprd3-0-6:32510] MCW rank 55 is not bound (or bound to all available processors) [csclprd3-0-2:07944] MCW rank 26 is not bound (or bound to all available processors) [csclprd3-0-6:32510] MCW rank 50 is not bound (or bound to all available processors) [csclprd3-0-6:32510] MCW rank 51 is not bound (or bound to all available processors) [csclprd3-0-2:07944] MCW rank 27 is not bound (or bound to all available processors) [csclprd3-0-2:07944] MCW rank 28 is not bound (or bound to all available processors) [csclprd3-0-6:32510] MCW rank 52 is not bound (or bound to all available processors) [csclprd3-0-6:32510] MCW rank 53 is not bound (or bound to all available processors) [csclprd3-0-2:07944] MCW rank 29 is not bound (or bound to all available processors) [csclprd3-0-0:00453] MCW rank 11 bound to socket 1[core 6[hwt 0]], socket 1[core 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], socket1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: [./././././.][B/B/B/B/B/B] [csclprd3-0-0:00453] MCW rank 12 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [B/B/B/B/B/B][./././././.] [csclprd3-0-0:00453] MCW rank 13 bound to socket 1[core 6[hwt 0]], socket 1[core 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], socket 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: [./././././.][B/B/B/B/B/B] [csclprd3-0-0:00453] MCW rank 14 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [B/B/B/B/B/B][./././././.] [csclprd3-0-0:00453] MCW rank 15 bound to socket 1[core 6[hwt 0]], socket 1[core 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], socket 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: [./././././.][B/B/B/B/B/B] [csclprd3-0-0:00453] MCW rank 16 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [B/B/B/B/B/B][./././././.] [csclprd3-0-7:22146] MCW rank 64 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-7:22146] MCW rank 65 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-0:00453] MCW rank 17 bound to socket 1[core 6[hwt 0]], socket 1[core 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], socket 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: [./././././.][B/B/B/B/B/B] [csclprd3-0-0:00453] MCW rank 18 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [B/B/B/B/B/B][./././././.] [csclprd3-0-11:00885] MCW rank 116 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-11:00885] MCW rank 117 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]],socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-10:20752] MCW rank 100 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-10:20752] MCW rank 101 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-0:00453] MCW rank 19 bound to socket 1[core 6[hwt 0]], socket 1[core 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], socket 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: [./././././.][B/B/B/B/B/B] [csclprd3-0-7:22146] MCW rank 66 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-11:00885] MCW rank 118 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-0:00453] MCW rank 8 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [B/B/B/B/B/B][./././././.] [csclprd3-0-10:20752] MCW rank 102 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-0:00453] MCW rank 9 bound to socket 1[core 6[hwt 0]], socket 1[core 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], socket 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: [./././././.][B/B/B/B/B/B] [csclprd3-0-0:00453] MCW rank 10 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [B/B/B/B/B/B][./././././.] [csclprd3-0-4:32449] MCW rank 42 is not bound (or bound to all available processors) [csclprd3-0-4:32449] MCW rank 43 is not bound (or bound to all available processors) [csclprd3-0-4:32449] MCW rank 38 is not bound (or bound to all available processors) [csclprd3-0-4:32449] MCW rank 39 is not bound (or bound to all available processors) [csclprd3-0-4:32449] MCW rank 40 is not bound (or bound to all available processors) [csclprd3-0-4:32449] MCW rank 41 is not bound (or bound to all available processors) [csclprd3-0-13:30897] MCW rank 126 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]]: [BB/BB/BB/BB/BB/BB][../../../../../..] [csclprd3-0-8:17159] MCW rank 80 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-13:30897] MCW rank 127 bound to socket 1[core 6[hwt 0-1]], socket 1[core 7[hwt 0-1]], socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]]: [../../../../../..][BB/BB/BB/BB/BB/BB] [csclprd3-0-8:17159] MCW rank 81 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]]: [../../../../../..][BB/BB/BB/BB/BB/BB] [csclprd3-0-8:17159] MCW rank 81 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-13:30897] MCW rank 128 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]]: [BB/BB/BB/BB/BB/BB][../../../../../..] [csclprd3-0-8:17159] MCW rank 82 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-13:30897] MCW rank 129 bound to socket 1[core 6[hwt 0-1]], socket 1[core 7[hwt 0-1]], socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]]: [../../../../../..][BB/BB/BB/BB/BB/BB] [csclprd3-0-8:17159] MCW rank 83 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-13:30897] MCW rank 130 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]]: [BB/BB/BB/BB/BB/BB][../../../../../..] [csclprd3-0-13:30897] MCW rank 131 bound to socket 1[core 6[hwt 0-1]], socket 1[core 7[hwt 0-1]], socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]]: [../../../../../..][BB/BB/BB/BB/BB/BB] [csclprd3-0-8:17159] MCW rank 84 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-8:17159] MCW rank 85 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-11:00885] MCW rank 119 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-10:20752] MCW rank 103 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-8:17159] MCW rank 86 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-7:22146] MCW rank 67 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-11:00885] MCW rank 104 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..][csclprd3-0-10:20752] MCW rank 88 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-8:17159] MCW rank 87 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-11:00885] MCW rank 105 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-10:20752] MCW rank 89 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-8:17159] MCW rank 72 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-7:22146] MCW rank 68 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-11:00885] MCW rank 106 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-10:20752] MCW rank 90 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-8:17159] MCW rank 73 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-11:00885] MCW rank 107 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-7:22146] MCW rank 69 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-8:17159] MCW rank 74 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-11:00885] MCW rank 108 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..]BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-7:22146] MCW rank 57 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-11:00885] MCW rank 114 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-10:20752] MCW rank 98 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-11:00885] MCW rank 115 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-7:22146] MCW rank 58 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-10:20752] MCW rank 99 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-7:22146] MCW rank 59 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-7:22146] MCW rank 60 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-7:22146] MCW rank 61 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-7:22146] MCW rank 62 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] [csclprd3-0-7:22146] MCW rank 63 bound to socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] [csclprd3-0-13:30901] *** Process received signal *** [csclprd3-0-13:30901] Signal: Bus error (7) [csclprd3-0-13:30901] Signal code: Non-existant physical address (2) [csclprd3-0-13:30901] Failing at address: 0x7ff404351d80 [csclprd3-0-13:30901] [ 0] /lib64/libpthread.so.0(+0xf500)[0x7ff41453c500] [csclprd3-0-13:30901] [ 1] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0xd4fea)[0x7ff41481efea] [csclprd3-0-13:30901] [ 2] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x219)[0x7ff41479f009] [csclprd3-0-13:30901] [ 3] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0x40)[0x7ff41479f110] [csclprd3-0-13:30901] [ 4] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0xc568e)[0x7ff41480f68e] [csclprd3-0-13:30901] [ 5] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xd5)[0x7ff4148e3715] [csclprd3-0-13:30901] [ 6] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0x8d6)[0x7ff4147b9ad6] [csclprd3-0-13:30901] [ 7] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x170)[0x7ff4147d8c60] [csclprd3-0-13:30901] [ 8] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] [csclprd3-0-13:30901] [ 9] /lib64/libc.so.6(__libc_start_main+0xfd)[0x7ff4141b9cdd] [csclprd3-0-13:30901] [10] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] [csclprd3-0-13:30901] *** End of error message *** ________________________________ From: users [users-boun...@open-mpi.org] on behalf of Ralph Castain [r...@open-mpi.org] Sent: Thursday, June 18, 2015 5:26 PM To: Open MPI Users Subject: Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash FWIW: I don't think this actually has anything to do with the #procs you are trying to run. Instead, I expect it has to do with confusion over how many cores it can bind across. When you tell it to use-hwthread-cpus, you are asking us to map processes to hwthreads, and not cores. I don't know which nodes are which, but it could be that we are getting incorrect info somewhere. Given that you are limiting the number of procs to the number of cores, is there some reason why you are asking us to use-hwthread-cpus? Why not just leave it at the default core level? I also suspect that you would have no problems if you -bind-to none - does that in fact work? On Jun 18, 2015, at 4:54 PM, Lane, William <william.l...@cshs.org> wrote: I'm having a strange problem w/OpenMPI 1.8.6. If I run my OpenMPI test code (compiled against OpenMPI 1.8.6 libraries) on < 131 slots I get no issues. Anything over 131 errors out: mpirun -np 132 -report-bindings --prefix /hpc/apps/mpi/openmpi/1.8.6/ --hostfile hostfile-single --mca btl_tcp_if_include eth0 --hetero-nodes --use-hwthread-cpus /hpc/home/lanew/mpi/openmpi/ProcessColors3 The hostfile has the number of slots restricted to the number of cores, while the max-slots includes the hyperthreading cores (e.g. csclprd3-0-0 slots=6 max-slots=12). The nodes are a mix of IBM x3550 nodes some are Sandybridges and others are older Xeons. I would like to add that the submit node from which I am launching mpirun has the open files soft limit (ulimit -a) set to 1024, while the hard limit (ulimit -Ha) is set to 4096. I know open file limits were an issue w/an older version of OpenMPI. The compute nodes all have their hard open files limit and soft open files limits set to 4096. Here's the output (csclprd3-0-13 is the last node listed in the hostfile hostfile-single): [csclprd3-0-13:28765] Signal: Bus error (7) [csclprd3-0-13:28765] Signal code: Non-existant physical address (2) [csclprd3-0-13:28765] Failing at address: 0x7f30002a8980 [csclprd3-0-13:28766] *** Process received signal *** [csclprd3-0-13:28766] Signal: Bus error (7) [csclprd3-0-13:28766] Signal code: Non-existant physical address (2) [csclprd3-0-13:28766] Failing at address: 0x7fe137662880 [csclprd3-0-13:28768] *** Process received signal *** [csclprd3-0-13:28768] Signal: Bus error (7) [csclprd3-0-13:28768] Signal code: Non-existant physical address (2) [csclprd3-0-13:28768] Failing at address: 0x7f9b40228a80 [csclprd3-0-13:28770] *** Process received signal *** [csclprd3-0-13:28770] Signal: Bus error (7) [csclprd3-0-13:28770] Signal code: Non-existant physical address (2) [csclprd3-0-13:28770] Failing at address: 0x7f0de7f2bb00 [csclprd3-0-13:28767] *** Process received signal *** [csclprd3-0-13:28767] Signal: Bus error (7) [csclprd3-0-13:28767] Signal code: Non-existant physical address (2) [csclprd3-0-13:28767] Failing at address: 0x7f9b6c2e8980 [csclprd3-0-13:28764] *** Process received signal *** [csclprd3-0-13:28764] Signal: Bus error (7) [csclprd3-0-13:28764] Signal code: Non-existant physical address (2) [csclprd3-0-13:28765] Signal: Bus error (7) [csclprd3-0-13:28765] Signal code: Non-existant physical address (2) [csclprd3-0-13:28765] Failing at address: 0x7f30002a8980 [csclprd3-0-13:28766] *** Process received signal *** [csclprd3-0-13:28766] Signal: Bus error (7) [csclprd3-0-13:28766] Signal code: Non-existant physical address (2) [csclprd3-0-13:28766] Failing at address: 0x7fe137662880 [csclprd3-0-13:28768] *** Process received signal *** [csclprd3-0-13:28768] Signal: Bus error (7) [csclprd3-0-13:28768] Signal code: Non-existant physical address (2) [csclprd3-0-13:28768] Failing at address: 0x7f9b40228a80 [csclprd3-0-13:28770] *** Process received signal *** [csclprd3-0-13:28770] Signal: Bus error (7) [csclprd3-0-13:28770] Signal code: Non-existant physical address (2) [csclprd3-0-13:28770] Failing at address: 0x7f0de7f2bb00 [csclprd3-0-13:28767] *** Process received signal *** [csclprd3-0-13:28767] Signal: Bus error (7) [csclprd3-0-13:28767] Signal code: Non-existant physical address (2) [csclprd3-0-13:28767] Failing at address: 0x7f9b6c2e8980 [csclprd3-0-13:28764] *** Process received signal *** [csclprd3-0-13:28764] Signal: Bus error (7) [csclprd3-0-13:28764] Signal code: Non-existant physical address (2) [csclprd3-0-13:28768] [ 3] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0x40)[0x7f9b513ad110] [csclprd3-0-13:28768] [ 4] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x219)[0x7f0df77b6009] [csclprd3-0-13:28770] [ 3] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0x40)[0x7f0df77b6110] [csclprd3-0-13:28770] [ 4] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0xc568e)[0x7f9b5141d68e] [csclprd3-0-13:28768] [ 5] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xd5)[0x7f9b514f1715] [csclprd3-0-13:28768] [ 6] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0xc568e)[0x7f30115ea68e] [csclprd3-0-13:28765] [ 5] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xd5)[0x7f30116be715] [csclprd3-0-13:28765] [ 6] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0xc568e)[0x7f9b7bb3b68e] [csclprd3-0-13:28767] [ 5] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xd5)[0x7f9b7bc0f715] [csclprd3-0-13:28767] [ 6] [csclprd3-0-13:28764] [ 4] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0xc568e)[0x7fa946bb768e] [csclprd3-0-13:28764] [ 5] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0xc568e)[0x7fe146d4068e] [csclprd3-0-13:28766] [ 5] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0xc568e)[0x7f0df782668e] [csclprd3-0-13:28770] [ 5] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xd5)[0x7f0df78fa715] [csclprd3-0-13:28770] [ 6] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0x8d6)[0x7f0df77d0ad6] [csclprd3-0-13:28770] [ 7] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xd5)[0x7fe146e14715] [csclprd3-0-13:28766] [ 6] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0x8d6)[0x7fe146ceaad6] [csclprd3-0-13:28766] [ 7] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0x8d6)[0x7f9b513c7ad6] [csclprd3-0-13:28768] [ 7] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x170)[0x7f9b513e6c60] [csclprd3-0-13:28768] [ 8] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] [csclprd3-0-13:28768] [ 9] /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f9b50dc7cdd] [csclprd3-0-13:28768] [10] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] [csclprd3-0-13:28768] *** End of error message *** /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0x8d6)[0x7f3011594ad6] [csclprd3-0-13:28765] [ 7] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x170)[0x7f30115b3c60] [csclprd3-0-13:28765] [ 8] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] [csclprd3-0-13:28765] [ 9] /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f3010f94cdd] [csclprd3-0-13:28765] [10] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] [csclprd3-0-13:28765] *** End of error message *** /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0x8d6)[0x7f9b7bae5ad6] [csclprd3-0-13:28767] [ 7] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x170)[0x7f9b7bb04c60] [csclprd3-0-13:28767] [ 8] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] [csclprd3-0-13:28767] [ 9] /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f9b7b4e5cdd] [csclprd3-0-13:28767] [10] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] [csclprd3-0-13:28767] *** End of error message *** /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xd5)[0x7fa946c8b715] [csclprd3-0-13:28764] [ 6] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0x8d6)[0x7fa946b61ad6] [csclprd3-0-13:28764] [ 7] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x170)[0x7f0df77efc60] [csclprd3-0-13:28770] [ 8] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] [csclprd3-0-13:28770] [ 9] /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f0df71d0cdd] [csclprd3-0-13:28770] [10] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] [csclprd3-0-13:28770] *** End of error message *** /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x170)[0x7fe146d09c60] [csclprd3-0-13:28766] [ 8] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] [csclprd3-0-13:28766] [ 9] /lib64/libc.so.6(__libc_start_main+0xfd)[0x7fe1466eacdd] [csclprd3-0-13:28767] *** End of error message *** /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xd5)[0x7fa946c8b715] [csclprd3-0-13:28764] [ 6] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0x8d6)[0x7fa946b61ad6] [csclprd3-0-13:28764] [ 7] /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x170)[0x7f0df77efc60] [csclprd3-0-13:28770] [ 8] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] [csclprd3-0-13:28770] [ 9] /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f0df71d0cdd] [csclprd3-0-13:28770] [10] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] [csclprd3-0-13:28770] *** End of error message *** /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x170)[0x7fe146d09c60] [csclprd3-0-13:28766] [ 8] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] [csclprd3-0-13:28766] [ 9] /lib64/libc.so.6(__libc_start_main+0xfd)[0x7fe1466eacdd] [csclprd3-0-13:28766] [10] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] [csclprd3-0-13:28766] *** End of error message *** /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x170)[0x7fa946b80c60] [csclprd3-0-13:28764] [ 8] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] [csclprd3-0-13:28764] [ 9] /lib64/libc.so.6(__libc_start_main+0xfd)[0x7fa946561cdd] [csclprd3-0-13:28764] [10] /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] [csclprd3-0-13:28764] *** End of error message *** -------------------------------------------------------------------------- mpirun noticed that process rank 126 with PID 0 on node csclprd3-0-13 exited on signal 7 (Bus error). Could a lack of the necessary NUMA libraries or the wrong version of NUMA libraries be contributing to this? IMPORTANT WARNING: This message is intended for the use of the person or entity to which it is addressed and may contain information that is privileged and confidential, the disclosure of which is governed by applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this information is strictly prohibited. Thank you for your cooperation. _______________________________________________ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/06/27159.php IMPORTANT WARNING: This message is intended for the use of the person or entity to which it is addressed and may contain information that is privileged and confidential, the disclosure of which is governed by applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this information is strictly prohibited. Thank you for your cooperation. _______________________________________________ users mailing list us...@open-mpi.org<http://UrlBlockedError.aspx> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/06/27164.php _______________________________________________ users mailing list us...@open-mpi.org<mailto:us...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/06/27166.php IMPORTANT WARNING: This message is intended for the use of the person or entity to which it is addressed and may contain information that is privileged and confidential, the disclosure of which is governed by applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this information is strictly prohibited. Thank you for your cooperation. _______________________________________________ users mailing list us...@open-mpi.org<mailto:us...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/06/27176.php IMPORTANT WARNING: This message is intended for the use of the person or entity to which it is addressed and may contain information that is privileged and confidential, the disclosure of which is governed by applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this information is strictly prohibited. Thank you for your cooperation.