Wow - that is one sick puppy! I see that some nodes are reporting not-bound for their procs, and the rest are binding to socket (as they should). Some of your nodes clearly do not have hyper threads enabled (or only have single-thread cores on them), and have 2 cores/socket. Other nodes have 8 cores/socket with hyper threads enabled, while still others have 6 cores/socket and HT enabled.
I don't see anyone binding to a single HT if multiple HTs/core are available. I think you are being fooled by those nodes that either don't have HT enabled, or have only 1 HT/core. In both cases, it is node 13 that is the node that fails. I also note that you said everything works okay with < 132 ranks, and node 13 hosts ranks 127-131. So node 13 would host ranks even if you reduced the number in the job to 131. This would imply that it probably isn't something wrong with the node itself. Is there any way you could run a job of this size on a homogeneous cluster? The procs all show bindings that look right, but I'm wondering if the heterogeneity is the source of the trouble here. We may be communicating the binding pattern incorrectly and giving bad info to the backend daemon. Also, rather than --report-bindings, use "--display-devel-map" on the command line and let's see what the mapper thinks it did. If there is a problem with placement, that is where it would exist. On Tue, Jun 23, 2015 at 5:12 PM, Lane, William <william.l...@cshs.org> wrote: > Ralph, > > There is something funny going on, the trace from the > runs w/the debug build aren't showing any differences from > what I got earlier. However, I did do a run w/the --bind-to core > switch and was surprised to see that hyperthreading cores were > sometimes being used. > > Here's the traces that I have: > > mpirun -np 132 -report-bindings --prefix /hpc/apps/mpi/openmpi/1.8.6/ > --hostfile hostfile-noslots --mca btl_tcp_if_include eth0 --hetero-nodes > /hpc/home/lanew/mpi/openmpi/ProcessColors3 > [csclprd3-0-5:16802] MCW rank 44 is not bound (or bound to all available > processors) > [csclprd3-0-5:16802] MCW rank 45 is not bound (or bound to all available > processors) > [csclprd3-0-5:16802] MCW rank 46 is not bound (or bound to all available > processors) > [csclprd3-6-5:12480] MCW rank 4 bound to socket 0[core 0[hwt 0]], socket > 0[core 1[hwt 0]]: [B/B][./.] > [csclprd3-6-5:12480] MCW rank 5 bound to socket 1[core 2[hwt 0]], socket > 1[core 3[hwt 0]]: [./.][B/B] > [csclprd3-6-5:12480] MCW rank 6 bound to socket 0[core 0[hwt 0]], socket > 0[core 1[hwt 0]]: [B/B][./.] > [csclprd3-6-5:12480] MCW rank 7 bound to socket 1[core 2[hwt 0]], socket > 1[core 3[hwt 0]]: [./.][B/B] > [csclprd3-0-5:16802] MCW rank 47 is not bound (or bound to all available > processors) > [csclprd3-0-5:16802] MCW rank 48 is not bound (or bound to all available > processors) > [csclprd3-0-5:16802] MCW rank 49 is not bound (or bound to all available > processors) > [csclprd3-0-1:14318] MCW rank 22 is not bound (or bound to all available > processors) > [csclprd3-0-1:14318] MCW rank 23 is not bound (or bound to all available > processors) > [csclprd3-0-1:14318] MCW rank 24 is not bound (or bound to all available > processors) > [csclprd3-6-1:24682] MCW rank 3 bound to socket 1[core 2[hwt 0]], socket > 1[core 3[hwt 0]]: [./.][B/B] > [csclprd3-6-1:24682] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket > 0[core 1[hwt 0]]: [B/B][./.] > [csclprd3-0-1:14318] MCW rank 25 is not bound (or bound to all available > processors) > [csclprd3-0-1:14318] MCW rank 20 is not bound (or bound to all available > processors) > [csclprd3-0-3:13827] MCW rank 34 is not bound (or bound to all available > processors) > [csclprd3-0-1:14318] MCW rank 21 is not bound (or bound to all available > processors) > [csclprd3-0-3:13827] MCW rank 35 is not bound (or bound to all available > processors) > [csclprd3-6-1:24682] MCW rank 1 bound to socket 1[core 2[hwt 0]], socket > 1[core 3[hwt 0]]: [./.][B/B] > [csclprd3-0-3:13827] MCW rank 36 is not bound (or bound to all available > processors) > [csclprd3-6-1:24682] MCW rank 2 bound to socket 0[core 0[hwt 0]], socket > 0[core 1[hwt 0]]: [B/B][./.] > [csclprd3-0-6:30371] MCW rank 51 is not bound (or bound to all available > processors) > [csclprd3-0-6:30371] MCW rank 52 is not bound (or bound to all available > processors) > [csclprd3-0-6:30371] MCW rank 53 is not bound (or bound to all available > processors) > [csclprd3-0-2:05825] MCW rank 30 is not bound (or bound to all available > processors) > [csclprd3-0-6:30371] MCW rank 54 is not bound (or bound to all available > processors) > [csclprd3-0-3:13827] MCW rank 37 is not bound (or bound to all available > processors) > [csclprd3-0-2:05825] MCW rank 31 is not bound (or bound to all available > processors) > [csclprd3-0-3:13827] MCW rank 32 is not bound (or bound to all available > processors) > [csclprd3-0-6:30371] MCW rank 55 is not bound (or bound to all available > processors) > [csclprd3-0-3:13827] MCW rank 33 is not bound (or bound to all available > processors) > [csclprd3-0-6:30371] MCW rank 50 is not bound (or bound to all available > processors) > [csclprd3-0-2:05825] MCW rank 26 is not bound (or bound to all available > processors) > [csclprd3-0-2:05825] MCW rank 27 is not bound (or bound to all available > processors) > [csclprd3-0-2:05825] MCW rank 28 is not bound (or bound to all available > processors) > [csclprd3-0-2:05825] MCW rank 29 is not bound (or bound to all available > processors) > [csclprd3-0-12:12383] MCW rank 121 is not bound (or bound to all available > processors) > [csclprd3-0-12:12383] MCW rank 122 is not bound (or bound to all available > processors) > [csclprd3-0-12:12383] MCW rank 123 is not bound (or bound to all available > processors) > [csclprd3-0-12:12383] MCW rank 124 is not bound (or bound to all available > processors) > [csclprd3-0-12:12383] MCW rank 125 is not bound (or bound to all available > processors) > [csclprd3-0-12:12383] MCW rank 120 is not bound (or bound to all available > processors) > [csclprd3-0-0:31079] MCW rank 13 bound to socket 1[core 6[hwt 0]], socket > 1[core 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], socket > 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: [./././././.][B/B/B/B/B/B] > [csclprd3-0-0:31079] MCW rank 14 bound to socket 0[core 0[hwt 0]], socket > 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket > 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [B/B/B/B/B/B][./././././.] > [csclprd3-0-0:31079] MCW rank 15 bound to socket 1[core 6[hwt 0]], socket > 1[core 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], socket > 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: [./././././.][B/B/B/B/B/B] > [csclprd3-0-0:31079] MCW rank 16 bound to socket 0[core 0[hwt 0]], socket > 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket > 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [B/B/B/B/B/B][./././././.] > [csclprd3-0-7:20515] MCW rank 68 bound to socket 0[core 0[hwt 0-1]], > socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt > 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core > 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: > [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > [csclprd3-0-10:19096] MCW rank 100 bound to socket 0[core 0[hwt 0-1]], > socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt > 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core > 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: > [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > [csclprd3-0-7:20515] MCW rank 69 bound to socket 1[core 8[hwt 0-1]], > socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt > 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket > 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: > [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > [csclprd3-0-10:19096] MCW rank 101 bound to socket 1[core 8[hwt 0-1]], > socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt > 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket > 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: > [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > [csclprd3-0-0:31079] MCW rank 17 bound to socket 1[core 6[hwt 0]], socket > 1[core 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], socket > 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: [./././././.][B/B/B/B/B/B] > [csclprd3-0-7:20515] MCW rank 70 bound to socket 0[core 0[hwt 0-1]], > socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt > 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core > 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: > [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > [csclprd3-0-10:19096] MCW rank 102 bound to socket 0[core 0[hwt 0-1]], > socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt > 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core > 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: > [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > [csclprd3-0-11:31636] MCW rank 116 bound to socket 0[core 0[hwt 0-1]], > socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt > 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core > 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: > [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > [csclprd3-0-11:31636] MCW rank 117 bound to socket 1[core 8[hwt 0-1]], > socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt > 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket > 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: > [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > [csclprd3-0-0:31079] MCW rank 18 bound to socket 0[core 0[hwt 0]], socket > 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket > 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [B/B/B/B/B/B][./././././.] > [csclprd3-0-11:31636] MCW rank 118 bound to socket 0[core 0[hwt 0-1]], > socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt > 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core > 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: > [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > [csclprd3-0-0:31079] MCW rank 19 bound to socket 1[core 6[hwt 0]], socket > 1[core 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], socket > 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: [./././././.][B/B/B/B/B/B] > [csclprd3-0-7:20515] MCW rank 71 bound to socket 1[core 8[hwt 0-1]], > socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt > 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket > 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: > [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > [csclprd3-0-10:19096] MCW rank 103 bound to socket 1[core 8[hwt 0-1]], > socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt > 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket > 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: > [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > [csclprd3-0-0:31079] MCW rank 8 bound to socket 0[core 0[hwt 0]], socket > 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket > 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [B/B/B/B/B/B][./././././.] > [csclprd3-0-0:31079] MCW rank 9 bound to socket 1[core 6[hwt 0]], socket > 1[core 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], socket > 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: [./././././.][B/B/B/B/B/B] > [csclprd3-0-10:19096] MCW rank 88 bound to socket 0[core 0[hwt 0-1]], > socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt > 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core > 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: > [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > [csclprd3-0-11:31636] MCW rank 119 bound to socket 1[core 8[hwt 0-1]], > socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt > 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket > 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: > [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > [csclprd3-0-7:20515] MCW rank 56 bound to socket 0[core 0[hwt 0-1]], > socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt > 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core > 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: > [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > [csclprd3-0-0:31079] MCW rank 10 bound to socket 0[core 0[hwt 0]], socket > 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket > 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [B/B/B/B/B/B][./././././.] > [csclprd3-0-7:20515] MCW rank 57 bound to socket 1[core 8[hwt 0-1]], > socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt > 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket > 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: > [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > [csclprd3-0-10:19096] MCW rank 89 bound to socket 1[core 8[hwt 0-1]], > socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt > 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket > 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: > [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > [csclprd3-0-11:31636] MCW rank 104 bound to socket 0[core 0[hwt 0-1]], > socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt > 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core > 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: > [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > [csclprd3-0-0:31079] MCW rank 11 bound to socket 1[core 6[hwt 0]], socket > 1[core 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], socket > 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: [./././././.][B/B/B/B/B/B] > [csclprd3-0-0:31079] MCW rank 12 bound to socket 0[core 0[hwt 0]], socket > 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket > 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [B/B/B/B/B/B][./././././.] > [csclprd3-0-4:30348] MCW rank 42 is not bound (or bound to all available > processors) > [csclprd3-0-4:30348] MCW rank 43 is not bound (or bound to all available > processors) > [csclprd3-0-10:19096] MCW rank 90 bound to socket 0[core 0[hwt 0-1]], > socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt > 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core > 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: > [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > [csclprd3-0-4:30348] MCW rank 38 is not bound (or bound to all available > processors) > [csclprd3-0-7:20515] MCW rank 58 bound to socket 0[core 0[hwt 0-1]], > socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt > 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core > 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: > [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > [csclprd3-0-4:30348] MCW rank 39 is not bound (or bound to all available > processors) > [csclprd3-0-4:30348] MCW rank 40 is not bound (or bound to all available > processors) > [csclprd3-0-4:30348] MCW rank 41 is not bound (or bound to all available > processors) > [csclprd3-0-11:31636] MCW rank 105 bound to socket 1[core 8[hwt 0-1]], > socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt > 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket > 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: > [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > [csclprd3-0-13:29118] MCW rank 127 bound to socket 1[core 6[hwt 0-1]], > socket 1[core 7[hwt 0-1]], socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt > 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]]: > [../../../../../..][BB/BB/BB/BB/BB/BB] > [csclprd3-0-13:29118] MCW rank 128 bound to socket 0[core 0[hwt 0-1]], > socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt > 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]]: > [BB/BB/BB/BB/BB/BB][../../../../../..] > [csclprd3-0-13:29118] MCW rank 129 bound to socket 1[core 6[hwt 0-1]], > socket 1[core 7[hwt 0-1]], socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt > 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]]: > [../../../../../..][BB/BB/BB/BB/BB/BB] > [csclprd3-0-13:29118] MCW rank 130 bound to socket 0[core 0[hwt 0-1]], > socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt > 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]]: > [BB/BB/BB/BB/BB/BB][../../../../../..] > [csclprd3-0-8:15542] MCW rank 84 bound to socket 0[core 0[hwt 0-1]], > socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt > 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core > 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: > [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > [csclprd3-0-13:29118] MCW rank 131 bound to socket 1[core 6[hwt 0-1]], > socket 1[core 7[hwt 0-1]], socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt > 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]]: > [../../../../../..][BB/BB/BB/BB/BB/BB] > [csclprd3-0-8:15542] MCW rank 85 bound to socket 1[core 8[hwt 0-1]], > socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt > 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket > 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: > [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > [csclprd3-0-13:29118] MCW rank 126 bound to socket 0[core 0[hwt 0-1]], > socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt > 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]]: > [BB/BB/BB/BB/BB/BB][../../../../../..] > [csclprd3-0-8:15542] MCW rank 86 bound to socket 0[core 0[hwt 0-1]], > socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt > 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core > 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: > [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > [csclprd3-0-8:15542] MCW rank 87 bound to socket 1[core 8[hwt 0-1]], > socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt > 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket > 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: > [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > [csclprd3-0-7:20515] MCW rank 59 bound to socket 1[core 8[hwt 0-1]], > socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt > 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket > 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: > [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > [csclprd3-0-10:19096] MCW rank 91 bound to socket 1[core 8[hwt 0-1]], > socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt > 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket > 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: > [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > [csclprd3-0-11:31636] MCW rank 106 bound to socket 0[core 0[hwt 0-1]], > socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt > 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core > 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: > [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > [csclprd3-0-8:15542] MCW rank 72 bound to socket 0[core 0[hwt 0-1]], > socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt > 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core > 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: > [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > [csclprd3-0-7:20515] MCW rank 60 bound to socket 0[core 0[hwt 0-1]], > socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt > 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core > 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: > [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > [csclprd3-0-10:19096] MCW rank 92 bound to socket 0[core 0[hwt 0-1]], > socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt > 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core > 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: > [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > [csclprd3-0-11:31636] MCW rank 107 bound to socket 1[core 8[hwt 0-1]], > socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt > 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket > 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: > [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > [csclprd3-0-7:20515] MCW rank 61 bound to socket 1[core 8[hwt 0-1]], > socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt > 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket > 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: > [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > [csclprd3-0-11:31636] MCW rank 108 bound to socket 0[core 0[hwt 0-1]], > socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt > 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core > 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: > [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > [csclprd3-0-10:19096] MCW rank 93 bound to socket 1[core 8[hwt 0-1]], > socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt > 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket > 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: > [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > [csclprd3-0-8:15542] MCW rank 73 bound to socket 1[core 8[hwt 0-1]], > socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt > 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket > 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: > [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > [csclprd3-0-7:20515] MCW rank 62 bound to socket 0[core 0[hwt 0-1]], > socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt > 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core > 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: > [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > [csclprd3-0-10:19096] MCW rank 94 bound to socket 0[core 0[hwt 0-1]], > socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt > 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core > 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: > [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > [csclprd3-0-11:31636] MCW rank 109 bound to socket 1[core 8[hwt 0-1]], > socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt > 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket > 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: > [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > [csclprd3-0-7:20515] MCW rank 63 bound to socket 1[core 8[hwt 0-1]], > socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt > 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket > 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: > [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > [csclprd3-0-10:19096] MCW rank 95 bound to socket 1[core 8[hwt 0-1]], > socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt > 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket > 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: > [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > [csclprd3-0-11:31636] MCW rank 110 bound to socket 0[core 0[hwt 0-1]], > socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt > 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core > 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: > [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > [csclprd3-0-8:15542] MCW rank 74 bound to socket 0[core 0[hwt 0-1]], > socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt > 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core > 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: > [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > [csclprd3-0-7:20515] MCW rank 64 bound to socket 0[core 0[hwt 0-1]], > socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt > 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core > 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: > [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > [csclprd3-0-10:19096] MCW rank 96 bound to socket 0[core 0[hwt 0-1]], > socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt > 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core > 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: > [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > [csclprd3-0-11:31636] MCW rank 111 bound to socket 1[core 8[hwt 0-1]], > socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt > 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket > 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: > [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > [csclprd3-0-7:20515] MCW rank 65 bound to socket 1[core 8[hwt 0-1]], > socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt > 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket > 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: > [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > [csclprd3-0-10:19096] MCW rank 97 bound to socket 1[core 8[hwt 0-1]], > socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt > 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket > 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: > [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > [csclprd3-0-11:31636] MCW rank 112 bound to socket 0[core 0[hwt 0-1]], > socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt > 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core > 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: > [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > [csclprd3-0-8:15542] MCW rank 75 bound to socket 1[core 8[hwt 0-1]], > socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt > 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket > 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: > [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > [csclprd3-0-7:20515] MCW rank 66 bound to socket 0[core 0[hwt 0-1]], > socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt > 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core > 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: > [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > [csclprd3-0-10:19096] MCW rank 98 bound to socket 0[core 0[hwt 0-1]], > socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt > 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core > 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: > [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > [csclprd3-0-11:31636] MCW rank 113 bound to socket 1[core 8[hwt 0-1]], > socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt > 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket > 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: > [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > [csclprd3-0-7:20515] MCW rank 67 bound to socket 1[core 8[hwt 0-1]], > socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt > 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket > 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: > [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > [csclprd3-0-10:19096] MCW rank 99 bound to socket 1[core 8[hwt 0-1]], > socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt > 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket > 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: > [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > [csclprd3-0-11:31636] MCW rank 114 bound to socket 0[core 0[hwt 0-1]], > socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt > 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core > 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: > [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > [csclprd3-0-8:15542] MCW rank 76 bound to socket 0[core 0[hwt 0-1]], > socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt > 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core > 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: > [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > [csclprd3-0-11:31636] MCW rank 115 bound to socket 1[core 8[hwt 0-1]], > socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt > 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket > 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: > [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > [csclprd3-0-8:15542] MCW rank 77 bound to socket 1[core 8[hwt 0-1]], > socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt > 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket > 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: > [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > [csclprd3-0-8:15542] MCW rank 78 bound to socket 0[core 0[hwt 0-1]], > socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt > 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core > 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: > [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > [csclprd3-0-8:15542] MCW rank 79 bound to socket 1[core 8[hwt 0-1]], > socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt > 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket > 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: > [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > [csclprd3-0-8:15542] MCW rank 80 bound to socket 0[core 0[hwt 0-1]], > socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt > 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core > 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: > [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > [csclprd3-0-8:15542] MCW rank 81 bound to socket 1[core 8[hwt 0-1]], > socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt > 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket > 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: > [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > [csclprd3-0-8:15542] MCW rank 82 bound to socket 0[core 0[hwt 0-1]], > socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt > 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core > 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: > [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] > [csclprd3-0-8:15542] MCW rank 83 bound to socket 1[core 8[hwt 0-1]], > socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt > 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket > 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: > [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] > [csclprd3-0-13:29120] *** Process received signal *** > [csclprd3-0-13:29120] Signal: Bus error (7) > [csclprd3-0-13:29120] Signal code: Non-existant physical address (2) > [csclprd3-0-13:29120] Failing at address: 0x7f181832ba80 > [csclprd3-0-13:29121] *** Process received signal *** > [csclprd3-0-13:29121] Signal: Bus error (7) > [csclprd3-0-13:29121] Signal code: Non-existant physical address (2) > [csclprd3-0-13:29121] Failing at address: 0x7f5ca82a7980 > [csclprd3-0-13:29122] *** Process received signal *** > [csclprd3-0-13:29122] Signal: Bus error (7) > [csclprd3-0-13:29122] Signal code: Non-existant physical address (2) > [csclprd3-0-13:29122] Failing at address: 0x7fac6ba24980 > [csclprd3-0-13:29123] *** Process received signal *** > [csclprd3-0-13:29123] Signal: Bus error (7) > [csclprd3-0-13:29123] Signal code: Non-existant physical address (2) > [csclprd3-0-13:29123] Failing at address: 0x7faa24267a00 > [csclprd3-0-13:29125] *** Process received signal *** > [csclprd3-0-13:29125] Signal: Bus error (7) > [csclprd3-0-13:29125] Signal code: Non-existant physical address (2) > [csclprd3-0-13:29125] Failing at address: 0x7fa493ae7a00 > [csclprd3-0-13:29119] *** Process received signal *** > [csclprd3-0-13:29119] Signal: Bus error (7) > [csclprd3-0-13:29119] Signal code: Non-existant physical address (2) > [csclprd3-0-13:29119] Failing at address: 0x7fed7436ba80 > [csclprd3-0-13:29120] [ 0] /lib64/libpthread.so.0(+0xf500)[0x7f182913e500] > [csclprd3-0-13:29120] [ 1] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x167f61)[0x7f18294b3f61] > [csclprd3-0-13:29120] [ 2] [csclprd3-0-13:29121] [ 0] > /lib64/libpthread.so.0(+0xf500)[0x7f5cb8803500] > [csclprd3-0-13:29121] [ 1] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x167f61)[0x7f5cb8b78f61] > [csclprd3-0-13:29121] [ 2] [csclprd3-0-13:29122] [ 0] > /lib64/libpthread.so.0(+0xf500)[0x7fac7b20c500] > [csclprd3-0-13:29122] [ 1] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x167f61)[0x7fac7b581f61] > [csclprd3-0-13:29122] [ 2] [csclprd3-0-13:29123] [ 0] > /lib64/libpthread.so.0(+0xf500)[0x7faa33edd500] > [csclprd3-0-13:29123] [ 1] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x167f61)[0x7faa34252f61] > [csclprd3-0-13:29123] [ 2] [csclprd3-0-13:29125] [ 0] > /lib64/libpthread.so.0(+0xf500)[0x7fa4a3097500] > [csclprd3-0-13:29125] [ 1] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x167f61)[0x7fa4a340cf61] > [csclprd3-0-13:29125] [ 2] [csclprd3-0-13:29119] [ 0] > /lib64/libpthread.so.0(+0xf500)[0x7fed85c95500] > [csclprd3-0-13:29119] [ 1] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x167f61)[0x7fed8600af61] > [csclprd3-0-13:29119] [ 2] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x168047)[0x7fa4a340d047] > [csclprd3-0-13:29125] [ 3] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x55670)[0x7fa4a32fa670] > [csclprd3-0-13:29125] [ 4] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x3b9)[0x7fa4a32fb5ab] > [csclprd3-0-13:29125] [ 5] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0xfb)[0x7fa4a32fb751] > [csclprd3-0-13:29125] [ 6] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x168047)[0x7f18294b4047] > [csclprd3-0-13:29120] [ 3] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x55670)[0x7f18293a1670] > [csclprd3-0-13:29120] [ 4] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x3b9)[0x7f18293a25ab] > [csclprd3-0-13:29120] [ 5] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0xfb)[0x7f18293a2751] > [csclprd3-0-13:29120] [ 6] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x168047)[0x7f5cb8b79047] > [csclprd3-0-13:29121] [ 3] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x55670)[0x7f5cb8a66670] > [csclprd3-0-13:29121] [ 4] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x3b9)[0x7f5cb8a675ab] > [csclprd3-0-13:29121] [ 5] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0xfb)[0x7f5cb8a67751] > [csclprd3-0-13:29121] [ 6] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x168047)[0x7fac7b582047] > [csclprd3-0-13:29122] [ 3] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x55670)[0x7fac7b46f670] > [csclprd3-0-13:29122] [ 4] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x3b9)[0x7fac7b4705ab] > [csclprd3-0-13:29122] [ 5] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0xfb)[0x7fac7b470751] > [csclprd3-0-13:29122] [ 6] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x168047)[0x7faa34253047] > [csclprd3-0-13:29123] [ 3] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x55670)[0x7faa34140670] > [csclprd3-0-13:29123] [ 4] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x3b9)[0x7faa341415ab] > [csclprd3-0-13:29123] [ 5] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0xfb)[0x7faa34141751] > [csclprd3-0-13:29123] [ 6] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x168047)[0x7fed8600b047] > [csclprd3-0-13:29119] [ 3] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x55670)[0x7fed85ef8670] > [csclprd3-0-13:29119] [ 4] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x3b9)[0x7fed85ef95ab] > [csclprd3-0-13:29119] [ 5] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0xfb)[0x7fed85ef9751] > [csclprd3-0-13:29119] [ 6] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_btl_sm_add_procs+0x671)[0x7fed860071c9] > [csclprd3-0-13:29119] [ 7] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x14a628)[0x7fed85fed628] > [csclprd3-0-13:29119] [ 8] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xff)[0x7fed86160d61] > [csclprd3-0-13:29119] [ 9] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_btl_sm_add_procs+0x671)[0x7faa3424f1c9] > [csclprd3-0-13:29123] [ 7] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x14a628)[0x7faa34235628] > [csclprd3-0-13:29123] [ 8] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xff)[0x7faa343a8d61] > [csclprd3-0-13:29123] [ 9] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_btl_sm_add_procs+0x671)[0x7fa4a34091c9] > [csclprd3-0-13:29125] [ 7] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x14a628)[0x7fa4a33ef628] > [csclprd3-0-13:29125] [ 8] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xff)[0x7fa4a3562d61] > [csclprd3-0-13:29125] [ 9] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_btl_sm_add_procs+0x671)[0x7f18294b01c9] > [csclprd3-0-13:29120] [ 7] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x14a628)[0x7f1829496628] > [csclprd3-0-13:29120] [ 8] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_btl_sm_add_procs+0x671)[0x7f5cb8b751c9] > [csclprd3-0-13:29121] [ 7] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x14a628)[0x7f5cb8b5b628] > [csclprd3-0-13:29121] [ 8] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xff)[0x7f5cb8cced61] > [csclprd3-0-13:29121] [ 9] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0xbda)[0x7f5cb8a96747] > [csclprd3-0-13:29121] [10] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_btl_sm_add_procs+0x671)[0x7fac7b57e1c9] > [csclprd3-0-13:29122] [ 7] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x14a628)[0x7fac7b564628] > [csclprd3-0-13:29122] [ 8] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xff)[0x7fac7b6d7d61] > [csclprd3-0-13:29122] [ 9] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0xbda)[0x7fac7b49f747] > [csclprd3-0-13:29122] [10] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0xbda)[0x7fed85f28747] > [csclprd3-0-13:29119] [10] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x185)[0x7fed85f6850b] > [csclprd3-0-13:29119] [11] > /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] > [csclprd3-0-13:29119] [12] > /lib64/libc.so.6(__libc_start_main+0xfd)[0x7fed85912cdd] > [csclprd3-0-13:29119] [13] > /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] > [csclprd3-0-13:29119] *** End of error message *** > > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0xbda)[0x7faa34170747] > [csclprd3-0-13:29123] [10] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x185)[0x7faa341b050b] > [csclprd3-0-13:29123] [11] > /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] > [csclprd3-0-13:29123] [12] > /lib64/libc.so.6(__libc_start_main+0xfd)[0x7faa33b5acdd] > [csclprd3-0-13:29123] [13] > /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] > [csclprd3-0-13:29123] *** End of error message *** > > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0xbda)[0x7fa4a332a747] > [csclprd3-0-13:29125] [10] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x185)[0x7fa4a336a50b] > [csclprd3-0-13:29125] [11] > /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] > [csclprd3-0-13:29125] [12] > /lib64/libc.so.6(__libc_start_main+0xfd)[0x7fa4a2d14cdd] > [csclprd3-0-13:29125] [13] > /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] > [csclprd3-0-13:29125] *** End of error message *** > > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xff)[0x7f1829609d61] > [csclprd3-0-13:29120] [ 9] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0xbda)[0x7f18293d1747] > [csclprd3-0-13:29120] [10] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x185)[0x7f182941150b] > [csclprd3-0-13:29120] [11] > /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] > [csclprd3-0-13:29120] [12] > /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f1828dbbcdd] > [csclprd3-0-13:29120] [13] > /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] > [csclprd3-0-13:29120] *** End of error message *** > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x185)[0x7f5cb8ad650b] > [csclprd3-0-13:29121] [11] > /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] > [csclprd3-0-13:29121] [12] > /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f5cb8480cdd] > [csclprd3-0-13:29121] [13] > /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] > [csclprd3-0-13:29121] *** End of error message *** > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x185)[0x7fac7b4df50b] > [csclprd3-0-13:29122] [11] > /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] > [csclprd3-0-13:29122] [12] > /lib64/libc.so.6(__libc_start_main+0xfd)[0x7fac7ae89cdd] > [csclprd3-0-13:29122] [13] > /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] > [csclprd3-0-13:29122] *** End of error message *** > -------------------------------------------------------------------------- > mpirun noticed that process rank 126 with PID 0 on node csclprd3-0-13 > exited on signal 7 (Bus error). > -------------------------------------------------------------------------- > > > 2. mpirun -np 132 -report-bindings --prefix /hpc/apps/mpi/openmpi/1.8.6/ > --hostfile hostfile-noslots --mca btl_tcp_if_include eth0 --hetero-nodes > --bind-to core /hpc/home/lanew/mpi/openmpi/ProcessColors3 > -------------------------------------------------------------------------- > WARNING: a request was made to bind a process. While the system > supports binding the process itself, at least one node does NOT > support binding memory to the process location. > > Node: csclprd3-6-1 > > This usually is due to not having the required NUMA support installed > on the node. In some Linux distributions, the required support is > contained in the libnumactl and libnumactl-devel packages. > This is a warning only; your job will continue, though performance may be > degraded. > -------------------------------------------------------------------------- > [csclprd3-6-1:24853] MCW rank 0 bound to socket 0[core 0[hwt 0]]: > [B/.][./.] > [csclprd3-6-1:24853] MCW rank 1 bound to socket 1[core 2[hwt 0]]: > [./.][B/.] > [csclprd3-6-1:24853] MCW rank 2 bound to socket 0[core 1[hwt 0]]: > [./B][./.] > [csclprd3-6-1:24853] MCW rank 3 bound to socket 1[core 3[hwt 0]]: > [./.][./B] > [csclprd3-6-5:12646] MCW rank 4 bound to socket 0[core 0[hwt 0]]: > [B/.][./.] > [csclprd3-6-5:12646] MCW rank 5 bound to socket 1[core 2[hwt 0]]: > [./.][B/.] > [csclprd3-6-5:12646] MCW rank 6 bound to socket 0[core 1[hwt 0]]: > [./B][./.] > [csclprd3-6-5:12646] MCW rank 7 bound to socket 1[core 3[hwt 0]]: > [./.][./B] > [csclprd3-0-1:14499] MCW rank 24 bound to socket 0[core 4[hwt 0]]: > [././././B/.] > [csclprd3-0-1:14499] MCW rank 25 bound to socket 0[core 5[hwt 0]]: > [./././././B] > [csclprd3-0-1:14499] MCW rank 20 bound to socket 0[core 0[hwt 0]]: > [B/././././.] > [csclprd3-0-5:16978] MCW rank 44 bound to socket 0[core 0[hwt 0]]: > [B/././././.] > [csclprd3-0-5:16978] MCW rank 45 bound to socket 0[core 1[hwt 0]]: > [./B/./././.] > [csclprd3-0-1:14499] MCW rank 21 bound to socket 0[core 1[hwt 0]]: > [./B/./././.] > [csclprd3-0-5:16978] MCW rank 46 bound to socket 0[core 2[hwt 0]]: > [././B/././.] > [csclprd3-0-1:14499] MCW rank 22 bound to socket 0[core 2[hwt 0]]: > [././B/././.] > [csclprd3-0-1:14499] MCW rank 23 bound to socket 0[core 3[hwt 0]]: > [./././B/./.] > [csclprd3-0-5:16978] MCW rank 47 bound to socket 0[core 3[hwt 0]]: > [./././B/./.] > [csclprd3-0-5:16978] MCW rank 48 bound to socket 0[core 4[hwt 0]]: > [././././B/.] > [csclprd3-0-5:16978] MCW rank 49 bound to socket 0[core 5[hwt 0]]: > [./././././B] > [csclprd3-0-6:30547] MCW rank 51 bound to socket 0[core 1[hwt 0]]: > [./B/./././.] > [csclprd3-0-2:06006] MCW rank 30 bound to socket 0[core 4[hwt 0]]: > [././././B/.] > [csclprd3-0-6:30547] MCW rank 52 bound to socket 0[core 2[hwt 0]]: > [././B/././.] > [csclprd3-0-2:06006] MCW rank 31 bound to socket 0[core 5[hwt 0]]: > [./././././B] > [csclprd3-0-6:30547] MCW rank 53 bound to socket 0[core 3[hwt 0]]: > [./././B/./.] > [csclprd3-0-2:06006] MCW rank 26 bound to socket 0[core 0[hwt 0]]: > [B/././././.] > [csclprd3-0-6:30547] MCW rank 54 bound to socket 0[core 4[hwt 0]]: > [././././B/.] > [csclprd3-0-2:06006] MCW rank 27 bound to socket 0[core 1[hwt 0]]: > [./B/./././.] > [csclprd3-0-2:06006] MCW rank 28 bound to socket 0[core 2[hwt 0]]: > [././B/././.] > [csclprd3-0-6:30547] MCW rank 55 bound to socket 0[core 5[hwt 0]]: > [./././././B] > [csclprd3-0-3:14008] MCW rank 34 bound to socket 0[core 2[hwt 0]]: > [././B/././.] > [csclprd3-0-6:30547] MCW rank 50 bound to socket 0[core 0[hwt 0]]: > [B/././././.] > [csclprd3-0-3:14008] MCW rank 35 bound to socket 0[core 3[hwt 0]]: > [./././B/./.] > [csclprd3-0-3:14008] MCW rank 36 bound to socket 0[core 4[hwt 0]]: > [././././B/.] > [csclprd3-0-3:14008] MCW rank 37 bound to socket 0[core 5[hwt 0]]: > [./././././B] > [csclprd3-0-3:14008] MCW rank 32 bound to socket 0[core 0[hwt 0]]: > [B/././././.] > [csclprd3-0-3:14008] MCW rank 33 bound to socket 0[core 1[hwt 0]]: > [./B/./././.] > [csclprd3-0-2:06006] MCW rank 29 bound to socket 0[core 3[hwt 0]]: > [./././B/./.] > [csclprd3-0-12:12559] MCW rank 120 bound to socket 0[core 0[hwt 0-1]]: > [BB/../../../../..] > [csclprd3-0-12:12559] MCW rank 121 bound to socket 0[core 1[hwt 0-1]]: > [../BB/../../../..] > [csclprd3-0-12:12559] MCW rank 122 bound to socket 0[core 2[hwt 0-1]]: > [../../BB/../../..] > [csclprd3-0-12:12559] MCW rank 123 bound to socket 0[core 3[hwt 0-1]]: > [../../../BB/../..] > [csclprd3-0-12:12559] MCW rank 124 bound to socket 0[core 4[hwt 0-1]]: > [../../../../BB/..] > [csclprd3-0-12:12559] MCW rank 125 bound to socket 0[core 5[hwt 0-1]]: > [../../../../../BB] > [csclprd3-0-0:31325] MCW rank 8 bound to socket 0[core 0[hwt 0]]: > [B/././././.][./././././.] > [csclprd3-0-0:31325] MCW rank 9 bound to socket 1[core 6[hwt 0]]: > [./././././.][B/././././.] > [csclprd3-0-0:31325] MCW rank 10 bound to socket 0[core 1[hwt 0]]: > [./B/./././.][./././././.] > [csclprd3-0-7:20792] MCW rank 68 bound to socket 0[core 6[hwt 0-1]]: > [../../../../../../BB/..][../../../../../../../..] > [csclprd3-0-7:20792] MCW rank 69 bound to socket 1[core 14[hwt 0-1]]: > [../../../../../../../..][../../../../../../BB/..] > [csclprd3-0-0:31325] MCW rank 11 bound to socket 1[core 7[hwt 0]]: > [./././././.][./B/./././.] > [csclprd3-0-10:19372] MCW rank 100 bound to socket 0[core 6[hwt 0-1]]: > [../../../../../../BB/..][../../../../../../../..] > [csclprd3-0-10:19372] MCW rank 101 bound to socket 1[core 14[hwt 0-1]]: > [../../../../../../../..][../../../../../../BB/..] > [csclprd3-0-11:31905] MCW rank 116 bound to socket 0[core 6[hwt 0-1]]: > [../../../../../../BB/..][../../../../../../../..] > [csclprd3-0-11:31905] MCW rank 117 bound to socket 1[core 14[hwt 0-1]]: > [../../../../../../../..][../../../../../../BB/..] > [csclprd3-0-7:20792] MCW rank 70 bound to socket 0[core 7[hwt 0-1]]: > [../../../../../../../BB][../../../../../../../..] > [csclprd3-0-10:19372] MCW rank 102 bound to socket 0[core 7[hwt 0-1]]: > [../../../../../../../BB][../../../../../../../..] > [csclprd3-0-11:31905] MCW rank 118 bound to socket 0[core 7[hwt 0-1]]: > [../../../../../../../BB][../../../../../../../..] > [csclprd3-0-7:20792] MCW rank 71 bound to socket 1[core 15[hwt 0-1]]: > [../../../../../../../..][../../../../../../../BB] > [csclprd3-0-10:19372] MCW rank 103 bound to socket 1[core 15[hwt 0-1]]: > [../../../../../../../..][../../../../../../../BB] > [csclprd3-0-0:31325] MCW rank 12 bound to socket 0[core 2[hwt 0]]: > [././B/././.][./././././.] > [csclprd3-0-11:31905] MCW rank 119 bound to socket 1[core 15[hwt 0-1]]: > [../../../../../../../..][../../../../../../../BB] > [csclprd3-0-0:31325] MCW rank 13 bound to socket 1[core 8[hwt 0]]: > [./././././.][././B/././.] > [csclprd3-0-7:20792] MCW rank 56 bound to socket 0[core 0[hwt 0-1]]: > [BB/../../../../../../..][../../../../../../../..] > [csclprd3-0-10:19372] MCW rank 88 bound to socket 0[core 0[hwt 0-1]]: > [BB/../../../../../../..][../../../../../../../..] > [csclprd3-0-11:31905] MCW rank 104 bound to socket 0[core 0[hwt 0-1]]: > [BB/../../../../../../..][../../../../../../../..] > [csclprd3-0-10:19372] MCW rank 89 bound to socket 1[core 8[hwt 0-1]]: > [../../../../../../../..][BB/../../../../../../..] > [csclprd3-0-7:20792] MCW rank 57 bound to socket 1[core 8[hwt 0-1]]: > [../../../../../../../..][BB/../../../../../../..] > [csclprd3-0-10:19372] MCW rank 90 bound to socket 0[core 1[hwt 0-1]]: > [../BB/../../../../../..][../../../../../../../..] > [csclprd3-0-11:31905] MCW rank 105 bound to socket 1[core 8[hwt 0-1]]: > [../../../../../../../..][BB/../../../../../../..] > [csclprd3-0-0:31325] MCW rank 14 bound to socket 0[core 3[hwt 0]]: > [./././B/./.][./././././.] > [csclprd3-0-7:20792] MCW rank 58 bound to socket 0[core 1[hwt 0-1]]: > [../BB/../../../../../..][../../../../../../../..] > [csclprd3-0-10:19372] MCW rank 91 bound to socket 1[core 9[hwt 0-1]]: > [../../../../../../../..][../BB/../../../../../..] > [csclprd3-0-0:31325] MCW rank 15 bound to socket 1[core 9[hwt 0]]: > [./././././.][./././B/./.] > [csclprd3-0-7:20792] MCW rank 59 bound to socket 1[core 9[hwt 0-1]]: > [../../../../../../../..][../BB/../../../../../..] > [csclprd3-0-10:19372] MCW rank 92 bound to socket 0[core 2[hwt 0-1]]: > [../../BB/../../../../..][../../../../../../../..] > [csclprd3-0-0:31325] MCW rank 16 bound to socket 0[core 4[hwt 0]]: > [././././B/.][./././././.] > [csclprd3-0-11:31905] MCW rank 106 bound to socket 0[core 1[hwt 0-1]]: > [../BB/../../../../../..][../../../../../../../..] > [csclprd3-0-0:31325] MCW rank 17 bound to socket 1[core 10[hwt 0]]: > [./././././.][././././B/.] > [csclprd3-0-7:20792] MCW rank 60 bound to socket 0[core 2[hwt 0-1]]: > [../../BB/../../../../..][../../../../../../../..] > [csclprd3-0-10:19372] MCW rank 93 bound to socket 1[core 10[hwt 0-1]]: > [../../../../../../../..][../../BB/../../../../..] > [csclprd3-0-0:31325] MCW rank 18 bound to socket 0[core 5[hwt 0]]: > [./././././B][./././././.] > [csclprd3-0-11:31905] MCW rank 107 bound to socket 1[core 9[hwt 0-1]]: > [../../../../../../../..][../BB/../../../../../..] > [csclprd3-0-7:20792] MCW rank 61 bound to socket 1[core 10[hwt 0-1]]: > [../../../../../../../..][../../BB/../../../../..] > [csclprd3-0-10:19372] MCW rank 94 bound to socket 0[core 3[hwt 0-1]]: > [../../../BB/../../../..][../../../../../../../..] > [csclprd3-0-11:31905] MCW rank 108 bound to socket 0[core 2[hwt 0-1]]: > [../../BB/../../../../..][../../../../../../../..] > [csclprd3-0-7:20792] MCW rank 62 bound to socket 0[core 3[hwt 0-1]]: > [../../../BB/../../../..][../../../../../../../..] > [csclprd3-0-11:31905] MCW rank 109 bound to socket 1[core 10[hwt 0-1]]: > [../../../../../../../..][../../BB/../../../../..] > [csclprd3-0-7:20792] MCW rank 63 bound to socket 1[core 11[hwt 0-1]]: > [../../../../../../../..][../../../BB/../../../..] > [csclprd3-0-10:19372] MCW rank 95 bound to socket 1[core 11[hwt 0-1]]: > [../.../../../../../../..][../../../BB/../../../..] > [csclprd3-0-11:31905] MCW rank 110 bound to socket 0[core 3[hwt 0-1]]: > [../../../BB/../../../..][../../../../../../../..] > [csclprd3-0-7:20792] MCW rank 64 bound to socket 0[core 4[hwt 0-1]]: > [../../../../BB/../../..][../../../../../../../..] > [csclprd3-0-10:19372] MCW rank 96 bound to socket 0[core 4[hwt 0-1]]: > [../../../../BB/../../..][../../../../../../../..] > [csclprd3-0-11:31905] MCW rank 111 bound to socket 1[core 11[hwt 0-1]]: > [../../../../../../../..][../../../BB/../../../..] > [csclprd3-0-0:31325] MCW rank 19 bound to socket 1[core 11[hwt 0]]: > [./././././.][./././././B] > [csclprd3-0-4:30528] MCW rank 42 bound to socket 0[core 4[hwt 0]]: > [././././B/.] > [csclprd3-0-4:30528] MCW rank 43 bound to socket 0[core 5[hwt 0]]: > [./././././B] > [csclprd3-0-4:30528] MCW rank 38 bound to socket 0[core 0[hwt 0]]: > [B/././././.] > [csclprd3-0-4:30528] MCW rank 39 bound to socket 0[core 1[hwt 0]]: > [./B/./././.] > [csclprd3-0-4:30528] MCW rank 40 bound to socket 0[core 2[hwt 0]]: > [././B/././.] > [csclprd3-0-4:30528] MCW rank 41 bound to socket 0[core 3[hwt 0]]: > [./././B/./.] > [csclprd3-0-13:29240] MCW rank 127 bound to socket 1[core 6[hwt 0-1]]: > [../../../../../..][BB/../../../../..] > [csclprd3-0-8:15818] MCW rank 76 bound to socket 0[core 2[hwt 0-1]]: > [../../BB/../../../../..][../../../../../../../..] > [csclprd3-0-13:29240] MCW rank 128 bound to socket 0[core 1[hwt 0-1]]: > [../BB/../../../..][../../../../../..] > [csclprd3-0-8:15818] MCW rank 77 bound to socket 1[core 10[hwt 0-1]]: > [../../../../../../../..][../../BB/../../../../..] > [csclprd3-0-13:29240] MCW rank 129 bound to socket 1[core 7[hwt 0-1]]: > [../../../../../..][../BB/../../../..] > [csclprd3-0-8:15818] MCW rank 78 bound to socket 0[core 3[hwt 0-1]]: > [../../../BB/../../../..][../../../../../../../..] > [csclprd3-0-13:29240] MCW rank 130 bound to socket 0[core 2[hwt 0-1]]: > [../../BB/../../..][../../../../../..] > [csclprd3-0-8:15818] MCW rank 79 bound to socket 1[core 11[hwt 0-1]]: > [../../../../../../../..][../../../BB/../../../..] > [csclprd3-0-13:29240] MCW rank 131 bound to socket 1[core 8[hwt 0-1]]: > [../../../../../..][../../BB/../../..] > [csclprd3-0-8:15818] MCW rank 80 bound to socket 0[core 4[hwt 0-1]]: > [../../../../BB/../../..][../../../../../../../..] > [csclprd3-0-13:29240] MCW rank 126 bound to socket 0[core 0[hwt 0-1]]: > [BB/../../../../..][../../../../../..] > [csclprd3-0-8:15818] MCW rank 81 bound to socket 1[core 12[hwt 0-1]]: > [../../../../../../../..][../../../../BB/../../..] > [csclprd3-0-8:15818] MCW rank 82 bound to socket 0[core 5[hwt 0-1]]: > [../../../../../BB/../..][../../../../../../../..] > [csclprd3-0-8:15818] MCW rank 83 bound to socket 1[core 13[hwt 0-1]]: > [../../../../../../../..][../../../../../BB/../..] > [csclprd3-0-8:15818] MCW rank 84 bound to socket 0[core 6[hwt 0-1]]: > [../../../../../../BB/..][../../../../../../../..] > [csclprd3-0-8:15818] MCW rank 85 bound to socket 1[core 14[hwt 0-1]]: > [../../../../../../../..][../../../../../../BB/..] > [csclprd3-0-8:15818] MCW rank 86 bound to socket 0[core 7[hwt 0-1]]: > [../../../../../../../BB][../../../../../../../..] > [csclprd3-0-8:15818] MCW rank 87 bound to socket 1[core 15[hwt 0-1]]: > [../../../../../../../..][../../../../../../../BB] > [csclprd3-0-8:15818] MCW rank 72 bound to socket 0[core 0[hwt 0-1]]: > [BB/../../../../../../..][../../../../../../../..] > [csclprd3-0-10:19372] MCW rank 97 bound to socket 1[core 12[hwt 0-1]]: > [../../../../../../../..][../../../../BB/../../..] > [csclprd3-0-11:31905] MCW rank 112 bound to socket 0[core 4[hwt 0-1]]: > [../../../../BB/../../..][../../../../../../../..] > [csclprd3-0-7:20792] MCW rank 65 bound to socket 1[core 12[hwt 0-1]]: > [../../../../../../../..][../../../../BB/../../..] > [csclprd3-0-8:15818] MCW rank 73 bound to socket 1[core 8[hwt 0-1]]: > [../../../../../../../..][BB/../../../../../../..] > [csclprd3-0-10:19372] MCW rank 98 bound to socket 0[core 5[hwt 0-1]]: > [../../../../../BB/../..][../../../../../../../..] > [csclprd3-0-11:31905] MCW rank 113 bound to socket 1[core 12[hwt 0-1]]: > [../../../../../../../..][../../../../BB/../../..] > [csclprd3-0-8:15818] MCW rank 74 bound to socket 0[core 1[hwt 0-1]]: > [../BB/../../../../../..][../../../../../../../..] > [csclprd3-0-7:20792] MCW rank 66 bound to socket 0[core 5[hwt 0-1]]: > [../../../../../BB/../..][../../../../../../../..] > [csclprd3-0-10:19372] MCW rank 99 bound to socket 1[core 13[hwt 0-1]]: > [../../../../../../../..][../../../../../BB/../..] > [csclprd3-0-11:31905] MCW rank 114 bound to socket 0[core 5[hwt 0-1]]: > [../../../../../BB/../..][../../../../../../../..] > [csclprd3-0-11:31905] MCW rank 115 bound to socket 1[core 13[hwt 0-1]]: > [../../../../../../../..][../../../../../BB/../..] > [csclprd3-0-8:15818] MCW rank 75 bound to socket 1[core 9[hwt 0-1]]: > [../../../../../../../..][../BB/../../../../../..] > [csclprd3-0-7:20792] MCW rank 67 bound to socket 1[core 13[hwt 0-1]]: > [../../../../../../../..][../../../../../BB/../..] > [csclprd3-0-13:29244] *** Process received signal *** > [csclprd3-0-13:29244] Signal: Bus error (7) > [csclprd3-0-13:29244] Signal code: Non-existant physical address (2) > [csclprd3-0-13:29244] Failing at address: 0x7f67c02a7980 > [csclprd3-0-13:29245] *** Process received signal *** > [csclprd3-0-13:29245] Signal: Bus error (7) > [csclprd3-0-13:29245] Signal code: Non-existant physical address (2) > [csclprd3-0-13:29245] Failing at address: 0x7f6390225900 > [csclprd3-0-13:29247] *** Process received signal *** > [csclprd3-0-13:29247] Signal: Bus error (7) > [csclprd3-0-13:29247] Signal code: Non-existant physical address (2) > [csclprd3-0-13:29247] Failing at address: 0x7ff4842e8980 > [csclprd3-0-13:29241] *** Process received signal *** > [csclprd3-0-13:29241] Signal: Bus error (7) > [csclprd3-0-13:29241] Signal code: Non-existant physical address (2) > [csclprd3-0-13:29241] Failing at address: 0x7fbd7c36ba80 > [csclprd3-0-13:29242] *** Process received signal *** > [csclprd3-0-13:29242] Signal: Bus error (7) > [csclprd3-0-13:29242] Signal code: Non-existant physical address (2) > [csclprd3-0-13:29242] Failing at address: 0x7f6773728a80 > [csclprd3-0-13:29243] *** Process received signal *** > [csclprd3-0-13:29243] Signal: Bus error (7) > [csclprd3-0-13:29243] Signal code: Non-existant physical address (2) > [csclprd3-0-13:29243] Failing at address: 0x7fbd7ea60980 > [csclprd3-0-13:29244] [ 0] /lib64/libpthread.so.0(+0xf500)[0x7f67cfa7b500] > [csclprd3-0-13:29244] [ 1] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x167f61)[0x7f67cfdf0f61] > [csclprd3-0-13:29244] [ 2] [csclprd3-0-13:29245] [ 0] > /lib64/libpthread.so.0(+0xf500)[0x7f639fac4500] > [csclprd3-0-13:29245] [ 1] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x167f61)[0x7f639fe39f61] > [csclprd3-0-13:29245] [ 2] [csclprd3-0-13:29247] [ 0] > /lib64/libpthread.so.0(+0xf500)[0x7ff493ea8500] > [csclprd3-0-13:29247] [ 1] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x167f61)[0x7ff49421df61] > [csclprd3-0-13:29247] [ 2] [csclprd3-0-13:29243] [ 0] > /lib64/libpthread.so.0(+0xf500)[0x7fbd8e1b0500] > [csclprd3-0-13:29243] [ 1] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x167f61)[0x7fbd8e525f61] > [csclprd3-0-13:29243] [ 2] [csclprd3-0-13:29241] [ 0] > /lib64/libpthread.so.0(+0xf500)[0x7fbd8cd79500] > [csclprd3-0-13:29241] [ 1] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x167f61)[0x7fbd8d0eef61] > [csclprd3-0-13:29241] [ 2] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x168047)[0x7fbd8d0ef047] > [csclprd3-0-13:29241] [ 3] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x55670)[0x7fbd8cfdc670] > [csclprd3-0-13:29241] [ 4] [csclprd3-0-13:29242] [ 0] > /lib64/libpthread.so.0(+0xf500)[0x7f6782cd0500] > [csclprd3-0-13:29242] [ 1] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x167f61)[0x7f6783045f61] > [csclprd3-0-13:29242] [ 2] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x168047)[0x7f6783046047] > [csclprd3-0-13:29242] [ 3] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x168047)[0x7fbd8e526047] > [csclprd3-0-13:29243] [ 3] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x55670)[0x7fbd8e413670] > [csclprd3-0-13:29243] [ 4] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x3b9)[0x7fbd8e4145ab] > [csclprd3-0-13:29243] [ 5] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0xfb)[0x7fbd8e414751] > [csclprd3-0-13:29243] [ 6] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_btl_sm_add_procs+0x671)[0x7fbd8e5221c9] > [csclprd3-0-13:29243] [ 7] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x14a628)[0x7fbd8e508628] > [csclprd3-0-13:29243] [ 8] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x3b9)[0x7fbd8cfdd5ab] > [csclprd3-0-13:29241] [ 5] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0xfb)[0x7fbd8cfdd751] > [csclprd3-0-13:29241] [ 6] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_btl_sm_add_procs+0x671)[0x7fbd8d0eb1c9] > [csclprd3-0-13:29241] [ 7] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x14a628)[0x7fbd8d0d1628] > [csclprd3-0-13:29241] [ 8] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xff)[0x7fbd8d244d61] > [csclprd3-0-13:29241] [ 9] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x168047)[0x7ff49421e047] > [csclprd3-0-13:29247] [ 3] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x55670)[0x7ff49410b670] > [csclprd3-0-13:29247] [ 4] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x3b9)[0x7ff49410c5ab] > [csclprd3-0-13:29247] [ 5] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0xfb)[0x7ff49410c751] > [csclprd3-0-13:29247] [ 6] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_btl_sm_add_procs+0x671)[0x7ff49421a1c9] > [csclprd3-0-13:29247] [ 7] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x14a628)[0x7ff494200628] > [csclprd3-0-13:29247] [ 8] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xff)[0x7ff494373d61] > [csclprd3-0-13:29247] [ 9] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x168047)[0x7f67cfdf1047] > [csclprd3-0-13:29244] [ 3] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x55670)[0x7f67cfcde670] > [csclprd3-0-13:29244] [ 4] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x3b9)[0x7f67cfcdf5ab] > [csclprd3-0-13:29244] [ 5] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0xfb)[0x7f67cfcdf751] > [csclprd3-0-13:29244] [ 6] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_btl_sm_add_procs+0x671)[0x7f67cfded1c9] > [csclprd3-0-13:29244] [ 7] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x14a628)[0x7f67cfdd3628] > [csclprd3-0-13:29244] [ 8] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xff)[0x7f67cff46d61] > [csclprd3-0-13:29244] [ 9] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x168047)[0x7f639fe3a047] > [csclprd3-0-13:29245] [ 3] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x55670)[0x7f639fd27670] > [csclprd3-0-13:29245] [ 4] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x3b9)[0x7f639fd285ab] > [csclprd3-0-13:29245] [ 5] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0xfb)[0x7f639fd28751] > [csclprd3-0-13:29245] [ 6] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_btl_sm_add_procs+0x671)[0x7f639fe361c9] > [csclprd3-0-13:29245] [ 7] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x14a628)[0x7f639fe1c628] > [csclprd3-0-13:29245] [ 8] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xff)[0x7f639ff8fd61] > [csclprd3-0-13:29245] [ 9] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x55670)[0x7f6782f33670] > [csclprd3-0-13:29242] [ 4] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x3b9)[0x7f6782f345ab] > [csclprd3-0-13:29242] [ 5] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0xfb)[0x7f6782f34751] > [csclprd3-0-13:29242] [ 6] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_btl_sm_add_procs+0x671)[0x7f67830421c9] > [csclprd3-0-13:29242] [ 7] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0x14a628)[0x7f6783028628] > [csclprd3-0-13:29242] [ 8] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0xbda)[0x7fbd8d00c747] > [csclprd3-0-13:29241] [10] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x185)[0x7fbd8d04c50b] > [csclprd3-0-13:29241] [11] > /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] > [csclprd3-0-13:29241] [12] > /lib64/libc.so.6(__libc_start_main+0xfd)[0x7fbd8c9f6cdd] > [csclprd3-0-13:29241] [13] > /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] > [csclprd3-0-13:29241] *** End of error message *** > > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0xbda)[0x7ff49413b747] > [csclprd3-0-13:29247] [10] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x185)[0x7ff49417b50b] > [csclprd3-0-13:29247] [11] > /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] > [csclprd3-0-13:29247] [12] > /lib64/libc.so.6(__libc_start_main+0xfd)[0x7ff493b25cdd] > [csclprd3-0-13:29247] [13] > /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] > [csclprd3-0-13:29247] *** End of error message *** > > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0xbda)[0x7f67cfd0e747] > [csclprd3-0-13:29244] [10] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x185)[0x7f67cfd4e50b] > [csclprd3-0-13:29244] [11] > /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] > [csclprd3-0-13:29244] [12] > /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f67cf6f8cdd] > [csclprd3-0-13:29244] [13] > /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] > [csclprd3-0-13:29244] *** End of error message *** > > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xff)[0x7f678319bd61] > [csclprd3-0-13:29242] [ 9] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0xbda)[0x7f6782f63747] > [csclprd3-0-13:29242] [10] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x185)[0x7f6782fa350b] > [csclprd3-0-13:29242] [11] > /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] > [csclprd3-0-13:29242] [12] > /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f678294dcdd] > [csclprd3-0-13:29242] [13] > /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] > [csclprd3-0-13:29242] *** End of error message *** > > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0xbda)[0x7f639fd57747] > [csclprd3-0-13:29245] [10] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x185)[0x7f639fd9750b] > [csclprd3-0-13:29245] [11] > /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] > [csclprd3-0-13:29245] [12] > /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f639f741cdd] > [csclprd3-0-13:29245] [13] > /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] > [csclprd3-0-13:29245] *** End of error message *** > > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xff)[0x7fbd8e67bd61] > [csclprd3-0-13:29243] [ 9] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0xbda)[0x7fbd8e443747] > [csclprd3-0-13:29243] [10] > /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x185)[0x7fbd8e48350b] > [csclprd3-0-13:29243] [11] > /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] > [csclprd3-0-13:29243] [12] > /lib64/libc.so.6(__libc_start_main+0xfd)[0x7fbd8de2dcdd] > [csclprd3-0-13:29243] [13] > /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] > [csclprd3-0-13:29243] *** End of error message *** > -------------------------------------------------------------------------- > mpirun noticed that process rank 126 with PID 0 on node csclprd3-0-13 > exited on signal 7 (Bus error). > -------------------------------------------------------------------------- > [lanew@csclprd3s1 openmpi]$ > > > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------ > *From:* users [users-boun...@open-mpi.org] on behalf of Ralph Castain [ > r...@open-mpi.org] > *Sent:* Tuesday, June 23, 2015 2:54 PM > *To:* Open MPI Users > *Subject:* Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = > crash > > You shouldn't need any special flags for mpicc or mpirun to replicate > the problem. This will just let us see the line numbers associated with the > crash so we can narrow down the problem. Once we get that, we may need to > rerun with specific params to narrow it down further. > > BTW: when you get the backtrace, could you check the other threads as > well? There are several threads running underneath now, and it would help > to get the backtrace for each of them just to ensure there isn't something > funny going on. > > Thanks > Ralph > > > On Tue, Jun 23, 2015 at 12:19 PM, Lane, William <william.l...@cshs.org> > wrote: > >> Ralph, >> >> I've had OpenMPI 1.8.6 installed on our cluster w/the --enable-debug >> option. Here's what I think are the relevant flags returned from >> ompi_info: >> >> openMPI 1.8.6 build info >> Fort MPI_SIZEOF: no >> C profiling: yes >> C++ profiling: yes >> Fort mpif.h profiling: yes >> Fort use mpi profiling: yes >> Fort use mpi_f08 prof: no >> C++ exceptions: no >> Thread support: posix (MPI_THREAD_MULTIPLE: no, OPAL support: yes, OMPI >> progress: no, ORTE progress: yes, Event lib: yes) >> Sparse Groups: no >> Internal debug support: yes >> MPI interface warnings: yes >> MPI parameter check: runtime >> Memory profiling support: no >> Memory debugging support: no >> dl support: yes >> Heterogeneous support: no >> mpirun default --prefix: no >> >> Do I need to compile my OpenMPI C test code w/any special >> switches passed to mpicc? >> >> Are there any special switches should I use with mpirun to run my job? >> >> Thanks for your help w/these issues. >> >> -Bill L. >> ------------------------------ >> *From:* users [users-boun...@open-mpi.org] on behalf of Ralph Castain [ >> r...@open-mpi.org] >> *Sent:* Friday, June 19, 2015 6:40 AM >> >> *To:* Open MPI Users >> *Subject:* Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = >> crash >> >> Good point >> >> William: can you rebuild OMPI with —enable-debug and run this again so >> we can see where the code is breaking? >> >> Thanks >> Ralph >> >> >> On Jun 19, 2015, at 6:11 AM, Gilles Gouaillardet < >> gilles.gouaillar...@gmail.com> wrote: >> >> Ralph, >> >> I got that, but I cannot read the stack trace (optimized build) >> my best bet is to reproduce the issue, and then find how and why >> ompi_free_list_t is segfault'ing. >> that's why I requested info about the environment >> >> iirc, ompi_free_list_t are different between master and v1.8, so an >> incorrect back port could be the root cause. >> >> Cheers, >> >> Gilles >> >> On Friday, June 19, 2015, Ralph Castain <r...@open-mpi.org> wrote: >> >>> Gilles >>> >>> I was fooled too, but that isn’t the issue. The problem is that >>> ompi_free_list is segfaulting: >>> >>> [csclprd3-0-13:30901] *** Process received signal *** >>>> [csclprd3-0-13:30901] Signal: Bus error (7) >>>> [csclprd3-0-13:30901] Signal code: Non-existant physical address (2) >>>> [csclprd3-0-13:30901] Failing at address: 0x7ff404351d80 >>>> [csclprd3-0-13:30901] [ 0] >>>> /lib64/libpthread.so.0(+0xf500)[0x7ff41453c500] >>>> [csclprd3-0-13:30901] [ 1] >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0xd4fea)[0x7ff41481efea] >>>> [csclprd3-0-13:30901] [ 2] >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x219)[0x7ff41479f009] >>>> [csclprd3-0-13:30901] [ 3] >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0x40)[0x7ff41479f110] >>>> [csclprd3-0-13:30901] [ 4] >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0xc568e)[0x7ff41480f68e] >>>> [csclprd3-0-13:30901] [ 5] >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xd5)[0x7ff4148e3715] >>>> [csclprd3-0-13:30901] [ 6] >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0x8d6)[0x7ff4147b9ad6] >>>> [csclprd3-0-13:30901] [ 7] >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x170)[0x7ff4147d8c60] >>>> [csclprd3-0-13:30901] [ 8] >>>> /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] >>>> [csclprd3-0-13:30901] [ 9] >>>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7ff4141b9cdd] >>>> [csclprd3-0-13:30901] [10] >>>> /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] >>>> [csclprd3-0-13:30901] *** End of error message *** >>>> >>> >>> >>> >>> On Jun 19, 2015, at 5:52 AM, Gilles Gouaillardet < >>> gilles.gouaillar...@gmail.com <http://UrlBlockedError.aspx>> wrote: >>> >>> Lane, >>> >>> could you please describe your configuration ? >>> how many sockets per node ? >>> how many cores per socket ? >>> how many threads per core ? >>> what is the minimum number of nodes needed to reproduce the issue ? >>> do all the nodes have the same configuration ? >>> if yes, what happens without --hetero-nodes ? >>> >>> Cheers, >>> >>> Gilles >>> >>> On Friday, June 19, 2015, Lane, William <william.l...@cshs.org >>> <http://UrlBlockedError.aspx>> wrote: >>> >>>> Ralph, >>>> >>>> I created a hostfile that just has the names of the hosts while >>>> specifying no slot information whatsoever (e.g. csclprd3-0-0) >>>> and received the following errors: >>>> >>>> mpirun -np 132 -report-bindings --prefix /hpc/apps/mpi/openmpi/1.8.6/ >>>> --hostfile hostfile-noslots --mca btl_tcp_if_include eth0 --hetero-nodes >>>> /hpc/home/lanew/mpi/openmpi/ProcessColors3 >>>> >>>> [csclprd3-6-5:14770] MCW rank 4 bound to socket 0[core 0[hwt 0]], >>>> socket 0[core 1[hwt 0]]: [B/B][./.] >>>> [csclprd3-6-5:14770] MCW rank 5 bound to socket 1[core 2[hwt 0]], >>>> socket 1[core 3[hwt 0]]: [./.][B/B] >>>> [csclprd3-6-5:14770] MCW rank 6 bound to socket 0[core 0[hwt 0]], >>>> socket 0[core 1[hwt 0]]: [B/B][./.] >>>> [csclprd3-6-5:14770] MCW rank 7 bound to socket 1[core 2[hwt 0]], >>>> socket 1[core 3[hwt 0]]: [./.][B/B] >>>> [csclprd3-0-1:16437] MCW rank 24 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-5:18925] MCW rank 48 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-1:16437] MCW rank 25 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-5:18925] MCW rank 49 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-1:16437] MCW rank 20 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-1:16437] MCW rank 21 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-5:18925] MCW rank 44 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-5:18925] MCW rank 45 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-1:16437] MCW rank 22 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-1:16437] MCW rank 23 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-5:18925] MCW rank 46 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-5:18925] MCW rank 47 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-3:15946] MCW rank 36 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-3:15946] MCW rank 37 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-3:15946] MCW rank 32 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-3:15946] MCW rank 33 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-3:15946] MCW rank 34 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-3:15946] MCW rank 35 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-12:09165] MCW rank 124 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-12:09165] MCW rank 125 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-12:09165] MCW rank 120 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-12:09165] MCW rank 121 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-12:09165] MCW rank 122 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-12:09165] MCW rank 123 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-6-1:27030] MCW rank 0 bound to socket 0[core 0[hwt 0]], >>>> socket 0[core 1[hwt 0]]: [B/B][./.] >>>> [csclprd3-6-1:27030] MCW rank 1 bound to socket 1[core 2[hwt 0]], >>>> socket 1[core 3[hwt 0]]: [./.][B/B] >>>> [csclprd3-6-1:27030] MCW rank 2 bound to socket 0[core 0[hwt 0]], >>>> socket 0[core 1[hwt 0]]: [B/B][./.] >>>> [csclprd3-6-1:27030] MCW rank 3 bound to socket 1[core 2[hwt 0]], >>>> socket 1[core 3[hwt 0]]: [./.][B/B] >>>> [csclprd3-0-2:07944] MCW rank 30 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-6:32510] MCW rank 54 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-2:07944] MCW rank 31 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-6:32510] MCW rank 55 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-2:07944] MCW rank 26 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-6:32510] MCW rank 50 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-6:32510] MCW rank 51 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-2:07944] MCW rank 27 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-2:07944] MCW rank 28 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-6:32510] MCW rank 52 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-6:32510] MCW rank 53 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-2:07944] MCW rank 29 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-0:00453] MCW rank 11 bound to socket 1[core 6[hwt 0]], >>>> socket 1[core 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], >>>> socket1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: >>>> [./././././.][B/B/B/B/B/B] >>>> [csclprd3-0-0:00453] MCW rank 12 bound to socket 0[core 0[hwt 0]], >>>> socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], >>>> socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: >>>> [B/B/B/B/B/B][./././././.] >>>> [csclprd3-0-0:00453] MCW rank 13 bound to socket 1[core 6[hwt 0]], >>>> socket 1[core 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], >>>> socket 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: >>>> [./././././.][B/B/B/B/B/B] >>>> [csclprd3-0-0:00453] MCW rank 14 bound to socket 0[core 0[hwt 0]], >>>> socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], >>>> socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: >>>> [B/B/B/B/B/B][./././././.] >>>> [csclprd3-0-0:00453] MCW rank 15 bound to socket 1[core 6[hwt 0]], >>>> socket 1[core 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], >>>> socket 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: >>>> [./././././.][B/B/B/B/B/B] >>>> [csclprd3-0-0:00453] MCW rank 16 bound to socket 0[core 0[hwt 0]], >>>> socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], >>>> socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: >>>> [B/B/B/B/B/B][./././././.] >>>> [csclprd3-0-7:22146] MCW rank 64 bound to socket 0[core 0[hwt 0-1]], >>>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >>>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >>>> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >>>> [csclprd3-0-7:22146] MCW rank 65 bound to socket 1[core 8[hwt 0-1]], >>>> socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt >>>> 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket >>>> 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: >>>> [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >>>> [csclprd3-0-0:00453] MCW rank 17 bound to socket 1[core 6[hwt 0]], >>>> socket 1[core 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], >>>> socket 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: >>>> [./././././.][B/B/B/B/B/B] >>>> [csclprd3-0-0:00453] MCW rank 18 bound to socket 0[core 0[hwt 0]], >>>> socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], >>>> socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: >>>> [B/B/B/B/B/B][./././././.] >>>> [csclprd3-0-11:00885] MCW rank 116 bound to socket 0[core 0[hwt 0-1]], >>>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >>>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >>>> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >>>> [csclprd3-0-11:00885] MCW rank 117 bound to socket 1[core 8[hwt 0-1]], >>>> socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt >>>> 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket >>>> 1[core 14[hwt 0-1]],socket 1[core 15[hwt 0-1]]: >>>> [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >>>> [csclprd3-0-10:20752] MCW rank 100 bound to socket 0[core 0[hwt 0-1]], >>>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >>>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >>>> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >>>> [csclprd3-0-10:20752] MCW rank 101 bound to socket 1[core 8[hwt 0-1]], >>>> socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt >>>> 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket >>>> 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: >>>> [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >>>> [csclprd3-0-0:00453] MCW rank 19 bound to socket 1[core 6[hwt 0]], >>>> socket 1[core 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], >>>> socket 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: >>>> [./././././.][B/B/B/B/B/B] >>>> [csclprd3-0-7:22146] MCW rank 66 bound to socket 0[core 0[hwt 0-1]], >>>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >>>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >>>> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >>>> [csclprd3-0-11:00885] MCW rank 118 bound to socket 0[core 0[hwt 0-1]], >>>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >>>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >>>> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >>>> [csclprd3-0-0:00453] MCW rank 8 bound to socket 0[core 0[hwt 0]], >>>> socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], >>>> socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: >>>> [B/B/B/B/B/B][./././././.] >>>> [csclprd3-0-10:20752] MCW rank 102 bound to socket 0[core 0[hwt 0-1]], >>>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >>>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >>>> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >>>> [csclprd3-0-0:00453] MCW rank 9 bound to socket 1[core 6[hwt 0]], >>>> socket 1[core 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], >>>> socket 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: >>>> [./././././.][B/B/B/B/B/B] >>>> [csclprd3-0-0:00453] MCW rank 10 bound to socket 0[core 0[hwt 0]], >>>> socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], >>>> socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: >>>> [B/B/B/B/B/B][./././././.] >>>> [csclprd3-0-4:32449] MCW rank 42 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-4:32449] MCW rank 43 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-4:32449] MCW rank 38 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-4:32449] MCW rank 39 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-4:32449] MCW rank 40 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-4:32449] MCW rank 41 is not bound (or bound to all >>>> available processors) >>>> [csclprd3-0-13:30897] MCW rank 126 bound to socket 0[core 0[hwt 0-1]], >>>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]]: >>>> [BB/BB/BB/BB/BB/BB][../../../../../..] >>>> [csclprd3-0-8:17159] MCW rank 80 bound to socket 0[core 0[hwt 0-1]], >>>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >>>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >>>> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >>>> [csclprd3-0-13:30897] MCW rank 127 bound to socket 1[core 6[hwt 0-1]], >>>> socket 1[core 7[hwt 0-1]], socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt >>>> 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]]: >>>> [../../../../../..][BB/BB/BB/BB/BB/BB] >>>> [csclprd3-0-8:17159] MCW rank 81 bound to socket 1[core 8[hwt 0-1]], >>>> socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt >>>> 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket >>>> 1[core 14[hwt 0-1]], socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], >>>> socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]]: >>>> [../../../../../..][BB/BB/BB/BB/BB/BB] >>>> [csclprd3-0-8:17159] MCW rank 81 bound to socket 1[core 8[hwt 0-1]], >>>> socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt >>>> 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket >>>> 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: >>>> [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >>>> [csclprd3-0-13:30897] MCW rank 128 bound to socket 0[core 0[hwt 0-1]], >>>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]]: >>>> [BB/BB/BB/BB/BB/BB][../../../../../..] >>>> [csclprd3-0-8:17159] MCW rank 82 bound to socket 0[core 0[hwt 0-1]], >>>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >>>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >>>> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >>>> [csclprd3-0-13:30897] MCW rank 129 bound to socket 1[core 6[hwt 0-1]], >>>> socket 1[core 7[hwt 0-1]], socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt >>>> 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]]: >>>> [../../../../../..][BB/BB/BB/BB/BB/BB] >>>> [csclprd3-0-8:17159] MCW rank 83 bound to socket 1[core 8[hwt 0-1]], >>>> socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt >>>> 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket >>>> 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: >>>> [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >>>> [csclprd3-0-13:30897] MCW rank 130 bound to socket 0[core 0[hwt 0-1]], >>>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]]: >>>> [BB/BB/BB/BB/BB/BB][../../../../../..] >>>> [csclprd3-0-13:30897] MCW rank 131 bound to socket 1[core 6[hwt 0-1]], >>>> socket 1[core 7[hwt 0-1]], socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt >>>> 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]]: >>>> [../../../../../..][BB/BB/BB/BB/BB/BB] >>>> [csclprd3-0-8:17159] MCW rank 84 bound to socket 0[core 0[hwt 0-1]], >>>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >>>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >>>> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >>>> [csclprd3-0-8:17159] MCW rank 85 bound to socket 1[core 8[hwt 0-1]], >>>> socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt >>>> 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket >>>> 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: >>>> [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >>>> [csclprd3-0-11:00885] MCW rank 119 bound to socket 1[core 8[hwt 0-1]], >>>> socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt >>>> 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket >>>> 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: >>>> [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >>>> [csclprd3-0-10:20752] MCW rank 103 bound to socket 1[core 8[hwt 0-1]], >>>> socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt >>>> 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket >>>> 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: >>>> [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >>>> [csclprd3-0-8:17159] MCW rank 86 bound to socket 0[core 0[hwt 0-1]], >>>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >>>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >>>> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >>>> [csclprd3-0-7:22146] MCW rank 67 bound to socket 1[core 8[hwt 0-1]], >>>> socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt >>>> 0-1], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core >>>> 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: >>>> [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >>>> [csclprd3-0-11:00885] MCW rank 104 bound to socket 0[core 0[hwt 0-1]], >>>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >>>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >>>> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..][csclprd3-0-10:20752] MCW >>>> rank 88 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], >>>> socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt >>>> 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core >>>> 7[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >>>> [csclprd3-0-8:17159] MCW rank 87 bound to socket 1[core 8[hwt 0-1]], >>>> socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt >>>> 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket >>>> 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: >>>> [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >>>> [csclprd3-0-11:00885] MCW rank 105 bound to socket 1[core 8[hwt 0-1]], >>>> socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt >>>> 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket >>>> 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: >>>> [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >>>> [csclprd3-0-10:20752] MCW rank 89 bound to socket 1[core 8[hwt 0-1]], >>>> socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt >>>> 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket >>>> 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: >>>> [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >>>> [csclprd3-0-8:17159] MCW rank 72 bound to socket 0[core 0[hwt 0-1]], >>>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >>>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >>>> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >>>> [csclprd3-0-7:22146] MCW rank 68 bound to socket 0[core 0[hwt 0-1]], >>>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >>>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >>>> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >>>> [csclprd3-0-11:00885] MCW rank 106 bound to socket 0[core 0[hwt 0-1]], >>>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >>>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >>>> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >>>> [csclprd3-0-10:20752] MCW rank 90 bound to socket 0[core 0[hwt 0-1]], >>>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >>>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >>>> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >>>> [csclprd3-0-8:17159] MCW rank 73 bound to socket 1[core 8[hwt 0-1]], >>>> socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt >>>> 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket >>>> 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: >>>> [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >>>> [csclprd3-0-11:00885] MCW rank 107 bound to socket 1[core 8[hwt 0-1]], >>>> socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt >>>> 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket >>>> 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: >>>> [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >>>> [csclprd3-0-7:22146] MCW rank 69 bound to socket 1[core 8[hwt 0-1]], >>>> socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt >>>> 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket >>>> 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: >>>> [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >>>> [csclprd3-0-8:17159] MCW rank 74 bound to socket 0[core 0[hwt 0-1]], >>>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >>>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >>>> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >>>> [csclprd3-0-11:00885] MCW rank 108 bound to socket 0[core 0[hwt 0-1]], >>>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >>>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >>>> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..]BB/BB/BB/BB/BB/BB/BB/BB] >>>> [csclprd3-0-7:22146] MCW rank 57 bound to socket 1[core 8[hwt 0-1]], >>>> socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt >>>> 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket >>>> 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: >>>> [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >>>> [csclprd3-0-11:00885] MCW rank 114 bound to socket 0[core 0[hwt 0-1]], >>>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >>>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >>>> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >>>> [csclprd3-0-10:20752] MCW rank 98 bound to socket 0[core 0[hwt 0-1]], >>>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >>>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >>>> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >>>> [csclprd3-0-11:00885] MCW rank 115 bound to socket 1[core 8[hwt 0-1]], >>>> socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt >>>> 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket >>>> 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: >>>> [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >>>> [csclprd3-0-7:22146] MCW rank 58 bound to socket 0[core 0[hwt 0-1]], >>>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >>>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >>>> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >>>> [csclprd3-0-10:20752] MCW rank 99 bound to socket 1[core 8[hwt 0-1]], >>>> socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt >>>> 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket >>>> 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: >>>> [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >>>> [csclprd3-0-7:22146] MCW rank 59 bound to socket 1[core 8[hwt 0-1]], >>>> socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt >>>> 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket >>>> 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: >>>> [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >>>> [csclprd3-0-7:22146] MCW rank 60 bound to socket 0[core 0[hwt 0-1]], >>>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >>>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >>>> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >>>> [csclprd3-0-7:22146] MCW rank 61 bound to socket 1[core 8[hwt 0-1]], >>>> socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt >>>> 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket >>>> 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: >>>> [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >>>> [csclprd3-0-7:22146] MCW rank 62 bound to socket 0[core 0[hwt 0-1]], >>>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt >>>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core >>>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: >>>> [BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../..] >>>> [csclprd3-0-7:22146] MCW rank 63 bound to socket 1[core 8[hwt 0-1]], >>>> socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt >>>> 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket >>>> 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: >>>> [../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB] >>>> [csclprd3-0-13:30901] *** Process received signal *** >>>> [csclprd3-0-13:30901] Signal: Bus error (7) >>>> [csclprd3-0-13:30901] Signal code: Non-existant physical address (2) >>>> [csclprd3-0-13:30901] Failing at address: 0x7ff404351d80 >>>> [csclprd3-0-13:30901] [ 0] >>>> /lib64/libpthread.so.0(+0xf500)[0x7ff41453c500] >>>> [csclprd3-0-13:30901] [ 1] >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0xd4fea)[0x7ff41481efea] >>>> [csclprd3-0-13:30901] [ 2] >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x219)[0x7ff41479f009] >>>> [csclprd3-0-13:30901] [ 3] >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0x40)[0x7ff41479f110] >>>> [csclprd3-0-13:30901] [ 4] >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0xc568e)[0x7ff41480f68e] >>>> [csclprd3-0-13:30901] [ 5] >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xd5)[0x7ff4148e3715] >>>> [csclprd3-0-13:30901] [ 6] >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0x8d6)[0x7ff4147b9ad6] >>>> [csclprd3-0-13:30901] [ 7] >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x170)[0x7ff4147d8c60] >>>> [csclprd3-0-13:30901] [ 8] >>>> /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] >>>> [csclprd3-0-13:30901] [ 9] >>>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7ff4141b9cdd] >>>> [csclprd3-0-13:30901] [10] >>>> /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] >>>> [csclprd3-0-13:30901] *** End of error message *** >>>> >>>> ------------------------------ >>>> *From:* users [users-boun...@open-mpi.org] on behalf of Ralph Castain [ >>>> r...@open-mpi.org] >>>> *Sent:* Thursday, June 18, 2015 5:26 PM >>>> *To:* Open MPI Users >>>> *Subject:* Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots >>>> = crash >>>> >>>> FWIW: I don’t think this actually has anything to do with the #procs >>>> you are trying to run. Instead, I expect it has to do with confusion over >>>> how many cores it can bind across. When you tell it to use-hwthread-cpus, >>>> you are asking us to map processes to hwthreads, and not cores. I don’t >>>> know which nodes are which, but it could be that we are getting incorrect >>>> info somewhere. >>>> >>>> Given that you are limiting the number of procs to the number of >>>> cores, is there some reason why you are asking us to use-hwthread-cpus? Why >>>> not just leave it at the default core level? >>>> >>>> I also suspect that you would have no problems if you —bind-to none - >>>> does that in fact work? >>>> >>>> >>>> On Jun 18, 2015, at 4:54 PM, Lane, William <william.l...@cshs.org> >>>> wrote: >>>> >>>> I'm having a strange problem w/OpenMPI 1.8.6. If I run >>>> my OpenMPI test code (compiled against OpenMPI 1.8.6 >>>> libraries) on < 131 slots I get no issues. Anything over 131 >>>> errors out: >>>> >>>> mpirun -np 132 -report-bindings --prefix /hpc/apps/mpi/openmpi/1.8.6/ >>>> --hostfile hostfile-single --mca btl_tcp_if_include eth0 --hetero-nodes >>>> --use-hwthread-cpus /hpc/home/lanew/mpi/openmpi/ProcessColors3 >>>> >>>> The hostfile has the number of slots restricted >>>> to the number of cores, while the max-slots includes >>>> the hyperthreading cores (e.g. csclprd3-0-0 slots=6 >>>> max-slots=12). >>>> >>>> The nodes are a mix of IBM x3550 nodes some >>>> are Sandybridges and others are older Xeons. >>>> >>>> I would like to add that the submit node from >>>> which I am launching mpirun has the open files >>>> soft limit (ulimit -a) set to 1024, while the hard limit >>>> (ulimit -Ha) is set to 4096. I know open file limits >>>> were an issue w/an older version of OpenMPI. The >>>> compute nodes all have their hard open files limit >>>> and soft open files limits set to 4096. >>>> >>>> Here's the output (csclprd3-0-13 is the last node >>>> listed in the hostfile hostfile-single): >>>> >>>> [csclprd3-0-13:28765] Signal: Bus error (7) >>>> [csclprd3-0-13:28765] Signal code: Non-existant physical address (2) >>>> [csclprd3-0-13:28765] Failing at address: 0x7f30002a8980 >>>> [csclprd3-0-13:28766] *** Process received signal *** >>>> [csclprd3-0-13:28766] Signal: Bus error (7) >>>> [csclprd3-0-13:28766] Signal code: Non-existant physical address (2) >>>> [csclprd3-0-13:28766] Failing at address: 0x7fe137662880 >>>> [csclprd3-0-13:28768] *** Process received signal *** >>>> [csclprd3-0-13:28768] Signal: Bus error (7) >>>> [csclprd3-0-13:28768] Signal code: Non-existant physical address (2) >>>> [csclprd3-0-13:28768] Failing at address: 0x7f9b40228a80 >>>> [csclprd3-0-13:28770] *** Process received signal *** >>>> [csclprd3-0-13:28770] Signal: Bus error (7) >>>> [csclprd3-0-13:28770] Signal code: Non-existant physical address (2) >>>> [csclprd3-0-13:28770] Failing at address: 0x7f0de7f2bb00 >>>> [csclprd3-0-13:28767] *** Process received signal *** >>>> [csclprd3-0-13:28767] Signal: Bus error (7) >>>> [csclprd3-0-13:28767] Signal code: Non-existant physical address (2) >>>> [csclprd3-0-13:28767] Failing at address: 0x7f9b6c2e8980 >>>> [csclprd3-0-13:28764] *** Process received signal *** >>>> [csclprd3-0-13:28764] Signal: Bus error (7) >>>> [csclprd3-0-13:28764] Signal code: Non-existant physical address (2) >>>> [csclprd3-0-13:28765] Signal: Bus error (7) >>>> [csclprd3-0-13:28765] Signal code: Non-existant physical address (2) >>>> [csclprd3-0-13:28765] Failing at address: 0x7f30002a8980 >>>> [csclprd3-0-13:28766] *** Process received signal *** >>>> [csclprd3-0-13:28766] Signal: Bus error (7) >>>> [csclprd3-0-13:28766] Signal code: Non-existant physical address (2) >>>> [csclprd3-0-13:28766] Failing at address: 0x7fe137662880 >>>> [csclprd3-0-13:28768] *** Process received signal *** >>>> [csclprd3-0-13:28768] Signal: Bus error (7) >>>> [csclprd3-0-13:28768] Signal code: Non-existant physical address (2) >>>> [csclprd3-0-13:28768] Failing at address: 0x7f9b40228a80 >>>> [csclprd3-0-13:28770] *** Process received signal *** >>>> [csclprd3-0-13:28770] Signal: Bus error (7) >>>> [csclprd3-0-13:28770] Signal code: Non-existant physical address (2) >>>> [csclprd3-0-13:28770] Failing at address: 0x7f0de7f2bb00 >>>> [csclprd3-0-13:28767] *** Process received signal *** >>>> [csclprd3-0-13:28767] Signal: Bus error (7) >>>> [csclprd3-0-13:28767] Signal code: Non-existant physical address (2) >>>> [csclprd3-0-13:28767] Failing at address: 0x7f9b6c2e8980 >>>> [csclprd3-0-13:28764] *** Process received signal *** >>>> [csclprd3-0-13:28764] Signal: Bus error (7) >>>> [csclprd3-0-13:28764] Signal code: Non-existant physical address (2) >>>> [csclprd3-0-13:28768] [ 3] >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0x40)[0x7f9b513ad110] >>>> [csclprd3-0-13:28768] [ 4] >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_grow+0x219)[0x7f0df77b6009] >>>> [csclprd3-0-13:28770] [ 3] >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_free_list_resize_mt+0x40)[0x7f0df77b6110] >>>> [csclprd3-0-13:28770] [ 4] >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0xc568e)[0x7f9b5141d68e] >>>> [csclprd3-0-13:28768] [ 5] >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xd5)[0x7f9b514f1715] >>>> [csclprd3-0-13:28768] [ 6] >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0xc568e)[0x7f30115ea68e] >>>> [csclprd3-0-13:28765] [ 5] >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xd5)[0x7f30116be715] >>>> [csclprd3-0-13:28765] [ 6] >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0xc568e)[0x7f9b7bb3b68e] >>>> [csclprd3-0-13:28767] [ 5] >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xd5)[0x7f9b7bc0f715] >>>> [csclprd3-0-13:28767] [ 6] [csclprd3-0-13:28764] [ 4] >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0xc568e)[0x7fa946bb768e] >>>> [csclprd3-0-13:28764] [ 5] >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0xc568e)[0x7fe146d4068e] >>>> [csclprd3-0-13:28766] [ 5] >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(+0xc568e)[0x7f0df782668e] >>>> [csclprd3-0-13:28770] [ 5] >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xd5)[0x7f0df78fa715] >>>> [csclprd3-0-13:28770] [ 6] >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0x8d6)[0x7f0df77d0ad6] >>>> [csclprd3-0-13:28770] [ 7] >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xd5)[0x7fe146e14715] >>>> [csclprd3-0-13:28766] [ 6] >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0x8d6)[0x7fe146ceaad6] >>>> [csclprd3-0-13:28766] [ 7] >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0x8d6)[0x7f9b513c7ad6] >>>> [csclprd3-0-13:28768] [ 7] >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x170)[0x7f9b513e6c60] >>>> [csclprd3-0-13:28768] [ 8] >>>> /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] >>>> [csclprd3-0-13:28768] [ 9] >>>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f9b50dc7cdd] >>>> [csclprd3-0-13:28768] [10] >>>> /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] >>>> [csclprd3-0-13:28768] *** End of error message *** >>>> >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0x8d6)[0x7f3011594ad6] >>>> [csclprd3-0-13:28765] [ 7] >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x170)[0x7f30115b3c60] >>>> [csclprd3-0-13:28765] [ 8] >>>> /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] >>>> [csclprd3-0-13:28765] [ 9] >>>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f3010f94cdd] >>>> [csclprd3-0-13:28765] [10] >>>> /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] >>>> [csclprd3-0-13:28765] *** End of error message *** >>>> >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0x8d6)[0x7f9b7bae5ad6] >>>> [csclprd3-0-13:28767] [ 7] >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x170)[0x7f9b7bb04c60] >>>> [csclprd3-0-13:28767] [ 8] >>>> /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] >>>> [csclprd3-0-13:28767] [ 9] >>>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f9b7b4e5cdd] >>>> [csclprd3-0-13:28767] [10] >>>> /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] >>>> [csclprd3-0-13:28767] *** End of error message *** >>>> >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xd5)[0x7fa946c8b715] >>>> [csclprd3-0-13:28764] [ 6] >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0x8d6)[0x7fa946b61ad6] >>>> [csclprd3-0-13:28764] [ 7] >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x170)[0x7f0df77efc60] >>>> [csclprd3-0-13:28770] [ 8] >>>> /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] >>>> [csclprd3-0-13:28770] [ 9] >>>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f0df71d0cdd] >>>> [csclprd3-0-13:28770] [10] >>>> /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] >>>> [csclprd3-0-13:28770] *** End of error message *** >>>> >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x170)[0x7fe146d09c60] >>>> [csclprd3-0-13:28766] [ 8] >>>> /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] >>>> [csclprd3-0-13:28766] [ 9] >>>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7fe1466eacdd] >>>> [csclprd3-0-13:28767] *** End of error message *** >>>> >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(mca_pml_ob1_add_procs+0xd5)[0x7fa946c8b715] >>>> [csclprd3-0-13:28764] [ 6] >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(ompi_mpi_init+0x8d6)[0x7fa946b61ad6] >>>> [csclprd3-0-13:28764] [ 7] >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x170)[0x7f0df77efc60] >>>> [csclprd3-0-13:28770] [ 8] >>>> /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] >>>> [csclprd3-0-13:28770] [ 9] >>>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f0df71d0cdd] >>>> [csclprd3-0-13:28770] [10] >>>> /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] >>>> [csclprd3-0-13:28770] *** End of error message *** >>>> >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x170)[0x7fe146d09c60] >>>> [csclprd3-0-13:28766] [ 8] >>>> /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] >>>> [csclprd3-0-13:28766] [ 9] >>>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7fe1466eacdd] >>>> [csclprd3-0-13:28766] [10] >>>> /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] >>>> [csclprd3-0-13:28766] *** End of error message *** >>>> >>>> /hpc/apps/mpi/openmpi/1.8.6/lib/libmpi.so.1(MPI_Init+0x170)[0x7fa946b80c60] >>>> [csclprd3-0-13:28764] [ 8] >>>> /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400ad0] >>>> [csclprd3-0-13:28764] [ 9] >>>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7fa946561cdd] >>>> [csclprd3-0-13:28764] [10] >>>> /hpc/home/lanew/mpi/openmpi/ProcessColors3[0x400999] >>>> [csclprd3-0-13:28764] *** End of error message *** >>>> >>>> -------------------------------------------------------------------------- >>>> mpirun noticed that process rank 126 with PID 0 on node csclprd3-0-13 >>>> exited on signal 7 (Bus error). >>>> >>>> Could a lack of the necessary NUMA libraries or the wrong version of >>>> NUMA >>>> libraries be contributing to this? >>>> IMPORTANT WARNING: This message is intended for the use of the person >>>> or entity to which it is addressed and may contain information that is >>>> privileged and confidential, the disclosure of which is governed by >>>> applicable law. If the reader of this message is not the intended >>>> recipient, or the employee or agent responsible for delivering it to the >>>> intended recipient, you are hereby notified that any dissemination, >>>> distribution or copying of this information is strictly prohibited. Thank >>>> you for your cooperation. _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2015/06/27159.php >>>> >>>> >>>> IMPORTANT WARNING: This message is intended for the use of the >>>> person or entity to which it is addressed and may contain information that >>>> is privileged and confidential, the disclosure of which is governed by >>>> applicable law. If the reader of this message is not the intended >>>> recipient, or the employee or agent responsible for delivering it to the >>>> intended recipient, you are hereby notified that any dissemination, >>>> distribution or copying of this information is strictly prohibited. Thank >>>> you for your cooperation. >>>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org <http://UrlBlockedError.aspx> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2015/06/27164.php >>> >>> >>> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/06/27166.php >> >> >> IMPORTANT WARNING: This message is intended for the use of the >> person or entity to which it is addressed and may contain information that >> is privileged and confidential, the disclosure of which is governed by >> applicable law. If the reader of this message is not the intended >> recipient, or the employee or agent responsible for delivering it to the >> intended recipient, you are hereby notified that any dissemination, >> distribution or copying of this information is strictly prohibited. Thank >> you for your cooperation. >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/06/27176.php >> > > IMPORTANT WARNING: This message is intended for the use of the person > or entity to which it is addressed and may contain information that is > privileged and confidential, the disclosure of which is governed by > applicable law. If the reader of this message is not the intended > recipient, or the employee or agent responsible for delivering it to the > intended recipient, you are hereby notified that any dissemination, > distribution or copying of this information is strictly prohibited. Thank > you for your cooperation. > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/06/27179.php >