Hi All,

We are running open MPI 1.3.2 with OFED1.5. we have 8 node cluster with 10Gb 
Iwarp ethernet card. 

Node name are as below n130,n131,n132,n133,n134,n135,n136,n137. Respective 10GB 
hostname are n130x,n131x..... n137x. 

we have /root/mpd.hosts entry like as below: 

n130x
n131x
n134x
n135x
n136x
n132x
n133x
n137x

We are not able to run open mpi with all 8 node. 

mpirun -n 8 -np 8 -hostfile /root/mpd.hosts -mca btl openib,self,sm --mca 
orte_base_help_aggregate 0 --mca btl_base_verbose 10 --mca btl_openib_verbose 
100 /usr/mpi/gcc/openmpi-1.3.2/tests/IMB-3.1/IMB-MPI1 Barrier

Output: 
=================================================================================

At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[33322,1],0]) is on host: n130
  Process 2 ([[33322,1],5]) is on host: n132x
  BTLs attempted: openib self sm

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[33322,1],2]) is on host: n134
  Process 2 ([[33322,1],5]) is on host: n132x
  BTLs attempted: openib self sm

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[33322,1],5]) is on host: n132
  Process 2 ([[33322,1],0]) is on host: n130
  BTLs attempted: openib self sm

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[33322,1],7]) is on host: n137
  Process 2 ([[33322,1],0]) is on host: n130
  BTLs attempted: openib self sm

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[33322,1],3]) is on host: n135
  Process 2 ([[33322,1],5]) is on host: n132x
  BTLs attempted: openib self sm

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[33322,1],6]) is on host: n133
  Process 2 ([[33322,1],0]) is on host: n130
  BTLs attempted: openib self sm

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[33322,1],1]) is on host: n131
  Process 2 ([[33322,1],5]) is on host: n132x
  BTLs attempted: openib self sm

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[33322,1],4]) is on host: n136
  Process 2 ([[33322,1],5]) is on host: n132x
  BTLs attempted: openib self sm

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init_thread
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[n134:4888] Abort before MPI_INIT completed successfully; not able to guarantee 
that all other processes were killed!
*** An error occurred in MPI_Init_thread
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
*** An error occurred in MPI_Init_thread
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
*** An error occurred in MPI_Init_thread
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
*** An error occurred in MPI_Init_thread
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
*** An error occurred in MPI_Init_thread
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
*** An error occurred in MPI_Init_thread
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[n137:4890] Abort before MPI_INIT completed successfully; not able to guarantee 
that all other processes were killed!
[n135:4883] Abort before MPI_INIT completed successfully; not able to guarantee 
that all other processes were killed!
[n133:4850] Abort before MPI_INIT completed successfully; not able to guarantee 
that all other processes were killed!
[n136:4866] Abort before MPI_INIT completed successfully; not able to guarantee 
that all other processes were killed!
[n131:4866] Abort before MPI_INIT completed successfully; not able to guarantee 
that all other processes were killed!
[n132:4855] Abort before MPI_INIT completed successfully; not able to guarantee 
that all other processes were killed!
--------------------------------------------------------------------------
mpirun has exited due to process rank 3 with PID 4883 on
node n135x exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init_thread
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[n130:4885] Abort before MPI_INIT completed successfully; not able to guarantee 
that all other processes were killed!

=================================================================================

we are able to run same command on btl with tcp as below for all 8 node :

mpirun -n 8 -np 8 -hostfile /root/mpd.hosts  -mca btl tcp,self,sm --mca 
orte_base_help_aggregate 0 --mca btl_base_verbose 10 --mca btl_openib_verbose 
100 /usr/mpi/gcc/openmpi-1.3.2/tests/IMB-3.1/IMB-MPI1 Barrier


If we remove n132,n133,n137 node from mpd.hosts file then we are able to run 
open mpi for all remaining 5 node on btl openib,sm,self .

So there is some problem with only n132,n133,n137 node. we are able to run 
opnmpi with this 3 node. but when we try to run this node with other 5 node or 
one of the node (n130,n131,n134,n135,n136) then we will get below error: 

Output :
===============
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[33304,1],1]) is on host: n132
  Process 2 ([[33304,1],0]) is on host: n130
  BTLs attempted: openib self sm

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[33304,1],0]) is on host: n130
  Process 2 ([[33304,1],1]) is on host: 100
  BTLs attempted: openib self sm

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init_thread
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[n130:4929] Abort before MPI_INIT completed successfully; not able to guarantee 
that all other processes were killed!
*** An error occurred in MPI_Init_thread
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[n132:4963] Abort before MPI_INIT completed successfully; not able to guarantee 
that all other processes were killed!
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 4929 on
node n130 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
-----------------------------------------------------------

we are able to run INtel,Mvapich2 MPI on All 8 node but we are facing problem 
for OpenMPI. Can any one help us what the real issue with that 3 node.

Find attached Log for detail.


Thanks,
Hardik 
[root@n130 scripts]# mpirun -n 8 -np 8 -hostfile /root/mpd.hosts  -mca btl 
openib,self,sm --mca orte_base_help_aggregate 0 --mca btl_base_verbose 10 --mca 
btl_openib_verbose 100 /opt/openmpi-1.3.2/NetEffect/test_bin/IMB_3.2/IMB-MPI1 
Barrier
[n130:04885] mca: base: components_open: Looking for btl components
[n130:04885] mca: base: components_open: opening btl components
[n130:04885] mca: base: components_open: found loaded component openib
[n130:04885] mca: base: components_open: component openib has no register 
function
[n130:04885] mca: base: components_open: component openib open function 
successful
[n130:04885] mca: base: components_open: found loaded component self
[n130:04885] mca: base: components_open: component self has no register function
[n130:04885] mca: base: components_open: component self open function successful
[n130:04885] mca: base: components_open: found loaded component sm
[n130:04885] mca: base: components_open: component sm has no register function
[n130:04885] mca: base: components_open: component sm open function successful
[n134:04888] mca: base: components_open: Looking for btl components
[n136:04866] mca: base: components_open: Looking for btl components
[n130:04885] select: initializing btl component openib
[n131:04866] mca: base: components_open: Looking for btl components
[n134:04888] mca: base: components_open: opening btl components
[n134:04888] mca: base: components_open: found loaded component openib
[n134:04888] mca: base: components_open: component openib has no register 
function
[n136:04866] mca: base: components_open: opening btl components
[n136:04866] mca: base: components_open: found loaded component openib
[n136:04866] mca: base: components_open: component openib has no register 
function
[n134:04888] mca: base: components_open: component openib open function 
successful
[n134:04888] mca: base: components_open: found loaded component self
[n134:04888] mca: base: components_open: component self has no register function
[n134:04888] mca: base: components_open: component self open function successful
[n134:04888] mca: base: components_open: found loaded component sm
[n134:04888] mca: base: components_open: component sm has no register function
[n134:04888] mca: base: components_open: component sm open function successful
[n136:04866] mca: base: components_open: component openib open function 
successful
[n136:04866] mca: base: components_open: found loaded component self
[n136:04866] mca: base: components_open: component self has no register function
[n136:04866] mca: base: components_open: component self open function successful
[n136:04866] mca: base: components_open: found loaded component sm
[n136:04866] mca: base: components_open: component sm has no register function
[n136:04866] mca: base: components_open: component sm open function successful
[n132:04855] mca: base: components_open: Looking for btl components
[n133:04850] mca: base: components_open: Looking for btl components
[n130][[33322,1],0][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying 
INI files for vendor 0x1255, part ID 256
[n130][[33322,1],0][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found 
corresponding INI values: NetEffect NE020
[n130][[33322,1],0][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying 
INI files for vendor 0x0000, part ID 0
[n130][[33322,1],0][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found 
corresponding INI values: default
[n130:04885] openib BTL: oob CPC only supported on InfiniBand; skipped on 
device nes0
[n130:04885] openib BTL: xoob CPC only supported with XRC receive queues; 
skipped on device nes0
[n130:04885] openib BTL: rdmacm CPC available for use on nes0
[n130:04885] select: init of component openib returned success
[n130:04885] select: initializing btl component self
[n130:04885] select: init of component self returned success
[n130:04885] select: initializing btl component sm
[n130:04885] select: init of component sm returned success
[n135:04883] mca: base: components_open: Looking for btl components
[n131:04866] mca: base: components_open: opening btl components
[n131:04866] mca: base: components_open: found loaded component openib
[n131:04866] mca: base: components_open: component openib has no register 
function
[n131:04866] mca: base: components_open: component openib open function 
successful
[n131:04866] mca: base: components_open: found loaded component self
[n131:04866] mca: base: components_open: component self has no register function
[n131:04866] mca: base: components_open: component self open function successful
[n131:04866] mca: base: components_open: found loaded component sm
[n131:04866] mca: base: components_open: component sm has no register function
[n131:04866] mca: base: components_open: component sm open function successful
[n134:04888] select: initializing btl component openib
[n136:04866] select: initializing btl component openib
[n131:04866] select: initializing btl component openib
[n132:04855] mca: base: components_open: opening btl components
[n132:04855] mca: base: components_open: found loaded component openib
[n132:04855] mca: base: components_open: component openib has no register 
function
[n132:04855] mca: base: components_open: component openib open function 
successful
[n132:04855] mca: base: components_open: found loaded component self
[n132:04855] mca: base: components_open: component self has no register function
[n132:04855] mca: base: components_open: component self open function successful
[n132:04855] mca: base: components_open: found loaded component sm
[n132:04855] mca: base: components_open: component sm has no register function
[n132:04855] mca: base: components_open: component sm open function successful
[n133:04850] mca: base: components_open: opening btl components
[n133:04850] mca: base: components_open: found loaded component openib
[n133:04850] mca: base: components_open: component openib has no register 
function
[n133:04850] mca: base: components_open: component openib open function 
successful
[n133:04850] mca: base: components_open: found loaded component self
[n133:04850] mca: base: components_open: component self has no register function
[n133:04850] mca: base: components_open: component self open function successful
[n133:04850] mca: base: components_open: found loaded component sm
[n133:04850] mca: base: components_open: component sm has no register function
[n133:04850] mca: base: components_open: component sm open function successful
[n136][[33322,1],4][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying 
INI files for vendor 0x1255, part ID 256
[n136][[33322,1],4][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found 
corresponding INI values: NetEffect NE020
[n136][[33322,1],4][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying 
INI files for vendor 0x0000, part ID 0
[n136][[33322,1],4][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found 
corresponding INI values: default
[n136:04866] openib BTL: oob CPC only supported on InfiniBand; skipped on 
device nes0
[n136:04866] openib BTL: xoob CPC only supported with XRC receive queues; 
skipped on device nes0
[n136:04866] openib BTL: rdmacm CPC available for use on nes0
[n136:04866] select: init of component openib returned success
[n136:04866] select: initializing btl component self
[n136:04866] select: init of component self returned success
[n136:04866] select: initializing btl component sm
[n136:04866] select: init of component sm returned success
[n135:04883] mca: base: components_open: opening btl components
[n135:04883] mca: base: components_open: found loaded component openib
[n135:04883] mca: base: components_open: component openib has no register 
function
[n135:04883] mca: base: components_open: component openib open function 
successful
[n135:04883] mca: base: components_open: found loaded component self
[n135:04883] mca: base: components_open: component self has no register function
[n135:04883] mca: base: components_open: component self open function successful
[n135:04883] mca: base: components_open: found loaded component sm
[n135:04883] mca: base: components_open: component sm has no register function
[n135:04883] mca: base: components_open: component sm open function successful
[n137:04890] mca: base: components_open: Looking for btl components
[n134][[33322,1],2][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying 
INI files for vendor 0x1255, part ID 256
[n134][[33322,1],2][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found 
corresponding INI values: NetEffect NE020
[n134][[33322,1],2][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying 
INI files for vendor 0x0000, part ID 0
[n134][[33322,1],2][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found 
corresponding INI values: default
[n134:04888] openib BTL: oob CPC only supported on InfiniBand; skipped on 
device nes0
[n134:04888] openib BTL: xoob CPC only supported with XRC receive queues; 
skipped on device nes0
[n134:04888] openib BTL: rdmacm CPC available for use on nes0
[n134:04888] select: init of component openib returned success
[n134:04888] select: initializing btl component self
[n134:04888] select: init of component self returned success
[n134:04888] select: initializing btl component sm
[n134:04888] select: init of component sm returned success
[n132:04855] select: initializing btl component openib
[n131][[33322,1],1][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying 
INI files for vendor 0x1255, part ID 256
[n131][[33322,1],1][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found 
corresponding INI values: NetEffect NE020
[n131][[33322,1],1][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying 
INI files for vendor 0x0000, part ID 0
[n131][[33322,1],1][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found 
corresponding INI values: default
[n131:04866] openib BTL: oob CPC only supported on InfiniBand; skipped on 
device nes0
[n131:04866] openib BTL: xoob CPC only supported with XRC receive queues; 
skipped on device nes0
[n131:04866] openib BTL: rdmacm CPC available for use on nes0
[n131:04866] select: init of component openib returned success
[n131:04866] select: initializing btl component self
[n131:04866] select: init of component self returned success
[n131:04866] select: initializing btl component sm
[n131:04866] select: init of component sm returned success
[n135:04883] select: initializing btl component openib
[n133:04850] select: initializing btl component openib
[n132][[33322,1],5][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying 
INI files for vendor 0x1255, part ID 256
[n132][[33322,1],5][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found 
corresponding INI values: NetEffect NE020
[n132][[33322,1],5][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying 
INI files for vendor 0x0000, part ID 0
[n132][[33322,1],5][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found 
corresponding INI values: default
[n132:04855] openib BTL: oob CPC only supported on InfiniBand; skipped on 
device nes0
[n132:04855] openib BTL: xoob CPC only supported with XRC receive queues; 
skipped on device nes0
[n132:04855] openib BTL: rdmacm CPC available for use on nes0
[n132:04855] select: init of component openib returned success
[n132:04855] select: initializing btl component self
[n132:04855] select: init of component self returned success
[n132:04855] select: initializing btl component sm
[n132:04855] select: init of component sm returned success
[n137:04890] mca: base: components_open: opening btl components
[n137:04890] mca: base: components_open: found loaded component openib
[n137:04890] mca: base: components_open: component openib has no register 
function
[n137:04890] mca: base: components_open: component openib open function 
successful
[n137:04890] mca: base: components_open: found loaded component self
[n137:04890] mca: base: components_open: component self has no register function
[n137:04890] mca: base: components_open: component self open function successful
[n137:04890] mca: base: components_open: found loaded component sm
[n137:04890] mca: base: components_open: component sm has no register function
[n137:04890] mca: base: components_open: component sm open function successful
[n135][[33322,1],3][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying 
INI files for vendor 0x1255, part ID 256
[n135][[33322,1],3][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found 
corresponding INI values: NetEffect NE020
[n135][[33322,1],3][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying 
INI files for vendor 0x0000, part ID 0
[n135][[33322,1],3][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found 
corresponding INI values: default
[n133][[33322,1],6][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying 
INI files for vendor 0x1255, part ID 256
[n133][[33322,1],6][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found 
corresponding INI values: NetEffect NE020
[n133][[33322,1],6][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying 
INI files for vendor 0x0000, part ID 0
[n133][[33322,1],6][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found 
corresponding INI values: default
[n135:04883] openib BTL: oob CPC only supported on InfiniBand; skipped on 
device nes0
[n135:04883] openib BTL: xoob CPC only supported with XRC receive queues; 
skipped on device nes0
[n135:04883] openib BTL: rdmacm CPC available for use on nes0
[n135:04883] select: init of component openib returned success
[n135:04883] select: initializing btl component self
[n135:04883] select: init of component self returned success
[n135:04883] select: initializing btl component sm
[n135:04883] select: init of component sm returned success
[n133:04850] openib BTL: oob CPC only supported on InfiniBand; skipped on 
device nes0
[n133:04850] openib BTL: xoob CPC only supported with XRC receive queues; 
skipped on device nes0
[n133:04850] openib BTL: rdmacm CPC available for use on nes0
[n133:04850] select: init of component openib returned success
[n133:04850] select: initializing btl component self
[n133:04850] select: init of component self returned success
[n133:04850] select: initializing btl component sm
[n133:04850] select: init of component sm returned success
[n137:04890] select: initializing btl component openib
[n137][[33322,1],7][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying 
INI files for vendor 0x1255, part ID 256
[n137][[33322,1],7][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found 
corresponding INI values: NetEffect NE020
[n137][[33322,1],7][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying 
INI files for vendor 0x0000, part ID 0
[n137][[33322,1],7][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found 
corresponding INI values: default
[n137:04890] openib BTL: oob CPC only supported on InfiniBand; skipped on 
device nes0
[n137:04890] openib BTL: xoob CPC only supported with XRC receive queues; 
skipped on device nes0
[n137:04890] openib BTL: rdmacm CPC available for use on nes0
[n137:04890] select: init of component openib returned success
[n137:04890] select: initializing btl component self
[n137:04890] select: init of component self returned success
[n137:04890] select: initializing btl component sm
[n137:04890] select: init of component sm returned success
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[33322,1],0]) is on host: n130
  Process 2 ([[33322,1],5]) is on host: n132x
  BTLs attempted: openib self sm

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[33322,1],2]) is on host: n134
  Process 2 ([[33322,1],5]) is on host: n132x
  BTLs attempted: openib self sm

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[33322,1],5]) is on host: n132
  Process 2 ([[33322,1],0]) is on host: n130
  BTLs attempted: openib self sm

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[33322,1],7]) is on host: n137
  Process 2 ([[33322,1],0]) is on host: n130
  BTLs attempted: openib self sm

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[33322,1],3]) is on host: n135
  Process 2 ([[33322,1],5]) is on host: n132x
  BTLs attempted: openib self sm

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[33322,1],6]) is on host: n133
  Process 2 ([[33322,1],0]) is on host: n130
  BTLs attempted: openib self sm

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[33322,1],1]) is on host: n131
  Process 2 ([[33322,1],5]) is on host: n132x
  BTLs attempted: openib self sm

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[33322,1],4]) is on host: n136
  Process 2 ([[33322,1],5]) is on host: n132x
  BTLs attempted: openib self sm

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init_thread
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[n134:4888] Abort before MPI_INIT completed successfully; not able to guarantee 
that all other processes were killed!
*** An error occurred in MPI_Init_thread
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
*** An error occurred in MPI_Init_thread
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
*** An error occurred in MPI_Init_thread
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
*** An error occurred in MPI_Init_thread
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
*** An error occurred in MPI_Init_thread
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
*** An error occurred in MPI_Init_thread
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[n137:4890] Abort before MPI_INIT completed successfully; not able to guarantee 
that all other processes were killed!
[n135:4883] Abort before MPI_INIT completed successfully; not able to guarantee 
that all other processes were killed!
[n133:4850] Abort before MPI_INIT completed successfully; not able to guarantee 
that all other processes were killed!
[n136:4866] Abort before MPI_INIT completed successfully; not able to guarantee 
that all other processes were killed!
[n131:4866] Abort before MPI_INIT completed successfully; not able to guarantee 
that all other processes were killed!
[n132:4855] Abort before MPI_INIT completed successfully; not able to guarantee 
that all other processes were killed!
--------------------------------------------------------------------------
mpirun has exited due to process rank 3 with PID 4883 on
node n135x exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init_thread
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[n130:4885] Abort before MPI_INIT completed successfully; not able to guarantee 
that all other processes were killed!
[root@n130 scripts]#

Reply via email to