Hi All, We are running open MPI 1.3.2 with OFED1.5. we have 8 node cluster with 10Gb Iwarp ethernet card.
Node name are as below n130,n131,n132,n133,n134,n135,n136,n137. Respective 10GB hostname are n130x,n131x..... n137x. we have /root/mpd.hosts entry like as below: n130x n131x n134x n135x n136x n132x n133x n137x We are not able to run open mpi with all 8 node. mpirun -n 8 -np 8 -hostfile /root/mpd.hosts -mca btl openib,self,sm --mca orte_base_help_aggregate 0 --mca btl_base_verbose 10 --mca btl_openib_verbose 100 /usr/mpi/gcc/openmpi-1.3.2/tests/IMB-3.1/IMB-MPI1 Barrier Output: ================================================================================= At least one pair of MPI processes are unable to reach each other for MPI communications. This means that no Open MPI device has indicated that it can be used to communicate between these processes. This is an error; Open MPI requires that all MPI processes be able to reach each other. This error can sometimes be the result of forgetting to specify the "self" BTL. Process 1 ([[33322,1],0]) is on host: n130 Process 2 ([[33322,1],5]) is on host: n132x BTLs attempted: openib self sm Your MPI job is now going to abort; sorry. -------------------------------------------------------------------------- -------------------------------------------------------------------------- At least one pair of MPI processes are unable to reach each other for MPI communications. This means that no Open MPI device has indicated that it can be used to communicate between these processes. This is an error; Open MPI requires that all MPI processes be able to reach each other. This error can sometimes be the result of forgetting to specify the "self" BTL. Process 1 ([[33322,1],2]) is on host: n134 Process 2 ([[33322,1],5]) is on host: n132x BTLs attempted: openib self sm Your MPI job is now going to abort; sorry. -------------------------------------------------------------------------- -------------------------------------------------------------------------- At least one pair of MPI processes are unable to reach each other for MPI communications. This means that no Open MPI device has indicated that it can be used to communicate between these processes. This is an error; Open MPI requires that all MPI processes be able to reach each other. This error can sometimes be the result of forgetting to specify the "self" BTL. Process 1 ([[33322,1],5]) is on host: n132 Process 2 ([[33322,1],0]) is on host: n130 BTLs attempted: openib self sm Your MPI job is now going to abort; sorry. -------------------------------------------------------------------------- -------------------------------------------------------------------------- At least one pair of MPI processes are unable to reach each other for MPI communications. This means that no Open MPI device has indicated that it can be used to communicate between these processes. This is an error; Open MPI requires that all MPI processes be able to reach each other. This error can sometimes be the result of forgetting to specify the "self" BTL. Process 1 ([[33322,1],7]) is on host: n137 Process 2 ([[33322,1],0]) is on host: n130 BTLs attempted: openib self sm Your MPI job is now going to abort; sorry. -------------------------------------------------------------------------- -------------------------------------------------------------------------- At least one pair of MPI processes are unable to reach each other for MPI communications. This means that no Open MPI device has indicated that it can be used to communicate between these processes. This is an error; Open MPI requires that all MPI processes be able to reach each other. This error can sometimes be the result of forgetting to specify the "self" BTL. Process 1 ([[33322,1],3]) is on host: n135 Process 2 ([[33322,1],5]) is on host: n132x BTLs attempted: openib self sm Your MPI job is now going to abort; sorry. -------------------------------------------------------------------------- -------------------------------------------------------------------------- At least one pair of MPI processes are unable to reach each other for MPI communications. This means that no Open MPI device has indicated that it can be used to communicate between these processes. This is an error; Open MPI requires that all MPI processes be able to reach each other. This error can sometimes be the result of forgetting to specify the "self" BTL. Process 1 ([[33322,1],6]) is on host: n133 Process 2 ([[33322,1],0]) is on host: n130 BTLs attempted: openib self sm Your MPI job is now going to abort; sorry. -------------------------------------------------------------------------- -------------------------------------------------------------------------- At least one pair of MPI processes are unable to reach each other for MPI communications. This means that no Open MPI device has indicated that it can be used to communicate between these processes. This is an error; Open MPI requires that all MPI processes be able to reach each other. This error can sometimes be the result of forgetting to specify the "self" BTL. Process 1 ([[33322,1],1]) is on host: n131 Process 2 ([[33322,1],5]) is on host: n132x BTLs attempted: openib self sm Your MPI job is now going to abort; sorry. -------------------------------------------------------------------------- -------------------------------------------------------------------------- At least one pair of MPI processes are unable to reach each other for MPI communications. This means that no Open MPI device has indicated that it can be used to communicate between these processes. This is an error; Open MPI requires that all MPI processes be able to reach each other. This error can sometimes be the result of forgetting to specify the "self" BTL. Process 1 ([[33322,1],4]) is on host: n136 Process 2 ([[33322,1],5]) is on host: n132x BTLs attempted: openib self sm Your MPI job is now going to abort; sorry. -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -------------------------------------------------------------------------- *** An error occurred in MPI_Init_thread *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [n134:4888] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! *** An error occurred in MPI_Init_thread *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) *** An error occurred in MPI_Init_thread *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) *** An error occurred in MPI_Init_thread *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) *** An error occurred in MPI_Init_thread *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) *** An error occurred in MPI_Init_thread *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) *** An error occurred in MPI_Init_thread *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [n137:4890] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! [n135:4883] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! [n133:4850] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! [n136:4866] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! [n131:4866] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! [n132:4855] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! -------------------------------------------------------------------------- mpirun has exited due to process rank 3 with PID 4883 on node n135x exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -------------------------------------------------------------------------- *** An error occurred in MPI_Init_thread *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [n130:4885] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! ================================================================================= we are able to run same command on btl with tcp as below for all 8 node : mpirun -n 8 -np 8 -hostfile /root/mpd.hosts -mca btl tcp,self,sm --mca orte_base_help_aggregate 0 --mca btl_base_verbose 10 --mca btl_openib_verbose 100 /usr/mpi/gcc/openmpi-1.3.2/tests/IMB-3.1/IMB-MPI1 Barrier If we remove n132,n133,n137 node from mpd.hosts file then we are able to run open mpi for all remaining 5 node on btl openib,sm,self . So there is some problem with only n132,n133,n137 node. we are able to run opnmpi with this 3 node. but when we try to run this node with other 5 node or one of the node (n130,n131,n134,n135,n136) then we will get below error: Output : =============== At least one pair of MPI processes are unable to reach each other for MPI communications. This means that no Open MPI device has indicated that it can be used to communicate between these processes. This is an error; Open MPI requires that all MPI processes be able to reach each other. This error can sometimes be the result of forgetting to specify the "self" BTL. Process 1 ([[33304,1],1]) is on host: n132 Process 2 ([[33304,1],0]) is on host: n130 BTLs attempted: openib self sm Your MPI job is now going to abort; sorry. -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -------------------------------------------------------------------------- -------------------------------------------------------------------------- At least one pair of MPI processes are unable to reach each other for MPI communications. This means that no Open MPI device has indicated that it can be used to communicate between these processes. This is an error; Open MPI requires that all MPI processes be able to reach each other. This error can sometimes be the result of forgetting to specify the "self" BTL. Process 1 ([[33304,1],0]) is on host: n130 Process 2 ([[33304,1],1]) is on host: 100 BTLs attempted: openib self sm Your MPI job is now going to abort; sorry. -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -------------------------------------------------------------------------- *** An error occurred in MPI_Init_thread *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [n130:4929] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! *** An error occurred in MPI_Init_thread *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [n132:4963] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! -------------------------------------------------------------------------- mpirun has exited due to process rank 0 with PID 4929 on node n130 exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). ----------------------------------------------------------- we are able to run INtel,Mvapich2 MPI on All 8 node but we are facing problem for OpenMPI. Can any one help us what the real issue with that 3 node. Find attached Log for detail. Thanks, Hardik
[root@n130 scripts]# mpirun -n 8 -np 8 -hostfile /root/mpd.hosts -mca btl openib,self,sm --mca orte_base_help_aggregate 0 --mca btl_base_verbose 10 --mca btl_openib_verbose 100 /opt/openmpi-1.3.2/NetEffect/test_bin/IMB_3.2/IMB-MPI1 Barrier [n130:04885] mca: base: components_open: Looking for btl components [n130:04885] mca: base: components_open: opening btl components [n130:04885] mca: base: components_open: found loaded component openib [n130:04885] mca: base: components_open: component openib has no register function [n130:04885] mca: base: components_open: component openib open function successful [n130:04885] mca: base: components_open: found loaded component self [n130:04885] mca: base: components_open: component self has no register function [n130:04885] mca: base: components_open: component self open function successful [n130:04885] mca: base: components_open: found loaded component sm [n130:04885] mca: base: components_open: component sm has no register function [n130:04885] mca: base: components_open: component sm open function successful [n134:04888] mca: base: components_open: Looking for btl components [n136:04866] mca: base: components_open: Looking for btl components [n130:04885] select: initializing btl component openib [n131:04866] mca: base: components_open: Looking for btl components [n134:04888] mca: base: components_open: opening btl components [n134:04888] mca: base: components_open: found loaded component openib [n134:04888] mca: base: components_open: component openib has no register function [n136:04866] mca: base: components_open: opening btl components [n136:04866] mca: base: components_open: found loaded component openib [n136:04866] mca: base: components_open: component openib has no register function [n134:04888] mca: base: components_open: component openib open function successful [n134:04888] mca: base: components_open: found loaded component self [n134:04888] mca: base: components_open: component self has no register function [n134:04888] mca: base: components_open: component self open function successful [n134:04888] mca: base: components_open: found loaded component sm [n134:04888] mca: base: components_open: component sm has no register function [n134:04888] mca: base: components_open: component sm open function successful [n136:04866] mca: base: components_open: component openib open function successful [n136:04866] mca: base: components_open: found loaded component self [n136:04866] mca: base: components_open: component self has no register function [n136:04866] mca: base: components_open: component self open function successful [n136:04866] mca: base: components_open: found loaded component sm [n136:04866] mca: base: components_open: component sm has no register function [n136:04866] mca: base: components_open: component sm open function successful [n132:04855] mca: base: components_open: Looking for btl components [n133:04850] mca: base: components_open: Looking for btl components [n130][[33322,1],0][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying INI files for vendor 0x1255, part ID 256 [n130][[33322,1],0][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found corresponding INI values: NetEffect NE020 [n130][[33322,1],0][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying INI files for vendor 0x0000, part ID 0 [n130][[33322,1],0][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found corresponding INI values: default [n130:04885] openib BTL: oob CPC only supported on InfiniBand; skipped on device nes0 [n130:04885] openib BTL: xoob CPC only supported with XRC receive queues; skipped on device nes0 [n130:04885] openib BTL: rdmacm CPC available for use on nes0 [n130:04885] select: init of component openib returned success [n130:04885] select: initializing btl component self [n130:04885] select: init of component self returned success [n130:04885] select: initializing btl component sm [n130:04885] select: init of component sm returned success [n135:04883] mca: base: components_open: Looking for btl components [n131:04866] mca: base: components_open: opening btl components [n131:04866] mca: base: components_open: found loaded component openib [n131:04866] mca: base: components_open: component openib has no register function [n131:04866] mca: base: components_open: component openib open function successful [n131:04866] mca: base: components_open: found loaded component self [n131:04866] mca: base: components_open: component self has no register function [n131:04866] mca: base: components_open: component self open function successful [n131:04866] mca: base: components_open: found loaded component sm [n131:04866] mca: base: components_open: component sm has no register function [n131:04866] mca: base: components_open: component sm open function successful [n134:04888] select: initializing btl component openib [n136:04866] select: initializing btl component openib [n131:04866] select: initializing btl component openib [n132:04855] mca: base: components_open: opening btl components [n132:04855] mca: base: components_open: found loaded component openib [n132:04855] mca: base: components_open: component openib has no register function [n132:04855] mca: base: components_open: component openib open function successful [n132:04855] mca: base: components_open: found loaded component self [n132:04855] mca: base: components_open: component self has no register function [n132:04855] mca: base: components_open: component self open function successful [n132:04855] mca: base: components_open: found loaded component sm [n132:04855] mca: base: components_open: component sm has no register function [n132:04855] mca: base: components_open: component sm open function successful [n133:04850] mca: base: components_open: opening btl components [n133:04850] mca: base: components_open: found loaded component openib [n133:04850] mca: base: components_open: component openib has no register function [n133:04850] mca: base: components_open: component openib open function successful [n133:04850] mca: base: components_open: found loaded component self [n133:04850] mca: base: components_open: component self has no register function [n133:04850] mca: base: components_open: component self open function successful [n133:04850] mca: base: components_open: found loaded component sm [n133:04850] mca: base: components_open: component sm has no register function [n133:04850] mca: base: components_open: component sm open function successful [n136][[33322,1],4][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying INI files for vendor 0x1255, part ID 256 [n136][[33322,1],4][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found corresponding INI values: NetEffect NE020 [n136][[33322,1],4][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying INI files for vendor 0x0000, part ID 0 [n136][[33322,1],4][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found corresponding INI values: default [n136:04866] openib BTL: oob CPC only supported on InfiniBand; skipped on device nes0 [n136:04866] openib BTL: xoob CPC only supported with XRC receive queues; skipped on device nes0 [n136:04866] openib BTL: rdmacm CPC available for use on nes0 [n136:04866] select: init of component openib returned success [n136:04866] select: initializing btl component self [n136:04866] select: init of component self returned success [n136:04866] select: initializing btl component sm [n136:04866] select: init of component sm returned success [n135:04883] mca: base: components_open: opening btl components [n135:04883] mca: base: components_open: found loaded component openib [n135:04883] mca: base: components_open: component openib has no register function [n135:04883] mca: base: components_open: component openib open function successful [n135:04883] mca: base: components_open: found loaded component self [n135:04883] mca: base: components_open: component self has no register function [n135:04883] mca: base: components_open: component self open function successful [n135:04883] mca: base: components_open: found loaded component sm [n135:04883] mca: base: components_open: component sm has no register function [n135:04883] mca: base: components_open: component sm open function successful [n137:04890] mca: base: components_open: Looking for btl components [n134][[33322,1],2][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying INI files for vendor 0x1255, part ID 256 [n134][[33322,1],2][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found corresponding INI values: NetEffect NE020 [n134][[33322,1],2][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying INI files for vendor 0x0000, part ID 0 [n134][[33322,1],2][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found corresponding INI values: default [n134:04888] openib BTL: oob CPC only supported on InfiniBand; skipped on device nes0 [n134:04888] openib BTL: xoob CPC only supported with XRC receive queues; skipped on device nes0 [n134:04888] openib BTL: rdmacm CPC available for use on nes0 [n134:04888] select: init of component openib returned success [n134:04888] select: initializing btl component self [n134:04888] select: init of component self returned success [n134:04888] select: initializing btl component sm [n134:04888] select: init of component sm returned success [n132:04855] select: initializing btl component openib [n131][[33322,1],1][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying INI files for vendor 0x1255, part ID 256 [n131][[33322,1],1][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found corresponding INI values: NetEffect NE020 [n131][[33322,1],1][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying INI files for vendor 0x0000, part ID 0 [n131][[33322,1],1][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found corresponding INI values: default [n131:04866] openib BTL: oob CPC only supported on InfiniBand; skipped on device nes0 [n131:04866] openib BTL: xoob CPC only supported with XRC receive queues; skipped on device nes0 [n131:04866] openib BTL: rdmacm CPC available for use on nes0 [n131:04866] select: init of component openib returned success [n131:04866] select: initializing btl component self [n131:04866] select: init of component self returned success [n131:04866] select: initializing btl component sm [n131:04866] select: init of component sm returned success [n135:04883] select: initializing btl component openib [n133:04850] select: initializing btl component openib [n132][[33322,1],5][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying INI files for vendor 0x1255, part ID 256 [n132][[33322,1],5][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found corresponding INI values: NetEffect NE020 [n132][[33322,1],5][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying INI files for vendor 0x0000, part ID 0 [n132][[33322,1],5][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found corresponding INI values: default [n132:04855] openib BTL: oob CPC only supported on InfiniBand; skipped on device nes0 [n132:04855] openib BTL: xoob CPC only supported with XRC receive queues; skipped on device nes0 [n132:04855] openib BTL: rdmacm CPC available for use on nes0 [n132:04855] select: init of component openib returned success [n132:04855] select: initializing btl component self [n132:04855] select: init of component self returned success [n132:04855] select: initializing btl component sm [n132:04855] select: init of component sm returned success [n137:04890] mca: base: components_open: opening btl components [n137:04890] mca: base: components_open: found loaded component openib [n137:04890] mca: base: components_open: component openib has no register function [n137:04890] mca: base: components_open: component openib open function successful [n137:04890] mca: base: components_open: found loaded component self [n137:04890] mca: base: components_open: component self has no register function [n137:04890] mca: base: components_open: component self open function successful [n137:04890] mca: base: components_open: found loaded component sm [n137:04890] mca: base: components_open: component sm has no register function [n137:04890] mca: base: components_open: component sm open function successful [n135][[33322,1],3][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying INI files for vendor 0x1255, part ID 256 [n135][[33322,1],3][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found corresponding INI values: NetEffect NE020 [n135][[33322,1],3][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying INI files for vendor 0x0000, part ID 0 [n135][[33322,1],3][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found corresponding INI values: default [n133][[33322,1],6][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying INI files for vendor 0x1255, part ID 256 [n133][[33322,1],6][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found corresponding INI values: NetEffect NE020 [n133][[33322,1],6][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying INI files for vendor 0x0000, part ID 0 [n133][[33322,1],6][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found corresponding INI values: default [n135:04883] openib BTL: oob CPC only supported on InfiniBand; skipped on device nes0 [n135:04883] openib BTL: xoob CPC only supported with XRC receive queues; skipped on device nes0 [n135:04883] openib BTL: rdmacm CPC available for use on nes0 [n135:04883] select: init of component openib returned success [n135:04883] select: initializing btl component self [n135:04883] select: init of component self returned success [n135:04883] select: initializing btl component sm [n135:04883] select: init of component sm returned success [n133:04850] openib BTL: oob CPC only supported on InfiniBand; skipped on device nes0 [n133:04850] openib BTL: xoob CPC only supported with XRC receive queues; skipped on device nes0 [n133:04850] openib BTL: rdmacm CPC available for use on nes0 [n133:04850] select: init of component openib returned success [n133:04850] select: initializing btl component self [n133:04850] select: init of component self returned success [n133:04850] select: initializing btl component sm [n133:04850] select: init of component sm returned success [n137:04890] select: initializing btl component openib [n137][[33322,1],7][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying INI files for vendor 0x1255, part ID 256 [n137][[33322,1],7][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found corresponding INI values: NetEffect NE020 [n137][[33322,1],7][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying INI files for vendor 0x0000, part ID 0 [n137][[33322,1],7][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found corresponding INI values: default [n137:04890] openib BTL: oob CPC only supported on InfiniBand; skipped on device nes0 [n137:04890] openib BTL: xoob CPC only supported with XRC receive queues; skipped on device nes0 [n137:04890] openib BTL: rdmacm CPC available for use on nes0 [n137:04890] select: init of component openib returned success [n137:04890] select: initializing btl component self [n137:04890] select: init of component self returned success [n137:04890] select: initializing btl component sm [n137:04890] select: init of component sm returned success -------------------------------------------------------------------------- At least one pair of MPI processes are unable to reach each other for MPI communications. This means that no Open MPI device has indicated that it can be used to communicate between these processes. This is an error; Open MPI requires that all MPI processes be able to reach each other. This error can sometimes be the result of forgetting to specify the "self" BTL. Process 1 ([[33322,1],0]) is on host: n130 Process 2 ([[33322,1],5]) is on host: n132x BTLs attempted: openib self sm Your MPI job is now going to abort; sorry. -------------------------------------------------------------------------- -------------------------------------------------------------------------- At least one pair of MPI processes are unable to reach each other for MPI communications. This means that no Open MPI device has indicated that it can be used to communicate between these processes. This is an error; Open MPI requires that all MPI processes be able to reach each other. This error can sometimes be the result of forgetting to specify the "self" BTL. Process 1 ([[33322,1],2]) is on host: n134 Process 2 ([[33322,1],5]) is on host: n132x BTLs attempted: openib self sm Your MPI job is now going to abort; sorry. -------------------------------------------------------------------------- -------------------------------------------------------------------------- At least one pair of MPI processes are unable to reach each other for MPI communications. This means that no Open MPI device has indicated that it can be used to communicate between these processes. This is an error; Open MPI requires that all MPI processes be able to reach each other. This error can sometimes be the result of forgetting to specify the "self" BTL. Process 1 ([[33322,1],5]) is on host: n132 Process 2 ([[33322,1],0]) is on host: n130 BTLs attempted: openib self sm Your MPI job is now going to abort; sorry. -------------------------------------------------------------------------- -------------------------------------------------------------------------- At least one pair of MPI processes are unable to reach each other for MPI communications. This means that no Open MPI device has indicated that it can be used to communicate between these processes. This is an error; Open MPI requires that all MPI processes be able to reach each other. This error can sometimes be the result of forgetting to specify the "self" BTL. Process 1 ([[33322,1],7]) is on host: n137 Process 2 ([[33322,1],0]) is on host: n130 BTLs attempted: openib self sm Your MPI job is now going to abort; sorry. -------------------------------------------------------------------------- -------------------------------------------------------------------------- At least one pair of MPI processes are unable to reach each other for MPI communications. This means that no Open MPI device has indicated that it can be used to communicate between these processes. This is an error; Open MPI requires that all MPI processes be able to reach each other. This error can sometimes be the result of forgetting to specify the "self" BTL. Process 1 ([[33322,1],3]) is on host: n135 Process 2 ([[33322,1],5]) is on host: n132x BTLs attempted: openib self sm Your MPI job is now going to abort; sorry. -------------------------------------------------------------------------- -------------------------------------------------------------------------- At least one pair of MPI processes are unable to reach each other for MPI communications. This means that no Open MPI device has indicated that it can be used to communicate between these processes. This is an error; Open MPI requires that all MPI processes be able to reach each other. This error can sometimes be the result of forgetting to specify the "self" BTL. Process 1 ([[33322,1],6]) is on host: n133 Process 2 ([[33322,1],0]) is on host: n130 BTLs attempted: openib self sm Your MPI job is now going to abort; sorry. -------------------------------------------------------------------------- -------------------------------------------------------------------------- At least one pair of MPI processes are unable to reach each other for MPI communications. This means that no Open MPI device has indicated that it can be used to communicate between these processes. This is an error; Open MPI requires that all MPI processes be able to reach each other. This error can sometimes be the result of forgetting to specify the "self" BTL. Process 1 ([[33322,1],1]) is on host: n131 Process 2 ([[33322,1],5]) is on host: n132x BTLs attempted: openib self sm Your MPI job is now going to abort; sorry. -------------------------------------------------------------------------- -------------------------------------------------------------------------- At least one pair of MPI processes are unable to reach each other for MPI communications. This means that no Open MPI device has indicated that it can be used to communicate between these processes. This is an error; Open MPI requires that all MPI processes be able to reach each other. This error can sometimes be the result of forgetting to specify the "self" BTL. Process 1 ([[33322,1],4]) is on host: n136 Process 2 ([[33322,1],5]) is on host: n132x BTLs attempted: openib self sm Your MPI job is now going to abort; sorry. -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -------------------------------------------------------------------------- *** An error occurred in MPI_Init_thread *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [n134:4888] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! *** An error occurred in MPI_Init_thread *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) *** An error occurred in MPI_Init_thread *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) *** An error occurred in MPI_Init_thread *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) *** An error occurred in MPI_Init_thread *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) *** An error occurred in MPI_Init_thread *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) *** An error occurred in MPI_Init_thread *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [n137:4890] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! [n135:4883] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! [n133:4850] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! [n136:4866] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! [n131:4866] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! [n132:4855] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! -------------------------------------------------------------------------- mpirun has exited due to process rank 3 with PID 4883 on node n135x exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -------------------------------------------------------------------------- *** An error occurred in MPI_Init_thread *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [n130:4885] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! [root@n130 scripts]#