[OMPI users] init of component openib returned failure
Hello, trying to run Intel MPI Benchmarks with OpenMPI 1.4.1 fails in initializing the component openib. System is Debian GNU/Linux 5.0.4. The command to start the job (under Torque 2.4.7) was: mpirun.openmpi-1.4.1 --mca btl_base_verbose 50 --mca btl self,openib -n 2 ./IMB-MPI1 -npmin 2 PingPong and results in these messages: 8<-- [beo-15:20933] mca: base: components_open: Looking for btl components [beo-16:20605] mca: base: components_open: Looking for btl components [beo-15:20933] mca: base: components_open: opening btl components [beo-15:20933] mca: base: components_open: found loaded component openib [beo-15:20933] mca: base: components_open: component openib has no register function [beo-15:20933] mca: base: components_open: component openib open function successful [beo-15:20933] mca: base: components_open: found loaded component self [beo-15:20933] mca: base: components_open: component self has no register function [beo-15:20933] mca: base: components_open: component self open function successful [beo-16:20605] mca: base: components_open: opening btl components [beo-16:20605] mca: base: components_open: found loaded component openib [beo-16:20605] mca: base: components_open: component openib has no register function [beo-16:20605] mca: base: components_open: component openib open function successful [beo-16:20605] mca: base: components_open: found loaded component self [beo-16:20605] mca: base: components_open: component self has no register function [beo-16:20605] mca: base: components_open: component self open function successful [beo-15:20933] select: initializing btl component openib [beo-15:20933] select: init of component openib returned failure [beo-15:20933] select: module openib unloaded [beo-15:20933] select: initializing btl component self [beo-15:20933] select: init of component self returned success [beo-16:20605] select: initializing btl component openib [beo-16:20605] select: init of component openib returned failure [beo-16:20605] select: module openib unloaded [beo-16:20605] select: initializing btl component self [beo-16:20605] select: init of component self returned success -- At least one pair of MPI processes are unable to reach each other for MPI communications. This means that no Open MPI device has indicated that it can be used to communicate between these processes. This is an error; Open MPI requires that all MPI processes be able to reach each other. This error can sometimes be the result of forgetting to specify the "self" BTL. Process 1 ([[4887,1],0]) is on host: beo-15 Process 2 ([[4887,1],1]) is on host: beo-16 BTLs attempted: self Your MPI job is now going to abort; sorry. -- -- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -- *** An error occurred in MPI_Init_thread *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [beo-15:20933] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! -- orterun has exited due to process rank 0 with PID 20933 on node beo-15 exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by orterun (as reported here). -- *** An error occurred in MPI_Init_thread *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [beo-16:20605] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! [beo-15:20930] 1 more process has sent help message help-mca-bml-r2.txt / unreachable proc [beo-15:20930] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages [beo-15:20930] 1 more process has sent help message help-mpi-runtime / mpi_init:startup:internal-failure 8<-- running another Benchmark (OSU) succeeds in loading the openib component. "ibstat |grep -i state" on both nodes gives: 8<-- State: Active
Re: [OMPI users] init of component openib returned failure
0247.64 20489.95 409612.78 819218.22 16384 25.48 32768 37.03 65536 60.21 131072 107.90 262144 201.18 524288 389.08 1048576 762.38 2097152 1510.91 4194304 3005.72 [beo-15:29479] mca: base: close: component openib closed [beo-16:29063] mca: base: close: component openib closed [beo-16:29063] mca: base: close: unloading component openib [beo-15:29479] mca: base: close: unloading component openib [beo-16:29063] mca: base: close: component self closed [beo-16:29063] mca: base: close: unloading component self [beo-15:29479] mca: base: close: component self closed [beo-15:29479] mca: base: close: unloading component self 8<-- really weird. Peter On May 18, 2010, at 6:18 AM, Peter Kruse wrote: Hello, trying to run Intel MPI Benchmarks with OpenMPI 1.4.1 fails in initializing the component openib. System is Debian GNU/Linux 5.0.4. The command to start the job (under Torque 2.4.7) was: mpirun.openmpi-1.4.1 --mca btl_base_verbose 50 --mca btl self,openib -n 2 ./IMB-MPI1 -npmin 2 PingPong and results in these messages: 8<-- [beo-15:20933] mca: base: components_open: Looking for btl components [beo-16:20605] mca: base: components_open: Looking for btl components [beo-15:20933] mca: base: components_open: opening btl components [beo-15:20933] mca: base: components_open: found loaded component openib [beo-15:20933] mca: base: components_open: component openib has no register function [beo-15:20933] mca: base: components_open: component openib open function successful [beo-15:20933] mca: base: components_open: found loaded component self [beo-15:20933] mca: base: components_open: component self has no register function [beo-15:20933] mca: base: components_open: component self open function successful [beo-16:20605] mca: base: components_open: opening btl components [beo-16:20605] mca: base: components_open: found loaded component openib [beo-16:20605] mca: base: components_open: component openib has no register function [beo-16:20605] mca: base: components_open: component openib open function successful [beo-16:20605] mca: base: components_open: found loaded component self [beo-16:20605] mca: base: components_open: component self has no register function [beo-16:20605] mca: base: components_open: component self open function successful [beo-15:20933] select: initializing btl component openib [beo-15:20933] select: init of component openib returned failure [beo-15:20933] select: module openib unloaded [beo-15:20933] select: initializing btl component self [beo-15:20933] select: init of component self returned success [beo-16:20605] select: initializing btl component openib [beo-16:20605] select: init of component openib returned failure [beo-16:20605] select: module openib unloaded [beo-16:20605] select: initializing btl component self [beo-16:20605] select: init of component self returned success -- At least one pair of MPI processes are unable to reach each other for MPI communications. This means that no Open MPI device has indicated that it can be used to communicate between these processes. This is an error; Open MPI requires that all MPI processes be able to reach each other. This error can sometimes be the result of forgetting to specify the "self" BTL. Process 1 ([[4887,1],0]) is on host: beo-15 Process 2 ([[4887,1],1]) is on host: beo-16 BTLs attempted: self Your MPI job is now going to abort; sorry. -- -- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -- *** An error occurred in MPI_Init_thread *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [beo-15:20933] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! -- orterun has exited due to process rank 0 with PID 20933 on node beo-15 exiting without calling "finalize". This may have caused other processes in the application to b
Re: [OMPI users] init of component openib returned failure
y] Querying INI files for vendor 0x, part ID 0 [beo-15][[12785,1],0][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found corresponding INI values: default [beo-15:29479] openib BTL: oob CPC available for use on mthca0:1 [beo-15:29479] openib BTL: xoob CPC only supported with XRC receive queues; skipped on mthca0:1 [beo-15:29479] openib BTL: rdmacm CPC available for use on mthca0:1 [beo-15:29479] select: init of component openib returned success [beo-15:29479] select: initializing btl component self [beo-15:29479] select: init of component self returned success [beo-16][[12785,1],1][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying INI files for vendor 0x02c9, part ID 25204 [beo-16][[12785,1],1][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found corresponding INI values: Mellanox Sinai Infinihost III [beo-16][[12785,1],1][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying INI files for vendor 0x, part ID 0 [beo-16][[12785,1],1][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found corresponding INI values: default [beo-16:29063] openib BTL: oob CPC available for use on mthca0:1 [beo-16:29063] openib BTL: xoob CPC only supported with XRC receive queues; skipped on mthca0:1 [beo-16:29063] openib BTL: rdmacm CPC available for use on mthca0:1 [beo-16:29063] select: init of component openib returned success [beo-16:29063] select: initializing btl component self [beo-16:29063] select: init of component self returned success # OSU MPI Latency Test (Version 2.2) # Size Latency (us) [beo-16][[12785,1],1][connect/btl_openib_connect_oob.c:313:qp_connect_all] Set MTU to IBV value 4 (2048 bytes) [beo-16][[12785,1],1][connect/btl_openib_connect_oob.c:313:qp_connect_all] Set MTU to IBV value 4 (2048 bytes) [beo-16][[12785,1],1][connect/btl_openib_connect_oob.c:313:qp_connect_all] Set MTU to IBV value 4 (2048 bytes) [beo-16][[12785,1],1][connect/btl_openib_connect_oob.c:313:qp_connect_all] Set MTU to IBV value 4 (2048 bytes) [beo-15][[12785,1],0][connect/btl_openib_connect_oob.c:313:qp_connect_all] Set MTU to IBV value 4 (2048 bytes) [beo-15][[12785,1],0][connect/btl_openib_connect_oob.c:313:qp_connect_all] Set MTU to IBV value 4 (2048 bytes) [beo-15][[12785,1],0][connect/btl_openib_connect_oob.c:313:qp_connect_all] Set MTU to IBV value 4 (2048 bytes) [beo-15][[12785,1],0][connect/btl_openib_connect_oob.c:313:qp_connect_all] Set MTU to IBV value 4 (2048 bytes) 0 3.57 1 3.65 2 3.63 4 3.64 8 3.68 16 3.72 32 3.77 64 3.95 128 4.95 256 5.36 512 6.03 10247.64 20489.95 409612.78 819218.22 16384 25.48 32768 37.03 65536 60.21 131072 107.90 262144 201.18 524288 389.08 1048576 762.38 2097152 1510.91 4194304 3005.72 [beo-15:29479] mca: base: close: component openib closed [beo-16:29063] mca: base: close: component openib closed [beo-16:29063] mca: base: close: unloading component openib [beo-15:29479] mca: base: close: unloading component openib [beo-16:29063] mca: base: close: component self closed [beo-16:29063] mca: base: close: unloading component self [beo-15:29479] mca: base: close: component self closed [beo-15:29479] mca: base: close: unloading component self 8<------ really weird. Peter On May 18, 2010, at 6:18 AM, Peter Kruse wrote: Hello, trying to run Intel MPI Benchmarks with OpenMPI 1.4.1 fails in initializing the component openib. System is Debian GNU/Linux 5.0.4. The command to start the job (under Torque 2.4.7) was: mpirun.openmpi-1.4.1 --mca btl_base_verbose 50 --mca btl self,openib -n 2 ./IMB-MPI1 -npmin 2 PingPong and results in these messages: 8<-- [beo-15:20933] mca: base: components_open: Looking for btl components [beo-16:20605] mca: base: components_open: Looking for btl components [beo-15:20933] mca: base: components_open: opening btl components [beo-15:20933] mca: base: components_open: found loaded component openib [beo-15:20933] mca: base: components_open: component openib has no register function [beo-15:20933] mca: base: components_open: component openib open function successful [beo-15:20933] mca: base: components_open: found loaded component self [beo-15:20933] mca: base: components_open: component self has no register function [beo-15:20933] mca: base: components_open: component self open function successful [beo-16:20605] mca: base: components_open: opening btl components [beo-16:20605] mca: base: components_open: found loaded component openib [beo-16:20605] mca: base: components_open: component openib has no register function [beo-16:20605] mca: base: components_open: component openib open function succes