Check your path and ld_library_path- looks like you are picking up some stale binary for orted and/or stale libraries (perhaps getting the default OMPI instead of 1.4.2) on the machine where it fails.
On Jul 7, 2010, at 7:44 AM, Grzegorz Maj wrote: > Hi, > I was trying to run some MPI processes as a singletons. On some of the > machines they crash on MPI_Init. I use exactly the same binaries of my > application and the same installation of openmpi 1.4.2 on two machines > and it works on one of them and fails on the other one. This is the > command and its output (test is a simple application calling only > MPI_Init and MPI_Finalize): > > LD_LIBRARY_PATH=/home/gmaj/openmpi/lib ./test > [host01:21866] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file > ../../../../../orte/mca/ess/hnp/ess_hnp_module.c at line 161 > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > orte_plm_base_select failed > --> Returned value Not found (-13) instead of ORTE_SUCCESS > -------------------------------------------------------------------------- > [host01:21866] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file > ../../orte/runtime/orte_init.c at line 132 > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > orte_ess_set_name failed > --> Returned value Not found (-13) instead of ORTE_SUCCESS > -------------------------------------------------------------------------- > [host01:21866] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file > ../../orte/orted/orted_main.c at line 323 > [host01:21865] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a > daemon on the local node in file > ../../../../../orte/mca/ess/singleton/ess_singleton_module.c at line > 381 > [host01:21865] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a > daemon on the local node in file > ../../../../../orte/mca/ess/singleton/ess_singleton_module.c at line > 143 > [host01:21865] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a > daemon on the local node in file ../../orte/runtime/orte_init.c at > line 132 > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > orte_ess_set_name failed > --> Returned value Unable to start a daemon on the local node (-128) > instead of ORTE_SUCCESS > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > It looks like MPI_INIT failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during MPI_INIT; some of which are due to configuration or environment > problems. This failure appears to be an internal failure; here's some > additional information (which may only be relevant to an Open MPI > developer): > > ompi_mpi_init: orte_init failed > --> Returned "Unable to start a daemon on the local node" (-128) > instead of "Success" (0) > -------------------------------------------------------------------------- > *** An error occurred in MPI_Init > *** before MPI was initialized > *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) > [host01:21865] Abort before MPI_INIT completed successfully; not able > to guarantee that all other processes were killed! > > > Any ideas on this? > > Thanks, > Grzegorz Maj > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users