On Jul 7, 2010, at 10:12 AM, Grzegorz Maj wrote: > The problem was that orted couldn't find ssh nor rsh on that machine. > I've added my installation to PATH and it now works. > So one question: I will definitely not use MPI_Comm_spawn or any > related stuff. Do I need this ssh? If not, is there any way to say > orted that it shouldn't be looking for ssh because it won't need it?
That's an interesting question - never faced that situation before. At the moment, the answer is "no". However, I could conjure up a patch that lets the orted not select a plm module.... > > Regards, > Grzegorz Maj > > 2010/7/7 Ralph Castain <r...@open-mpi.org>: >> Check your path and ld_library_path- looks like you are picking up some >> stale binary for orted and/or stale libraries (perhaps getting the default >> OMPI instead of 1.4.2) on the machine where it fails. >> >> On Jul 7, 2010, at 7:44 AM, Grzegorz Maj wrote: >> >>> Hi, >>> I was trying to run some MPI processes as a singletons. On some of the >>> machines they crash on MPI_Init. I use exactly the same binaries of my >>> application and the same installation of openmpi 1.4.2 on two machines >>> and it works on one of them and fails on the other one. This is the >>> command and its output (test is a simple application calling only >>> MPI_Init and MPI_Finalize): >>> >>> LD_LIBRARY_PATH=/home/gmaj/openmpi/lib ./test >>> [host01:21866] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file >>> ../../../../../orte/mca/ess/hnp/ess_hnp_module.c at line 161 >>> -------------------------------------------------------------------------- >>> It looks like orte_init failed for some reason; your parallel process is >>> likely to abort. There are many reasons that a parallel process can >>> fail during orte_init; some of which are due to configuration or >>> environment problems. This failure appears to be an internal failure; >>> here's some additional information (which may only be relevant to an >>> Open MPI developer): >>> >>> orte_plm_base_select failed >>> --> Returned value Not found (-13) instead of ORTE_SUCCESS >>> -------------------------------------------------------------------------- >>> [host01:21866] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file >>> ../../orte/runtime/orte_init.c at line 132 >>> -------------------------------------------------------------------------- >>> It looks like orte_init failed for some reason; your parallel process is >>> likely to abort. There are many reasons that a parallel process can >>> fail during orte_init; some of which are due to configuration or >>> environment problems. This failure appears to be an internal failure; >>> here's some additional information (which may only be relevant to an >>> Open MPI developer): >>> >>> orte_ess_set_name failed >>> --> Returned value Not found (-13) instead of ORTE_SUCCESS >>> -------------------------------------------------------------------------- >>> [host01:21866] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file >>> ../../orte/orted/orted_main.c at line 323 >>> [host01:21865] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a >>> daemon on the local node in file >>> ../../../../../orte/mca/ess/singleton/ess_singleton_module.c at line >>> 381 >>> [host01:21865] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a >>> daemon on the local node in file >>> ../../../../../orte/mca/ess/singleton/ess_singleton_module.c at line >>> 143 >>> [host01:21865] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a >>> daemon on the local node in file ../../orte/runtime/orte_init.c at >>> line 132 >>> -------------------------------------------------------------------------- >>> It looks like orte_init failed for some reason; your parallel process is >>> likely to abort. There are many reasons that a parallel process can >>> fail during orte_init; some of which are due to configuration or >>> environment problems. This failure appears to be an internal failure; >>> here's some additional information (which may only be relevant to an >>> Open MPI developer): >>> >>> orte_ess_set_name failed >>> --> Returned value Unable to start a daemon on the local node (-128) >>> instead of ORTE_SUCCESS >>> -------------------------------------------------------------------------- >>> -------------------------------------------------------------------------- >>> It looks like MPI_INIT failed for some reason; your parallel process is >>> likely to abort. There are many reasons that a parallel process can >>> fail during MPI_INIT; some of which are due to configuration or environment >>> problems. This failure appears to be an internal failure; here's some >>> additional information (which may only be relevant to an Open MPI >>> developer): >>> >>> ompi_mpi_init: orte_init failed >>> --> Returned "Unable to start a daemon on the local node" (-128) >>> instead of "Success" (0) >>> -------------------------------------------------------------------------- >>> *** An error occurred in MPI_Init >>> *** before MPI was initialized >>> *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) >>> [host01:21865] Abort before MPI_INIT completed successfully; not able >>> to guarantee that all other processes were killed! >>> >>> >>> Any ideas on this? >>> >>> Thanks, >>> Grzegorz Maj >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users