On Jul 7, 2010, at 10:12 AM, Grzegorz Maj wrote:

> The problem was that orted couldn't find ssh nor rsh on that machine.
> I've added my installation to PATH and it now works.
> So one question: I will definitely not use MPI_Comm_spawn or any
> related stuff. Do I need this ssh? If not, is there any way to say
> orted that it shouldn't be looking for ssh because it won't need it?

That's an interesting question - never faced that situation before. At the 
moment, the answer is "no". However, I could conjure up a patch that lets the 
orted not select a plm module....

> 
> Regards,
> Grzegorz Maj
> 
> 2010/7/7 Ralph Castain <r...@open-mpi.org>:
>> Check your path and ld_library_path- looks like you are picking up some 
>> stale binary for orted and/or stale libraries (perhaps getting the default 
>> OMPI instead of 1.4.2) on the machine where it fails.
>> 
>> On Jul 7, 2010, at 7:44 AM, Grzegorz Maj wrote:
>> 
>>> Hi,
>>> I was trying to run some MPI processes as a singletons. On some of the
>>> machines they crash on MPI_Init. I use exactly the same binaries of my
>>> application and the same installation of openmpi 1.4.2 on two machines
>>> and it works on one of them and fails on the other one. This is the
>>> command and its output (test is a simple application calling only
>>> MPI_Init and MPI_Finalize):
>>> 
>>> LD_LIBRARY_PATH=/home/gmaj/openmpi/lib ./test
>>> [host01:21866] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file
>>> ../../../../../orte/mca/ess/hnp/ess_hnp_module.c at line 161
>>> --------------------------------------------------------------------------
>>> It looks like orte_init failed for some reason; your parallel process is
>>> likely to abort.  There are many reasons that a parallel process can
>>> fail during orte_init; some of which are due to configuration or
>>> environment problems.  This failure appears to be an internal failure;
>>> here's some additional information (which may only be relevant to an
>>> Open MPI developer):
>>> 
>>>  orte_plm_base_select failed
>>>  --> Returned value Not found (-13) instead of ORTE_SUCCESS
>>> --------------------------------------------------------------------------
>>> [host01:21866] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file
>>> ../../orte/runtime/orte_init.c at line 132
>>> --------------------------------------------------------------------------
>>> It looks like orte_init failed for some reason; your parallel process is
>>> likely to abort.  There are many reasons that a parallel process can
>>> fail during orte_init; some of which are due to configuration or
>>> environment problems.  This failure appears to be an internal failure;
>>> here's some additional information (which may only be relevant to an
>>> Open MPI developer):
>>> 
>>>  orte_ess_set_name failed
>>>  --> Returned value Not found (-13) instead of ORTE_SUCCESS
>>> --------------------------------------------------------------------------
>>> [host01:21866] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file
>>> ../../orte/orted/orted_main.c at line 323
>>> [host01:21865] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a
>>> daemon on the local node in file
>>> ../../../../../orte/mca/ess/singleton/ess_singleton_module.c at line
>>> 381
>>> [host01:21865] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a
>>> daemon on the local node in file
>>> ../../../../../orte/mca/ess/singleton/ess_singleton_module.c at line
>>> 143
>>> [host01:21865] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a
>>> daemon on the local node in file ../../orte/runtime/orte_init.c at
>>> line 132
>>> --------------------------------------------------------------------------
>>> It looks like orte_init failed for some reason; your parallel process is
>>> likely to abort.  There are many reasons that a parallel process can
>>> fail during orte_init; some of which are due to configuration or
>>> environment problems.  This failure appears to be an internal failure;
>>> here's some additional information (which may only be relevant to an
>>> Open MPI developer):
>>> 
>>>  orte_ess_set_name failed
>>>  --> Returned value Unable to start a daemon on the local node (-128)
>>> instead of ORTE_SUCCESS
>>> --------------------------------------------------------------------------
>>> --------------------------------------------------------------------------
>>> It looks like MPI_INIT failed for some reason; your parallel process is
>>> likely to abort.  There are many reasons that a parallel process can
>>> fail during MPI_INIT; some of which are due to configuration or environment
>>> problems.  This failure appears to be an internal failure; here's some
>>> additional information (which may only be relevant to an Open MPI
>>> developer):
>>> 
>>>  ompi_mpi_init: orte_init failed
>>>  --> Returned "Unable to start a daemon on the local node" (-128)
>>> instead of "Success" (0)
>>> --------------------------------------------------------------------------
>>> *** An error occurred in MPI_Init
>>> *** before MPI was initialized
>>> *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
>>> [host01:21865] Abort before MPI_INIT completed successfully; not able
>>> to guarantee that all other processes were killed!
>>> 
>>> 
>>> Any ideas on this?
>>> 
>>> Thanks,
>>> Grzegorz Maj
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to