I've got some more information (after rebuilding Open MPI and the
application a few times). I put

-mca mpi_show_mca_params enviro


in my mpirun line to get some of the parameter information back.
I get the following information back (warning - it's long).

--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened.  This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded).  Note that
Open MPI stopped checking at the first component that it did not find.

Host:      compute-0-0.local
Framework: ras
Component: proxy
--------------------------------------------------------------------------
[compute-0-0.local:01564] [[58307,0],0] ORTE_ERROR_LOG: Error in file ess_hnp_module.c at line 199
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened.  This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded).  Note that
Open MPI stopped checking at the first component that it did not find.

Host:      compute-0-0.local
Framework: ras
Component: proxy
--------------------------------------------------------------------------
[compute-0-0.local:01565] [[58306,0],0] ORTE_ERROR_LOG: Error in file ess_hnp_module.c at line 199
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened.  This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded).  Note that
Open MPI stopped checking at the first component that it did not find.

Host:      compute-0-0.local
Framework: ras
Component: proxy
--------------------------------------------------------------------------
[compute-0-0.local:01562] [[58309,0],0] ORTE_ERROR_LOG: Error in file ess_hnp_module.c at line 199
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened.  This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded).  Note that
Open MPI stopped checking at the first component that it did not find.

Host:      compute-0-0.local
Framework: ras
Component: proxy
--------------------------------------------------------------------------
[compute-0-0.local:01560] [[58311,0],0] ORTE_ERROR_LOG: Error in file ess_hnp_module.c at line 199
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened.  This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded).  Note that
Open MPI stopped checking at the first component that it did not find.

Host:      compute-0-0.local
Framework: ras
Component: proxy
--------------------------------------------------------------------------
[compute-0-0.local:01566] [[58305,0],0] ORTE_ERROR_LOG: Error in file ess_hnp_module.c at line 199
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened.  This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded).  Note that
Open MPI stopped checking at the first component that it did not find.

Host:      compute-0-0.local
Framework: ras
Component: proxy
--------------------------------------------------------------------------
[compute-0-0.local:01563] [[58308,0],0] ORTE_ERROR_LOG: Error in file ess_hnp_module.c at line 199
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened.  This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded).  Note that
Open MPI stopped checking at the first component that it did not find.

Host:      compute-0-0.local
Framework: ras
Component: proxy
--------------------------------------------------------------------------
[compute-0-0.local:01559] [[58312,0],0] ORTE_ERROR_LOG: Error in file ess_hnp_module.c at line 199
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

 orte_ras_base_open failed
 --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[compute-0-0.local:01565] [[58306,0],0] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 132
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

 orte_ras_base_open failed
 --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[compute-0-0.local:01564] [[58307,0],0] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 132
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

 orte_ras_base_open failed
 --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[compute-0-0.local:01562] [[58309,0],0] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 132
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

 orte_ras_base_open failed
 --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[compute-0-0.local:01566] [[58305,0],0] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 132
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

 orte_ras_base_open failed
 --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[compute-0-0.local:01560] [[58311,0],0] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 132
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

 orte_ras_base_open failed
 --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[compute-0-0.local:01563] [[58308,0],0] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 132
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

 orte_ras_base_open failed
 --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[compute-0-0.local:01559] [[58312,0],0] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 132
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

 orte_ess_set_name failed
 --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[compute-0-0.local:01566] [[58305,0],0] ORTE_ERROR_LOG: Error in file orted/orted_main.c at line 323
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

 orte_ess_set_name failed
 --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[compute-0-0.local:01564] [[58307,0],0] ORTE_ERROR_LOG: Error in file orted/orted_main.c at line 323
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

 orte_ess_set_name failed
 --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[compute-0-0.local:01565] [[58306,0],0] ORTE_ERROR_LOG: Error in file orted/orted_main.c at line 323
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

 orte_ess_set_name failed
 --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[compute-0-0.local:01562] [[58309,0],0] ORTE_ERROR_LOG: Error in file orted/orted_main.c at line 323
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

 orte_ess_set_name failed
 --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[compute-0-0.local:01560] [[58311,0],0] ORTE_ERROR_LOG: Error in file orted/orted_main.c at line 323
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

 orte_ess_set_name failed
 --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[compute-0-0.local:01563] [[58308,0],0] ORTE_ERROR_LOG: Error in file orted/orted_main.c at line 323
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

 orte_ess_set_name failed
 --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[compute-0-0.local:01559] [[58312,0],0] ORTE_ERROR_LOG: Error in file orted/orted_main.c at line 323 [compute-0-0.local:01556] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 381 [compute-0-0.local:01556] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 143 [compute-0-0.local:01556] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file runtime/orte_init.c at line 132
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

 orte_ess_set_name failed
--> Returned value Unable to start a daemon on the local node (-128) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[compute-0-0.local:01555] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 381 [compute-0-0.local:01555] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 143 [compute-0-0.local:01551] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 381 [compute-0-0.local:01551] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 143 [compute-0-0.local:01551] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file runtime/orte_init.c at line 132 [compute-0-0.local:01552] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 381 [compute-0-0.local:01552] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 143 [compute-0-0.local:01552] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file runtime/orte_init.c at line 132 [compute-0-0.local:01554] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 381 [compute-0-0.local:01554] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 143 [compute-0-0.local:01554] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file runtime/orte_init.c at line 132 [compute-0-0.local:01555] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file runtime/orte_init.c at line 132
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

 orte_ess_set_name failed
--> Returned value Unable to start a daemon on the local node (-128) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

 ompi_mpi_init: orte_init failed
--> Returned "Unable to start a daemon on the local node" (-128) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[compute-0-0.local:1556] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! [compute-0-0.local:01557] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 381 [compute-0-0.local:01557] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 143 [compute-0-0.local:01557] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file runtime/orte_init.c at line 132 [compute-0-0.local:01558] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 381 [compute-0-0.local:01558] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 143 [compute-0-0.local:01558] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file runtime/orte_init.c at line 132
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

 orte_ess_set_name failed
--> Returned value Unable to start a daemon on the local node (-128) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;

(and on and on).

Does anyone have any ideas? Google let me down on this one.

TIA!

Jeff


Good morning,

I just built 1.3.2 on a ROCKS 4.something system. I built my code
(CFL3D) with the Intel 10.1 compilers. I also linked in the
OpenMPI libs and the Intel libraries to make sure I had the paths
correct. When I try running my code, I get the following,


error: executing task of job 2951 failed: execution daemon on host "compute-2-3.local" didn't accept task --------------------------------------------------------------------------
A daemon (pid 12015) died unexpectedly with status 1 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
-------------------------------------------------------------------------- --------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
mpirun: clean termination accomplished



Everything seems correct. I checked that the mpirun was correct
and the binary has the correct libraries (checked using ldd).

Can anyone tell me what the "status 1" means? Any tips on debugging
the problem?

Thanks!

Jeff


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to