You have version confusion somewhere - the error message indicates that mpirun is looking for a component that only exists in the 1.2.x series, not in 1.3.x. Check that your LD_LIBRARY_PATH is pointing to the 1.3.2 location, along with your PATH.
On Fri, May 29, 2009 at 12:52 PM, Jeff Layton <layto...@att.net> wrote: > I've got some more information (after rebuilding Open MPI and the > application a few times). I put > > -mca mpi_show_mca_params enviro > > > in my mpirun line to get some of the parameter information back. > I get the following information back (warning - it's long). > > -------------------------------------------------------------------------- > A requested component was not found, or was unable to be opened. This > means that this component is either not installed or is unable to be > used on your system (e.g., sometimes this means that shared libraries > that the component requires are unable to be found/loaded). Note that > Open MPI stopped checking at the first component that it did not find. > > Host: compute-0-0.local > Framework: ras > Component: proxy > -------------------------------------------------------------------------- > [compute-0-0.local:01564] [[58307,0],0] ORTE_ERROR_LOG: Error in file > ess_hnp_module.c at line 199 > -------------------------------------------------------------------------- > A requested component was not found, or was unable to be opened. This > means that this component is either not installed or is unable to be > used on your system (e.g., sometimes this means that shared libraries > that the component requires are unable to be found/loaded). Note that > Open MPI stopped checking at the first component that it did not find. > > Host: compute-0-0.local > Framework: ras > Component: proxy > -------------------------------------------------------------------------- > [compute-0-0.local:01565] [[58306,0],0] ORTE_ERROR_LOG: Error in file > ess_hnp_module.c at line 199 > -------------------------------------------------------------------------- > A requested component was not found, or was unable to be opened. This > means that this component is either not installed or is unable to be > used on your system (e.g., sometimes this means that shared libraries > that the component requires are unable to be found/loaded). Note that > Open MPI stopped checking at the first component that it did not find. > > Host: compute-0-0.local > Framework: ras > Component: proxy > -------------------------------------------------------------------------- > [compute-0-0.local:01562] [[58309,0],0] ORTE_ERROR_LOG: Error in file > ess_hnp_module.c at line 199 > -------------------------------------------------------------------------- > A requested component was not found, or was unable to be opened. This > means that this component is either not installed or is unable to be > used on your system (e.g., sometimes this means that shared libraries > that the component requires are unable to be found/loaded). Note that > Open MPI stopped checking at the first component that it did not find. > > Host: compute-0-0.local > Framework: ras > Component: proxy > -------------------------------------------------------------------------- > [compute-0-0.local:01560] [[58311,0],0] ORTE_ERROR_LOG: Error in file > ess_hnp_module.c at line 199 > -------------------------------------------------------------------------- > A requested component was not found, or was unable to be opened. This > means that this component is either not installed or is unable to be > used on your system (e.g., sometimes this means that shared libraries > that the component requires are unable to be found/loaded). Note that > Open MPI stopped checking at the first component that it did not find. > > Host: compute-0-0.local > Framework: ras > Component: proxy > -------------------------------------------------------------------------- > [compute-0-0.local:01566] [[58305,0],0] ORTE_ERROR_LOG: Error in file > ess_hnp_module.c at line 199 > -------------------------------------------------------------------------- > A requested component was not found, or was unable to be opened. This > means that this component is either not installed or is unable to be > used on your system (e.g., sometimes this means that shared libraries > that the component requires are unable to be found/loaded). Note that > Open MPI stopped checking at the first component that it did not find. > > Host: compute-0-0.local > Framework: ras > Component: proxy > -------------------------------------------------------------------------- > [compute-0-0.local:01563] [[58308,0],0] ORTE_ERROR_LOG: Error in file > ess_hnp_module.c at line 199 > -------------------------------------------------------------------------- > A requested component was not found, or was unable to be opened. This > means that this component is either not installed or is unable to be > used on your system (e.g., sometimes this means that shared libraries > that the component requires are unable to be found/loaded). Note that > Open MPI stopped checking at the first component that it did not find. > > Host: compute-0-0.local > Framework: ras > Component: proxy > -------------------------------------------------------------------------- > [compute-0-0.local:01559] [[58312,0],0] ORTE_ERROR_LOG: Error in file > ess_hnp_module.c at line 199 > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > orte_ras_base_open failed > --> Returned value Error (-1) instead of ORTE_SUCCESS > -------------------------------------------------------------------------- > [compute-0-0.local:01565] [[58306,0],0] ORTE_ERROR_LOG: Error in file > runtime/orte_init.c at line 132 > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > orte_ras_base_open failed > --> Returned value Error (-1) instead of ORTE_SUCCESS > -------------------------------------------------------------------------- > [compute-0-0.local:01564] [[58307,0],0] ORTE_ERROR_LOG: Error in file > runtime/orte_init.c at line 132 > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > orte_ras_base_open failed > --> Returned value Error (-1) instead of ORTE_SUCCESS > -------------------------------------------------------------------------- > [compute-0-0.local:01562] [[58309,0],0] ORTE_ERROR_LOG: Error in file > runtime/orte_init.c at line 132 > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > orte_ras_base_open failed > --> Returned value Error (-1) instead of ORTE_SUCCESS > -------------------------------------------------------------------------- > [compute-0-0.local:01566] [[58305,0],0] ORTE_ERROR_LOG: Error in file > runtime/orte_init.c at line 132 > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > orte_ras_base_open failed > --> Returned value Error (-1) instead of ORTE_SUCCESS > -------------------------------------------------------------------------- > [compute-0-0.local:01560] [[58311,0],0] ORTE_ERROR_LOG: Error in file > runtime/orte_init.c at line 132 > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > orte_ras_base_open failed > --> Returned value Error (-1) instead of ORTE_SUCCESS > -------------------------------------------------------------------------- > [compute-0-0.local:01563] [[58308,0],0] ORTE_ERROR_LOG: Error in file > runtime/orte_init.c at line 132 > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > orte_ras_base_open failed > --> Returned value Error (-1) instead of ORTE_SUCCESS > -------------------------------------------------------------------------- > [compute-0-0.local:01559] [[58312,0],0] ORTE_ERROR_LOG: Error in file > runtime/orte_init.c at line 132 > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > orte_ess_set_name failed > --> Returned value Error (-1) instead of ORTE_SUCCESS > -------------------------------------------------------------------------- > [compute-0-0.local:01566] [[58305,0],0] ORTE_ERROR_LOG: Error in file > orted/orted_main.c at line 323 > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > orte_ess_set_name failed > --> Returned value Error (-1) instead of ORTE_SUCCESS > -------------------------------------------------------------------------- > [compute-0-0.local:01564] [[58307,0],0] ORTE_ERROR_LOG: Error in file > orted/orted_main.c at line 323 > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > orte_ess_set_name failed > --> Returned value Error (-1) instead of ORTE_SUCCESS > -------------------------------------------------------------------------- > [compute-0-0.local:01565] [[58306,0],0] ORTE_ERROR_LOG: Error in file > orted/orted_main.c at line 323 > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > orte_ess_set_name failed > --> Returned value Error (-1) instead of ORTE_SUCCESS > -------------------------------------------------------------------------- > [compute-0-0.local:01562] [[58309,0],0] ORTE_ERROR_LOG: Error in file > orted/orted_main.c at line 323 > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > orte_ess_set_name failed > --> Returned value Error (-1) instead of ORTE_SUCCESS > -------------------------------------------------------------------------- > [compute-0-0.local:01560] [[58311,0],0] ORTE_ERROR_LOG: Error in file > orted/orted_main.c at line 323 > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > orte_ess_set_name failed > --> Returned value Error (-1) instead of ORTE_SUCCESS > -------------------------------------------------------------------------- > [compute-0-0.local:01563] [[58308,0],0] ORTE_ERROR_LOG: Error in file > orted/orted_main.c at line 323 > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > orte_ess_set_name failed > --> Returned value Error (-1) instead of ORTE_SUCCESS > -------------------------------------------------------------------------- > [compute-0-0.local:01559] [[58312,0],0] ORTE_ERROR_LOG: Error in file > orted/orted_main.c at line 323 > [compute-0-0.local:01556] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to > start a daemon on the local node in file ess_singleton_module.c at line 381 > [compute-0-0.local:01556] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to > start a daemon on the local node in file ess_singleton_module.c at line 143 > [compute-0-0.local:01556] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to > start a daemon on the local node in file runtime/orte_init.c at line 132 > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > orte_ess_set_name failed > --> Returned value Unable to start a daemon on the local node (-128) > instead of ORTE_SUCCESS > -------------------------------------------------------------------------- > [compute-0-0.local:01555] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to > start a daemon on the local node in file ess_singleton_module.c at line 381 > [compute-0-0.local:01555] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to > start a daemon on the local node in file ess_singleton_module.c at line 143 > [compute-0-0.local:01551] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to > start a daemon on the local node in file ess_singleton_module.c at line 381 > [compute-0-0.local:01551] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to > start a daemon on the local node in file ess_singleton_module.c at line 143 > [compute-0-0.local:01551] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to > start a daemon on the local node in file runtime/orte_init.c at line 132 > [compute-0-0.local:01552] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to > start a daemon on the local node in file ess_singleton_module.c at line 381 > [compute-0-0.local:01552] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to > start a daemon on the local node in file ess_singleton_module.c at line 143 > [compute-0-0.local:01552] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to > start a daemon on the local node in file runtime/orte_init.c at line 132 > [compute-0-0.local:01554] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to > start a daemon on the local node in file ess_singleton_module.c at line 381 > [compute-0-0.local:01554] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to > start a daemon on the local node in file ess_singleton_module.c at line 143 > [compute-0-0.local:01554] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to > start a daemon on the local node in file runtime/orte_init.c at line 132 > [compute-0-0.local:01555] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to > start a daemon on the local node in file runtime/orte_init.c at line 132 > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > orte_ess_set_name failed > --> Returned value Unable to start a daemon on the local node (-128) > instead of ORTE_SUCCESS > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > It looks like MPI_INIT failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during MPI_INIT; some of which are due to configuration or environment > problems. This failure appears to be an internal failure; here's some > additional information (which may only be relevant to an Open MPI > developer): > > ompi_mpi_init: orte_init failed > --> Returned "Unable to start a daemon on the local node" (-128) instead > of "Success" (0) > -------------------------------------------------------------------------- > *** An error occurred in MPI_Init > *** before MPI was initialized > *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) > [compute-0-0.local:1556] Abort before MPI_INIT completed successfully; not > able to guarantee that all other processes were killed! > [compute-0-0.local:01557] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to > start a daemon on the local node in file ess_singleton_module.c at line 381 > [compute-0-0.local:01557] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to > start a daemon on the local node in file ess_singleton_module.c at line 143 > [compute-0-0.local:01557] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to > start a daemon on the local node in file runtime/orte_init.c at line 132 > [compute-0-0.local:01558] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to > start a daemon on the local node in file ess_singleton_module.c at line 381 > [compute-0-0.local:01558] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to > start a daemon on the local node in file ess_singleton_module.c at line 143 > [compute-0-0.local:01558] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to > start a daemon on the local node in file runtime/orte_init.c at line 132 > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > orte_ess_set_name failed > --> Returned value Unable to start a daemon on the local node (-128) > instead of ORTE_SUCCESS > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > > (and on and on). > > Does anyone have any ideas? Google let me down on this one. > > TIA! > > Jeff > > > > Good morning, >> >> I just built 1.3.2 on a ROCKS 4.something system. I built my code >> (CFL3D) with the Intel 10.1 compilers. I also linked in the >> OpenMPI libs and the Intel libraries to make sure I had the paths >> correct. When I try running my code, I get the following, >> >> >> error: executing task of job 2951 failed: execution daemon on host >> "compute-2-3.local" didn't accept task >> -------------------------------------------------------------------------- >> >> A daemon (pid 12015) died unexpectedly with status 1 while attempting >> to launch so we are aborting. >> >> There may be more information reported by the environment (see above). >> >> This may be because the daemon was unable to find all the needed shared >> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the >> location of the shared libraries on the remote nodes and this will >> automatically be forwarded to the remote nodes. >> -------------------------------------------------------------------------- >> >> -------------------------------------------------------------------------- >> >> mpirun noticed that the job aborted, but has no info as to the process >> that caused that situation. >> -------------------------------------------------------------------------- >> >> mpirun: clean termination accomplished >> >> >> >> Everything seems correct. I checked that the mpirun was correct >> and the binary has the correct libraries (checked using ldd). >> >> Can anyone tell me what the "status 1" means? Any tips on debugging >> the problem? >> >> Thanks! >> >> Jeff >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >