Hello All,
I'm trying to run the latest OpenMPI code on Jaguar.
(Cloned from the Open MPI Mercurial mirror of the Subversion repository)
The configuration and compilation of OpenMPI were fine, and benchmark
was also successfully compiled. I tried to launch my program using mpirun
within an interactive job, but it failed immediately.
Core dump file gave me the following information.
====================Error Msg=========================
[jaguarpf-login2:15370] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
start a daemon on the local
node in file ess_singleton_module.c at line 220
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
ompi_mpi_init: orte_init failed
--> Returned value Unable to start a daemon on the local node (-127)
instead of ORTE_SUCCESS
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration33r environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: orte_init failed
--> Returned "Unable to start a daemon on40he local node" (-127) instead
of "Success" (0)
--------------------------------------------------------------------------
[jaguarpf-login2:15370] *** An error occurred in MPI_Init
[jaguarpf-login2:15370] *** reported by process [4294967295,42949No
process In: Line: ?? PC: ??
[jaguarpf-login2:15370] *** on a NULL communicator
[jaguarpf-login2:15370] *** Unknown error
[jaguarpf-login2:15370] *** MPI_ERRORS_ARE_FATAL (processes in this
communicator will now abort,
[jaguarpf-login2:15370] *** and potentially your MPI job)
--------------------------------------------------------------------------
An MPI process is aborting at a time when it cannot guarantee that all
of its peer processes in the job will be killed properly. You should
double check that everything has shut down cleanly.
Reason: Before MPI_INIT completed
Local host: jaguarpf-login2
PID: 15370
--------------------------------------------------------------------------
Program exited with code 01.
====================Error Msg Over=====================
There are several components under ess, but I don't know why and how the
singleton component was chosen.
I hope someone could help me to compile and run openmpi successfully on
Jaguar.
Any comment and suggestion will be appreciated.
Thanks,
--Bin