Dear all,

I am having a problem while running mpirun in OpenMPI 1.5 version. I
compiled OpenMPI 1.5 with BLCR 0.8.2 and OFED 1.4.1 as follows:

./configure \
--with-ft=cr \
--enable-mpi-threads \
--with-blcr=/home/nguyen/opt/blcr \
--with-blcr-libdir=/home/nguyen/opt/blcr/lib \
--prefix=/home/nguyen/opt/openmpi-1.5 \
--with-openib \
--enable-mpirun-prefix-by-default

For programs under "openmpi-1.5/examples" folder, mpirun tests were
successful. But mpirun aborted immediately when running a program in MPI
CUDA code, which was tested successfully with OpenMPI 1.4.3. Below is the
error message.

Can anyone give me an idea about this error?
Thank you.

Best Regards,
Toan
----------------------


[rc002.local:17727] [[56831,1],1] ORTE_ERROR_LOG: Data unpack would read
past end of buffer in file util/nidmap.c at line 371
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_base_build_nidmap failed
  --> Returned value Data unpack would read past end of buffer (-26) instead
of ORTE_SUCCESS
--------------------------------------------------------------------------
[rc002.local:17727] [[56831,1],1] ORTE_ERROR_LOG: Data unpack would read
past end of buffer in file base/ess_base_nidmap.c at line 62
[rc002.local:17727] [[56831,1],1] ORTE_ERROR_LOG: Data unpack would read
past end of buffer in file ess_env_module.c at line 173
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_set_name failed
  --> Returned value Data unpack would read past end of buffer (-26) instead
of ORTE_SUCCESS
--------------------------------------------------------------------------
[rc002.local:17727] [[56831,1],1] ORTE_ERROR_LOG: Data unpack would read
past end of buffer in file runtime/orte_init.c at line 132
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: orte_init failed
  --> Returned "Data unpack would read past end of buffer" (-26) instead of
"Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[rc002.local:17727] Abort before MPI_INIT completed successfully; not able
to guarantee that all other processes were killed!
--------------------------------------------------------------------------
mpirun has exited due to process rank 1 with PID 17727 on
node rc002 exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------

Reply via email to