That could mean you didn't recompile the code using the new version of OMPI. 
The 1.4 and 1.5 series are not binary compatible - you have to recompile your 
code.

If you did recompile, you may be getting version confusion on the backend nodes 
- you should check your ld_library_path and ensure it is pointing to the 1.5 
series install.

On Dec 8, 2010, at 8:02 AM, Nguyen Toan wrote:

> Dear all,
> 
> I am having a problem while running mpirun in OpenMPI 1.5 version. I compiled 
> OpenMPI 1.5 with BLCR 0.8.2 and OFED 1.4.1 as follows:
> 
> ./configure \
> --with-ft=cr \
> --enable-mpi-threads \
> --with-blcr=/home/nguyen/opt/blcr \
> --with-blcr-libdir=/home/nguyen/opt/blcr/lib \
> --prefix=/home/nguyen/opt/openmpi-1.5 \
> --with-openib \
> --enable-mpirun-prefix-by-default
> 
> For programs under "openmpi-1.5/examples" folder, mpirun tests were 
> successful. But mpirun aborted immediately when running a program in MPI CUDA 
> code, which was tested successfully with OpenMPI 1.4.3. Below is the error 
> message.
> 
> Can anyone give me an idea about this error?
> Thank you.
> 
> Best Regards,
> Toan
> ----------------------
> 
> 
> [rc002.local:17727] [[56831,1],1] ORTE_ERROR_LOG: Data unpack would read past 
> end of buffer in file util/nidmap.c at line 371
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
> 
>   orte_ess_base_build_nidmap failed
>   --> Returned value Data unpack would read past end of buffer (-26) instead 
> of ORTE_SUCCESS
> --------------------------------------------------------------------------
> [rc002.local:17727] [[56831,1],1] ORTE_ERROR_LOG: Data unpack would read past 
> end of buffer in file base/ess_base_nidmap.c at line 62
> [rc002.local:17727] [[56831,1],1] ORTE_ERROR_LOG: Data unpack would read past 
> end of buffer in file ess_env_module.c at line 173
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
> 
>   orte_ess_set_name failed
>   --> Returned value Data unpack would read past end of buffer (-26) instead 
> of ORTE_SUCCESS
> --------------------------------------------------------------------------
> [rc002.local:17727] [[56831,1],1] ORTE_ERROR_LOG: Data unpack would read past 
> end of buffer in file runtime/orte_init.c at line 132
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems.  This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
> 
>   ompi_mpi_init: orte_init failed
>   --> Returned "Data unpack would read past end of buffer" (-26) instead of 
> "Success" (0)
> --------------------------------------------------------------------------
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
> [rc002.local:17727] Abort before MPI_INIT completed successfully; not able to 
> guarantee that all other processes were killed!
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 1 with PID 17727 on
> node rc002 exiting improperly. There are two reasons this could occur:
> 
> 1. this process did not call "init" before exiting, but others in
> the job did. This can cause a job to hang indefinitely while it waits
> for all processes to call "init". By rule, if one process calls "init",
> then ALL processes must call "init" prior to termination.
> 
> 2. this process called "init", but exited without calling "finalize".
> By rule, all processes that call "init" MUST call "finalize" prior to
> exiting or it will be considered an "abnormal termination"
> 
> This may have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> --------------------------------------------------------------------------
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to