Dear Ralph, Thank you for your reply. I did check the ld_library_path and recompile with the new version and it worked perfectly. Thank you again.
Best Regards, Toan On Thu, Dec 9, 2010 at 12:30 AM, Ralph Castain <r...@open-mpi.org> wrote: > That could mean you didn't recompile the code using the new version of > OMPI. The 1.4 and 1.5 series are not binary compatible - you have to > recompile your code. > > If you did recompile, you may be getting version confusion on the backend > nodes - you should check your ld_library_path and ensure it is pointing to > the 1.5 series install. > > On Dec 8, 2010, at 8:02 AM, Nguyen Toan wrote: > > > Dear all, > > > > I am having a problem while running mpirun in OpenMPI 1.5 version. I > compiled OpenMPI 1.5 with BLCR 0.8.2 and OFED 1.4.1 as follows: > > > > ./configure \ > > --with-ft=cr \ > > --enable-mpi-threads \ > > --with-blcr=/home/nguyen/opt/blcr \ > > --with-blcr-libdir=/home/nguyen/opt/blcr/lib \ > > --prefix=/home/nguyen/opt/openmpi-1.5 \ > > --with-openib \ > > --enable-mpirun-prefix-by-default > > > > For programs under "openmpi-1.5/examples" folder, mpirun tests were > successful. But mpirun aborted immediately when running a program in MPI > CUDA code, which was tested successfully with OpenMPI 1.4.3. Below is the > error message. > > > > Can anyone give me an idea about this error? > > Thank you. > > > > Best Regards, > > Toan > > ---------------------- > > > > > > [rc002.local:17727] [[56831,1],1] ORTE_ERROR_LOG: Data unpack would read > past end of buffer in file util/nidmap.c at line 371 > > > -------------------------------------------------------------------------- > > It looks like orte_init failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during orte_init; some of which are due to configuration or > > environment problems. This failure appears to be an internal failure; > > here's some additional information (which may only be relevant to an > > Open MPI developer): > > > > orte_ess_base_build_nidmap failed > > --> Returned value Data unpack would read past end of buffer (-26) > instead of ORTE_SUCCESS > > > -------------------------------------------------------------------------- > > [rc002.local:17727] [[56831,1],1] ORTE_ERROR_LOG: Data unpack would read > past end of buffer in file base/ess_base_nidmap.c at line 62 > > [rc002.local:17727] [[56831,1],1] ORTE_ERROR_LOG: Data unpack would read > past end of buffer in file ess_env_module.c at line 173 > > > -------------------------------------------------------------------------- > > It looks like orte_init failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during orte_init; some of which are due to configuration or > > environment problems. This failure appears to be an internal failure; > > here's some additional information (which may only be relevant to an > > Open MPI developer): > > > > orte_ess_set_name failed > > --> Returned value Data unpack would read past end of buffer (-26) > instead of ORTE_SUCCESS > > > -------------------------------------------------------------------------- > > [rc002.local:17727] [[56831,1],1] ORTE_ERROR_LOG: Data unpack would read > past end of buffer in file runtime/orte_init.c at line 132 > > > -------------------------------------------------------------------------- > > It looks like MPI_INIT failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during MPI_INIT; some of which are due to configuration or > environment > > problems. This failure appears to be an internal failure; here's some > > additional information (which may only be relevant to an Open MPI > > developer): > > > > ompi_mpi_init: orte_init failed > > --> Returned "Data unpack would read past end of buffer" (-26) instead > of "Success" (0) > > > -------------------------------------------------------------------------- > > *** An error occurred in MPI_Init > > *** before MPI was initialized > > *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) > > [rc002.local:17727] Abort before MPI_INIT completed successfully; not > able to guarantee that all other processes were killed! > > > -------------------------------------------------------------------------- > > mpirun has exited due to process rank 1 with PID 17727 on > > node rc002 exiting improperly. There are two reasons this could occur: > > > > 1. this process did not call "init" before exiting, but others in > > the job did. This can cause a job to hang indefinitely while it waits > > for all processes to call "init". By rule, if one process calls "init", > > then ALL processes must call "init" prior to termination. > > > > 2. this process called "init", but exited without calling "finalize". > > By rule, all processes that call "init" MUST call "finalize" prior to > > exiting or it will be considered an "abnormal termination" > > > > This may have caused other processes in the application to be > > terminated by signals sent by mpirun (as reported here). > > > -------------------------------------------------------------------------- > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >