Dear OMPI users, I'm a new OpenMPI user. I've configured openmpi-1.3.3 with those options "./configure --prefix=/home/mab/openmpi-1.3.3 --with-sge --enable-ft-thread --with-ft=cr --enable-mpi-threads --enable-static --disable-shared --with-blcr=/home/mab/blcr-0.8.2/" then compiled and installed it successfully. Now I'm trying to use the checkpoint/restart protocol. I run a program with the options "mpirun -n 2 -am ft-enable-cr -H localhost prime/checkpoint-restart-test" but I receive the following error:
*** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [madel:28896] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! -------------------------------------------------------------------------- It looks like opal_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during opal_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): opal_cr_init() failed failed --> Returned value -1 instead of OPAL_SUCCESS -------------------------------------------------------------------------- [madel:28896] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 77 -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): ompi_mpi_init: orte_init failed --> Returned "Error" (-1) instead of "Success" (0) -------------------------------------------------------------------------- I can't find the files mentioned in this post "http://www.open-mpi.org/community/lists/users/2009/09/10641.php" (mca_crs_blcr.so, mca_crs_blcr.la). Could you please help me with that error? Thanks in advance Mohamed Adel