Hello Ralph, Thanks for your reply.
In order to start my job, I tried the following two ways (1) configured/compiled open-mpi and compiled benchmark on head node. submitted a pbs job. (2) submitted an interactive job to redo config/compile on compute node. And then used "/path/to/mpicc -o hello hello_world.c" to compile the benchmark. used "/path/tp/mpirun -np 2 /path/to/hello" to run the job. Actually I also tried to run "/path/tp/mpirun -np 2 hostname" but got the same error. The configure line is pretty long. 67 $SRCDIR/configure \ 68 --prefix=$PREFIX \ 69 --enable-static --disable-shared --disable-dlopen --disable-pretty-print-stacktrace --disable-pty-support --disable-io-romio --enable-contrib-no-build=libnbc,vt --enable-debug \ 70 --with-memory-manager=none --with-threads \ 71 --without-tm \ 72 --with-wrapper-ldflags="${ADD_WRAPPER_LDFLAGS}" \ 73 --with-wrapper-libs="-lnsl -lpthread -lm" \ 74 --with-platform=optimized \ 75 --with-ugni=/opt/cray/ugni/2.3-1.0400.3912.4.29.gem \ 76 --with-ugni-libdir=/opt/cray/ugni/2.3-1.0400.3912.4.29.gem/lib64 \ 77 --with-ugni-includedir=/opt/cray/gni-headers/2.1-1.0400.3906.5.1.gem/include \ 78 --with-xpmem=/opt/cray/xpmem/0.1-2.0400.29883.4.6.gem \ 79 --with-xpmem-libdir=/opt/cray/xpmem/0.1-2.0400.29883.4.6.gem/lib64 \ 80 --enable-mem-debug --enable-mem-profile --enable-debug-symbols --enable-binaries \ 81 --enable-picky --enable-mpi-f77 --enable-mpi-f90 --enable-mpi-cxx --enable-mpi-cxx-seek \ 82 --without-slurm --with-memory-manager=ptmalloc2 \ 83 --with-pmi=/opt/cray/pmi/2.1.4-1.0000.8596.8.9.gem --with-cray-pmi-ext \ 84 --enable-mca-no-build=maffinity-first_use,maffinity-libnuma,ess-cnos,filem-rsh,grpcomm-cnos,pml-dr \ 85 ${ADD_COMPILER} \ 86 CPPFLAGS="${ADD_CPPFLAGS} -I${gniheaders}" \ 87 FFLAGS="${ADD_FFLAGS} -I${gniheaders}" \ 88 FCFLAGS="${ADD_FCFLAGS} -I/usr/include -I${gniheaders}" \ 89 CFLAGS="-I/usr/include -I${gniheaders}" \ 90 LDFLAGS="--static ${ADD_LDFLAGS} ${UGNILIBS} ${XPMEMLIBS}" \ 91 LIBS="${ADD_LIBS} -lpthread -lrt -lpthread -lm" | tee build.log Any idea? Bin WANG On Mon, Mar 5, 2012 at 7:13 PM, Ralph Castain <rhc.open...@gmail.com> wrote: > How did you attempt to start your job, and what does your configure line > look like? > > Sent from my iPad > > On Mar 5, 2012, at 2:11 PM, bin Wang <bighead...@gmail.com> wrote: > > > Hello All, > > > > I'm trying to run the latest OpenMPI code on Jaguar. > > (Cloned from the Open MPI Mercurial mirror of the Subversion repository) > > The configuration and compilation of OpenMPI were fine, and benchmark > > was also successfully compiled. I tried to launch my program using mpirun > > within an interactive job, but it failed immediately. > > > > Core dump file gave me the following information. > > ====================Error Msg========================= > > [jaguarpf-login2:15370] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to > start a daemon on the local > > node in file ess_singleton_module.c at line 220 > > > -------------------------------------------------------------------------- > > It looks like orte_init failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during orte_init; some of which are due to configuration or > > environment problems. This failure appears to be an internal failure; > > here's some additional information (which may only be relevant to an > > Open MPI developer): > > ompi_mpi_init: orte_init failed > > --> Returned value Unable to start a daemon on the local node (-127) > instead of ORTE_SUCCESS > > > > > -------------------------------------------------------------------------- > > It looks like MPI_INIT failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during MPI_INIT; some of which are due to configuration33r > environment > > problems. This failure appears to be an internal failure; here's some > > additional information (which may only be relevant to an Open MPI > > developer): > > ompi_mpi_init: orte_init failed > > --> Returned "Unable to start a daemon on40he local node" (-127) instead > of "Success" (0) > > > -------------------------------------------------------------------------- > > [jaguarpf-login2:15370] *** An error occurred in MPI_Init > > [jaguarpf-login2:15370] *** reported by process [4294967295,42949No > process In: Line: ?? PC: ?? > > [jaguarpf-login2:15370] *** on a NULL communicator > > [jaguarpf-login2:15370] *** Unknown error > > [jaguarpf-login2:15370] *** MPI_ERRORS_ARE_FATAL (processes in this > communicator will now abort, > > [jaguarpf-login2:15370] *** and potentially your MPI job) > > > -------------------------------------------------------------------------- > > An MPI process is aborting at a time when it cannot guarantee that all > > of its peer processes in the job will be killed properly. You should > > double check that everything has shut down cleanly. > > Reason: Before MPI_INIT completed > > Local host: jaguarpf-login2 > > PID: 15370 > > > -------------------------------------------------------------------------- > > Program exited with code 01. > > ====================Error Msg Over===================== > > > > There are several components under ess, but I don't know why and how the > > singleton component was chosen. > > > > I hope someone could help me to compile and run openmpi successfully on > Jaguar. > > > > Any comment and suggestion will be appreciated. > > > > Thanks, > > > > --Bin > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >