Hello Jorge, On Monday 29 October 2007 18:27, Jorge Parra wrote: > When running openMPI my system freezes when initializing MPI (function > MPI_init). This happens only when I try to run the process in multiples > nodes in my cluster. Running multiple instances of the testing code > locally (i.e ./mpirun -np 2 greetings) is succesful. would it be possible to repeat the tests with the latest Open MPI-1.2.4 version?
Even though nothing in Open MPI should make Your system freeze. Could You check the logs on the nodes and possibly have a dmesg created just before the MPI_Init... > - rsh runs well, and is configured to full access. (i.e. rsh > "192.168.1.103 date" is succesful, so they are "rsh AFRLMPPBM2 date" or > "rsh AFRLMPPBM2.MPPdomain.com"). Security is not an issue in this system. > > - uname -n and hostname return a valid hostname > > - The testing code (attached to this email) is run (and fails) as: > ./mpirun --hostfile /root/hostfile -np 2 greetings . The hostfile has the > names of the localnode (first entry:AFRLMPPBM1) and the remote node > (second entry: AFRLMPPBM2). This file is also attached to this email. > > - The environment variables seem to be properly set (see env.log attached > file). Local mpi programs (i.e. ./mpirun -np 2 greetings) run well. > > -.profile has the path information for both the executables and the > libraries > > - orted runs in the remote node, however it does not print anything in > console. The only output in the remote node is: > > pam_rhosts_auth[235]: user root has a `+' user entry > pam_rhosts_auth[235]: allowed to r...@afrlmppbm1.mppdomain.com as root > PAM_unix[235]: (rsh) session opened for user root by (uid=0) > in.rshd[236]: r...@afrlmppbm1.mppdomain.com as root: cmd='( ! [ -e > ./.profile ] > > || . ./.profile; orted --bootproxy 1 --name 0.0.1 --num_procs 3 You're running as root? Why is that? > Then the remote process returns command prompt. However orted is in the > background. The local process is frozen, and just prints: "Calling init", > which is just before MPI_Init (see greetings.c). > > I believe the COMM WORLD cannot be correctly initialized. However I can't > see which part of my configuration is wrong. > > Any help is greatly appreciated. With best regards, Rainer -- ---------------------------------------------------------------- Dipl.-Inf. Rainer Keller http://www.hlrs.de/people/keller HLRS Tel: ++49 (0)711-685 6 5858 Nobelstrasse 19 Fax: ++49 (0)711-685 6 5832 70550 Stuttgart email: kel...@hlrs.de Germany AIM/Skype:rusraink "Emails save time, not printing them saves trees!"