Hello Jorge,
On Monday 29 October 2007 18:27, Jorge Parra wrote:
> When running openMPI my system freezes when initializing MPI (function
> MPI_init). This happens only when I try to run the process in multiples
> nodes in my cluster. Running multiple instances of the testing code
> locally (i.e ./mpirun -np 2 greetings) is succesful.
would it be possible to repeat the tests with the latest Open MPI-1.2.4 
version?

Even though nothing in Open MPI should make Your system freeze.
Could You check the logs on the nodes and possibly have a dmesg created just 
before the MPI_Init...

> - rsh runs well, and is configured to full access. (i.e. rsh
> "192.168.1.103 date" is succesful, so they are "rsh AFRLMPPBM2 date" or
> "rsh AFRLMPPBM2.MPPdomain.com"). Security is not an issue in this system.
>
> - uname -n and hostname return a valid hostname
>
> - The testing code (attached to this email) is run (and fails) as:
> ./mpirun --hostfile /root/hostfile -np 2 greetings . The hostfile has the
> names of the localnode (first entry:AFRLMPPBM1) and the remote node
> (second entry: AFRLMPPBM2). This file is also attached to this email.
>
> - The environment variables seem to be properly set (see env.log attached
> file). Local mpi programs (i.e. ./mpirun -np 2 greetings) run well.
>
> -.profile has the path information for both the executables and the
> libraries
>
> - orted runs in the remote node, however it does not print anything in
> console. The only output in the remote node is:
>
> pam_rhosts_auth[235]: user root has a `+' user entry
> pam_rhosts_auth[235]: allowed to r...@afrlmppbm1.mppdomain.com as root
> PAM_unix[235]: (rsh) session opened for user root by (uid=0)
> in.rshd[236]: r...@afrlmppbm1.mppdomain.com as root: cmd='( ! [ -e
> ./.profile ]
>
> || . ./.profile; orted --bootproxy 1 --name 0.0.1 --num_procs 3
You're running as root? Why is that?

> Then the remote process returns command prompt. However orted is in the
> background. The local process is frozen, and just prints: "Calling init",
> which is just before MPI_Init (see greetings.c).
>
> I believe the COMM WORLD cannot be correctly initialized. However I can't
> see which part of my configuration is wrong.
>
> Any help is greatly appreciated.

With best regards,
Rainer
-- 
----------------------------------------------------------------
Dipl.-Inf. Rainer Keller   http://www.hlrs.de/people/keller
 HLRS                          Tel: ++49 (0)711-685 6 5858
 Nobelstrasse 19                  Fax: ++49 (0)711-685 6 5832
 70550 Stuttgart                    email: kel...@hlrs.de     
 Germany                             AIM/Skype:rusraink

"Emails save time, not printing them saves trees!"

Reply via email to