Hi Jeff,


Jeff Squyres wrote:
On Mar 13, 2009, at 6:17 AM, Raymond Wan wrote:

What doesn't work is:

[On Y] mpirun --host Y,Z --np 2 uname -a
[On Y] mpirun --host X,Y,Z --np 3 uname -a

...and similarly for machine Z. I can confirm that from any of the 3

Do you see "rsh" or "ssh" in the output of "ps -eadf" when mpirun is hanging, perchance? If you, what happens if you copy-n-paste those command lines and run them manually?



No, I don't see either rsh or ssh when mpirun is hanging.  Is that odd?  
Something I'm doing wrong?

I only see an mpirun command and an orted command.


rwan     22800 22761  0 09:52 pts/2    00:00:00 mpirun --host X,Y,Z --np 3 
sleep 1000
rwan     22804     1  0 09:52 ?        00:00:00 orted --bootproxy 1 --name 0.0.2 --num_procs 4 
--vpid_start 0 --nodename Y --universe rwan@Y:default-universe-22800 --nsreplica 
"0.0.0;tcp://Y:36889" --gprreplica "0.0.0;tcp://Y:36889" --set-sid


Actually, when I run the above mpirun command, I don't see "sleep" running 
locally on machine Y, either.  However, if I did this:

mpirun --host Y --np 3 sleep 1000

I see 3 instances of "sleep" when I do ps -aedf.  Does mpirun try to "ssh" all networked 
machines first before it starts the program (even if one of those instances will run locally?).  Perhaps 
unrelated...but when I am on Y and I do an rsh to Z, I get a "No route to host".  I asked the 
sysadmin about it (I'm not the sysadmin of Y or Z) and he doesn't know why but as we should be using ssh 
anyway, he isn't going to address the problem (unless it is a side-effect of my mpirun problem). I only 
presume rsh hasn't been set up properly; ssh works fine, though.

Thank you!

Ray


Reply via email to