Hi, I am trying to reproduce what I was able to show last Friday on Amazon EC2 instances, but I am having a problem. What I was able to show last Friday as root was with this command: mpirun –app app.ac with app.ac being: -H dns-entry-A –np 1 (linux command) -H dns-entry-A –np 1 (linux command) -H dns-entry-B –np 1 (linux command) -H dns-entry-B –np 1 (linux command)
Here’s the config file in root’s .ssh directory: Host * IdentityFile /root/.ssh/.derobee/.kagi IdentitiesOnly yes BatchMode yes Yesterday and today I can’t get this to work. I made the last part of app.ac file simpler (it now says /bin/hostname). Below is the session: -bash-3.2# -bash-3.2# # I am on instance A, host name for inst A is: -bash-3.2# hostname domU-12-31-39-09-CD-C2 -bash-3.2# -bash-3.2# nslookup domU-12-31-39-09-CD-C2 Server: 172.16.0.23 Address: 172.16.0.23#53 Non-authoritative answer: Name: domU-12-31-39-09-CD-C2.compute-1.internal Address: 10.210.210.48 -bash-3.2# cd .ssh -bash-3.2# -bash-3.2# cat config Host * IdentityFile /root/.ssh/.derobee/.kagi IdentitiesOnly yes BatchMode yes -bash-3.2# -bash-3.2# ll config -rw-r--r-- 1 root root 103 Feb 15 17:18 config -bash-3.2# -bash-3.2# chmod 600 config -bash-3.2# -bash-3.2# # show I can go to inst B without password/passphrase -bash-3.2# -bash-3.2# ssh domU-12-31-39-09-E6-71.compute-1.internal Last login: Tue Feb 15 17:18:46 2011 from 10.210.210.48 -bash-3.2# -bash-3.2# hostname domU-12-31-39-09-E6-71 -bash-3.2# -bash-3.2# nslookup `hostname` Server: 172.16.0.23 Address: 172.16.0.23#53 Non-authoritative answer: Name: domU-12-31-39-09-E6-71.compute-1.internal Address: 10.210.233.123 -bash-3.2# # and back to inst A is also no problem -bash-3.2# -bash-3.2# ssh domU-12-31-39-09-CD-C2.compute-1.internal Last login: Tue Feb 15 17:36:19 2011 from 63.193.205.1 -bash-3.2# -bash-3.2# hostname domU-12-31-39-09-CD-C2 -bash-3.2# -bash-3.2# # log out twice to go back to inst A -bash-3.2# exit logout Connection to domU-12-31-39-09-CD-C2.compute-1.internal closed. -bash-3.2# -bash-3.2# exit logout Connection to domU-12-31-39-09-E6-71.compute-1.internal closed. -bash-3.2# -bash-3.2# hostname domU-12-31-39-09-CD-C2 -bash-3.2# -bash-3.2# cd .. -bash-3.2# -bash-3.2# pwd /root -bash-3.2# -bash-3.2# ll total 8 -rw-r--r-- 1 root root 260 Feb 15 17:24 app.ac -rw-r--r-- 1 root root 130 Feb 15 17:34 app.ac2 -bash-3.2# -bash-3.2# cat app.ac -H domU-12-31-39-09-CD-C2.compute-1.internal -np 1 /bin/hostname -H domU-12-31-39-09-CD-C2.compute-1.internal -np 1 /bin/hostname -H domU-12-31-39-09-E6-71.compute-1.internal -np 1 /bin/hostname -H domU-12-31-39-09-E6-71.compute-1.internal -np 1 /bin/hostname -bash-3.2# -bash-3.2# # when there is a remote machine (bottome 2 lines) it hangs -bash-3.2# mpirun -app app.ac mpirun: killing job... -------------------------------------------------------------------------- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun was unable to cleanly terminate the daemons on the nodes shown below. Additional manual cleanup may be required - please refer to the "orte-clean" tool for assistance. -------------------------------------------------------------------------- domU-12-31-39-09-E6-71.compute-1.internal - daemon did not report back when launched -bash-3.2# -bash-3.2# cat app.ac2 -H domU-12-31-39-09-CD-C2.compute-1.internal -np 1 /bin/hostname -H domU-12-31-39-09-CD-C2.compute-1.internal -np 1 /bin/hostname -bash-3.2# -bash-3.2# # when there is no remote machine, then mpirun works: -bash-3.2# mpirun -app app.ac2 domU-12-31-39-09-CD-C2 domU-12-31-39-09-CD-C2 -bash-3.2# -bash-3.2# hostname domU-12-31-39-09-CD-C2 -bash-3.2# -bash-3.2# # this gotta be ssh problem.... -bash-3.2# -bash-3.2# # show no firewall is used -bash-3.2# iptables --list Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination -bash-3.2# -bash-3.2# exit logout [tsakai@vixen ec2]$ Would someone please point out what I am doing wrong? Thank you. Regards, Tena