Tena Sakai wrote:
Hi,
I am trying to reproduce what I was able to show last Friday on Amazon
EC2 instances, but I am having a problem. What I was able to show last
Friday as root was with this command:
mpirun –app app.ac
with app.ac being:
-H dns-entry-A –np 1 (linux command)
-H dns-entry-A –np 1 (linux command)
-H dns-entry-B –np 1 (linux command)
-H dns-entry-B –np 1 (linux command)
Here’s the config file in root’s .ssh directory:
Host *
IdentityFile /root/.ssh/.derobee/.kagi
IdentitiesOnly yes
BatchMode yes
Yesterday and today I can’t get this to work. I made the last part of
app.ac
file simpler (it now says /bin/hostname). Below is the session:
-bash-3.2#
-bash-3.2# # I am on instance A, host name for inst A is:
-bash-3.2# hostname
domU-12-31-39-09-CD-C2
-bash-3.2#
-bash-3.2# nslookup domU-12-31-39-09-CD-C2
Server: 172.16.0.23
Address: 172.16.0.23#53
Non-authoritative answer:
Name: domU-12-31-39-09-CD-C2.compute-1.internal
Address: 10.210.210.48
-bash-3.2# cd .ssh
-bash-3.2#
-bash-3.2# cat config
Host *
IdentityFile /root/.ssh/.derobee/.kagi
IdentitiesOnly yes
BatchMode yes
-bash-3.2#
-bash-3.2# ll config
-rw-r--r-- 1 root root 103 Feb 15 17:18 config
-bash-3.2#
-bash-3.2# chmod 600 config
-bash-3.2#
-bash-3.2# # show I can go to inst B without password/passphrase
-bash-3.2#
-bash-3.2# ssh domU-12-31-39-09-E6-71.compute-1.internal
Last login: Tue Feb 15 17:18:46 2011 from 10.210.210.48
-bash-3.2#
-bash-3.2# hostname
domU-12-31-39-09-E6-71
-bash-3.2#
-bash-3.2# nslookup `hostname`
Server: 172.16.0.23
Address: 172.16.0.23#53
Non-authoritative answer:
Name: domU-12-31-39-09-E6-71.compute-1.internal
Address: 10.210.233.123
-bash-3.2# # and back to inst A is also no problem
-bash-3.2#
-bash-3.2# ssh domU-12-31-39-09-CD-C2.compute-1.internal
Last login: Tue Feb 15 17:36:19 2011 from 63.193.205.1
-bash-3.2#
-bash-3.2# hostname
domU-12-31-39-09-CD-C2
-bash-3.2#
-bash-3.2# # log out twice to go back to inst A
-bash-3.2# exit
logout
Connection to domU-12-31-39-09-CD-C2.compute-1.internal closed.
-bash-3.2#
-bash-3.2# exit
logout
Connection to domU-12-31-39-09-E6-71.compute-1.internal closed.
-bash-3.2#
-bash-3.2# hostname
domU-12-31-39-09-CD-C2
-bash-3.2#
-bash-3.2# cd ..
-bash-3.2#
-bash-3.2# pwd
/root
-bash-3.2#
-bash-3.2# ll
total 8
-rw-r--r-- 1 root root 260 Feb 15 17:24 app.ac
-rw-r--r-- 1 root root 130 Feb 15 17:34 app.ac2
-bash-3.2#
-bash-3.2# cat app.ac
-H domU-12-31-39-09-CD-C2.compute-1.internal -np 1 /bin/hostname
-H domU-12-31-39-09-CD-C2.compute-1.internal -np 1 /bin/hostname
-H domU-12-31-39-09-E6-71.compute-1.internal -np 1 /bin/hostname
-H domU-12-31-39-09-E6-71.compute-1.internal -np 1 /bin/hostname
-bash-3.2#
-bash-3.2# # when there is a remote machine (bottome 2 lines) it hangs
-bash-3.2# mpirun -app app.ac
mpirun: killing job...
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons on the nodes shown
below. Additional manual cleanup may be required - please refer to
the "orte-clean" tool for assistance.
--------------------------------------------------------------------------
domU-12-31-39-09-E6-71.compute-1.internal - daemon did not
report back when launched
-bash-3.2#
-bash-3.2# cat app.ac2
-H domU-12-31-39-09-CD-C2.compute-1.internal -np 1 /bin/hostname
-H domU-12-31-39-09-CD-C2.compute-1.internal -np 1 /bin/hostname
-bash-3.2#
-bash-3.2# # when there is no remote machine, then mpirun works:
-bash-3.2# mpirun -app app.ac2
domU-12-31-39-09-CD-C2
domU-12-31-39-09-CD-C2
-bash-3.2#
-bash-3.2# hostname
domU-12-31-39-09-CD-C2
-bash-3.2#
-bash-3.2# # this gotta be ssh problem....
-bash-3.2#
-bash-3.2# # show no firewall is used
-bash-3.2# iptables --list
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
-bash-3.2#
-bash-3.2# exit
logout
[tsakai@vixen ec2]$
Would someone please point out what I am doing wrong?
Thank you.
Regards,
Tena
Hi Tena
Nothing wrong that I can see.
Just another couple of suggestions,
based on somewhat vague possibilities.
A slight difference is that on vixen and dashen you ran the
MPI hostname tests as a regular user, not as root, right?
Not sure if this will make much of a difference,
but it may be worth trying to run it as a regular user in EC2 also.
I general most people avoid running user applications (MPI programs
included) as root.
Mostly for safety, but I wonder if there are any
implications in the 'rootly powers'
regarding the under-the-hood processes that OpenMPI
launches along with the actual user programs.
This may make no difference either,
but you could do a 'service iptables status',
to see if the service is running, even though there are
no explicit iptable rules (as per your email).
If the service is not running you get
'Firewall is stopped.' (in CentOS).
I *think* 'iptables --list' loads the iptables module into the
kernel, as a side effect, whereas the service command does not.
So, it may be cleaner (safer?) to use the service version
instead of 'iptables --list'.
I don't know if it will make any difference,
but just in case, if the service is running,
why not do 'service iptables stop',
and perhaps also 'chkconfig iptables off' to be completely
free of iptables?
Gus Correa