Tena Sakai wrote:
Hi,

I am trying to reproduce what I was able to show last Friday on Amazon
EC2 instances, but I am having a problem.  What I was able to show last
Friday as root was with this command:
  mpirun –app app.ac
with app.ac being:
  -H dns-entry-A –np 1 (linux command)
  -H dns-entry-A –np 1 (linux command)
  -H dns-entry-B –np 1 (linux command)
  -H dns-entry-B –np 1 (linux command)

Here’s the config file in root’s .ssh directory:
  Host *
        IdentityFile /root/.ssh/.derobee/.kagi
        IdentitiesOnly yes
        BatchMode yes

Yesterday and today I can’t get this to work. I made the last part of app.ac
file simpler (it now says /bin/hostname).  Below is the session:

  -bash-3.2#
  -bash-3.2# # I am on instance A, host name for inst A is:
  -bash-3.2# hostname
  domU-12-31-39-09-CD-C2
  -bash-3.2#
  -bash-3.2# nslookup domU-12-31-39-09-CD-C2
  Server:               172.16.0.23
  Address:      172.16.0.23#53
Non-authoritative answer:
  Name: domU-12-31-39-09-CD-C2.compute-1.internal
  Address: 10.210.210.48
-bash-3.2# cd .ssh
  -bash-3.2#
  -bash-3.2# cat config
  Host *
          IdentityFile /root/.ssh/.derobee/.kagi
          IdentitiesOnly yes
          BatchMode yes
  -bash-3.2#
  -bash-3.2# ll config
  -rw-r--r-- 1 root root 103 Feb 15 17:18 config
  -bash-3.2#
  -bash-3.2# chmod 600 config
  -bash-3.2#
  -bash-3.2# # show I can go to inst B without password/passphrase
  -bash-3.2#
  -bash-3.2# ssh domU-12-31-39-09-E6-71.compute-1.internal
  Last login: Tue Feb 15 17:18:46 2011 from 10.210.210.48
  -bash-3.2#
  -bash-3.2# hostname
  domU-12-31-39-09-E6-71
  -bash-3.2#
  -bash-3.2# nslookup `hostname`
  Server:               172.16.0.23
  Address:      172.16.0.23#53
Non-authoritative answer:
  Name: domU-12-31-39-09-E6-71.compute-1.internal
  Address: 10.210.233.123
-bash-3.2# # and back to inst A is also no problem
  -bash-3.2#
  -bash-3.2# ssh domU-12-31-39-09-CD-C2.compute-1.internal
  Last login: Tue Feb 15 17:36:19 2011 from 63.193.205.1
  -bash-3.2#
  -bash-3.2# hostname
  domU-12-31-39-09-CD-C2
  -bash-3.2#
  -bash-3.2# # log out twice to go back to inst A
  -bash-3.2# exit
  logout
  Connection to domU-12-31-39-09-CD-C2.compute-1.internal closed.
  -bash-3.2#
  -bash-3.2# exit
  logout
  Connection to domU-12-31-39-09-E6-71.compute-1.internal closed.
  -bash-3.2#
  -bash-3.2# hostname
  domU-12-31-39-09-CD-C2
  -bash-3.2#
  -bash-3.2# cd ..
  -bash-3.2#
  -bash-3.2# pwd
  /root
  -bash-3.2#
  -bash-3.2# ll
  total 8
  -rw-r--r-- 1 root root 260 Feb 15 17:24 app.ac
  -rw-r--r-- 1 root root 130 Feb 15 17:34 app.ac2
  -bash-3.2#
  -bash-3.2# cat app.ac
  -H domU-12-31-39-09-CD-C2.compute-1.internal -np 1 /bin/hostname
  -H domU-12-31-39-09-CD-C2.compute-1.internal -np 1 /bin/hostname
  -H domU-12-31-39-09-E6-71.compute-1.internal -np 1 /bin/hostname
  -H domU-12-31-39-09-E6-71.compute-1.internal -np 1 /bin/hostname
  -bash-3.2#
  -bash-3.2# # when there is a remote machine (bottome 2 lines) it hangs
  -bash-3.2# mpirun -app app.ac
  mpirun: killing job...
--------------------------------------------------------------------------
  mpirun noticed that the job aborted, but has no info as to the process
  that caused that situation.
  --------------------------------------------------------------------------
  --------------------------------------------------------------------------
  mpirun was unable to cleanly terminate the daemons on the nodes shown
  below. Additional manual cleanup may be required - please refer to
  the "orte-clean" tool for assistance.
  --------------------------------------------------------------------------
domU-12-31-39-09-E6-71.compute-1.internal - daemon did not report back when launched
  -bash-3.2#
  -bash-3.2# cat app.ac2
  -H domU-12-31-39-09-CD-C2.compute-1.internal -np 1 /bin/hostname
  -H domU-12-31-39-09-CD-C2.compute-1.internal -np 1 /bin/hostname
  -bash-3.2#
  -bash-3.2# # when there is no remote machine, then mpirun works:
  -bash-3.2# mpirun -app app.ac2
  domU-12-31-39-09-CD-C2
  domU-12-31-39-09-CD-C2
  -bash-3.2#
  -bash-3.2# hostname
  domU-12-31-39-09-CD-C2
  -bash-3.2#
  -bash-3.2# # this gotta be ssh problem....
  -bash-3.2#
  -bash-3.2# # show no firewall is used
  -bash-3.2# iptables --list
  Chain INPUT (policy ACCEPT)
   target     prot opt source               destination

  Chain FORWARD (policy ACCEPT)
  target     prot opt source               destination

  Chain OUTPUT (policy ACCEPT)
  target     prot opt source               destination
  -bash-3.2#
  -bash-3.2# exit
  logout
  [tsakai@vixen ec2]$

Would someone please point out what I am doing wrong?

Thank you.

Regards,

Tena

Hi Tena

Nothing wrong that I can see.
Just another couple of suggestions,
based on somewhat vague possibilities.

A slight difference is that on vixen and dashen you ran the
MPI hostname tests as a regular user, not as root, right?
Not sure if this will make much of a difference,
but it may be worth trying to run it as a regular user in EC2 also.
I general most people avoid running user applications (MPI programs included) as root.
Mostly for safety, but I wonder if there are any
implications in the 'rootly powers'
regarding the under-the-hood processes that OpenMPI
launches along with the actual user programs.

This may make no difference either,
but you could do a 'service iptables status',
to see if the service is running, even though there are
no explicit iptable rules (as per your email).
If the service is not running you get
'Firewall is stopped.' (in CentOS).
I *think* 'iptables --list' loads the iptables module into the
kernel, as a side effect, whereas the service command does not.
So, it may be cleaner (safer?) to use the service version
instead of 'iptables --list'.
I don't know if it will make any difference,
but just in case, if the service is running,
why not do 'service iptables stop',
and perhaps also 'chkconfig iptables off' to be completely
free of iptables?

Gus Correa

Reply via email to