[tsakai@vixen ec2]$
[tsakai@vixen ec2]$ ssh -i $MYKEY
tsa...@ec2-50-17-24-195.compute-1.amazonaws.com
Last login: Wed Feb 16 06:50:08 2011 from 63.193.205.1
__| __|_ ) Amazon Linux AMI
_| ( / Beta
___|\___|___|
See /usr/share/doc/amzn-ami/image-release-notes for latest release notes.
:-)
[tsakai@domU-12-31-39-16-75-1E ~]$
[tsakai@domU-12-31-39-16-75-1E ~]$ # show firewall is off
[tsakai@domU-12-31-39-16-75-1E ~]$ service iptables status
-bash: service: command not found
[tsakai@domU-12-31-39-16-75-1E ~]$ sudo service iptables status
iptables: Firewall is not running.
[tsakai@domU-12-31-39-16-75-1E ~]$ # show I can go to inst B with no
password authentication
[tsakai@domU-12-31-39-16-75-1E ~]$ ssh
domU-12-31-39-16-4E-4C.compute-1.internal
Last login: Wed Feb 16 06:53:14 2011 from
domu-12-31-39-16-75-1e.compute-1.internal
__| __|_ ) Amazon Linux AMI
_| ( / Beta
___|\___|___|
See /usr/share/doc/amzn-ami/image-release-notes for latest release notes.
:-)
[tsakai@domU-12-31-39-16-4E-4C ~]$
[tsakai@domU-12-31-39-16-4E-4C ~]$ # also back to inst A
[tsakai@domU-12-31-39-16-4E-4C ~]$
[tsakai@domU-12-31-39-16-4E-4C ~]$ ssh
domU-12-31-39-16-75-1E.compute-1.internal
Last login: Wed Feb 16 06:58:33 2011 from 63.193.205.1
__| __|_ ) Amazon Linux AMI
_| ( / Beta
___|\___|___|
See /usr/share/doc/amzn-ami/image-release-notes for latest release notes.
:-)
[tsakai@domU-12-31-39-16-75-1E ~]$
[tsakai@domU-12-31-39-16-75-1E ~]$ # OK
[tsakai@domU-12-31-39-16-75-1E ~]$ # back to inst B
[tsakai@domU-12-31-39-16-75-1E ~]$ exit
logout
Connection to domU-12-31-39-16-75-1E.compute-1.internal closed.
[tsakai@domU-12-31-39-16-4E-4C ~]$
[tsakai@domU-12-31-39-16-4E-4C ~]$ env | grep LD_LIB
LD_LIBRARY_PATH=:/usr/local/lib
[tsakai@domU-12-31-39-16-4E-4C ~]$ # show no firewall on inst B
[tsakai@domU-12-31-39-16-4E-4C ~]$ sudo service iptables status
iptables: Firewall is not running.
[tsakai@domU-12-31-39-16-4E-4C ~]$
[tsakai@domU-12-31-39-16-4E-4C ~]$ # go back to inst A
[tsakai@domU-12-31-39-16-4E-4C ~]$ exit
logout
Connection to domU-12-31-39-16-4E-4C.compute-1.internal closed.
[tsakai@domU-12-31-39-16-75-1E ~]$
[tsakai@domU-12-31-39-16-75-1E ~]$ env | grep LD_LIB
LD_LIBRARY_PATH=:/usr/local/lib
[tsakai@domU-12-31-39-16-75-1E ~]$
[tsakai@domU-12-31-39-16-75-1E ~]$ cat app.ac
-H domU-12-31-39-16-75-1E.compute-1.internal -np 1 /bin/hostname
-H domU-12-31-39-16-75-1E.compute-1.internal -np 1 /bin/hostname
-H domU-12-31-39-16-4E-4C.compute-1.internal -np 1 /bin/hostname
-H domU-12-31-39-16-4E-4C.compute-1.internal -np 1 /bin/hostname
[tsakai@domU-12-31-39-16-75-1E ~]$
[tsakai@domU-12-31-39-16-75-1E ~]$ # top 2 are inst A (this machine);
bottom 2 are remote inst (inst B)
[tsakai@domU-12-31-39-16-75-1E ~]$
[tsakai@domU-12-31-39-16-75-1E ~]$ mpirun -app app.ac
^Cmpirun: killing job...
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons on the nodes shown
below. Additional manual cleanup may be required - please refer to
the "orte-clean" tool for assistance.
--------------------------------------------------------------------------
domU-12-31-39-16-4E-4C.compute-1.internal - daemon did not report
back when launched
[tsakai@domU-12-31-39-16-75-1E ~]$
[tsakai@domU-12-31-39-16-75-1E ~]$ # *** daemon did not report back when
launched ***
[tsakai@domU-12-31-39-16-75-1E ~]$
[tsakai@domU-12-31-39-16-75-1E ~]$
[tsakai@domU-12-31-39-16-75-1E ~]$ cat app.ac2
-H domU-12-31-39-16-75-1E.compute-1.internal -np 1 /bin/hostname
-H domU-12-31-39-16-75-1E.compute-1.internal -np 1 /bin/hostname
[tsakai@domU-12-31-39-16-75-1E ~]$
[tsakai@domU-12-31-39-16-75-1E ~]$ # they refer to this instance (inst A)
[tsakai@domU-12-31-39-16-75-1E ~]$ mpirun -app app.ac2
domU-12-31-39-16-75-1E
domU-12-31-39-16-75-1E
[tsakai@domU-12-31-39-16-75-1E ~]$
[tsakai@domU-12-31-39-16-75-1E ~]$ # that's no problem
[tsakai@domU-12-31-39-16-75-1E ~]$
[tsakai@domU-12-31-39-16-75-1E ~]$ cd .ssh
[tsakai@domU-12-31-39-16-75-1E .ssh]$
[tsakai@domU-12-31-39-16-75-1E .ssh]$ cat config
Host *
IdentityFile /home/tsakai/.ssh/tsakai
IdentitiesOnly yes
BatchMode yes
[tsakai@domU-12-31-39-16-75-1E .ssh]$
[tsakai@domU-12-31-39-16-75-1E .ssh]$ mv config config.svd
[tsakai@domU-12-31-39-16-75-1E .ssh]$
[tsakai@domU-12-31-39-16-75-1E .ssh]$ cat config.svd > config
[tsakai@domU-12-31-39-16-75-1E .ssh]$
[tsakai@domU-12-31-39-16-75-1E .ssh]$ ll config
-rw-rw-r-- 1 tsakai tsakai 81 Feb 16 07:06 config
[tsakai@domU-12-31-39-16-75-1E .ssh]$
[tsakai@domU-12-31-39-16-75-1E .ssh]$ chmod 600 config
[tsakai@domU-12-31-39-16-75-1E .ssh]$
[tsakai@domU-12-31-39-16-75-1E .ssh]$ cat config
Host *
IdentityFile /home/tsakai/.ssh/tsakai
IdentitiesOnly yes
BatchMode yes
[tsakai@domU-12-31-39-16-75-1E .ssh]$
[tsakai@domU-12-31-39-16-75-1E .ssh]$ cat - >> config
LogLevel DEBUG3
[tsakai@domU-12-31-39-16-75-1E .ssh]$
[tsakai@domU-12-31-39-16-75-1E .ssh]$ cat config
Host *
IdentityFile /home/tsakai/.ssh/tsakai
IdentitiesOnly yes
BatchMode yes
LogLevel DEBUG3
[tsakai@domU-12-31-39-16-75-1E .ssh]$
[tsakai@domU-12-31-39-16-75-1E .ssh]$ ll config
-rw------- 1 tsakai tsakai 98 Feb 16 07:07 config
[tsakai@domU-12-31-39-16-75-1E .ssh]$
[tsakai@domU-12-31-39-16-75-1E .ssh]$ cd ..
[tsakai@domU-12-31-39-16-75-1E ~]$
[tsakai@domU-12-31-39-16-75-1E ~]$ mpirun -app app.ac
debug2: ssh_connect: needpriv 0
debug1: Connecting to domU-12-31-39-16-4E-4C.compute-1.internal
[10.96.77.182] port 22.
debug1: Connection established.
debug3: Not a RSA1 key file /home/tsakai/.ssh/tsakai.
debug2: key_type_from_name: unknown key type '-----BEGIN'
debug3: key_read: missing keytype
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug2: key_type_from_name: unknown key type '-----END'
debug3: key_read: missing keytype
debug1: identity file /home/tsakai/.ssh/tsakai type -1
debug1: Remote protocol version 2.0, remote software version OpenSSH_5.3
debug1: match: OpenSSH_5.3 pat OpenSSH*
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_5.3
debug2: fd 3 setting O_NONBLOCK
debug1: SSH2_MSG_KEXINIT sent
debug3: Wrote 792 bytes for a total of 813
debug1: SSH2_MSG_KEXINIT received
debug2: kex_parse_kexinit:
diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diff
ie-hellman-group14-sha1,diffie-hellman-group1-sha1
debug2: kex_parse_kexinit: ssh-rsa,ssh-dss
debug2: kex_parse_kexinit:
aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc,b
lowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.l
iu.se
debug2: kex_parse_kexinit:
aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc,b
lowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.l
iu.se
debug2: kex_parse_kexinit:
hmac-md5,hmac-sha1,umac...@openssh.com,hmac-ripemd160,hmac-ripemd160@openssh
.com,hmac-sha1-96,hmac-md5-96
debug2: kex_parse_kexinit:
hmac-md5,hmac-sha1,umac...@openssh.com,hmac-ripemd160,hmac-ripemd160@openssh
.com,hmac-sha1-96,hmac-md5-96
debug2: kex_parse_kexinit: none,z...@openssh.com,zlib
debug2: kex_parse_kexinit: none,z...@openssh.com,zlib
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit: first_kex_follows 0
debug2: kex_parse_kexinit: reserved 0
debug2: kex_parse_kexinit:
diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diff
ie-hellman-group14-sha1,diffie-hellman-group1-sha1
debug2: kex_parse_kexinit: ssh-rsa,ssh-dss
debug2: kex_parse_kexinit:
aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc,b
lowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.l
iu.se
debug2: kex_parse_kexinit:
aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc,b
lowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.l
iu.se
debug2: kex_parse_kexinit:
hmac-md5,hmac-sha1,umac...@openssh.com,hmac-ripemd160,hmac-ripemd160@openssh
.com,hmac-sha1-96,hmac-md5-96
debug2: kex_parse_kexinit:
hmac-md5,hmac-sha1,umac...@openssh.com,hmac-ripemd160,hmac-ripemd160@openssh
.com,hmac-sha1-96,hmac-md5-96
debug2: kex_parse_kexinit: none,z...@openssh.com
debug2: kex_parse_kexinit: none,z...@openssh.com
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit: first_kex_follows 0
debug2: kex_parse_kexinit: reserved 0
debug2: mac_setup: found hmac-md5
debug1: kex: server->client aes128-ctr hmac-md5 none
debug2: mac_setup: found hmac-md5
debug1: kex: client->server aes128-ctr hmac-md5 none
debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent
debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP
debug3: Wrote 24 bytes for a total of 837
debug2: dh_gen_key: priv key bits set: 125/256
debug2: bits set: 489/1024
debug1: SSH2_MSG_KEX_DH_GEX_INIT sent
debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY
debug3: Wrote 144 bytes for a total of 981
debug3: check_host_in_hostfile: filename /home/tsakai/.ssh/known_hosts
debug3: check_host_in_hostfile: match line 1
debug3: check_host_in_hostfile: filename /home/tsakai/.ssh/known_hosts
debug3: check_host_in_hostfile: match line 1
debug1: Host 'domu-12-31-39-16-4e-4c.compute-1.internal' is known and
matches the RSA host key.
debug1: Found key in /home/tsakai/.ssh/known_hosts:1
debug2: bits set: 491/1024
debug1: ssh_rsa_verify: signature correct
debug2: kex_derive_keys
debug2: set_newkeys: mode 1
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug3: Wrote 16 bytes for a total of 997
debug2: set_newkeys: mode 0
debug1: SSH2_MSG_NEWKEYS received
debug1: SSH2_MSG_SERVICE_REQUEST sent
debug3: Wrote 48 bytes for a total of 1045
debug2: service_accept: ssh-userauth
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug2: key: /home/tsakai/.ssh/tsakai ((nil))
debug3: Wrote 64 bytes for a total of 1109
debug1: Authentications that can continue: publickey
debug3: start over, passed a different list publickey
debug3: preferred gssapi-with-mic,publickey
debug3: authmethod_lookup publickey
debug3: remaining preferred: ,publickey
debug3: authmethod_is_enabled publickey
debug1: Next authentication method: publickey
debug1: Trying private key: /home/tsakai/.ssh/tsakai
debug1: read PEM private key done: type RSA
debug3: sign_and_send_pubkey
debug2: we sent a publickey packet, wait for reply
debug3: Wrote 384 bytes for a total of 1493
debug1: Authentication succeeded (publickey).
debug2: fd 4 setting O_NONBLOCK
debug1: channel 0: new [client-session]
debug3: ssh_session2_open: channel_new: 0
debug2: channel 0: send open
debug1: Requesting no-more-sessi...@openssh.com
debug1: Entering interactive session.
debug3: Wrote 128 bytes for a total of 1621
debug2: callback start
debug2: client_session2_setup: id 0
debug1: Sending environment.
debug3: Ignored env HOSTNAME
debug3: Ignored env TERM
debug3: Ignored env SHELL
debug3: Ignored env HISTSIZE
debug3: Ignored env EC2_AMITOOL_HOME
debug3: Ignored env SSH_CLIENT
debug3: Ignored env SSH_TTY
debug3: Ignored env USER
debug3: Ignored env LD_LIBRARY_PATH
debug3: Ignored env LS_COLORS
debug3: Ignored env EC2_HOME
debug3: Ignored env MAIL
debug3: Ignored env PATH
debug3: Ignored env INPUTRC
debug3: Ignored env PWD
debug3: Ignored env JAVA_HOME
debug1: Sending env LANG = en_US.UTF-8
debug2: channel 0: request env confirm 0
debug3: Ignored env AWS_CLOUDWATCH_HOME
debug3: Ignored env AWS_IAM_HOME
debug3: Ignored env SHLVL
debug3: Ignored env HOME
debug3: Ignored env AWS_PATH
debug3: Ignored env AWS_AUTO_SCALING_HOME
debug3: Ignored env LOGNAME
debug3: Ignored env AWS_ELB_HOME
debug3: Ignored env SSH_CONNECTION
debug3: Ignored env LESSOPEN
debug3: Ignored env AWS_RDS_HOME
debug3: Ignored env G_BROKEN_FILENAMES
debug3: Ignored env _
debug3: Ignored env OLDPWD
debug3: Ignored env OMPI_MCA_plm
debug1: Sending command: orted --daemonize -mca ess env -mca
orte_ess_jobid 125566976 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 2
--hnp-uri "125566976.0;tcp://10.96.118.236:56064"
debug2: channel 0: request exec confirm 1
debug2: fd 3 setting TCP_NODELAY
debug2: callback done
debug2: channel 0: open confirm rwindow 0 rmax 32768
debug3: Wrote 272 bytes for a total of 1893
debug2: channel 0: rcvd adjust 2097152
debug2: channel_input_status_confirm: type 99 id 0
debug2: exec request accepted on channel 0
debug2: channel 0: read<=0 rfd 4 len 0
debug2: channel 0: read failed
debug2: channel 0: close_read
debug2: channel 0: input open -> drain
debug2: channel 0: ibuf empty
debug2: channel 0: send eof
debug2: channel 0: input drain -> closed
debug3: Wrote 32 bytes for a total of 1925
debug2: channel 0: rcvd eof
debug2: channel 0: output open -> drain
debug2: channel 0: obuf empty
debug2: channel 0: close_write
debug2: channel 0: output drain -> closed
debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
debug2: channel 0: rcvd close
debug3: channel 0: will not send data after close
debug2: channel 0: almost dead
debug2: channel 0: gc: notify user
debug2: channel 0: gc: user detached
debug2: channel 0: send close
debug2: channel 0: is dead
debug2: channel 0: garbage collecting
debug1: channel 0: free: client-session, nchannels 1
debug3: channel 0: status: The following connections are open:
#0 client-session (t4 r0 i3/0 o3/0 fd -1/-1 cfd -1)
debug3: channel 0: close_fds r -1 w -1 e 6 c -1
debug3: Wrote 32 bytes for a total of 1957
debug3: Wrote 64 bytes for a total of 2021
debug1: fd 0 clearing O_NONBLOCK
Transferred: sent 1840, received 1896 bytes, in 0.1 seconds
Bytes per second: sent 18384.8, received 18944.3
debug1: Exit status 0
# it is hanging; I am about to issue control-C
^Cmpirun: killing job...
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons on the nodes shown
below. Additional manual cleanup may be required - please refer to
the "orte-clean" tool for assistance.
--------------------------------------------------------------------------
domU-12-31-39-16-4E-4C.compute-1.internal - daemon did not report
back when launched
[tsakai@domU-12-31-39-16-75-1E ~]$
[tsakai@domU-12-31-39-16-75-1E ~]$ # it says the same thing, i.e.,
[tsakai@domU-12-31-39-16-75-1E ~]$ # daemon did not report back when
launched
[tsakai@domU-12-31-39-16-75-1E ~]$
[tsakai@domU-12-31-39-16-75-1E ~]$ # what does that mean?
[tsakai@domU-12-31-39-16-75-1E ~]$ # ssh doesn't say anything alarming...
[tsakai@domU-12-31-39-16-75-1E ~]$
[tsakai@domU-12-31-39-16-75-1E ~]$ # I give up
[tsakai@domU-12-31-39-16-75-1E ~]$
[tsakai@domU-12-31-39-16-75-1E ~]$ exit
logout
[tsakai@vixen ec2]$
[tsakai@vixen ec2]$
Do you see anything strange?
One final question: On ssh man page, it mentions a few environmental
varialbles. SSH_ASKPASS, SSH_AUTH_SOCK, SSH_CONNECTION, etc. Do
any of these matter as far as openMPI is concerned?
Thank you, Gus.
Regards,
Tena
On 2/15/11 5:09 PM, "Gus Correa" <g...@ldeo.columbia.edu> wrote:
Tena Sakai wrote:
Hi,
I am trying to reproduce what I was able to show last Friday on Amazon
EC2 instances, but I am having a problem. What I was able to show last
Friday as root was with this command:
mpirun app app.ac
with app.ac being:
-H dns-entry-A np 1 (linux command)
-H dns-entry-A np 1 (linux command)
-H dns-entry-B np 1 (linux command)
-H dns-entry-B np 1 (linux command)
Here¹s the config file in root¹s .ssh directory:
Host *
IdentityFile /root/.ssh/.derobee/.kagi
IdentitiesOnly yes
BatchMode yes
Yesterday and today I can¹t get this to work. I made the last part of
app.ac
file simpler (it now says /bin/hostname). Below is the session:
-bash-3.2#
-bash-3.2# # I am on instance A, host name for inst A is:
-bash-3.2# hostname
domU-12-31-39-09-CD-C2
-bash-3.2#
-bash-3.2# nslookup domU-12-31-39-09-CD-C2
Server: 172.16.0.23
Address: 172.16.0.23#53
Non-authoritative answer:
Name: domU-12-31-39-09-CD-C2.compute-1.internal
Address: 10.210.210.48
-bash-3.2# cd .ssh
-bash-3.2#
-bash-3.2# cat config
Host *
IdentityFile /root/.ssh/.derobee/.kagi
IdentitiesOnly yes
BatchMode yes
-bash-3.2#
-bash-3.2# ll config
-rw-r--r-- 1 root root 103 Feb 15 17:18 config
-bash-3.2#
-bash-3.2# chmod 600 config
-bash-3.2#
-bash-3.2# # show I can go to inst B without password/passphrase
-bash-3.2#
-bash-3.2# ssh domU-12-31-39-09-E6-71.compute-1.internal
Last login: Tue Feb 15 17:18:46 2011 from 10.210.210.48
-bash-3.2#
-bash-3.2# hostname
domU-12-31-39-09-E6-71
-bash-3.2#
-bash-3.2# nslookup `hostname`
Server: 172.16.0.23
Address: 172.16.0.23#53
Non-authoritative answer:
Name: domU-12-31-39-09-E6-71.compute-1.internal
Address: 10.210.233.123
-bash-3.2# # and back to inst A is also no problem
-bash-3.2#
-bash-3.2# ssh domU-12-31-39-09-CD-C2.compute-1.internal
Last login: Tue Feb 15 17:36:19 2011 from 63.193.205.1
-bash-3.2#
-bash-3.2# hostname
domU-12-31-39-09-CD-C2
-bash-3.2#
-bash-3.2# # log out twice to go back to inst A
-bash-3.2# exit
logout
Connection to domU-12-31-39-09-CD-C2.compute-1.internal closed.
-bash-3.2#
-bash-3.2# exit
logout
Connection to domU-12-31-39-09-E6-71.compute-1.internal closed.
-bash-3.2#
-bash-3.2# hostname
domU-12-31-39-09-CD-C2
-bash-3.2#
-bash-3.2# cd ..
-bash-3.2#
-bash-3.2# pwd
/root
-bash-3.2#
-bash-3.2# ll
total 8
-rw-r--r-- 1 root root 260 Feb 15 17:24 app.ac
-rw-r--r-- 1 root root 130 Feb 15 17:34 app.ac2
-bash-3.2#
-bash-3.2# cat app.ac
-H domU-12-31-39-09-CD-C2.compute-1.internal -np 1 /bin/hostname
-H domU-12-31-39-09-CD-C2.compute-1.internal -np 1 /bin/hostname
-H domU-12-31-39-09-E6-71.compute-1.internal -np 1 /bin/hostname
-H domU-12-31-39-09-E6-71.compute-1.internal -np 1 /bin/hostname
-bash-3.2#
-bash-3.2# # when there is a remote machine (bottome 2 lines) it hangs
-bash-3.2# mpirun -app app.ac
mpirun: killing job...
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons on the nodes shown
below. Additional manual cleanup may be required - please refer to
the "orte-clean" tool for assistance.
--------------------------------------------------------------------------
domU-12-31-39-09-E6-71.compute-1.internal - daemon did not
report back when launched
-bash-3.2#
-bash-3.2# cat app.ac2
-H domU-12-31-39-09-CD-C2.compute-1.internal -np 1 /bin/hostname
-H domU-12-31-39-09-CD-C2.compute-1.internal -np 1 /bin/hostname
-bash-3.2#
-bash-3.2# # when there is no remote machine, then mpirun works:
-bash-3.2# mpirun -app app.ac2
domU-12-31-39-09-CD-C2
domU-12-31-39-09-CD-C2
-bash-3.2#
-bash-3.2# hostname
domU-12-31-39-09-CD-C2
-bash-3.2#
-bash-3.2# # this gotta be ssh problem....
-bash-3.2#
-bash-3.2# # show no firewall is used
-bash-3.2# iptables --list
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
-bash-3.2#
-bash-3.2# exit
logout
[tsakai@vixen ec2]$
Would someone please point out what I am doing wrong?
Thank you.
Regards,
Tena
Hi Tena
Nothing wrong that I can see.
Just another couple of suggestions,
based on somewhat vague possibilities.
A slight difference is that on vixen and dashen you ran the
MPI hostname tests as a regular user, not as root, right?
Not sure if this will make much of a difference,
but it may be worth trying to run it as a regular user in EC2 also.
I general most people avoid running user applications (MPI programs
included) as root.
Mostly for safety, but I wonder if there are any
implications in the 'rootly powers'
regarding the under-the-hood processes that OpenMPI
launches along with the actual user programs.
This may make no difference either,
but you could do a 'service iptables status',
to see if the service is running, even though there are
no explicit iptable rules (as per your email).
If the service is not running you get
'Firewall is stopped.' (in CentOS).
I *think* 'iptables --list' loads the iptables module into the
kernel, as a side effect, whereas the service command does not.
So, it may be cleaner (safer?) to use the service version
instead of 'iptables --list'.
I don't know if it will make any difference,
but just in case, if the service is running,
why not do 'service iptables stop',
and perhaps also 'chkconfig iptables off' to be completely
free of iptables?
Gus Correa
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users