Hi Tena

Since root can but you can't,
is is a directory permission problem perhaps?
Check the execution directory permission (on both machines,
if this is not NFS mounted dir).
I am not sure, but IIRR OpenMPI also uses /tmp for
under-the-hood stuff, worth checking permissions there also.
Just a naive guess.

Congrats for all the progress with the cloudy MPI!

Gus Correa

Tena Sakai wrote:
Hi,

I have made a bit more progress.  I think I can say ssh authenti-
cation problem is behind me now.  I am still having a problem running
mpirun, but the latest discovery, which I can reproduce, is that
I can run mpirun as root.  Here's the session log:

  [tsakai@vixen ec2]$ 2ec2 ec2-184-73-104-242.compute-1.amazonaws.com
  Last login: Fri Feb 11 00:41:11 2011 from 10.100.243.195
  [tsakai@ip-10-195-198-31 ~]$
  [tsakai@ip-10-195-198-31 ~]$ ll
  total 8
  -rw-rw-r-- 1 tsakai tsakai 274 Feb 11 00:47 app.ac
  -rwxr-xr-x 1 tsakai tsakai 379 Feb 11 00:48 fib.R
  [tsakai@ip-10-195-198-31 ~]$
  [tsakai@ip-10-195-198-31 ~]$ ll .ssh
  total 16
  -rw------- 1 tsakai tsakai  232 Feb  5 23:19 authorized_keys
  -rw------- 1 tsakai tsakai  102 Feb 11 00:34 config
  -rw-r--r-- 1 tsakai tsakai 1302 Feb 11 00:36 known_hosts
  -rw------- 1 tsakai tsakai  887 Feb  8 22:03 tsakai
  [tsakai@ip-10-195-198-31 ~]$
  [tsakai@ip-10-195-198-31 ~]$ ssh ip-10-100-243-195.ec2.internal
  Last login: Fri Feb 11 00:36:20 2011 from 10.195.198.31
  [tsakai@ip-10-100-243-195 ~]$
  [tsakai@ip-10-100-243-195 ~]$ # I am on machine B
  [tsakai@ip-10-100-243-195 ~]$ hostname
  ip-10-100-243-195
  [tsakai@ip-10-100-243-195 ~]$
  [tsakai@ip-10-100-243-195 ~]$ ll
  total 8
  -rw-rw-r-- 1 tsakai tsakai 274 Feb 11 00:44 app.ac
  -rwxr-xr-x 1 tsakai tsakai 379 Feb 11 00:47 fib.R
  [tsakai@ip-10-100-243-195 ~]$
  [tsakai@ip-10-100-243-195 ~]$
  [tsakai@ip-10-100-243-195 ~]$ cat app.ac
  -H ip-10-195-198-31.ec2.internal -np 1 Rscript /home/tsakai/fib.R 5
  -H ip-10-195-198-31.ec2.internal -np 1 Rscript /home/tsakai/fib.R 6
  -H ip-10-100-243-195.ec2.internal -np 1 Rscript /home/tsakai/fib.R 7
  -H ip-10-100-243-195.ec2.internal -np 1 Rscript /home/tsakai/fib.R 8
  [tsakai@ip-10-100-243-195 ~]$
  [tsakai@ip-10-100-243-195 ~]$ # go back to machine A
  [tsakai@ip-10-100-243-195 ~]$
  [tsakai@ip-10-100-243-195 ~]$ exit
  logout
  Connection to ip-10-100-243-195.ec2.internal closed.
  [tsakai@ip-10-195-198-31 ~]$
  [tsakai@ip-10-195-198-31 ~]$ hostname
  ip-10-195-198-31
  [tsakai@ip-10-195-198-31 ~]$
  [tsakai@ip-10-195-198-31 ~]$ # Execute mpirun
  [tsakai@ip-10-195-198-31 ~]$
  [tsakai@ip-10-195-198-31 ~]$ mpirun -app app.ac
  --------------------------------------------------------------------------
  mpirun was unable to launch the specified application as it encountered an
error:

  Error: pipe function call failed when setting up I/O forwarding subsystem
  Node: ip-10-195-198-31

  while attempting to start process rank 0.
  --------------------------------------------------------------------------
  [tsakai@ip-10-195-198-31 ~]$
  [tsakai@ip-10-195-198-31 ~]$ # try it as root
  [tsakai@ip-10-195-198-31 ~]$
  [tsakai@ip-10-195-198-31 ~]$ sudo su
  bash-3.2#
  bash-3.2# pwd
  /home/tsakai
  bash-3.2#
  bash-3.2# ls -l /root/.ssh/config
  -rw------- 1 root root 103 Feb 11 00:56 /root/.ssh/config
  bash-3.2#
  bash-3.2# cat /root/.ssh/config
  Host *
          IdentityFile /root/.ssh/.derobee/.kagi
          IdentitiesOnly yes
          BatchMode yes
  bash-3.2#
  bash-3.2# pwd
  /home/tsakai
  bash-3.2#
  bash-3.2# ls -l
  total 8
  -rw-rw-r-- 1 tsakai tsakai 274 Feb 11 00:47 app.ac
  -rwxr-xr-x 1 tsakai tsakai 379 Feb 11 00:48 fib.R
  bash-3.2#
  bash-3.2# # now is the time for mpirun
  bash-3.2#
  bash-3.2# mpirun --app ./app.ac
  13 ip-10-100-243-195
  21 ip-10-100-243-195
  5 ip-10-195-198-31
  8 ip-10-195-198-31
  bash-3.2#
  bash-3.2# # It works (being root)!
  bash-3.2#
  bash-3.2# exit
  exit
  [tsakai@ip-10-195-198-31 ~]$
  [tsakai@ip-10-195-198-31 ~]$ # try it one more time as tsakai
  [tsakai@ip-10-195-198-31 ~]$
  [tsakai@ip-10-195-198-31 ~]$ mpirun --app app.ac
  --------------------------------------------------------------------------
  mpirun was unable to launch the specified application as it encountered an
error:

  Error: pipe function call failed when setting up I/O forwarding subsystem
  Node: ip-10-195-198-31

  while attempting to start process rank 0.
  --------------------------------------------------------------------------
  [tsakai@ip-10-195-198-31 ~]$
  [tsakai@ip-10-195-198-31 ~]$ # I don't get it.
  [tsakai@ip-10-195-198-31 ~]$
  [tsakai@ip-10-195-198-31 ~]$ exit
  logout
  [tsakai@vixen ec2]$

So, why does it say "pipe function call failed when setting up
I/O forwarding subsystem Node: ip-10-195-198-31" ?
The node it is referring to is not the remote machine.  It is
What I call machine A.  I first thought maybe this is a problem
With PATH variable.  But I don't think so.  I compared root's
Path to that of tsaki's and made them identical and retried.
I got the same behavior.

If you could enlighten me why this is happening, I would really
Appreciate it.

Thank you.

Tena


On 2/10/11 4:12 PM, "Tena Sakai" <tsa...@gallo.ucsf.edu> wrote:

Hi jeff,

Thanks for the firewall tip.  I tried it while allowing all tip traffic
and got interesting and preplexing result.  Here's what's interesting
(BTW, I got rid of "LogLevel DEBUG3" from ./ssh/config on this run):

   [tsakai@ip-10-203-21-132 ~]$
   [tsakai@ip-10-203-21-132 ~]$ mpirun --app app.ac2
   Host key verification failed.

--------------------------------------------------------------------------
   A daemon (pid 2743) died unexpectedly with status 255 while attempting
   to launch so we are aborting.

   There may be more information reported by the environment (see above).

   This may be because the daemon was unable to find all the needed shared
   libraries on the remote node. You may set your LD_LIBRARY_PATH to have
the
   location of the shared libraries on the remote nodes and this will
   automatically be forwarded to the remote nodes.

--------------------------------------------------------------------------

--------------------------------------------------------------------------
   mpirun noticed that the job aborted, but has no info as to the process
   that caused that situation.

--------------------------------------------------------------------------
   mpirun: clean termination accomplished

   [tsakai@ip-10-203-21-132 ~]$
   [tsakai@ip-10-203-21-132 ~]$ env | grep LD_LIB
   [tsakai@ip-10-203-21-132 ~]$
   [tsakai@ip-10-203-21-132 ~]$ # Let's set LD_LIBRARY_PATH to
/usr/local/lib
   [tsakai@ip-10-203-21-132 ~]$
   [tsakai@ip-10-203-21-132 ~]$
   [tsakai@ip-10-203-21-132 ~]$ export LD_LIBRARY_PATH='/usr/local/lib'
   [tsakai@ip-10-203-21-132 ~]$
   [tsakai@ip-10-203-21-132 ~]$ # I better to this on machine B as well
   [tsakai@ip-10-203-21-132 ~]$
   [tsakai@ip-10-203-21-132 ~]$ ssh -i tsakai ip-10-195-171-159
   Warning: Identity file tsakai not accessible: No such file or directory.
   Last login: Thu Feb 10 18:31:20 2011 from 10.203.21.132
   [tsakai@ip-10-195-171-159 ~]$
   [tsakai@ip-10-195-171-159 ~]$ export LD_LIBRARY_PATH='/usr/local/lib'
   [tsakai@ip-10-195-171-159 ~]$
   [tsakai@ip-10-195-171-159 ~]$ env | grep LD_LIB
   LD_LIBRARY_PATH=/usr/local/lib
   [tsakai@ip-10-195-171-159 ~]$
   [tsakai@ip-10-195-171-159 ~]$ # OK, now go bak to machine A
   [tsakai@ip-10-195-171-159 ~]$ exit
   logout
   Connection to ip-10-195-171-159 closed.
   [tsakai@ip-10-203-21-132 ~]$
   [tsakai@ip-10-203-21-132 ~]$ hostname
   ip-10-203-21-132
   [tsakai@ip-10-203-21-132 ~]$ # try mpirun again
   [tsakai@ip-10-203-21-132 ~]$
   [tsakai@ip-10-203-21-132 ~]$ mpirun --app app.ac2
   Host key verification failed.

--------------------------------------------------------------------------
   A daemon (pid 2789) died unexpectedly with status 255 while attempting
   to launch so we are aborting.

   There may be more information reported by the environment (see above).

   This may be because the daemon was unable to find all the needed shared
   libraries on the remote node. You may set your LD_LIBRARY_PATH to have
the
   location of the shared libraries on the remote nodes and this will
   automatically be forwarded to the remote nodes.

--------------------------------------------------------------------------

--------------------------------------------------------------------------
   mpirun noticed that the job aborted, but has no info as to the process
   that caused that situation.

--------------------------------------------------------------------------
   mpirun: clean termination accomplished

   [tsakai@ip-10-203-21-132 ~]$
   [tsakai@ip-10-203-21-132 ~]$ # I thought openmpi library was in
/usr/local/lib...
   [tsakai@ip-10-203-21-132 ~]$
   [tsakai@ip-10-203-21-132 ~]$ ll -t /usr/local/lib | less
   total 16604
   lrwxrwxrwx 1 root root      16 Feb  8 23:06 libfuse.so ->
libfuse.so.2.8.5
   lrwxrwxrwx 1 root root      16 Feb  8 23:06 libfuse.so.2 ->
libfuse.so.2.8.5
   lrwxrwxrwx 1 root root      25 Feb  8 23:06 libmca_common_sm.so ->
libmca_common_sm.so.1.0.0
   lrwxrwxrwx 1 root root      25 Feb  8 23:06 libmca_common_sm.so.1 ->
libmca_common_sm.so.1.0.0
   lrwxrwxrwx 1 root root      15 Feb  8 23:06 libmpi.so -> libmpi.so.0.0.2
   lrwxrwxrwx 1 root root      15 Feb  8 23:06 libmpi.so.0 ->
libmpi.so.0.0.2
   lrwxrwxrwx 1 root root      19 Feb  8 23:06 libmpi_cxx.so ->
libmpi_cxx.so.0.0.1
   lrwxrwxrwx 1 root root      19 Feb  8 23:06 libmpi_cxx.so.0 ->
libmpi_cxx.so.0.0.1
   lrwxrwxrwx 1 root root      19 Feb  8 23:06 libmpi_f77.so ->
libmpi_f77.so.0.0.1
   lrwxrwxrwx 1 root root      19 Feb  8 23:06 libmpi_f77.so.0 ->
libmpi_f77.so.0.0.1
   lrwxrwxrwx 1 root root      19 Feb  8 23:06 libmpi_f90.so ->
libmpi_f90.so.0.0.1
   lrwxrwxrwx 1 root root      19 Feb  8 23:06 libmpi_f90.so.0 ->
libmpi_f90.so.0.0.1
   lrwxrwxrwx 1 root root      20 Feb  8 23:06 libopen-pal.so ->
libopen-pal.so.0.0.0
   lrwxrwxrwx 1 root root      20 Feb  8 23:06 libopen-pal.so.0 ->
libopen-pal.so.0.0.0
   lrwxrwxrwx 1 root root      20 Feb  8 23:06 libopen-rte.so ->
libopen-rte.so.0.0.0
   lrwxrwxrwx 1 root root      20 Feb  8 23:06 libopen-rte.so.0 ->
libopen-rte.so.0.0.0
   lrwxrwxrwx 1 root root      26 Feb  8 23:06 libopenmpi_malloc.so ->
libopenmpi_malloc.so.0.0.0
   lrwxrwxrwx 1 root root      26 Feb  8 23:06 libopenmpi_malloc.so.0 ->
libopenmpi_malloc.so.0.0.0
   lrwxrwxrwx 1 root root      20 Feb  8 23:06 libulockmgr.so ->
libulockmgr.so.1.0.1
   lrwxrwxrwx 1 root root      20 Feb  8 23:06 libulockmgr.so.1 ->
libulockmgr.so.1.0.1
   lrwxrwxrwx 1 root root      16 Feb  8 23:06 libxml2.so ->
libxml2.so.2.7.2
   lrwxrwxrwx 1 root root      16 Feb  8 23:06 libxml2.so.2 ->
libxml2.so.2.7.2
   -rw-r--r-- 1 root root  385912 Jan 26 01:00 libvt.a
   [tsakai@ip-10-203-21-132 ~]$
   [tsakai@ip-10-203-21-132 ~]$ # Now, I am really confused...
   [tsakai@ip-10-203-21-132 ~]$

Do you know why it's complaining about shared libraries?

Thank you.

Tena


On 2/10/11 1:05 PM, "Jeff Squyres" <jsquy...@cisco.com> wrote:

Your prior mails were about ssh issues, but this one sounds like you might
have firewall issues.

That is, the "orted" command attempts to open a TCP socket back to mpirun for
various command and control reasons.  If it is blocked from doing so by a
firewall, Open MPI won't run.  In general, you can either disable your
firewall or you can setup a trust relationship for TCP connections within
your
cluster.



On Feb 10, 2011, at 1:03 PM, Tena Sakai wrote:

Hi Reuti,

Thanks for suggesting "LogLevel DEBUG3."  I did so and complete
session is captured in the attached file.

What I did is much similar to what I have done before: verify
that ssh works and then run mpirun command.  In my a bit lengthy
session log, there are two responses from "LogLevel DEBUG3."  First
from an scp invocation and then from mpirun invocation.  They both
say
   debug1: Authentication succeeded (publickey).

From mpirun invocation, I see a line:
   debug1: Sending command:  orted --daemonize -mca ess env -mca
orte_ess_jobid 3344891904 -mca orte_ess_vpid 1 -mca orte_ess_num_procs
   2 --hnp-uri "3344891904.0;tcp://10.194.95.239:54256"
The IP address at the end of the line is indeed that of machine B.
After that there was hanging and I controlled-C out of it, which
gave me more lines.  But the lines after
   debug1: Sending command:  orted bla bla bla
doesn't look good to me.  But, in truth, I have no idea what they
mean.

If you could shed some light, I would appreciate it very much.

Regards,

Tena


On 2/10/11 10:57 AM, "Reuti" <re...@staff.uni-marburg.de> wrote:

Hi,

Am 10.02.2011 um 19:11 schrieb Tena Sakai:

your local machine is Linux like, but the execution hosts
are Macs? I saw the /Users/tsakai/... in your output.
No, my environment is entirely linux.  The path to my home
directory on one host (blitzen) has been known as /Users/tsakai,
despite it is an nfs mount from vixen (which is known to
itself as /home/tsakai).  For historical reasons, I have
chosen to give a symbolic link named /Users to vixen's /Home,
so that I can use consistent path for both vixen and blitzen.
okay. Sometimes the protection of the home directory must be adjusted too,
but
as you can do it from the command line this shouldn't be an issue.


Is this a private cluster (or at least private interfaces)?
It would also be an option to use hostbased authentication,
which will avoid setting any known_hosts file or passphraseless
ssh-keys for each user.
No, it is not a private cluster.  It is Amazon EC2.  When I
Ssh from my local machine (vixen) I use its public interface,
but to address from one amazon cluster node to the other I
use nodes' private dns names: domU-12-31-39-07-35-21 and
domU-12-31-39-06-74-E2.  Both public and private dns names
change from a launch to another.  I am using passphrasesless
ssh-keys for authentication in all cases, i.e., from vixen to
Amazon node A, from amazon node A to amazon node B, and from
Amazon node B back to A.  (Please see my initail post.  There
is a session dialogue for this.)  They all work without authen-
tication dialogue, except a brief initial dialogue:
  The authenticity of host 'domu-xx-xx-xx-xx-xx-x (10.xx.xx.xx)'
  can't be established.
   RSA key fingerprint is e3:ad:75:b1:a4:63:7f:0f:c4:0b:10:71:f3:2f:21:81.
   Are you sure you want to continue connecting (yes/no)?
to which I say "yes."
But I am unclear with what you mean by "hostbased authentication"?
Doesn't that mean with password?  If so, it is not an option.
No. It's convenient inside a private cluster as it won't fill each users'
known_hosts file and you don't need to create any ssh-keys. But when the
hostname changes every time it might also create new hostkeys. It uses
hostkeys (private and public), this way it works for all users. Just for
reference:

http://arc.liv.ac.uk/SGE/howto/hostbased-ssh.html

You could look into it later.

==

- Can you try to use a command when connecting from A to B? E.g. ssh
`domU-12-31-39-06-74-E2 ls`. Is this working too?

- What about putting:

LogLevel DEBUG3

In your ~/.ssh/config. Maybe we can see what he's trying to negotiate
before
it fails in verbose mode.


-- Reuti



Regards,

Tena


On 2/10/11 2:27 AM, "Reuti" <re...@staff.uni-marburg.de> wrote:

Hi,

your local machine is Linux like, but the execution hosts are Macs? I saw
the
/Users/tsakai/... in your output.

a) executing a command on them is also working, e.g.: ssh
domU-12-31-39-07-35-21 ls

Am 10.02.2011 um 07:08 schrieb Tena Sakai:

Hi,

I have made a bit of progress(?)...
I made a config file in my .ssh directory on the cloud.  It looks like:
  # machine A
  Host domU-12-31-39-07-35-21.compute-1.internal
This is just an abbreviation or nickname above. To use the specified
settings,
it's necessary to specify exactly this name. When the settings are the
same
anyway for all machines, you can use:

Host *
  IdentityFile /home/tsakai/.ssh/tsakai
  IdentitiesOnly yes
  BatchMode yes

instead.

Is this a private cluster (or at least private interfaces)? It would also
be
an option to use hostbased authentication, which will avoid setting any
known_hosts file or passphraseless ssh-keys for each user.

-- Reuti


  HostName domU-12-31-39-07-35-21
  BatchMode yes
  IdentityFile /home/tsakai/.ssh/tsakai
  ChallengeResponseAuthentication no
  IdentitiesOnly yes

  # machine B
  Host domU-12-31-39-06-74-E2.compute-1.internal
  HostName domU-12-31-39-06-74-E2
  BatchMode yes
  IdentityFile /home/tsakai/.ssh/tsakai
  ChallengeResponseAuthentication no
  IdentitiesOnly yes

This file exists on both machine A and machine B.

Now When I issue mpirun command as below:
  [tsakai@domU-12-31-39-06-74-E2 ~]$ mpirun -app app.ac2

It hungs.  I control-C out of it and I get:
  mpirun: killing job...



------------------------------------------------------------------------->>>>>>
-
  mpirun noticed that the job aborted, but has no info as to the process
  that caused that situation.


------------------------------------------------------------------------->>>>>>
-

------------------------------------------------------------------------->>>>>>
-
  mpirun was unable to cleanly terminate the daemons on the nodes shown
  below. Additional manual cleanup may be required - please refer to
  the "orte-clean" tool for assistance.


------------------------------------------------------------------------->>>>>>
-
      domU-12-31-39-07-35-21.compute-1.internal - daemon did not report
back when launched

Am I making progress?

Does this mean I am past authentication and something else is the
problem?
Does someone have an example .ssh/config file I can look at?  There are
so
many keyword-argument paris for this config file and I would like to
look
at
some very basic one that works.

Thank you.

Tena Sakai
tsa...@gallo.ucsf.edu

On 2/9/11 7:52 PM, "Tena Sakai" <tsa...@gallo.ucsf.edu> wrote:

Hi

I have an app.ac1 file like below:
  [tsakai@vixen local]$ cat app.ac1
  -H vixen.egcrc.org   -np 1 Rscript
/Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 5
  -H vixen.egcrc.org   -np 1 Rscript
/Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 6
  -H blitzen.egcrc.org -np 1 Rscript
/Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 7
  -H blitzen.egcrc.org -np 1 Rscript
/Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 8

The program I run is
  Rscript /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R x
Where x is [5..8].  The machines vixen and blitzen each run 2 runs.

Here¹s the program fib.R:
  [ tsakai@vixen local]$ cat fib.R
      # fib() computes, given index n, fibonacci number iteratively
      # here's the first dozen sequence (indexed from 0..11)
      # 1, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89

  fib <- function( n ) {
          a <- 0
          b <- 1
          for ( i in 1:n ) {
               t <- b
               b <- a
               a <- a + t
          }
      a

  arg <- commandArgs( TRUE )
  myHost <- system( 'hostname', intern=TRUE )
  cat( fib(arg), myHost, '\n' )

It reads an argument from command line and produces a fibonacci number
that
corresponds to that index, followed by the machine name.  Pretty simple
stuff.

Here¹s the run output:
  [tsakai@vixen local]$ mpirun -app app.ac1
  5 vixen.egcrc.org
  8 vixen.egcrc.org
  13 blitzen.egcrc.org
  21 blitzen.egcrc.org

Which is exactly what I expect.  So far so good.

Now I want to run the same thing on cloud.  I launch 2 instances of the
same
virtual machine, to which I get to by:
  [tsakai@vixen local]$ ssh ­A ­I ~/.ssh/tsakai
machine-instance-A-public-dns

Now I am on machine A:
  [tsakai@domU-12-31-39-00-D1-F2 ~]$
  [tsakai@domU-12-31-39-00-D1-F2 ~]$ # and I can go to machine B
without
password authentication,
  [tsakai@domU-12-31-39-00-D1-F2 ~]$ # i.e., use public/private key
  [tsakai@domU-12-31-39-00-D1-F2 ~]$
  [tsakai@domU-12-31-39-00-D1-F2 ~]$ hostname
  domU-12-31-39-00-D1-F2
  [tsakai@domU-12-31-39-00-D1-F2 ~]$ ssh -i .ssh/tsakai
domU-12-31-39-0C-C8-01
  Last login: Wed Feb  9 20:51:48 2011 from 10.254.214.4
  [tsakai@domU-12-31-39-0C-C8-01 ~]$
  [tsakai@domU-12-31-39-0C-C8-01 ~]$ # I am now on machine B
  [tsakai@domU-12-31-39-0C-C8-01 ~]$ hostname
  domU-12-31-39-0C-C8-01
  [tsakai@domU-12-31-39-0C-C8-01 ~]$
  [tsakai@domU-12-31-39-0C-C8-01 ~]$ # now show I can get to machine A
without using password
  [tsakai@domU-12-31-39-0C-C8-01 ~]$
  [tsakai@domU-12-31-39-0C-C8-01 ~]$ ssh -i .ssh/tsakai
domU-12-31-39-00-D1-F2
  The authenticity of host 'domu-12-31-39-00-d1-f2 (10.254.214.4)'
can't
be established.
  RSA key fingerprint is
e3:ad:75:b1:a4:63:7f:0f:c4:0b:10:71:f3:2f:21:81.
  Are you sure you want to continue connecting (yes/no)? yes
  Warning: Permanently added 'domu-12-31-39-00-d1-f2' (RSA) to the list
of
known hosts.
  Last login: Wed Feb  9 20:49:34 2011 from 10.215.203.239
  [tsakai@domU-12-31-39-00-D1-F2 ~]$
  [tsakai@domU-12-31-39-00-D1-F2 ~]$ hostname
  domU-12-31-39-00-D1-F2
  [tsakai@domU-12-31-39-00-D1-F2 ~]$
  [tsakai@domU-12-31-39-00-D1-F2 ~]$ exit
  logout
  Connection to domU-12-31-39-00-D1-F2 closed.
  [tsakai@domU-12-31-39-0C-C8-01 ~]$
  [tsakai@domU-12-31-39-0C-C8-01 ~]$ exit
  logout
  Connection to domU-12-31-39-0C-C8-01 closed.
  [tsakai@domU-12-31-39-00-D1-F2 ~]$
  [tsakai@domU-12-31-39-00-D1-F2 ~]$ # back at machine A
  [tsakai@domU-12-31-39-00-D1-F2 ~]$ hostname
  domU-12-31-39-00-D1-F2

As you can see, neither machine uses password for authentication; it
uses
public/private key pairs.  There is no problem (that I can see) for ssh
invocation
from one machine to the other.  This is so because I have a copy of
public
key
and a copy of private key on each instance.

The app.ac file is identical, except the node names:
  [tsakai@domU-12-31-39-00-D1-F2 ~]$ cat app.ac1
  -H domU-12-31-39-00-D1-F2 -np 1 Rscript /home/tsakai/fib.R 5
  -H domU-12-31-39-00-D1-F2 -np 1 Rscript /home/tsakai/fib.R 6
  -H domU-12-31-39-0C-C8-01 -np 1 Rscript /home/tsakai/fib.R 7
  -H domU-12-31-39-0C-C8-01 -np 1 Rscript /home/tsakai/fib.R 8

Here¹s what happens with mpirun:

  [tsakai@domU-12-31-39-00-D1-F2 ~]$ mpirun -app app.ac1
  tsakai@domu-12-31-39-0c-c8-01's password:
  Permission denied, please try again.
  tsakai@domu-12-31-39-0c-c8-01's password: mpirun: killing job...



----------------------------------------------------------------------->>>>>>>>
-
--
  mpirun noticed that the job aborted, but has no info as to the
process
  that caused that situation.


----------------------------------------------------------------------->>>>>>>>
-
--

  mpirun: clean termination accomplished

  [tsakai@domU-12-31-39-00-D1-F2 ~]$

Mpirun (or somebody else?) asks me password, which I don¹t have.
I end up typing control-C.

Here¹s my question:
How can I get past authentication by mpirun where there is no password?

I would appreciate your help/insight greatly.

Thank you.

Tena Sakai
tsa...@gallo.ucsf.edu





_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
<session4Reuti.text>_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to