Hi Reuti, > your local machine is Linux like, but the execution hosts > are Macs? I saw the /Users/tsakai/... in your output.
No, my environment is entirely linux. The path to my home directory on one host (blitzen) has been known as /Users/tsakai, despite it is an nfs mount from vixen (which is known to itself as /home/tsakai). For historical reasons, I have chosen to give a symbolic link named /Users to vixen's /Home, so that I can use consistent path for both vixen and blitzen. > Is this a private cluster (or at least private interfaces)? > It would also be an option to use hostbased authentication, > which will avoid setting any known_hosts file or passphraseless > ssh-keys for each user. No, it is not a private cluster. It is Amazon EC2. When I Ssh from my local machine (vixen) I use its public interface, but to address from one amazon cluster node to the other I use nodes' private dns names: domU-12-31-39-07-35-21 and domU-12-31-39-06-74-E2. Both public and private dns names change from a launch to another. I am using passphrasesless ssh-keys for authentication in all cases, i.e., from vixen to Amazon node A, from amazon node A to amazon node B, and from Amazon node B back to A. (Please see my initail post. There is a session dialogue for this.) They all work without authen- tication dialogue, except a brief initial dialogue: The authenticity of host 'domu-xx-xx-xx-xx-xx-x (10.xx.xx.xx)' can't be established. RSA key fingerprint is e3:ad:75:b1:a4:63:7f:0f:c4:0b:10:71:f3:2f:21:81. Are you sure you want to continue connecting (yes/no)? to which I say "yes." But I am unclear with what you mean by "hostbased authentication"? Doesn't that mean with password? If so, it is not an option. Regards, Tena On 2/10/11 2:27 AM, "Reuti" <re...@staff.uni-marburg.de> wrote: > Hi, > > your local machine is Linux like, but the execution hosts are Macs? I saw the > /Users/tsakai/... in your output. > > a) executing a command on them is also working, e.g.: ssh > domU-12-31-39-07-35-21 ls > > Am 10.02.2011 um 07:08 schrieb Tena Sakai: > >> Hi, >> >> I have made a bit of progress(?)... >> I made a config file in my .ssh directory on the cloud. It looks like: >> # machine A >> Host domU-12-31-39-07-35-21.compute-1.internal > > This is just an abbreviation or nickname above. To use the specified settings, > it's necessary to specify exactly this name. When the settings are the same > anyway for all machines, you can use: > > Host * > IdentityFile /home/tsakai/.ssh/tsakai > IdentitiesOnly yes > BatchMode yes > > instead. > > Is this a private cluster (or at least private interfaces)? It would also be > an option to use hostbased authentication, which will avoid setting any > known_hosts file or passphraseless ssh-keys for each user. > > -- Reuti > > >> HostName domU-12-31-39-07-35-21 >> BatchMode yes >> IdentityFile /home/tsakai/.ssh/tsakai >> ChallengeResponseAuthentication no >> IdentitiesOnly yes >> >> # machine B >> Host domU-12-31-39-06-74-E2.compute-1.internal >> HostName domU-12-31-39-06-74-E2 >> BatchMode yes >> IdentityFile /home/tsakai/.ssh/tsakai >> ChallengeResponseAuthentication no >> IdentitiesOnly yes >> >> This file exists on both machine A and machine B. >> >> Now When I issue mpirun command as below: >> [tsakai@domU-12-31-39-06-74-E2 ~]$ mpirun -app app.ac2 >> >> It hungs. I control-C out of it and I get: >> mpirun: killing job... >> >> >> -------------------------------------------------------------------------- >> mpirun noticed that the job aborted, but has no info as to the process >> that caused that situation. >> >> -------------------------------------------------------------------------- >> >> -------------------------------------------------------------------------- >> mpirun was unable to cleanly terminate the daemons on the nodes shown >> below. Additional manual cleanup may be required - please refer to >> the "orte-clean" tool for assistance. >> >> -------------------------------------------------------------------------- >> domU-12-31-39-07-35-21.compute-1.internal - daemon did not report >> back when launched >> >> Am I making progress? >> >> Does this mean I am past authentication and something else is the problem? >> Does someone have an example .ssh/config file I can look at? There are so >> many keyword-argument paris for this config file and I would like to look at >> some very basic one that works. >> >> Thank you. >> >> Tena Sakai >> tsa...@gallo.ucsf.edu >> >> On 2/9/11 7:52 PM, "Tena Sakai" <tsa...@gallo.ucsf.edu> wrote: >> >>> Hi >>> >>> I have an app.ac1 file like below: >>> [tsakai@vixen local]$ cat app.ac1 >>> -H vixen.egcrc.org -np 1 Rscript >>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 5 >>> -H vixen.egcrc.org -np 1 Rscript >>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 6 >>> -H blitzen.egcrc.org -np 1 Rscript >>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 7 >>> -H blitzen.egcrc.org -np 1 Rscript >>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 8 >>> >>> The program I run is >>> Rscript /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R x >>> Where x is [5..8]. The machines vixen and blitzen each run 2 runs. >>> >>> Here¹s the program fib.R: >>> [ tsakai@vixen local]$ cat fib.R >>> # fib() computes, given index n, fibonacci number iteratively >>> # here's the first dozen sequence (indexed from 0..11) >>> # 1, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89 >>> >>> fib <- function( n ) { >>> a <- 0 >>> b <- 1 >>> for ( i in 1:n ) { >>> t <- b >>> b <- a >>> a <- a + t >>> } >>> a >>> >>> arg <- commandArgs( TRUE ) >>> myHost <- system( 'hostname', intern=TRUE ) >>> cat( fib(arg), myHost, '\n' ) >>> >>> It reads an argument from command line and produces a fibonacci number that >>> corresponds to that index, followed by the machine name. Pretty simple >>> stuff. >>> >>> Here¹s the run output: >>> [tsakai@vixen local]$ mpirun -app app.ac1 >>> 5 vixen.egcrc.org >>> 8 vixen.egcrc.org >>> 13 blitzen.egcrc.org >>> 21 blitzen.egcrc.org >>> >>> Which is exactly what I expect. So far so good. >>> >>> Now I want to run the same thing on cloud. I launch 2 instances of the same >>> virtual machine, to which I get to by: >>> [tsakai@vixen local]$ ssh A I ~/.ssh/tsakai >>> machine-instance-A-public-dns >>> >>> Now I am on machine A: >>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ >>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ # and I can go to machine B without >>> password authentication, >>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ # i.e., use public/private key >>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ >>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ hostname >>> domU-12-31-39-00-D1-F2 >>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ ssh -i .ssh/tsakai >>> domU-12-31-39-0C-C8-01 >>> Last login: Wed Feb 9 20:51:48 2011 from 10.254.214.4 >>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ >>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ # I am now on machine B >>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ hostname >>> domU-12-31-39-0C-C8-01 >>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ >>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ # now show I can get to machine A >>> without using password >>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ >>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ ssh -i .ssh/tsakai >>> domU-12-31-39-00-D1-F2 >>> The authenticity of host 'domu-12-31-39-00-d1-f2 (10.254.214.4)' can't >>> be established. >>> RSA key fingerprint is e3:ad:75:b1:a4:63:7f:0f:c4:0b:10:71:f3:2f:21:81. >>> Are you sure you want to continue connecting (yes/no)? yes >>> Warning: Permanently added 'domu-12-31-39-00-d1-f2' (RSA) to the list of >>> known hosts. >>> Last login: Wed Feb 9 20:49:34 2011 from 10.215.203.239 >>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ >>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ hostname >>> domU-12-31-39-00-D1-F2 >>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ >>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ exit >>> logout >>> Connection to domU-12-31-39-00-D1-F2 closed. >>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ >>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ exit >>> logout >>> Connection to domU-12-31-39-0C-C8-01 closed. >>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ >>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ # back at machine A >>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ hostname >>> domU-12-31-39-00-D1-F2 >>> >>> As you can see, neither machine uses password for authentication; it uses >>> public/private key pairs. There is no problem (that I can see) for ssh >>> invocation >>> from one machine to the other. This is so because I have a copy of public >>> key >>> and a copy of private key on each instance. >>> >>> The app.ac file is identical, except the node names: >>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ cat app.ac1 >>> -H domU-12-31-39-00-D1-F2 -np 1 Rscript /home/tsakai/fib.R 5 >>> -H domU-12-31-39-00-D1-F2 -np 1 Rscript /home/tsakai/fib.R 6 >>> -H domU-12-31-39-0C-C8-01 -np 1 Rscript /home/tsakai/fib.R 7 >>> -H domU-12-31-39-0C-C8-01 -np 1 Rscript /home/tsakai/fib.R 8 >>> >>> Here¹s what happens with mpirun: >>> >>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ mpirun -app app.ac1 >>> tsakai@domu-12-31-39-0c-c8-01's password: >>> Permission denied, please try again. >>> tsakai@domu-12-31-39-0c-c8-01's password: mpirun: killing job... >>> >>> >>> -------------------------------------------------------------------------- >>> mpirun noticed that the job aborted, but has no info as to the process >>> that caused that situation. >>> >>> -------------------------------------------------------------------------- >>> >>> mpirun: clean termination accomplished >>> >>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ >>> >>> Mpirun (or somebody else?) asks me password, which I don¹t have. >>> I end up typing control-C. >>> >>> Here¹s my question: >>> How can I get past authentication by mpirun where there is no password? >>> >>> I would appreciate your help/insight greatly. >>> >>> Thank you. >>> >>> Tena Sakai >>> tsa...@gallo.ucsf.edu >>> >>> >>> >>> >>> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users