Hi, Am 10.02.2011 um 22:03 schrieb Tena Sakai:
> Hi Reuti, > > Thanks for suggesting "LogLevel DEBUG3." I did so and complete > session is captured in the attached file. > > What I did is much similar to what I have done before: verify > that ssh works and then run mpirun command. In my a bit lengthy > session log, there are two responses from "LogLevel DEBUG3." First > from an scp invocation and then from mpirun invocation. They both > say > debug1: Authentication succeeded (publickey). yes. I hoped to see the point where "Permission denied." is output, but when I reread your post now, it could mean that this was sloved already. I agree with Jeff, that right now it look like a firewall issue. -- Reuti >> From mpirun invocation, I see a line: > > debug1: Sending command: orted --daemonize -mca ess env -mca > orte_ess_jobid 3344891904 -mca orte_ess_vpid 1 -mca orte_ess_num_procs > 2 --hnp-uri "3344891904.0;tcp://10.194.95.239:54256" > The IP address at the end of the line is indeed that of machine B. > After that there was hanging and I controlled-C out of it, which > gave me more lines. But the lines after > debug1: Sending command: orted bla bla bla > doesn't look good to me. But, in truth, I have no idea what they > mean. > > If you could shed some light, I would appreciate it very much. > > Regards, > > Tena > > > On 2/10/11 10:57 AM, "Reuti" <re...@staff.uni-marburg.de> wrote: > >> Hi, >> >> Am 10.02.2011 um 19:11 schrieb Tena Sakai: >> >>>> your local machine is Linux like, but the execution hosts >>>> are Macs? I saw the /Users/tsakai/... in your output. >>> >>> No, my environment is entirely linux. The path to my home >>> directory on one host (blitzen) has been known as /Users/tsakai, >>> despite it is an nfs mount from vixen (which is known to >>> itself as /home/tsakai). For historical reasons, I have >>> chosen to give a symbolic link named /Users to vixen's /Home, >>> so that I can use consistent path for both vixen and blitzen. >> >> okay. Sometimes the protection of the home directory must be adjusted too, >> but >> as you can do it from the command line this shouldn't be an issue. >> >> >>>> Is this a private cluster (or at least private interfaces)? >>>> It would also be an option to use hostbased authentication, >>>> which will avoid setting any known_hosts file or passphraseless >>>> ssh-keys for each user. >>> >>> No, it is not a private cluster. It is Amazon EC2. When I >>> Ssh from my local machine (vixen) I use its public interface, >>> but to address from one amazon cluster node to the other I >>> use nodes' private dns names: domU-12-31-39-07-35-21 and >>> domU-12-31-39-06-74-E2. Both public and private dns names >>> change from a launch to another. I am using passphrasesless >>> ssh-keys for authentication in all cases, i.e., from vixen to >>> Amazon node A, from amazon node A to amazon node B, and from >>> Amazon node B back to A. (Please see my initail post. There >>> is a session dialogue for this.) They all work without authen- >>> tication dialogue, except a brief initial dialogue: >>> The authenticity of host 'domu-xx-xx-xx-xx-xx-x (10.xx.xx.xx)' >>> can't be established. >>> RSA key fingerprint is e3:ad:75:b1:a4:63:7f:0f:c4:0b:10:71:f3:2f:21:81. >>> Are you sure you want to continue connecting (yes/no)? >>> to which I say "yes." >>> But I am unclear with what you mean by "hostbased authentication"? >>> Doesn't that mean with password? If so, it is not an option. >> >> No. It's convenient inside a private cluster as it won't fill each users' >> known_hosts file and you don't need to create any ssh-keys. But when the >> hostname changes every time it might also create new hostkeys. It uses >> hostkeys (private and public), this way it works for all users. Just for >> reference: >> >> http://arc.liv.ac.uk/SGE/howto/hostbased-ssh.html >> >> You could look into it later. >> >> == >> >> - Can you try to use a command when connecting from A to B? E.g. ssh >> `domU-12-31-39-06-74-E2 ls`. Is this working too? >> >> - What about putting: >> >> LogLevel DEBUG3 >> >> In your ~/.ssh/config. Maybe we can see what he's trying to negotiate before >> it fails in verbose mode. >> >> >> -- Reuti >> >> >> >>> Regards, >>> >>> Tena >>> >>> >>> On 2/10/11 2:27 AM, "Reuti" <re...@staff.uni-marburg.de> wrote: >>> >>>> Hi, >>>> >>>> your local machine is Linux like, but the execution hosts are Macs? I saw >>>> the >>>> /Users/tsakai/... in your output. >>>> >>>> a) executing a command on them is also working, e.g.: ssh >>>> domU-12-31-39-07-35-21 ls >>>> >>>> Am 10.02.2011 um 07:08 schrieb Tena Sakai: >>>> >>>>> Hi, >>>>> >>>>> I have made a bit of progress(?)... >>>>> I made a config file in my .ssh directory on the cloud. It looks like: >>>>> # machine A >>>>> Host domU-12-31-39-07-35-21.compute-1.internal >>>> >>>> This is just an abbreviation or nickname above. To use the specified >>>> settings, >>>> it's necessary to specify exactly this name. When the settings are the same >>>> anyway for all machines, you can use: >>>> >>>> Host * >>>> IdentityFile /home/tsakai/.ssh/tsakai >>>> IdentitiesOnly yes >>>> BatchMode yes >>>> >>>> instead. >>>> >>>> Is this a private cluster (or at least private interfaces)? It would also >>>> be >>>> an option to use hostbased authentication, which will avoid setting any >>>> known_hosts file or passphraseless ssh-keys for each user. >>>> >>>> -- Reuti >>>> >>>> >>>>> HostName domU-12-31-39-07-35-21 >>>>> BatchMode yes >>>>> IdentityFile /home/tsakai/.ssh/tsakai >>>>> ChallengeResponseAuthentication no >>>>> IdentitiesOnly yes >>>>> >>>>> # machine B >>>>> Host domU-12-31-39-06-74-E2.compute-1.internal >>>>> HostName domU-12-31-39-06-74-E2 >>>>> BatchMode yes >>>>> IdentityFile /home/tsakai/.ssh/tsakai >>>>> ChallengeResponseAuthentication no >>>>> IdentitiesOnly yes >>>>> >>>>> This file exists on both machine A and machine B. >>>>> >>>>> Now When I issue mpirun command as below: >>>>> [tsakai@domU-12-31-39-06-74-E2 ~]$ mpirun -app app.ac2 >>>>> >>>>> It hungs. I control-C out of it and I get: >>>>> mpirun: killing job... >>>>> >>>>> >>>>> -------------------------------------------------------------------------- >>>>> mpirun noticed that the job aborted, but has no info as to the process >>>>> that caused that situation. >>>>> >>>>> -------------------------------------------------------------------------- >>>>> >>>>> -------------------------------------------------------------------------- >>>>> mpirun was unable to cleanly terminate the daemons on the nodes shown >>>>> below. Additional manual cleanup may be required - please refer to >>>>> the "orte-clean" tool for assistance. >>>>> >>>>> -------------------------------------------------------------------------- >>>>> domU-12-31-39-07-35-21.compute-1.internal - daemon did not report >>>>> back when launched >>>>> >>>>> Am I making progress? >>>>> >>>>> Does this mean I am past authentication and something else is the problem? >>>>> Does someone have an example .ssh/config file I can look at? There are so >>>>> many keyword-argument paris for this config file and I would like to look >>>>> at >>>>> some very basic one that works. >>>>> >>>>> Thank you. >>>>> >>>>> Tena Sakai >>>>> tsa...@gallo.ucsf.edu >>>>> >>>>> On 2/9/11 7:52 PM, "Tena Sakai" <tsa...@gallo.ucsf.edu> wrote: >>>>> >>>>>> Hi >>>>>> >>>>>> I have an app.ac1 file like below: >>>>>> [tsakai@vixen local]$ cat app.ac1 >>>>>> -H vixen.egcrc.org -np 1 Rscript >>>>>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 5 >>>>>> -H vixen.egcrc.org -np 1 Rscript >>>>>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 6 >>>>>> -H blitzen.egcrc.org -np 1 Rscript >>>>>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 7 >>>>>> -H blitzen.egcrc.org -np 1 Rscript >>>>>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 8 >>>>>> >>>>>> The program I run is >>>>>> Rscript /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R x >>>>>> Where x is [5..8]. The machines vixen and blitzen each run 2 runs. >>>>>> >>>>>> Here’s the program fib.R: >>>>>> [ tsakai@vixen local]$ cat fib.R >>>>>> # fib() computes, given index n, fibonacci number iteratively >>>>>> # here's the first dozen sequence (indexed from 0..11) >>>>>> # 1, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89 >>>>>> >>>>>> fib <- function( n ) { >>>>>> a <- 0 >>>>>> b <- 1 >>>>>> for ( i in 1:n ) { >>>>>> t <- b >>>>>> b <- a >>>>>> a <- a + t >>>>>> } >>>>>> a >>>>>> >>>>>> arg <- commandArgs( TRUE ) >>>>>> myHost <- system( 'hostname', intern=TRUE ) >>>>>> cat( fib(arg), myHost, '\n' ) >>>>>> >>>>>> It reads an argument from command line and produces a fibonacci number >>>>>> that >>>>>> corresponds to that index, followed by the machine name. Pretty simple >>>>>> stuff. >>>>>> >>>>>> Here’s the run output: >>>>>> [tsakai@vixen local]$ mpirun -app app.ac1 >>>>>> 5 vixen.egcrc.org >>>>>> 8 vixen.egcrc.org >>>>>> 13 blitzen.egcrc.org >>>>>> 21 blitzen.egcrc.org >>>>>> >>>>>> Which is exactly what I expect. So far so good. >>>>>> >>>>>> Now I want to run the same thing on cloud. I launch 2 instances of the >>>>>> same >>>>>> virtual machine, to which I get to by: >>>>>> [tsakai@vixen local]$ ssh –A –I ~/.ssh/tsakai >>>>>> machine-instance-A-public-dns >>>>>> >>>>>> Now I am on machine A: >>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ >>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ # and I can go to machine B without >>>>>> password authentication, >>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ # i.e., use public/private key >>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ >>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ hostname >>>>>> domU-12-31-39-00-D1-F2 >>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ ssh -i .ssh/tsakai >>>>>> domU-12-31-39-0C-C8-01 >>>>>> Last login: Wed Feb 9 20:51:48 2011 from 10.254.214.4 >>>>>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ >>>>>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ # I am now on machine B >>>>>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ hostname >>>>>> domU-12-31-39-0C-C8-01 >>>>>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ >>>>>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ # now show I can get to machine A >>>>>> without using password >>>>>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ >>>>>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ ssh -i .ssh/tsakai >>>>>> domU-12-31-39-00-D1-F2 >>>>>> The authenticity of host 'domu-12-31-39-00-d1-f2 (10.254.214.4)' can't >>>>>> be established. >>>>>> RSA key fingerprint is e3:ad:75:b1:a4:63:7f:0f:c4:0b:10:71:f3:2f:21:81. >>>>>> Are you sure you want to continue connecting (yes/no)? yes >>>>>> Warning: Permanently added 'domu-12-31-39-00-d1-f2' (RSA) to the list >>>>>> of >>>>>> known hosts. >>>>>> Last login: Wed Feb 9 20:49:34 2011 from 10.215.203.239 >>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ >>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ hostname >>>>>> domU-12-31-39-00-D1-F2 >>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ >>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ exit >>>>>> logout >>>>>> Connection to domU-12-31-39-00-D1-F2 closed. >>>>>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ >>>>>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ exit >>>>>> logout >>>>>> Connection to domU-12-31-39-0C-C8-01 closed. >>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ >>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ # back at machine A >>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ hostname >>>>>> domU-12-31-39-00-D1-F2 >>>>>> >>>>>> As you can see, neither machine uses password for authentication; it uses >>>>>> public/private key pairs. There is no problem (that I can see) for ssh >>>>>> invocation >>>>>> from one machine to the other. This is so because I have a copy of >>>>>> public >>>>>> key >>>>>> and a copy of private key on each instance. >>>>>> >>>>>> The app.ac file is identical, except the node names: >>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ cat app.ac1 >>>>>> -H domU-12-31-39-00-D1-F2 -np 1 Rscript /home/tsakai/fib.R 5 >>>>>> -H domU-12-31-39-00-D1-F2 -np 1 Rscript /home/tsakai/fib.R 6 >>>>>> -H domU-12-31-39-0C-C8-01 -np 1 Rscript /home/tsakai/fib.R 7 >>>>>> -H domU-12-31-39-0C-C8-01 -np 1 Rscript /home/tsakai/fib.R 8 >>>>>> >>>>>> Here’s what happens with mpirun: >>>>>> >>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ mpirun -app app.ac1 >>>>>> tsakai@domu-12-31-39-0c-c8-01's password: >>>>>> Permission denied, please try again. >>>>>> tsakai@domu-12-31-39-0c-c8-01's password: mpirun: killing job... >>>>>> >>>>>> >>>>>> -------------------------------------------------------------------------- >>>>>> mpirun noticed that the job aborted, but has no info as to the process >>>>>> that caused that situation. >>>>>> >>>>>> -------------------------------------------------------------------------- >>>>>> >>>>>> mpirun: clean termination accomplished >>>>>> >>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ >>>>>> >>>>>> Mpirun (or somebody else?) asks me password, which I don’t have. >>>>>> I end up typing control-C. >>>>>> >>>>>> Here’s my question: >>>>>> How can I get past authentication by mpirun where there is no password? >>>>>> >>>>>> I would appreciate your help/insight greatly. >>>>>> >>>>>> Thank you. >>>>>> >>>>>> Tena Sakai >>>>>> tsa...@gallo.ucsf.edu >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > <session4Reuti.text>_______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users