Hi Barnet,

Thank you for your post.
It was security group setting.  Here’s my entry

   Connection Method    Protocol    From port    To port   Source (IP or group)
    All                                    tcp               0                  
65535    intra

I didn’t want to use 0.0.0.0/0 for source.  What intra is the very name of this
security group and by having this field set as the name of the security group
it enables all instances belonging to this security group, intra, to communicate
with each other.

I certainly didn’t have OMPI_MCA_plm_rsh_agent variable set; nor have I touched
/etc/ssh/ssh_config file.  Just fixing the security group, all examples I have 
been using
 started working.

Obviously, I need to stury and experiment more with security group, not to 
mention
OpenMPI environment variables, but I am starting to see the light at the other 
end
of the tunnel.

Thank you for sharing tips.

Regards,

Tena


On 2/17/11 8:56 PM, "Barnet Wagman" <b...@norbl.com> wrote:

  Tena

 Earlier today I was able to successfully get a

     submission host[ec2 instance 0] <-> slave [ec2 instance 1]

 configuration to work.  I haven't fully digested your "this must be an ssh ... 
" thread.  But here are few things that I found it
 necessary to do, in order to get things working.

 (i) First and foremost is the ec2 security group.  The 'default' group will 
probably not work.  ompi randomly chooses ports. I think that some ranges are 
excluded, but I was too lazy to find out, so I just opened everything up, 
creating a group that includes the line

 Connection Method    Protocol    From port    To port   Source (IP or group)
  All                              tcp             0                  65535   
0.0.0.0/0

 Of course this could be insecure, depending how your instance is configured.  
Since I have no services running except ssh, I'm don't foresee any problems.

 (ii) Since you have ssh working, this probably is irrelevant: by default when 
ompi uses ssh, it attempts to log into the remote host using the local user 
name, and will use the rsa file  $USER/.ssh/id_rsa. However, you can explicitly 
set these by specifying the ssh command in an MCA param, e.g.

  OMPI_MCA_plm_rsh_agent="ssh -i rsa_file -l ec2-user"; export 
OMPI_MCA_plm_rsh_agent

 And the rsa file must have mode 600.

 (iii) To supress the ssh authenticity test, I added

    UserKnownHostsFile /dev/null
    StrictHostKeyChecking no
 to /etc/ssh/ssh_config

 Hope one of these helps.

 bw

 On 2/17/11 6:11 PM, Tena Sakai wrote:
 Re: [OMPI users] How are IP addresses determined? Hi Barnet,

 > If I understand you correctly, the configuration you're trying to use Is
 >  submission host[ec2 instance 0] <-> slave [ec2 instance 1]

 Correct.

 > but have you tried using the public/external uri?

 I just did.  It didn’t make a bit of difference.
 I also tried IP addresses and that didn’t get me anywhere either.

 Jeff earlier gave me steps to follow, which I am about to embark on.
 May I suggest you follow a thread with heading “This must be ssh
 problem, but I can't figure out what it is...”

 Regards,

 Tena


 On 2/17/11 10:05 AM, "Barnet Wagman" <b...@norbl.com> wrote:


  Tena,

  If I understand you correctly, the configuration you're trying to use is


submission host[ec2 instance 0] <-> slave [ec2 instance 1]


 I haven't tried this yet (although I will in the next few days).

  I've tried


(a)  submission host[non-ec2 system with static IP, direct net connection] <-> 
slave [ec2 instance 1]
  (b)  submission host[non-ec2 system with local static IP, connected to net 
via router] <-> slave [ec2 instance 1]


 (a) works, (b) does not, presumably because opmpi does not support NAT (see 
Jeff Squyres comments, later in the thread).


  I notice that you're using the 'internal' uri to specify hostnames. This 
makes sense in principle, but have you tried using the public/external uri?  
Presumably opmpi has to lookup these hostnames.  I don't know how that's done, 
but trying to lookup the internal uri might be a problem.

  If you try this (or anything else), I'd appreciate it if you'd post your 
results.

  bw


  On 2/17/11 4:08 AM, Tena Sakai wrote:

 Re: [OMPI users] How are IP addresses determined? Hi Barnet,

  Allow me to interject.
  Are you saying that you run master on your local machine and launching 
openMPI process on EC2?  You are saying that 1) tcp port 
tcp://192.168.1.101:35272 is on your local system and 2) the ec2 instance is 
trying to connect your local machine’s port 35272 , and hanging.  Is that 
correct?

  I have just a bit different situation.  I am running 2 ec2 instances and 
trying to run mpirun on both instances.  My ssh debug output looks quite 
similar to yours and mpirun behavior also very similar.  Here’s what I captured:
    Sending command:  orted --daemonize -mca ess env -mca orte_ess_jobid 
1025769472 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 2 --hnp-uri 
"1025769472.0;tcp://10.118.23.4:60941"
  And here’s what I did on the instance from which I issued mpirun:
    [tsakai@ip-10-118-23-4 ~]$ nslookup `hostname`
    Server:         172.16.0.23
    Address:        172.16.0.23#53

    Non-authoritative answer:
    Name:   ip-10-118-23-4.ec2.internal
    Address: 10.118.23.4

  So that tcp port does belong to this instance.  Furthermore, it cannot come 
into it.  No router (which may perform address translation?) is involved and it 
appears the same thing as what you describe is happening.  Incidentally, here’s 
how I ran mpirun:
    [tsakai@ip-10-118-23-4 ~]$ mpirun -app app.ac
  With app.ac file:
    [tsakai@ip-10-118-23-4 ~]$ cat app.ac
    -H ip-10-118-23-4.ec2.internal -np 1 /bin/hostname
    -H ip-10-118-23-4.ec2.internal -np 1 /bin/hostname
    -H ip-10-118-18-172.ec2.internal -np 1 /bin/hostname
    -H ip-10-118-18-172.ec2.internal -np 1 /bin/hostname

  The first two lines spawns /bin/hostname on this instance 
(ip-10-118-23-4.ec2.internal) and the bottom 2 lines on the remote instance.
  Here’s the security group used for these instances:

    connetion       protocol   from     to      source
    -------------        -----------   ------    -----   ------------
    SSH                 tcp           22      22    0.0.0.0/0

  Am I making sense?

  Regards,

  Tena




  On 2/16/11 8:56 PM, "Barnet Wagman" <b...@norbl.com>  wrote:



  I've run into a problem involving accessing a remote host via a router and I 
think need to understand how opmpi determines ip addresses.  If there's 
anything posted on this subject, please point me to it.

   Here's the problem:

   I've installed opmpi (1.4.3) on a remote system (an Amazon ec2 instance).  
If the local system I'm working on has a static ip address (and a direct 
connection to the internet), there's no problem.  But if the local system 
accesses the internet through a router (which itself gets it's ip via dhcp), a 
call to runmpi command hangs.

   This is not firewall problem - I've disabled the firewalls on all the system 
that are involved (and the router).

   It is also not an ssh problem.  The ssh connection is being made and it 
appears that the application has been launched on the remote system.  After the 
runmpi command has been launched locally, a ps on the remote system shows a 
process



orted --daemonize -mca ess env -mca orte_ess_jobid 1187643392 -mca 
orte_ess_vpid 1 -mca orte_ess_num_procs 2 --hnp-uri 
1187643392.0;tcp://192.168.1.101:35272




   While I don't really understand the orted process, I assume this indicates 
that a command to execute an app has been received and that opmpi is trying to 
run it.

   I suspect that the problem is related to the '--hnp-uri ... 
tcp://192.168.1.101' argument.  192.168.1.101 is the address of my local system 
on my local network (attached to the router), which of course is not accessible 
over the net.  It appears that opmpi is transmitting the local (static) ip 
address to the remote host.

   It would help to know how opmpi determines and distributes IP addresses.  
And if there's any way to control this.

   Any thoughts on dealing with this would be greatly appreciated.

   Thanks,

   bw







 _______________________________________________
 users mailing list
 us...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/users








_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Reply via email to