You can use the tcp_if_include/tcp_if_exclude with address ranges instead of 
names. ompi_info --mca btl tcp give you some hints:

>                  MCA btl: parameter "btl_tcp_if_include" (current value: 
> <none>, data
>                           source: default value)
>                           Comma-delimited list of devices or CIDR notation of 
> networks
>                           to use for MPI communication (e.g., "eth0,eth1" or
>                           "192.168.0.0/16,10.1.4.0/24").  Mutually exclusive 
> with
>                           btl_tcp_if_exclude.
>                  MCA btl: parameter "btl_tcp_if_exclude" (current value: 
> <lo,sppp>, data
>                           source: default value)
>                           Comma-delimited list of devices or CIDR notation of 
> networks
>                           to NOT use for MPI communication -- all devices not 
> matching
>                           these specifications will be used (e.g., 
> "eth0,eth1" or
>                           "192.168.0.0/16,10.1.4.0/24").  Mutually exclusive 
> with
>                           btl_tcp_if_include.
> 

  george.


On Nov 18, 2010, at 05:19 , Krzysztof Zarzycki wrote:

> We just discovered this ticket, which might describe the same problem that we 
> have:
> 
> https://svn.open-mpi.org/trac/ompi/ticket/1505
> 
> It seems unresolved... do you have a workaround for it? I've seen the "-mca 
> opal_net_private_ipv4 " parameter, but I don't exactly know how to use it... 
> At least my experiments failed to do anything.
> 
> I'll be very grateful for your help,
> Krzysztof
> 
> 
> 2010/11/17 Grzegorz Maj <ma...@wp.pl>
> 2010/11/11 Jeff Squyres <jsquy...@cisco.com>:
> > On Nov 11, 2010, at 3:23 PM, Krzysztof Zarzycki wrote:
> >
> >> No, unfortunately specification of interfaces is a little more 
> >> complicated...  eth0/1/2 is not common for both machines.
> >
> > Can you define "common"?  Do you mean that eth0 on one machine is on a 
> > different network then eth0 on the other machine?
> >
> > Is there any way that you can make them the same?  It would certainly make 
> > things easier.
> 
> Yes, they are on different networks and unfortunately we are not
> allowed to play with this.
> 
> >
> >> I've tried to play with (oob/btl)_tcp_ if_include, but actually... I don't 
> >> know exactly how.
> >
> > See my other mail:
> >
> >    http://www.open-mpi.org/community/lists/users/2010/11/14737.php
> >
> >> Anyway, do you have any ideas how to further debug the communication 
> >> problem?
> >
> > The connect() is not getting through somehow.  Sadly, we don't have enough 
> > debug messages to show exactly what is going wrong when these kinds of 
> > things happen; I have a half-finished branch that has much better 
> > debug/error messages, but I've never had the time to finish it (indeed, I 
> > think there's a bug in that development branch right now, otherwise I'd 
> > recommend giving it a whirl).  :-\
> 
> Analyzing the strace of both processes shows, that on both sides the
> call to 'poll' after connect/accept succeeds. As I understand they
> even exchange some information, which is always 8 bytes, like
> D\227\0\1\0\0\0\0. One of them sends this information and the other
> receives it. But after receiving, it does:
> 
> ----
> recv(8, "\5g\0\1\0\0\0\0", 8, 0)        = 8
> fcntl64(8, F_GETFL)                     = 0x2 (flags O_RDWR)
> fcntl64(8, F_SETFL, O_RDWR|O_NONBLOCK)  = 0
> getpeername(8, {sa_family=AF_INET, sin_port=htons(57885),
> sin_addr=inet_addr("10.0.0.2")}, [16]) = 0
> close(8)
> ----
> 
> In a working scenario (on another machines), after receiving, these
> bytes are resent and then proceeds the proper communication (my
> 'hello' message is sent).
> 
> The above address 10.0.0.2 is eth2 on the host machine, which indeed
> should be used in this communication.
> 
> While playing with network interfaces it came out, that when we bring
> down one of the aliases (eth2:0), it starts working. How should we
> enforce mpirun not to use this alias, when it's up? We were trying to
> use (oob/btl)_tcp_ if_exclude and specifying eth2:0, but it doesn't
> seem to help.
> 
> Regards,
> Grzegorz
> 
> 
> >
> > --
> > Jeff Squyres
> > jsquy...@cisco.com
> > For corporate legal information go to:
> > http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to