Correct -- it doesn't make sense to specify both include *and* exclude: by 
specifying one, you're implicitly (but exactly/precisely) specifying the other.

My suggestion would be to use positive notation, not negative notation.  For 
example:

mpirun --mca btl tcp,self --mca btl_tcp_if_include eth0 ...

That way, you *know* you're only getting the TCP and self BTLs, and you *know* 
you're only getting eth0.  If that works, then spread out from there, e.g.:

mpirun --mca btl tcp,sm,self --mca btl_tcp_if_include eth0,eth1 ...

E.g., also include the "sm" BTL (which is only used for shared memory 
communications between 2 procs on the same server, and is therefore useless for 
a 2-proc-across-2-server run of osu_bw, but you get the idea), but also use 
eth0 and eth1.  

And so on.

The problem with using ^openib and/or btl_tcp_if_exclude is that you might end 
up using some BTLs and/or TCP interfaces that you don't expect, and therefore 
can run into problems.

Make sense?



On Sep 20, 2013, at 11:17 AM, Ralph Castain <r...@open-mpi.org> wrote:

> I don't think you are allowed to specify both include and exclude options at 
> the same time as they conflict - you should either exclude ib0 or include 
> eth0 (or whatever).
> 
> My guess is that the various nodes are trying to communicate across disjoint 
> networks. We've seen that before when, for example, eth0 on one node is on 
> one subnet, and eth0 on another node is on a different subnet. You might look 
> for that kind of arrangement.
> 
> 
> On Sep 20, 2013, at 8:05 AM, "Elken, Tom" <tom.el...@intel.com> wrote:
> 
>>> The trouble is when I try to add some "--mca" parameters to force it to
>>> use TCP/Ethernet, the program seems to hang.  I get the headers of the
>>> "osu_bw" output, but no results, even on the first case (1 byte payload
>>> per packet).  This is occurring on both the IB-enabled nodes, and on the
>>> Ethernet-only nodes.  The specific syntax I was using was:  "mpirun
>>> --mca btl ^openib --mca btl_tcp_if_exclude ib0 ./osu_bw"
>> 
>> When we want to run over TCP and IPoIB on an IB/PSM equipped cluster, we use:
>> --mca btl sm --mca btl tcp,self --mca btl_tcp_if_exclude eth0 --mca 
>> btl_tcp_if_include ib0 --mca mtl ^psm
>> 
>> based on this, it looks like the following might work for you:
>> --mca btl sm,tcp,self --mca btl_tcp_if_exclude ib0 --mca btl_tcp_if_include 
>> eth0 --mca btl ^openib
>> 
>> If you don't have ib0 ports configured on the IB nodes, probably you don't 
>> need the" --mca btl_tcp_if_exclude ib0."
>> 
>> -Tom
>> 
>>> 
>>> The problem occurs at least with OpenMPI 1.6.3 compiled with GNU 4.4
>>> compilers, with 1.6.3 compiled with Intel 13.0.1 compilers, and with
>>> 1.6.5 compiled with Intel 13.0.1 compilers.  I haven't tested any other
>>> combinations yet.
>>> 
>>> Any ideas here?  It's very possible this is a system configuration
>>> problem, but I don't know where to look.  At this point, any ideas would
>>> be welcome, either about the specific situation, or general pointers on
>>> mpirun debugging flags to use.  I can't find much in the docs yet on
>>> run-time debugging for OpenMPI, as opposed to debugging the application.
>>> Maybe I'm just looking in the wrong place.
>>> 
>>> 
>>> Thanks,
>>> 
>>> --
>>> Lloyd Brown
>>> Systems Administrator
>>> Fulton Supercomputing Lab
>>> Brigham Young University
>>> http://marylou.byu.edu
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to