Interesting.  I was taking the approach of "only exclude what you're
certain you don't want" (the native IB and TCP/IPoIB stuff) since I
wasn't confident enough in my knowledge of the OpenMPI internals, to
know what I should explicitly include.

However, taking Jeff's suggestion, this does seem to work, and gives me
the expected Ethernet performance:

"mpirun --mca btl tcp,sm,self --mca btl_tcp_if_include em1 ./osu_bw"

So, in short, I'm still not sure why my exclude syntax doesn't work.
But the include-driven syntax that Jeff suggested, does seem to work.  I
admit I'm still curious to understand how to get OpenMPI to give me the
details of what's going on.  But the immediate problem of getting the
numbers out of osu_bw and osu_latency, seems to be solved.

Thanks everyone.  I really appreciate it.


--
Lloyd Brown
Systems Administrator
Fulton Supercomputing Lab
Brigham Young University
http://marylou.byu.edu

On 09/20/2013 09:33 AM, Jeff Squyres (jsquyres) wrote:
> Correct -- it doesn't make sense to specify both include *and* exclude: by 
> specifying one, you're implicitly (but exactly/precisely) specifying the 
> other.
> 
> My suggestion would be to use positive notation, not negative notation.  For 
> example:
> 
> mpirun --mca btl tcp,self --mca btl_tcp_if_include eth0 ...
> 
> That way, you *know* you're only getting the TCP and self BTLs, and you 
> *know* you're only getting eth0.  If that works, then spread out from there, 
> e.g.:
> 
> mpirun --mca btl tcp,sm,self --mca btl_tcp_if_include eth0,eth1 ...
> 
> E.g., also include the "sm" BTL (which is only used for shared memory 
> communications between 2 procs on the same server, and is therefore useless 
> for a 2-proc-across-2-server run of osu_bw, but you get the idea), but also 
> use eth0 and eth1.  
> 
> And so on.
> 
> The problem with using ^openib and/or btl_tcp_if_exclude is that you might 
> end up using some BTLs and/or TCP interfaces that you don't expect, and 
> therefore can run into problems.
> 
> Make sense?
> 
> 
> 
> On Sep 20, 2013, at 11:17 AM, Ralph Castain <r...@open-mpi.org> wrote:
> 
>> I don't think you are allowed to specify both include and exclude options at 
>> the same time as they conflict - you should either exclude ib0 or include 
>> eth0 (or whatever).
>>
>> My guess is that the various nodes are trying to communicate across disjoint 
>> networks. We've seen that before when, for example, eth0 on one node is on 
>> one subnet, and eth0 on another node is on a different subnet. You might 
>> look for that kind of arrangement.
>>
>>
>> On Sep 20, 2013, at 8:05 AM, "Elken, Tom" <tom.el...@intel.com> wrote:
>>
>>>> The trouble is when I try to add some "--mca" parameters to force it to
>>>> use TCP/Ethernet, the program seems to hang.  I get the headers of the
>>>> "osu_bw" output, but no results, even on the first case (1 byte payload
>>>> per packet).  This is occurring on both the IB-enabled nodes, and on the
>>>> Ethernet-only nodes.  The specific syntax I was using was:  "mpirun
>>>> --mca btl ^openib --mca btl_tcp_if_exclude ib0 ./osu_bw"
>>>
>>> When we want to run over TCP and IPoIB on an IB/PSM equipped cluster, we 
>>> use:
>>> --mca btl sm --mca btl tcp,self --mca btl_tcp_if_exclude eth0 --mca 
>>> btl_tcp_if_include ib0 --mca mtl ^psm
>>>
>>> based on this, it looks like the following might work for you:
>>> --mca btl sm,tcp,self --mca btl_tcp_if_exclude ib0 --mca btl_tcp_if_include 
>>> eth0 --mca btl ^openib
>>>
>>> If you don't have ib0 ports configured on the IB nodes, probably you don't 
>>> need the" --mca btl_tcp_if_exclude ib0."
>>>
>>> -Tom
>>>
>>>>
>>>> The problem occurs at least with OpenMPI 1.6.3 compiled with GNU 4.4
>>>> compilers, with 1.6.3 compiled with Intel 13.0.1 compilers, and with
>>>> 1.6.5 compiled with Intel 13.0.1 compilers.  I haven't tested any other
>>>> combinations yet.
>>>>
>>>> Any ideas here?  It's very possible this is a system configuration
>>>> problem, but I don't know where to look.  At this point, any ideas would
>>>> be welcome, either about the specific situation, or general pointers on
>>>> mpirun debugging flags to use.  I can't find much in the docs yet on
>>>> run-time debugging for OpenMPI, as opposed to debugging the application.
>>>> Maybe I'm just looking in the wrong place.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> --
>>>> Lloyd Brown
>>>> Systems Administrator
>>>> Fulton Supercomputing Lab
>>>> Brigham Young University
>>>> http://marylou.byu.edu
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 

Reply via email to