Interesting. I was taking the approach of "only exclude what you're certain you don't want" (the native IB and TCP/IPoIB stuff) since I wasn't confident enough in my knowledge of the OpenMPI internals, to know what I should explicitly include.
However, taking Jeff's suggestion, this does seem to work, and gives me the expected Ethernet performance: "mpirun --mca btl tcp,sm,self --mca btl_tcp_if_include em1 ./osu_bw" So, in short, I'm still not sure why my exclude syntax doesn't work. But the include-driven syntax that Jeff suggested, does seem to work. I admit I'm still curious to understand how to get OpenMPI to give me the details of what's going on. But the immediate problem of getting the numbers out of osu_bw and osu_latency, seems to be solved. Thanks everyone. I really appreciate it. -- Lloyd Brown Systems Administrator Fulton Supercomputing Lab Brigham Young University http://marylou.byu.edu On 09/20/2013 09:33 AM, Jeff Squyres (jsquyres) wrote: > Correct -- it doesn't make sense to specify both include *and* exclude: by > specifying one, you're implicitly (but exactly/precisely) specifying the > other. > > My suggestion would be to use positive notation, not negative notation. For > example: > > mpirun --mca btl tcp,self --mca btl_tcp_if_include eth0 ... > > That way, you *know* you're only getting the TCP and self BTLs, and you > *know* you're only getting eth0. If that works, then spread out from there, > e.g.: > > mpirun --mca btl tcp,sm,self --mca btl_tcp_if_include eth0,eth1 ... > > E.g., also include the "sm" BTL (which is only used for shared memory > communications between 2 procs on the same server, and is therefore useless > for a 2-proc-across-2-server run of osu_bw, but you get the idea), but also > use eth0 and eth1. > > And so on. > > The problem with using ^openib and/or btl_tcp_if_exclude is that you might > end up using some BTLs and/or TCP interfaces that you don't expect, and > therefore can run into problems. > > Make sense? > > > > On Sep 20, 2013, at 11:17 AM, Ralph Castain <r...@open-mpi.org> wrote: > >> I don't think you are allowed to specify both include and exclude options at >> the same time as they conflict - you should either exclude ib0 or include >> eth0 (or whatever). >> >> My guess is that the various nodes are trying to communicate across disjoint >> networks. We've seen that before when, for example, eth0 on one node is on >> one subnet, and eth0 on another node is on a different subnet. You might >> look for that kind of arrangement. >> >> >> On Sep 20, 2013, at 8:05 AM, "Elken, Tom" <tom.el...@intel.com> wrote: >> >>>> The trouble is when I try to add some "--mca" parameters to force it to >>>> use TCP/Ethernet, the program seems to hang. I get the headers of the >>>> "osu_bw" output, but no results, even on the first case (1 byte payload >>>> per packet). This is occurring on both the IB-enabled nodes, and on the >>>> Ethernet-only nodes. The specific syntax I was using was: "mpirun >>>> --mca btl ^openib --mca btl_tcp_if_exclude ib0 ./osu_bw" >>> >>> When we want to run over TCP and IPoIB on an IB/PSM equipped cluster, we >>> use: >>> --mca btl sm --mca btl tcp,self --mca btl_tcp_if_exclude eth0 --mca >>> btl_tcp_if_include ib0 --mca mtl ^psm >>> >>> based on this, it looks like the following might work for you: >>> --mca btl sm,tcp,self --mca btl_tcp_if_exclude ib0 --mca btl_tcp_if_include >>> eth0 --mca btl ^openib >>> >>> If you don't have ib0 ports configured on the IB nodes, probably you don't >>> need the" --mca btl_tcp_if_exclude ib0." >>> >>> -Tom >>> >>>> >>>> The problem occurs at least with OpenMPI 1.6.3 compiled with GNU 4.4 >>>> compilers, with 1.6.3 compiled with Intel 13.0.1 compilers, and with >>>> 1.6.5 compiled with Intel 13.0.1 compilers. I haven't tested any other >>>> combinations yet. >>>> >>>> Any ideas here? It's very possible this is a system configuration >>>> problem, but I don't know where to look. At this point, any ideas would >>>> be welcome, either about the specific situation, or general pointers on >>>> mpirun debugging flags to use. I can't find much in the docs yet on >>>> run-time debugging for OpenMPI, as opposed to debugging the application. >>>> Maybe I'm just looking in the wrong place. >>>> >>>> >>>> Thanks, >>>> >>>> -- >>>> Lloyd Brown >>>> Systems Administrator >>>> Fulton Supercomputing Lab >>>> Brigham Young University >>>> http://marylou.byu.edu >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > >