Correct -- it doesn't make sense to specify both include *and* exclude: by specifying one, you're implicitly (but exactly/precisely) specifying the other.
My suggestion would be to use positive notation, not negative notation. For example: mpirun --mca btl tcp,self --mca btl_tcp_if_include eth0 ... That way, you *know* you're only getting the TCP and self BTLs, and you *know* you're only getting eth0. If that works, then spread out from there, e.g.: mpirun --mca btl tcp,sm,self --mca btl_tcp_if_include eth0,eth1 ... E.g., also include the "sm" BTL (which is only used for shared memory communications between 2 procs on the same server, and is therefore useless for a 2-proc-across-2-server run of osu_bw, but you get the idea), but also use eth0 and eth1. And so on. The problem with using ^openib and/or btl_tcp_if_exclude is that you might end up using some BTLs and/or TCP interfaces that you don't expect, and therefore can run into problems. Make sense? On Sep 20, 2013, at 11:17 AM, Ralph Castain <r...@open-mpi.org> wrote: > I don't think you are allowed to specify both include and exclude options at > the same time as they conflict - you should either exclude ib0 or include > eth0 (or whatever). > > My guess is that the various nodes are trying to communicate across disjoint > networks. We've seen that before when, for example, eth0 on one node is on > one subnet, and eth0 on another node is on a different subnet. You might look > for that kind of arrangement. > > > On Sep 20, 2013, at 8:05 AM, "Elken, Tom" <tom.el...@intel.com> wrote: > >>> The trouble is when I try to add some "--mca" parameters to force it to >>> use TCP/Ethernet, the program seems to hang. I get the headers of the >>> "osu_bw" output, but no results, even on the first case (1 byte payload >>> per packet). This is occurring on both the IB-enabled nodes, and on the >>> Ethernet-only nodes. The specific syntax I was using was: "mpirun >>> --mca btl ^openib --mca btl_tcp_if_exclude ib0 ./osu_bw" >> >> When we want to run over TCP and IPoIB on an IB/PSM equipped cluster, we use: >> --mca btl sm --mca btl tcp,self --mca btl_tcp_if_exclude eth0 --mca >> btl_tcp_if_include ib0 --mca mtl ^psm >> >> based on this, it looks like the following might work for you: >> --mca btl sm,tcp,self --mca btl_tcp_if_exclude ib0 --mca btl_tcp_if_include >> eth0 --mca btl ^openib >> >> If you don't have ib0 ports configured on the IB nodes, probably you don't >> need the" --mca btl_tcp_if_exclude ib0." >> >> -Tom >> >>> >>> The problem occurs at least with OpenMPI 1.6.3 compiled with GNU 4.4 >>> compilers, with 1.6.3 compiled with Intel 13.0.1 compilers, and with >>> 1.6.5 compiled with Intel 13.0.1 compilers. I haven't tested any other >>> combinations yet. >>> >>> Any ideas here? It's very possible this is a system configuration >>> problem, but I don't know where to look. At this point, any ideas would >>> be welcome, either about the specific situation, or general pointers on >>> mpirun debugging flags to use. I can't find much in the docs yet on >>> run-time debugging for OpenMPI, as opposed to debugging the application. >>> Maybe I'm just looking in the wrong place. >>> >>> >>> Thanks, >>> >>> -- >>> Lloyd Brown >>> Systems Administrator >>> Fulton Supercomputing Lab >>> Brigham Young University >>> http://marylou.byu.edu >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/