1 - How do I check the BTLs available? Something like "ompi_info | grep -i btl"? If so, here's the list:
> MCA btl: ofud (MCA v2.0, API v2.0, Component v1.6.3) > MCA btl: openib (MCA v2.0, API v2.0, Component v1.6.3) > MCA btl: self (MCA v2.0, API v2.0, Component v1.6.3) > MCA btl: sm (MCA v2.0, API v2.0, Component v1.6.3) > MCA btl: tcp (MCA v2.0, API v2.0, Component v1.6.3) 2 - The IP interfaces on all nodes are: - em1 - Ethernet - IP in the 192.168.216.0/22 range - ib0 - IPoIB (only on IB-enabled nodes) - IP in the 192.168.212.0/22 range - lo - loopback - 127.0.0.1/8 And I think that Jeff is absolutely right. This syntax did work: > mpirun --mca btl ^openib --mca btl_tcp_if_exclude > 192.168.212.0/22,127.0.0.1/8 ./osu_bw And this one too, which is basically equivalent in this case: > mpirun --mca btl ^openib --mca btl_tcp_if_exclude ib0,lo ./osu_bw It is interesting to me, though, that I need to explicitly exclude lo/127.0.0.1 in this case, but when I'm on an Ethernet-only node, and I just do the plain "mpirun ./appname", I don't have to exclude anything, and it figures out to use em1, and not lo. Lloyd Brown Systems Administrator Fulton Supercomputing Lab Brigham Young University http://marylou.byu.edu On 09/20/2013 10:31 AM, Jeff Squyres (jsquyres) wrote: > On Sep 20, 2013, at 12:27 PM, Lloyd Brown <lloyd_br...@byu.edu> wrote: > >> Interesting. I was taking the approach of "only exclude what you're >> certain you don't want" (the native IB and TCP/IPoIB stuff) since I >> wasn't confident enough in my knowledge of the OpenMPI internals, to >> know what I should explicitly include. >> >> However, taking Jeff's suggestion, this does seem to work, and gives me >> the expected Ethernet performance: >> >> "mpirun --mca btl tcp,sm,self --mca btl_tcp_if_include em1 ./osu_bw" >> >> So, in short, I'm still not sure why my exclude syntax doesn't work. > > Check two things: > > 1. What BTLs are available? Is there some other BTL that may be used instead > of openib? > > 2. (this one is more likely) What IP interfaces are available on all nodes? > The most obvious guess here is that you didn't exclude 127.0.0.1/8, and OMPI > found this interface on all nodes, and therefore assumed that it was > routable/usable on all nodes. Hence, one quick experiment might be to try > your exclude syntax again, but *also* exclude 127.0.0.8/8. >