Re: [OMPI users] Debugging Runtime/Ethernet Problems

2013-09-20 Thread Jeff Squyres (jsquyres)
On Sep 20, 2013, at 1:00 PM, Lloyd Brown wrote: > It is interesting to me, though, that I need to explicitly exclude > lo/127.0.0.1 in this case, but when I'm on an Ethernet-only node, and I > just do the plain "mpirun ./appname", I don't have to exclude anything, > and it figures out to use em1,

Re: [OMPI users] Debugging Runtime/Ethernet Problems

2013-09-20 Thread Lloyd Brown
1 - How do I check the BTLs available? Something like "ompi_info | grep -i btl"? If so, here's the list: > MCA btl: ofud (MCA v2.0, API v2.0, Component v1.6.3) > MCA btl: openib (MCA v2.0, API v2.0, Component v1.6.3) > MCA btl: self (MCA v2.0, A

Re: [OMPI users] Debugging Runtime/Ethernet Problems

2013-09-20 Thread Jeff Squyres (jsquyres)
On Sep 20, 2013, at 12:27 PM, Lloyd Brown wrote: > Interesting. I was taking the approach of "only exclude what you're > certain you don't want" (the native IB and TCP/IPoIB stuff) since I > wasn't confident enough in my knowledge of the OpenMPI internals, to > know what I should explicitly incl

Re: [OMPI users] Debugging Runtime/Ethernet Problems

2013-09-20 Thread Lloyd Brown
Interesting. I was taking the approach of "only exclude what you're certain you don't want" (the native IB and TCP/IPoIB stuff) since I wasn't confident enough in my knowledge of the OpenMPI internals, to know what I should explicitly include. However, taking Jeff's suggestion, this does seem to

Re: [OMPI users] Debugging Runtime/Ethernet Problems

2013-09-20 Thread Jeff Squyres (jsquyres)
Correct -- it doesn't make sense to specify both include *and* exclude: by specifying one, you're implicitly (but exactly/precisely) specifying the other. My suggestion would be to use positive notation, not negative notation. For example: mpirun --mca btl tcp,self --mca btl_tcp_if_include eth

Re: [OMPI users] Debugging Runtime/Ethernet Problems

2013-09-20 Thread Ralph Castain
I don't think you are allowed to specify both include and exclude options at the same time as they conflict - you should either exclude ib0 or include eth0 (or whatever). My guess is that the various nodes are trying to communicate across disjoint networks. We've seen that before when, for exam

Re: [OMPI users] Debugging Runtime/Ethernet Problems

2013-09-20 Thread Elken, Tom
> The trouble is when I try to add some "--mca" parameters to force it to > use TCP/Ethernet, the program seems to hang. I get the headers of the > "osu_bw" output, but no results, even on the first case (1 byte payload > per packet). This is occurring on both the IB-enabled nodes, and on the > E