I was doing a test on our IB based cluster, where I was diabling IB --mca btl ^openib --mca mtl ^mxm
I was sending very large messages >1GB and I was surppised by the speed. I noticed then that of all our ethernet interfaces eth0 (1gig-e) ib0 (ip over ib, for lustre configuration at vendor request) eoib0 (ethernet over IB interface for IB -> Ethernet gateway for some extrnal storage support at >1Gig speed I saw all three were getting traffic. We use torque for our Resource Manager and use TM support, the hostnames given by torque match the eth0 interfaces. How does OMPI figure out that it can also talk over the others? How does it chose to load balance? BTW that is fine, but we will use if_exclude on one of the IB ones as ib0 and eoib0 are the same physical device and may screw with load balancing if anyone ver falls back to TCP. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985