On Mar 5, 2009, at 7:05 PM, Shinta Bonnefoy wrote:

Thanks, the option --mca btl ^openib  works fine !

Half of the cluster has Infiniband/OpenFabrics (from node49 to node96)
and the other half (nodes from 01 to 48)  doesn't.


Aaaaahhhhh... this explains things. I wonder if we have not tested the "some have OF, some do not" code paths well; I'm guessing we're hitting a corner case during the shutdown.

I just wanted to make openmpi run over ethernet/tcp first.

I will try to make it run using OpenFabrics but I guess I need to
recompile another package to do it so ?


No. Open MPI hides the dependencies on networking libraries such as OF in its plugins. So you don't need to recompile your application; you just run with or without the ^openib switch.

If I mix some nodes with OpenFabrics and some other which don't have
OpenFabrics, I should use the option "--mca btl ^openib" right ?


For now yes. We should fix this, though. But the fix won't be in 1.3.1; possibly in 1.3.2.

And if I use exclusively similar nodes (either non OpenFabrics and only
OpenFabrics), I don't have to use the option anymore.


Correct.  OMPI will then automatically choose to use the openib BTL.

But over OpenFabrics, does openmpi will use automatically the Infiniband
hardware ???


Yes.

I'm guessing that there's only a problem when you have a job that spans nodes with and without OF hardware, but all with the OF software stack. I'll file a bug about this and see what we can do.

--
Jeff Squyres
Cisco Systems

Reply via email to