On Mar 5, 2009, at 7:05 PM, Shinta Bonnefoy wrote:
Thanks, the option --mca btl ^openib works fine !
Half of the cluster has Infiniband/OpenFabrics (from node49 to
node96)
and the other half (nodes from 01 to 48) doesn't.
Aaaaahhhhh... this explains things. I wonder if we have not tested
the "some have OF, some do not" code paths well; I'm guessing we're
hitting a corner case during the shutdown.
I just wanted to make openmpi run over ethernet/tcp first.
I will try to make it run using OpenFabrics but I guess I need to
recompile another package to do it so ?
No. Open MPI hides the dependencies on networking libraries such as
OF in its plugins. So you don't need to recompile your application;
you just run with or without the ^openib switch.
If I mix some nodes with OpenFabrics and some other which don't have
OpenFabrics, I should use the option "--mca btl ^openib" right ?
For now yes. We should fix this, though. But the fix won't be in
1.3.1; possibly in 1.3.2.
And if I use exclusively similar nodes (either non OpenFabrics and
only
OpenFabrics), I don't have to use the option anymore.
Correct. OMPI will then automatically choose to use the openib BTL.
But over OpenFabrics, does openmpi will use automatically the
Infiniband
hardware ???
Yes.
I'm guessing that there's only a problem when you have a job that
spans nodes with and without OF hardware, but all with the OF software
stack. I'll file a bug about this and see what we can do.
--
Jeff Squyres
Cisco Systems