On Jan 22, 2009, at 11:26 PM, Sangamesh B wrote:

   We''ve a cluster with 23 nodes connected to IB switch and 8 nodes
have connected to ethernet switch. Master node is also connected to IB
switch. SGE(with tight integration, -pe orte)  is used for
parallel/serial job submission.

Open MPI-1.3 is installed on master node with IB support
(--with-openib=/usr). The same folder is copied to the remaining 23 IB
nodes.

Sounds good.

Now what shall I do for remaining 8 ethernet nodes:
(1) Copy the same folder(IB) to these nodes
(2) Install Open MPI on one of the 8 eight ethernet nodes. Copy the
same to 7 nodes.
(3) Install an ethernet version of Open MPI on master node and copy to 8 nodes.

Either 1 or 2 is your best bet.

Do you have OFED installed on all nodes (either explicitly, or included in your Linux distro)?

If so, I believe that at least some users with configurations like this install OMPI with OFED support (--with-openib=/usr, as you mentioned above) on all nodes. OMPI will notice that there is no OpenFabrics-capable hardware on the ethernet-only nodes and will simply not use the openib BTL plugin.

Note that OMPI v1.3 got better about being silent about the lack of OpenFabrics devices when the openib BTL is present (OMPI v1.2 issued a warning about this).

How you intend to use this setup is up to you; you may want to restrict jobs to 100% IB or 100% ethernet via SGE, or you may want to let them mix, realizing that the overall parallel job may be slowed down to the speed of the slowest network (e.g., ethernet).

Make sense?

--
Jeff Squyres
Cisco Systems

Reply via email to