We rely heavily on OpenMPI's ability to 'fall though' to the next best option on our cluster. Example we have some IB (verbs nodes) and most have TCP.

Recently we added some qlogic IB that uses PSM to get good performance. We built OpenMPI to include PSM in addition to verbs, and TCP. We also added the needed libs (infinipath package from qlogic) to our load so it is on all the nodes.

So OpenMPI works fine on PSM when ran on nodes that have the psm (qlogic) hardware installed. The problem is when you run OpenMPI on nodes that have just TCP networks it does not fall to TCP if the infinipath libs are found:

(using 1.4.1 with intel compilers)
mpirun -H nyx0818,nyx0819 /home/brockp/a.out

nyx0818.engin.umich.edu.28120ipath_wait_for_device: The /dev/ipath device failed to appear after 30.0 seconds: Connection timed out nyx0818.engin.umich.edu.28120PSM Could not find an InfiniPath Unit on device /dev/ipath (30s elapsed) (err=21) nyx0819.engin.umich.edu.7384ipath_wait_for_device: The /dev/ipath device failed to appear after 30.0 seconds: Connection timed out nyx0819.engin.umich.edu.7384PSM Could not find an InfiniPath Unit on device /dev/ipath (30s elapsed) (err=21)
--------------------------------------------------------------------------
PSM was unable to open an endpoint. Please make sure that the network link is
active on the node and the hardware is functioning.

  Error: PSM Could not find an InfiniPath Unit


Is there a way to set openMPI to fail to verbs or tcp should psm fail?
Thanks

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



Reply via email to