We rely heavily on OpenMPI's ability to 'fall though' to the next best
option on our cluster. Example we have some IB (verbs nodes) and most
have TCP.
Recently we added some qlogic IB that uses PSM to get good
performance. We built OpenMPI to include PSM in addition to verbs,
and TCP. We also added the needed libs (infinipath package from
qlogic) to our load so it is on all the nodes.
So OpenMPI works fine on PSM when ran on nodes that have the psm
(qlogic) hardware installed.
The problem is when you run OpenMPI on nodes that have just TCP
networks it does not fall to TCP if the infinipath libs are found:
(using 1.4.1 with intel compilers)
mpirun -H nyx0818,nyx0819 /home/brockp/a.out
nyx0818.engin.umich.edu.28120ipath_wait_for_device: The /dev/ipath
device failed to appear after 30.0 seconds: Connection timed out
nyx0818.engin.umich.edu.28120PSM Could not find an InfiniPath Unit on
device /dev/ipath (30s elapsed) (err=21)
nyx0819.engin.umich.edu.7384ipath_wait_for_device: The /dev/ipath
device failed to appear after 30.0 seconds: Connection timed out
nyx0819.engin.umich.edu.7384PSM Could not find an InfiniPath Unit on
device /dev/ipath (30s elapsed) (err=21)
--------------------------------------------------------------------------
PSM was unable to open an endpoint. Please make sure that the network
link is
active on the node and the hardware is functioning.
Error: PSM Could not find an InfiniPath Unit
Is there a way to set openMPI to fail to verbs or tcp should psm fail?
Thanks
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985