Jeff Squyres wrote:
can OpenMPI also deal with one of the subnets failing?
ie. will OpenMPI automatically fall back to using the last remaining
working IB port out of a node, or even fallback to GigE if all the IB
fails?
Not in the 1.2 series.
The 1.3 series *may* include "APM" support (automatic path migration
-- a feature in IB). It looks positive that that'll make the 1.3 cut,
but I don't have definite information yet.
Current ompi-trunk have APM implementation. If you enable APM ompi will
use only first port on the
HCA for data transmission and second one will be reserver for back-up.
On network failure on the first port
all connections will migrate to second port. The APM works only on the
HCA level - I mean that you can not migrate between
different HCAs, you can migrate only between 2 ports of the same HCA.
--
Pavel Shamis (Pasha)
Mellanox Technologies