Hi,
I am interested in using OpenMPI to manage the distribution on a MicroZed
cluster. This MicroZed boards come with a Zynq device, which has a
dual-core ARM cortex A9. One of the objectives of the project I am working
on is resilience, so I am trully interested in the fault tolerance provided
by OpenMPI.

The thing I want to know is if there is any implementation for run-time
migration. For instance, if I have an octa-MicroZed cluster running an MPI
job and I unplug the Ethernet cable of one of them or I reboot another one,
is there any support in OpenMPI to detect these failures and migrate the
ranks to other processors on run-time execution?

Thank you in advance,
Alberto.
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to