If the only thing you really need is what you just described, then FT-
MPI is your best pick. At least until we finish moving the fault
tolerance features from FT-MPI in Open MPI. Giving you a time frame
it will be difficult, the only thing I can state it's that this will
not happens before
The kind of recovery I am seeking after is easy, and the following
simple example illustrates the point:
I want to send a message to a different node. If it does not respond to
me, I do not want my application to crash. I want to continue using
other node resources.
I hate it when a node cra