On May 5, 2005, at 7:58 AM, atarpley wrote:
1) When will the final Open MPI be released (non development)?
As soon as everybody is happy with the stability and the features of
a version. And like for most of the HPC software, SC05 seems like a
reasonable deadline. Meanwhile, a beta version will be released soon
(no specific deadline available at the moment).
2) What fault tolerance mechanisms will be included? Specifically,
if a node
goes down, what happens? Will everything bomb?
Several models of fault tolerance will be included. Maybe not on the
first release but there are several teams already working on such
projects. A short list of available fault tolerance mechanisms follow:
1. a coordinated checkpointing - a Chandy-Lamport (a la LAM)
2. an uncoordinated one (a la MPICH-V)
3. and one similar with FT-MPI.
In few words: most of the usual fault-tolerance mechanisms will be
included.
The behavior of the application when a node goes down depend on the
user choice (via parameters at the initialization time). If the user
let the error handler on the MPI communicators to fatal then of
course everything will get destroyed by the Open MPI runtime
environment. Otherwise, one (depending again on user parameters) of
the fault tolerance mechanisms will take care of the rest of the
execution.
3) There will be FULL multi-threading support, correct?
Correct except that the FULL multi-threading support is already
inside. We are currently testing the multi-threaded support for all
of the drivers (only TCP is considered to be multi-threaded
compliant). The next step will be to look at the performances, as we
are using fine grained locking mechanisms.
This feature will definitively be in the stable release.
Thanks,
george.
"We must accept finite disappointment, but we must never lose infinite
hope."
Martin Luther King