On Feb 16, 2005, at 2:19 PM, Jonathan Day wrote:

First off, I noticed in a previous posting that bproc
was supported. Will this also be true for Mosix and/or
OpenMosix?

It depends on what you mean by "support".

Open MPI will not initially have any migration capabilities, so Open MPI applications running under *Mosix would fail if *Mosix tried to migrate them. Future versions of Open MPI will have checkpoint / restart / migration capabilities, so it would enable the possibility of running in Condor and or *Mosix kinds of environments.

Open MPI will support a bunch of different schedulers and back-end run-time launchers (e.g., rsh/ssh, bproc, PBS/TM, etc.). I'm not familiar with *Mosix at all, so I don't know what scheduler / launcher *Mosix uses (or if it has its own). It's quite possible that a *Mosix "plugin" would need to be written for Open MPI to support running on such clusters (all of Open MPI's interactions with the back-end run-time environment are component-ized; supporting new back-end RTE's is simply a matter of writing more plugins).

Second, I've been monitoring the progress of various
MPI projects for a while. I've seen no progress on
IPMI for some time. Likewise, the very-high-speed
MP-MPICH project (which provides optimized support for
things like SCI) seems to be comatose at best.

(I assume you mean IMPI, not IPMI...?)

We do plan to eventually support IMPI. LAM/MPI was one of the first MPI implementations to support IMPI (parts of it, at least -- LAM had some architectural issues that made it quite difficult to support all of IMPI); our long term plans include migrating all that work into Open MPI and finishing the implementation.

IMPI simply didn't make the first cut for Open MPI; we had too much else to do to get a basic MPI implementation working, etc.

What, if anything, will be utilized from these
projects? Or will it be assumed that the failures
indicate a flaw in the concept or design?

I'm not familiar with MP-MPICH, but I can say that our architecture is quite different than that of MPICH's. Open MPI will support a variety of high-speed interconnects. We currently don't have access to any SCI clusters, and therefore an SCI device for MPI message/IO traffic is not currently planned (i.e., it's not a priority for any of the current members).

However, this is one of the main points of the Open MPI project: as with our RTE support, much of the MPI layer is component-ized. So supporting SCI interconnects is simply a matter of writing one or more plugins. Eventually, we'll have full documentation for all of our component frameworks and we'll be actively encouraging third parties to write independent components for Open MPI.

Another strength is that Open MPI components can be distributed and installed independently of the main Open MPI distribution. Once you have a base installation of Open MPI, you can download and install 3rd party components (users and developers can even have their own user-specific components in a system-wide Open MPI installation).

Third, there are several projects which utilize
modified versions of MPICH - implying there may be a
lack of some critical hooks in MPI implementations.
The projects I know of that do this are Globus (a grid
computing system) and Gamma (a very low level, low
latency IPC system).

What, if anything, has been learned from such projects
and what sort of support will Open-MPI provide to
cover such cases? One thing I would like to see is the
ability to load/unload modules on-the-fly on systems
that support a working dlopen(), in a similar manner
to the Linux kernel.

As implied by my text above, this is how we do components in Open MPI (dlopen, etc.). So installing a new component is [usually] just a matter of putting a .so in a specific directory.

This work was started in LAM/MPI -- the component architecture in LAM has 4 different "plugin" types: point-to-point device, collective algorithms, checkpoint/restart system, and run-time environment.

We took the best of the component concepts from LAM/MPI, expanded on them, and used them extensively throughout Open MPI. I think we currently have 30+ component types in Open MPI, and are continually adding more.

We really, really hope to solve (or at least greatly help) the "code forking" problems that have been common in MPI implementations (20+ different installations of MPI on a cluster). One of our goals is to enable one MPI implementation installation with lots and lots of plugins, and a flexible system for users to pick which components to use (or, in most cases, have a sane set of components automatically picked for them).

That's utopia, however, and not 100% realizable. For example, you'll still need multiple Open MPI installations -- one for each different set of compilers, etc. Different compiler name mangling schemes and data type layouts and sizes are problems that are beyond the scope of an MPI implementation to solve, unfortunately. :-\

Finally, with regards to the development process - do
you have a mechanism in place for external developers
to track bugs, submit fixes/extensions, etc?

Not yet, but we very much plan to. Once we get over this initial hump of a basic MPI implementation, we'll be opening our doors, so to speak and strongly utilizing the open source model. Our release branches in Subversion will be open to the public.

I'll say this right up front, however: the main distribution of Open MPI needs to be production-quality code. So we'll likely be quite choosy about who is allowed to commit and what patches we'll take. That, too, I think falls quite in-line with the Open Source philosophy (darwinism of patches / source code donations, if you will).

Finally, Open MPI will be released under a BSD-like license. Any code that is contributed to the main Open MPI repository *must* be properly copyrighted and released under a compatible license.

Engaging the interest of a significant number of Open
Source developers is hard, but it appears to be true
that the more transparent the process, the greater the
success.

I know plenty of "Open Source" scientific/academic
projects where the administrators won't even permit
the project to be listed on Open Source catalogues and
databases. Needless to say, such projects evolve
painfully slowly or fall apart entirely. The
assumption that internal effort is enough often proves
optimistic.

What plans do you have to avoid the pitfalls of other
projects?

I think that this is part of the strength that the LAM/MPI team brings to this project -- look at our project history and you'll see active engagement of the community, open access to our repository, active mailing lists, a full and comprehensive web site, etc. LAM/MPI is listed in multiple Open Source catalogues and is distributed in nearly every Linux distribution (and several BSD distributions). Indeed, the name of our parent organization here at Indiana University is the Open Systems Laboratory (www.osl.iu.edu).

So we're quite committed to making Open MPI follow the best practices of open source projects. As I mentioned above, this does not necessarily mean that we'll accept patches from just anyone, nor does it mean that we can provide 24/7 support to random users around the world (we will have our own deadlines and internal deliverables that need to be met, for example -- and those will inevitably sometimes take precedence over answering support e-mails). But we will do our best to provide access, engage third party developers, and give support to users.

I hope that this answers your questions. Please feel free to ping us with more!

--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/

Reply via email to