Re: [O-MPI users] Several queries

Jeff Squyres Thu, 17 Feb 2005 08:53:32 -0500

On Feb 16, 2005, at 2:19 PM, Jonathan Day wrote:

First off, I noticed in a previous posting that bproc
was supported. Will this also be true for Mosix and/or
OpenMosix?


It depends on what you mean by "support".

Open MPI will not initially have any migration capabilities, so OpenMPI applications running under *Mosix would fail if *Mosix tried tomigrate them. Future versions of Open MPI will have checkpoint /restart / migration capabilities, so it would enable the possibility ofrunning in Condor and or *Mosix kinds of environments.

Open MPI will support a bunch of different schedulers and back-endrun-time launchers (e.g., rsh/ssh, bproc, PBS/TM, etc.). I'm notfamiliar with *Mosix at all, so I don't know what scheduler / launcher*Mosix uses (or if it has its own). It's quite possible that a *Mosix"plugin" would need to be written for Open MPI to support running onsuch clusters (all of Open MPI's interactions with the back-endrun-time environment are component-ized; supporting new back-end RTE'sis simply a matter of writing more plugins).

Second, I've been monitoring the progress of various
MPI projects for a while. I've seen no progress on
IPMI for some time. Likewise, the very-high-speed
MP-MPICH project (which provides optimized support for
things like SCI) seems to be comatose at best.


(I assume you mean IMPI, not IPMI...?)

We do plan to eventually support IMPI. LAM/MPI was one of the firstMPI implementations to support IMPI (parts of it, at least -- LAM hadsome architectural issues that made it quite difficult to support allof IMPI); our long term plans include migrating all that work into OpenMPI and finishing the implementation.

IMPI simply didn't make the first cut for Open MPI; we had too muchelse to do to get a basic MPI implementation working, etc.

What, if anything, will be utilized from these
projects? Or will it be assumed that the failures
indicate a flaw in the concept or design?

I'm not familiar with MP-MPICH, but I can say that our architecture isquite different than that of MPICH's. Open MPI will support a varietyof high-speed interconnects. We currently don't have access to any SCIclusters, and therefore an SCI device for MPI message/IO traffic is notcurrently planned (i.e., it's not a priority for any of the currentmembers).

However, this is one of the main points of the Open MPI project: aswith our RTE support, much of the MPI layer is component-ized. Sosupporting SCI interconnects is simply a matter of writing one or moreplugins. Eventually, we'll have full documentation for all of ourcomponent frameworks and we'll be actively encouraging third parties towrite independent components for Open MPI.

Another strength is that Open MPI components can be distributed andinstalled independently of the main Open MPI distribution. Once youhave a base installation of Open MPI, you can download and install 3rdparty components (users and developers can even have their ownuser-specific components in a system-wide Open MPI installation).

Third, there are several projects which utilize
modified versions of MPICH - implying there may be a
lack of some critical hooks in MPI implementations.
The projects I know of that do this are Globus (a grid
computing system) and Gamma (a very low level, low
latency IPC system).

What, if anything, has been learned from such projects
and what sort of support will Open-MPI provide to
cover such cases? One thing I would like to see is the
ability to load/unload modules on-the-fly on systems
that support a working dlopen(), in a similar manner
to the Linux kernel.

As implied by my text above, this is how we do components in Open MPI(dlopen, etc.). So installing a new component is [usually] just amatter of putting a .so in a specific directory.

This work was started in LAM/MPI -- the component architecture in LAMhas 4 different "plugin" types: point-to-point device, collectivealgorithms, checkpoint/restart system, and run-time environment.

We took the best of the component concepts from LAM/MPI, expanded onthem, and used them extensively throughout Open MPI. I think wecurrently have 30+ component types in Open MPI, and are continuallyadding more.

We really, really hope to solve (or at least greatly help) the "codeforking" problems that have been common in MPI implementations (20+different installations of MPI on a cluster). One of our goals is toenable one MPI implementation installation with lots and lots ofplugins, and a flexible system for users to pick which components touse (or, in most cases, have a sane set of components automaticallypicked for them).

That's utopia, however, and not 100% realizable. For example, you'llstill need multiple Open MPI installations -- one for each differentset of compilers, etc. Different compiler name mangling schemes anddata type layouts and sizes are problems that are beyond the scope ofan MPI implementation to solve, unfortunately. :-\

Finally, with regards to the development process - do
you have a mechanism in place for external developers
to track bugs, submit fixes/extensions, etc?

Not yet, but we very much plan to. Once we get over this initial humpof a basic MPI implementation, we'll be opening our doors, so to speakand strongly utilizing the open source model. Our release branches inSubversion will be open to the public.

I'll say this right up front, however: the main distribution of OpenMPI needs to be production-quality code. So we'll likely be quitechoosy about who is allowed to commit and what patches we'll take.That, too, I think falls quite in-line with the Open Source philosophy(darwinism of patches / source code donations, if you will).

Finally, Open MPI will be released under a BSD-like license. Any codethat is contributed to the main Open MPI repository *must* be properlycopyrighted and released under a compatible license.

Engaging the interest of a significant number of Open
Source developers is hard, but it appears to be true
that the more transparent the process, the greater the
success.

I know plenty of "Open Source" scientific/academic
projects where the administrators won't even permit
the project to be listed on Open Source catalogues and
databases. Needless to say, such projects evolve
painfully slowly or fall apart entirely. The
assumption that internal effort is enough often proves
optimistic.

What plans do you have to avoid the pitfalls of other
projects?

I think that this is part of the strength that the LAM/MPI team bringsto this project -- look at our project history and you'll see activeengagement of the community, open access to our repository, activemailing lists, a full and comprehensive web site, etc. LAM/MPI islisted in multiple Open Source catalogues and is distributed in nearlyevery Linux distribution (and several BSD distributions). Indeed, thename of our parent organization here at Indiana University is the OpenSystems Laboratory (www.osl.iu.edu).

So we're quite committed to making Open MPI follow the best practicesof open source projects. As I mentioned above, this does notnecessarily mean that we'll accept patches from just anyone, nor doesit mean that we can provide 24/7 support to random users around theworld (we will have our own deadlines and internal deliverables thatneed to be met, for example -- and those will inevitably sometimes takeprecedence over answering support e-mails). But we will do our best toprovide access, engage third party developers, and give support tousers.

I hope that this answers your questions. Please feel free to ping uswith more!


--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/

Re: [O-MPI users] Several queries

Reply via email to