On Jun 14, 2005, at 9:54 PM, Scott Feldman wrote:
We're a quiet bunch. :-)
Which is a bad thing for Open Source development. It seems Open MPI is
closed-source development project with an open-source release model.
At this point in our development, I somewhat agree. But this will soon
change. See below.
The FAQ claims the future is in Open Source code, methodology, and
philosophy; so why is the development and testing of Open MPI closed?
It is "somewhat" closed. We have actually had a limited set of third
parties involved from the very beginning -- people outside the core
Open MPI development team who have tested the code, reported bugs, etc.
Such reports have been extremely useful and helpful.
I fully appreciate the benefit of the open source model. Indeed, my
parent organization's name is The Open Systems Laboratory
(www.osl.iu.edu), which contributes and develops to several well-known
open source projects. For example, we've been developing and
maintaining LAM/MPI under an open source model for several years, and
it works well. As of now, Open MPI is slightly different (see below
for why), but will be shifting to a more open source model (like LAM's)
in the near future.
Closed-source development doesn't scale. You're missing out on early
bug reports from users with environments and applications different
than yours. You're missing out on outside development help in finding
and fixing bugs.
There are several reasons that we have not yet released the code to the
public:
1. The first 8 months or so of our project were in "stealth" mode. We
honestly didn't know if the collaboration would eventually bear useful
fruit. As the project went on, we came to see that it was working
(really well, actually), and so we came out of stealth mode, announced
it to the world, created the public web site, etc.
2. Once the alpha was available, we didn't want novice users
downloading the code, thinking that it was a fully-functional MPI
implementation. Even with oodles of warnings on the web site and/or in
the tarball itself, some people would definitely try it and send
unhelpful "it doesn't work!" flame mails. Hence, it was released to a
closed set of 3rd parties who were known to be knowledgeable about MPI
and would be able to generate useful bug reports. Their assistance was
invaluable to us.
3. The HPC community is quite small, and the competition is quite
fierce. We have direct and distinct competition in this space; having
a bad release would negatively impact this project and greatly harm our
chances of ever having a good release (at least in the eyes of the
public). This is an unfortunate reality.
4. Adding more developers to a project does not make it release faster
(indeed, it usually slows it down). This is true for any committee
model -- the more people on a project, the more opinions need to get
discussed and more compromises need to be reached. This is not
necessarily a Bad Thing, of course -- independent, outside opinions can
shed unique insight into problems -- but it does mean that it takes
more time. As I indicated in a previous post, we're taking longer than
we expected in terms of the code (truth be told, we had really hoped to
be at beta quality by SC last year -- that didn't happen). Adding more
developers right now would inevitably make this project take yet more
time before releasing -- which we really don't have at this point. We
need to get to stability and release, if for no other reason than to
answer mails like this.
5. We really wanted to reach some level of stability before we opened
to the public. We felt that this would be the best way to make a
positive contribution to the HPC community -- present code that works,
and then move forward from there. For all the interesting /
research-worthy parts of an MPI implementation, there's 10 times that
amount of code that is totally uninteresting / maintenance-requiring /
internal accounting code and data structures that the MPI has to
maintain. Very, very few people outside the core MPI development team
will ever look at or care about this kind of code (this has been our
experience in other MPI implementations). Specifically, what I mean is
that we anticipate that almost no one outside of us will look at 90% of
the Open MPI code base -- 3rd party vendors and researchers will be
focusing on the 10% of the code base which is performance critical.
Unfortunately, the other 90% is what takes a large portion of the time
to develop and debug, and is quite useless to 3rd parties if a) it
doesn't work, or b) changes quickly enough that it makes working in the
10%-performance-critical parts painful.
6. We're still working through the legal issues to get an Apache-like
structure in place to a) guarantee that the code will always be in open
source, and b) ensure that all contributions from 3rd parties are
"clean" in terms of intellectual property (can you say "SCO"?). This
has unfortunately taken *WAY* longer than we anticipated, and perhaps
the biggest reason that we have not invited in 3rd party developers
yet.
Please adopt a release-early, release-often strategy.
Actually, this is something that we will desperately try to avoid.
Open source does not necessarily equal "release early, release often".
IMHO, that methodology tends to imply that at least one reason you have
to release often because there are bugs that need to be fixed. For
production-quality software, you really want to release stable
software. There are always those who want to be out on the bleeding
edge of development with the latest / greatest software (despite the
fact that it may not be stable), but in our experience, the vast
majority of MPI users just want software that works and don't care
about many esoteric features. They just want to run their MPI codes
and get stable, repeatable answers.
My experience with LAM/MPI is specifically what I am citing -- indeed,
there are still many users who are using [extremely] old versions of
LAM/MPI simply because that's what they started developing / using
years ago and it still works for them ("it ain't broke, so why fix
it?"). I cannot speak for Los Alamos, but FWIW, I think that LA-MPI's
experience is similar (they work in a production-quality environment --
slow, production-quality release cycles).
However, to accommodate both kinds of users in LAM/MPI (those who want
stability and those who want bleeding edge), we adopted a dual-headed
strategy:
1. Slow formal release cycle. LAM/MPI typically has 1-3 releases a
year. Usually one major release with a small number of bug fix
releases following it.
2. Nightly tarball snapshots available. Anyone who wants to can grab
either a Subversion checkout or a nightly snapshot tarball, but no
guarantees are made about its stability (because it represents active
development).
I anticipate that something analogous will occur for Open MPI.
"Show us the code!"
I have a long, public track record of high-quality open source
software, and am firmly committed to make Open MPI be in the same
category.
We will show you the code soon, I promise. We've come too far to *not*
do so! :-)
--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/