Re: [O-MPI users] Questions on status

Jeff Squyres Wed, 15 Jun 2005 11:19:24 -0500

On Jun 14, 2005, at 9:54 PM, Scott Feldman wrote:

We're a quiet bunch.  :-)


Which is a bad thing for Open Source development.  It seems Open MPI is
closed-source development project with an open-source release model.

At this point in our development, I somewhat agree. But this will soonchange. See below.

The FAQ claims the future is in Open Source code, methodology, and
philosophy; so why is the development and testing of Open MPI closed?

It is "somewhat" closed. We have actually had a limited set of thirdparties involved from the very beginning -- people outside the coreOpen MPI development team who have tested the code, reported bugs, etc.Such reports have been extremely useful and helpful.

I fully appreciate the benefit of the open source model. Indeed, myparent organization's name is The Open Systems Laboratory(www.osl.iu.edu), which contributes and develops to several well-knownopen source projects. For example, we've been developing andmaintaining LAM/MPI under an open source model for several years, andit works well. As of now, Open MPI is slightly different (see belowfor why), but will be shifting to a more open source model (like LAM's)in the near future.

Closed-source development doesn't scale.  You're missing out on early
bug reports from users with environments and applications different
than yours.  You're missing out on outside development help in finding
and fixing bugs.

There are several reasons that we have not yet released the code to thepublic:

1. The first 8 months or so of our project were in "stealth" mode. Wehonestly didn't know if the collaboration would eventually bear usefulfruit. As the project went on, we came to see that it was working(really well, actually), and so we came out of stealth mode, announcedit to the world, created the public web site, etc.

2. Once the alpha was available, we didn't want novice usersdownloading the code, thinking that it was a fully-functional MPIimplementation. Even with oodles of warnings on the web site and/or inthe tarball itself, some people would definitely try it and sendunhelpful "it doesn't work!" flame mails. Hence, it was released to aclosed set of 3rd parties who were known to be knowledgeable about MPIand would be able to generate useful bug reports. Their assistance wasinvaluable to us.

3. The HPC community is quite small, and the competition is quitefierce. We have direct and distinct competition in this space; havinga bad release would negatively impact this project and greatly harm ourchances of ever having a good release (at least in the eyes of thepublic). This is an unfortunate reality.

4. Adding more developers to a project does not make it release faster(indeed, it usually slows it down). This is true for any committeemodel -- the more people on a project, the more opinions need to getdiscussed and more compromises need to be reached. This is notnecessarily a Bad Thing, of course -- independent, outside opinions canshed unique insight into problems -- but it does mean that it takesmore time. As I indicated in a previous post, we're taking longer thanwe expected in terms of the code (truth be told, we had really hoped tobe at beta quality by SC last year -- that didn't happen). Adding moredevelopers right now would inevitably make this project take yet moretime before releasing -- which we really don't have at this point. Weneed to get to stability and release, if for no other reason than toanswer mails like this.

5. We really wanted to reach some level of stability before we openedto the public. We felt that this would be the best way to make apositive contribution to the HPC community -- present code that works,and then move forward from there. For all the interesting /research-worthy parts of an MPI implementation, there's 10 times thatamount of code that is totally uninteresting / maintenance-requiring /internal accounting code and data structures that the MPI has tomaintain. Very, very few people outside the core MPI development teamwill ever look at or care about this kind of code (this has been ourexperience in other MPI implementations). Specifically, what I mean isthat we anticipate that almost no one outside of us will look at 90% ofthe Open MPI code base -- 3rd party vendors and researchers will befocusing on the 10% of the code base which is performance critical.Unfortunately, the other 90% is what takes a large portion of the timeto develop and debug, and is quite useless to 3rd parties if a) itdoesn't work, or b) changes quickly enough that it makes working in the10%-performance-critical parts painful.

6. We're still working through the legal issues to get an Apache-likestructure in place to a) guarantee that the code will always be in opensource, and b) ensure that all contributions from 3rd parties are"clean" in terms of intellectual property (can you say "SCO"?). Thishas unfortunately taken *WAY* longer than we anticipated, and perhapsthe biggest reason that we have not invited in 3rd party developersyet.

Please adopt a release-early, release-often strategy.

Actually, this is something that we will desperately try to avoid.Open source does not necessarily equal "release early, release often".IMHO, that methodology tends to imply that at least one reason you haveto release often because there are bugs that need to be fixed. Forproduction-quality software, you really want to release stablesoftware. There are always those who want to be out on the bleedingedge of development with the latest / greatest software (despite thefact that it may not be stable), but in our experience, the vastmajority of MPI users just want software that works and don't careabout many esoteric features. They just want to run their MPI codesand get stable, repeatable answers.

My experience with LAM/MPI is specifically what I am citing -- indeed,there are still many users who are using [extremely] old versions ofLAM/MPI simply because that's what they started developing / usingyears ago and it still works for them ("it ain't broke, so why fixit?"). I cannot speak for Los Alamos, but FWIW, I think that LA-MPI'sexperience is similar (they work in a production-quality environment --slow, production-quality release cycles).

However, to accommodate both kinds of users in LAM/MPI (those who wantstability and those who want bleeding edge), we adopted a dual-headedstrategy:

1. Slow formal release cycle. LAM/MPI typically has 1-3 releases ayear. Usually one major release with a small number of bug fixreleases following it.

2. Nightly tarball snapshots available. Anyone who wants to can grabeither a Subversion checkout or a nightly snapshot tarball, but noguarantees are made about its stability (because it represents activedevelopment).


I anticipate that something analogous will occur for Open MPI.

"Show us the code!"

I have a long, public track record of high-quality open sourcesoftware, and am firmly committed to make Open MPI be in the samecategory.

We will show you the code soon, I promise. We've come too far to *not*do so! :-)


--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/

Re: [O-MPI users] Questions on status

Reply via email to