Hey all, There are many forms of offlist development. One form of offlist development is working on large code drops in private and then contributing them all at once. Threshold size is probably arguable, and varies by project; put that aside for the moment. I've been working on an explanation of how large code drops damage community and code. I'd love to hear your feedback. I'm including my project and the dev@community list in the hopes that people from other projects also have a perspective. Here it goes:
Imagine you are an individual contributor on a project. You would like to contribute something. You see a feature you'd like to add or a bug you'd like to fix, a user you would like to support, or a release you'd like to test. You start on it. You submit your pull request, you answer the user's question, you test the release. You continue doing this at a low level for a few months. You see other people starting to contribute too. This is nice. You're working together with others towards a common goal. Then, out of the blue a company with multiple paid contributors shows up. Let's name them Acme. Acme drops a year of code on the project. They could do this many ways. For example: A.) Acme could develop in the repository you were working in, or B.) Acme could create a project-internal fork and create a new repository. C.) Acme could even telegraph months in advance that they intend to do this, by posting to the dev list or contacting key contributors offlist, or just by having done it a few times already. A.) First let's imagine that Acme made massive changes in the repository you were working in. Perhaps they already solved the problem you solved, but in a different way. Perhaps, they deleted functions you made changes in. Perhaps they added significant functionality you would have liked to help with. What good were your efforts? Wouldn't you find this discouraging? And now you want to continue to make changes, but the code you want to change has commit messages referencing tickets which you have no access to. Or it has no reference to tickets at all. You find an area that seems to be needlessly complex: can you remove the complexity? You have no way of knowing what you'd be breaking. Perhaps you have a proprietary UI which depends on a behavior which was removed or changed. Now your UI is broken. Because the code drop is so large, you have no way to reasonably review it for incompatibilities. It is not possible to review a year of development all at once. And if your review turns up problems? Do you accept the entire pull request or decline the whole thing? Putting all the code into one pull request is a form of blackmail (commonly used in the formulation of bills for Congress). If you want the good you have to take the bad. B.) Now let's imagine that Acme forked the code and created a new repository which they then added to the project. None of the work you did is in this new repository. If those features you implemented were important to you, you will have to re-introduce them into the new repository. You'll have to start from zero learning to work in the new repository. You also had no say in how that code was developed, so maybe the feature that you need is unnecessarily difficult to implement in that repository. You don't know why things are the way they are there, so you're walking through a mine field without a map when you're making changes. And anyways, why is Acme Corp so certain you had nothing of value to add? Releasing this code also becomes contentious. Which of the two competing repositories gets released? Both of them? How does the project communicate to users about how these pieces fit together. C.) Imagine Acme gave you lots of forewarning that this was coming. You still have no say in how the code is developed. You know that anything you might contribute could be obsoleted. You can't tell users whether the up-and-coming release will be compatible. And what's the point in testing that release? You don't know how to check that your needs are being considered in the architecture of the new code base. You have no sense of ownership over what comes out of that process. You see that nobody else outside of Acme is working on the project either, for the same reasons. Most contributors would get discouraged and prefer not to participate if those were the conditions. If contributors didn't get discouraged, they would fairly quickly be taking orders from the employees of Acme Corp. Acme Corp has all the inside information about what's coming in a year in the next code dump. Information is power. Contributors who are also users may also chose to stop contributing and become free riders. Why not just depend on Acme Corp for all of the development? What Acme seems to be getting out of this scenario is an Apache feather. It's a form of free-riding on Apache's reputation. Now let's imagine that you are the CTO of another company, let's call them Kaushal. Kaushal is considering taking part in this open source project, but they are a competitor to Acme. As Kaushal's CTO, you can see, based on commit history, and participation that Acme is dominating the project. You would be smart to expect that Acme would take advantage of their dominant position in the project. Acme could deliberately sabotage Kaushal's use cases, or simply 'starve' them by convincing people not to help Kaushal. Kaushal's CTO could respond to this threat in one of two ways: 1.) Simply not take part on the open source project. Create their own closed source thing, or their own open source project, and not engage. This is the most likely response. 2.) Try to dominate the project themselves. Kaushal has the same tools available that Acme has. Kaushal's CTO could tell his employees to do long-interval code drops just like Acme is doing. Now with two corporations doing long-interval code drops on a project, merging the code becomes very very difficult. Fights about who gets to decide what could eventually cause a complete cessation of release activity. So imagine that all competitors chose to remain just users, and Acme remains in control. Now imagine Acme loses interest in the project. Acme found something that will make them more money, or Acme's business fails. Or Acme gets tired of offering their development resources to the free riders. Acme stops contributing to the project. But the project has become so dependent on Acme that it cannot exist without Acme. When Acme exits, project activity could end. Open source projects require transparency, not just as a moral value, but as a pragmatic prerequisite for collaboration. Offlist development damages the community *and* the code. Best Regards, Myrle P.S. Some very interesting research on the game-theoretical aspects of modularization in open source: http://people.hbs.edu/cbaldwin/DR2/BaldwinClark.ArchOS.Jun03.pdf "Does Code Architecture Mitigate Free Riding in the Open Source Development Model?" I would argue that the information divisibility being applied here at the code modularity dimension also applies to the time dimension. So, it seems likely the argument against large code drops can be made mathematically. Which really tickles the geek in me. : o) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@community.apache.org For additional commands, e-mail: dev-h...@community.apache.org