Why are large code drops damaging to a community?

Myrle Krantz Thu, 18 Oct 2018 04:18:24 -0700

Hey all,

There are many forms of offlist development.  One form of offlist
development is working on large code drops in private and then
contributing them all at once.  Threshold size is probably arguable,
and varies by project; put that aside for the moment.  I've been
working on an explanation of how large code drops damage community and
code.  I'd love to hear your feedback.  I'm including my project and
the dev@community list in the hopes that people from other projects
also have a perspective.  Here it goes:



Imagine you are an individual contributor on a project.  You would
like to contribute something.  You see a feature you'd like to add or
a bug you'd like to fix, a user you would like to support, or a
release you'd like to test.  You start on it.  You submit your pull
request, you answer the user's question, you test the release.  You
continue doing this at a low level for a few months.  You see other
people starting to contribute too.  This is nice.  You're working
together with others towards a common goal.  Then, out of the blue a
company with multiple paid contributors shows up.  Let's name them
Acme. Acme drops a year of code on the project.  They could do this
many ways.  For example:  A.) Acme could develop in the repository you
were working in, or B.) Acme could create a project-internal fork and
create a new repository. C.) Acme could even telegraph months in
advance that they intend to do this, by posting to the dev list or
contacting key contributors offlist, or just by having done it a few
times already.


A.) First let's imagine that Acme made massive changes in the
repository you were working in.  Perhaps they already solved the
problem you solved, but in a different way.  Perhaps, they deleted
functions you made changes in.  Perhaps they added significant
functionality you would have liked to help with.  What good were your
efforts?  Wouldn't you find this discouraging?

And now you want to continue to make changes, but the code you want to
change has commit messages referencing tickets which you have no
access to.  Or it has no reference to tickets at all.  You find an
area that seems to be needlessly complex: can you remove the
complexity?  You have no way of knowing what you'd be breaking.

Perhaps you have a proprietary UI which depends on a behavior which
was removed or changed.  Now your UI is broken.  Because the code drop
is so large, you have no way to reasonably review it for
incompatibilities.  It is not possible to review a year of development
all at once.  And if your review turns up problems?  Do you accept the
entire pull request or decline the whole thing?  Putting all the code
into one pull request is a form of blackmail (commonly used in the
formulation of bills for Congress).  If you want the good you have to
take the bad.


B.) Now let's imagine that Acme forked the code and created a new
repository which they then added to the project.  None of the work you
did is in this new repository.  If those features you implemented were
important to you, you will have to re-introduce them into the new
repository.

You'll have to start from zero learning to work in the new repository.
You also had no say in how that code was developed, so maybe the
feature that you need is unnecessarily difficult to implement in that
repository.   You don't know why things are the way they are there, so
you're walking through a mine field without a map when you're making
changes.

And anyways, why is Acme Corp so certain you had nothing of value to add?

Releasing this code also becomes contentious. Which of the two
competing repositories gets released?  Both of them? How does the
project communicate to users about how these pieces fit together.


C.) Imagine Acme gave you lots of forewarning that this was coming.
You still have no say in how the code is developed.  You know that
anything you might contribute could be obsoleted.  You can't tell
users whether the up-and-coming release will be compatible.  And
what's the point in testing that release?  You don't know how to check
that your needs are being considered in the architecture of the new
code base.

You have no sense of ownership over what comes out of that process.

You see that nobody else outside of Acme is working on the project
either, for the same reasons.


Most contributors would get discouraged and prefer not to participate
if those were the conditions.  If contributors didn't get discouraged,
they would fairly quickly be taking orders from the employees of Acme
Corp.  Acme Corp has all the inside information about what's coming in
a year in the next code dump.  Information is power.  Contributors who
are also users may also chose to stop contributing and become free
riders.  Why not just depend on Acme Corp for all of the development?

What Acme seems to be getting out of this scenario is an Apache
feather.  It's a form of free-riding on Apache's reputation.


Now let's imagine that you are the CTO of another company, let's call
them Kaushal.  Kaushal is considering taking part in this open source
project, but they are a competitor to Acme.  As Kaushal's CTO, you can
see, based on commit history, and participation that Acme is
dominating the project.  You would be smart to expect that Acme would
take advantage of their dominant position in the project.  Acme could
deliberately sabotage Kaushal's use cases, or simply 'starve' them by
convincing people not to help Kaushal.  Kaushal's CTO could respond to
this threat in one of two ways: 1.) Simply not take part on the open
source project.  Create their own closed source thing, or their own
open source project, and not engage.  This is the most likely
response.  2.) Try to dominate the project themselves.  Kaushal has
the same tools available that Acme has. Kaushal's CTO could tell his
employees to do long-interval code drops just like Acme is doing.  Now
with two corporations doing long-interval code drops on a project,
merging the code becomes very very difficult.  Fights about who gets
to decide what could eventually cause a complete cessation of release
activity.


So imagine that all competitors chose to remain just users, and Acme
remains in control.  Now imagine Acme loses interest in the project.
Acme found something that will make them more money, or Acme's
business fails.  Or Acme gets tired of offering their development
resources to the free riders.  Acme stops contributing to the project.
But the project has become so dependent on Acme that it cannot exist
without Acme.  When Acme exits, project activity could end.


Open source projects require transparency, not just as a moral value,
but as a pragmatic prerequisite for collaboration.  Offlist
development damages the community *and* the code.

Best Regards,
Myrle

P.S.  Some very interesting research on the game-theoretical aspects
of modularization in open source:
http://people.hbs.edu/cbaldwin/DR2/BaldwinClark.ArchOS.Jun03.pdf
"Does Code Architecture Mitigate Free Riding in the Open Source
Development Model?"

I would argue that the information divisibility being applied here at
the code modularity dimension also applies to the time dimension.  So,
it seems likely the argument against large code drops can be made
mathematically. Which really tickles the geek in me. : o)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@community.apache.org
For additional commands, e-mail: dev-h...@community.apache.org

Why are large code drops damaging to a community?

Reply via email to