> 1. People are not clearly contributing to Apache AsterixDB when
> submitting a patch via Gerrit at UCI.edu. Think about Section 5 of
> ASLv2.

Then what are they submitting a patch for review to, exactly?

> 2. The ASF has no record of any contributions that are happening on
> the Gerrit instance at UCI, until a committer decides to push code to
> the ASF repo.

I'm afraid I don't understand this point. How is this different than
any other distributed version control system? In github, nobody is
aware of a contribution in a fork until a pull request is made. How's
that any different than what's going on here?

>And from a provenance perspective, we have no records of
> submission of contributions at all.

How is provenance lost? What is the way in which records are kept? Is
there some other information side-channel besides the push records
that have already been discussed?

> 3. Discussion and code review is happening at UCI, within their Gerrit
> instance, there is no record of those discussions at the ASF. (With
> reviews.a.o, Jira, GH Pull Requests, all of that information gets
> copied to one of the project's mailing list for posterity.)

We can make Gerrit CC the dev@ list for all changes.

> 4. And this is the real issue for me. Gerrit is possessive of git
> repos it manages by nature; it needs and wants control. The very
> nature of Gerrit demands that it be the canonical repo.

I would tend to disagree. For instance, with the way things are right
now, there's nothing stopping us from accepting Github pull requests.
We'd just push them to Gerrit's head.

>We can play
> word games and say that it isn't, or that the repo of record that
> releases are produced from is the ASF repo, but there are a number of
> realities that reflect that it isn't. First, when the mirroring goes
> wrong, the initial call is to rewrite history on the ASF repo [3].

I think there's a misunderstanding of what happened there, then. What
happened is as follows: A committer failed to follow what we, as a
community, had agreed upon as a procedure for committing changes. It
actually looked at first like they had committed a patch which not
only hadn't been code reviewed, but was superseded by another version
of the same high-level change that fixed issues. Hence, it really
looked like a change that we didn't want to keep in the git history if
it could be helped. That's the main reason we wanted to rewrite the
git history. The fact Gerrit contained the correct version is
orthogonal.

> This suggests to me that the gerrit repo is the de facto repo for the
> project.

Nobody clones from Gerrit or sets it as their upstream, so again I disagree.

> Second, Gerrit is where everything is really happening:
> contributions, code review, testing (from a Jenkins instance at UCI).

What, per se, is unique about that? I could point at any number of
Apache projects where the activity is happening mostly in Github pull
requests, and the testing in Travis CI. These are all external
services that the community decided worked best for them. We have
external services that we like too, just different ones.


- Ian


On Tue, Jul 14, 2015 at 8:56 PM, David Nalley <da...@gnsa.us> wrote:
> On Tue, Jul 14, 2015 at 8:08 PM, Till Westmann <ti...@apache.org> wrote:
>>
>> On 14 Jul 2015, at 15:31, David Nalley wrote:
>>
>>> On Tue, Jul 14, 2015 at 1:14 AM, Ian Maxon <ima...@uci.edu> wrote:
>>>
>>>> We use Gerrit as
>>>> a tool to do code reviews and to organize the commits, as well as to
>>>> facilitate easy testing. However that's all it's used for- we still
>>>> clone from repositories that come downstream from ASF, not the other
>>>> way around. I'd be interested to understand how this would be
>>>> considered any different than what is done with Github Pull Requests.
>>>>
>>>
>>> So GH PR have a subtle distinction (at least in the way that they are
>>> handled at the ASF). Projects can't merge pull requests into the repo
>>> at github. Non-committers see a workflow that is the Github workflow,
>>> because that's very familiar, and lowers the barrier to contribution.
>>> Committers, however, have a very different workflow than the folks who
>>> typically review and close pull requests on github. They have to take
>>> the patch [1], and merge it into the canonical repository at the ASF,
>>> which then appears in the github repository because of the mirror
>>> process.  This stops the problem of diverging codebases that you are
>>> currently experiencing, calls to rewrite history to align the ASF repo
>>> with the external repo, etc.
>>
>>
>> As Ian indicated AsterixDB's process also requires manual interaction of
>> a committer. The current steps are now documented on the website [2].
>>
>
> So, that's marginally better than some previous examples of similar behavior.
> But I think there are still multiple problems, and I'll try and be
> more explicit about them:
>
> 1. People are not clearly contributing to Apache AsterixDB when
> submitting a patch via Gerrit at UCI.edu. Think about Section 5 of
> ASLv2.
> 2. The ASF has no record of any contributions that are happening on
> the Gerrit instance at UCI, until a committer decides to push code to
> the ASF repo. And from a provenance perspective, we have no records of
> submission of contributions at all.
> 3. Discussion and code review is happening at UCI, within their Gerrit
> instance, there is no record of those discussions at the ASF. (With
> reviews.a.o, Jira, GH Pull Requests, all of that information gets
> copied to one of the project's mailing list for posterity.)
> 4. And this is the real issue for me. Gerrit is possessive of git
> repos it manages by nature; it needs and wants control. The very
> nature of Gerrit demands that it be the canonical repo. We can play
> word games and say that it isn't, or that the repo of record that
> releases are produced from is the ASF repo, but there are a number of
> realities that reflect that it isn't. First, when the mirroring goes
> wrong, the initial call is to rewrite history on the ASF repo [3].
> This suggests to me that the gerrit repo is the de facto repo for the
> project. Second, Gerrit is where everything is really happening:
> contributions, code review, testing (from a Jenkins instance at UCI).
>
>
>>> There are some other problems, that aren't necessarily as worrisome,
>>> but should be something to consider. First, you're relying on a third
>>> party to provide that resource. That's not inherently a problem, but
>>> we have a number of examples of projects using external tools and
>>> those being shut down or phased out which causes tremendous disruption
>>> to projects. It's also at the old project's home, which might cause
>>> some folks to question whether the project is truly independent, or
>>> not.
>>
>>
>> In my view Gerrit is "just" a tool that the AsterixDB community chose
>> to keep when starting the incubation process. It is is non-essential and
>> has been used by developers from different organizations before the
>> incubation started. But I think that its use was and is very beneficial
>> to the project.
>>
>> When we started incubation it seemed to us, that keeping the existing
>> tool would be a good idea as it
>> a) allows for a smoother transition and
>> b) would not put additional requirements on the ASF infrastructure.
>>
>
> I personally like Gerrit. I think it's probably one of the more robust
> review tools in existence, and it's certainly the most extensible
> based on what I've seen. That said, its use in this case is not
> without problems.
>
>> However, I do agree that a shut down of the service (which seems very
>> unlikely at the current point in time) could be a disruption to the
>> project.
>
> We would have said the same thing about Codehaus not too many years ago.
>
>> So it might be better to run this tool on the ASF
>> infrastructure.
>> Should we pursue this?
>
> We've explored gerrit 2-3 times in the past 24 months. We have seen
> several projects request it over the years. As I've mentioned
> elsewhere in this thread, our most recent exploration was in December,
> and there are a number of issues that would make an ASF-wide instance
> of gerrit to be impractically costly to deploy. I also think that due
> to the provenance requirements that come with version control as I
> understand them, as well as some of the other issues that would come
> into play, that infrastructure would not permit a project-specific
> instance of Gerrit to be run on ASF infrastructure.
>
>> Or is it acceptable to keep the tool on external hardware for now?
>> Or do you see fundamental issues with AsterixDB's use of Gerrit?
>>
>
> I do not think it's acceptable to use the tool on external hardware. I
> don't see inherent issues with the tool itself, but also don't think
> it's pragmatic to have running internally. I know that's a bad
> position that seems to be inflexible for the project itself, but with
> around 200 active projects a bit of flexibility is assumed to be lost.
>
>
> --David
>
>
>>
>>> [1]
>>> https://patch-diff.githubusercontent.com/raw/apache/airavata/pull/18.patch
>>
>>
>> [2] https://asterixdb.incubator.apache.org/pushing.html
>
> [3] 
> http://mail-archives.apache.org/mod_mbox/incubator-asterixdb-dev/201507.mbox/%3cCAN_YF5ztLpaKLnnRSdTeSqB+mJ8Sk6aJ58p_NG9Scx=kbqj...@mail.gmail.com%3e
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Reply via email to