Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

Ewen Cheslack-Postava Fri, 01 May 2015 01:35:32 -0700

Also +1. There are some drawbacks to using Github for reviews, e.g. lots of
emails for each review because they don't let you publish your entire
review in one go like RB does, but it drastically lowers the barrier to
contributing for most developers. Also, if you haven't tried it, hub
https://hub.github.com/ makes it really easy to checkout and test PRs.

One thing I noticed is that when you try to generate a PR it defaults to
the 0.8.2 branch. Can we fix that up to be trunk by default? That's the
most common use case; version branches are really only useful when a
release is being prepared. Do changes to the Github repo require tickets to
the Apache Infra team or is this something committers have control over?

On a related note, which perhaps should be discussed on another thread:

The CI setup is a related issue that we might want to rethink. Apache also
supports Travis CI, and now pays for dedicated build slaves:
https://blogs.apache.org/infra/entry/apache_gains_additional_travis_ci I
think the setup should be pretty easy since building + running tests is
just a gradle command. Having tests run automatically on PRs (and
promptly!) makes it a lot easier to confidently commit a change, especially
if the build merges with trunk first. I started looking at this when I was
trying to sort out Confluent's build & test infrastructure. See
https://travis-ci.org/ewencp/kafka/builds/60802386 for an example, the
patch is about 10 lines in a .travis.yml file and the failure in that
example seems to be unrelated to the Travis confg. I think the basic config
I created is all we'd need.

Unfortunately, I couldn't easily tell what the delay on builds is, i.e.
would it be an improvement over the delays with the current Jenkins setup.
But having them run on PR creation/update means the results will usually be
ready by the time a reviewer gets to looking at the PR and would be
reported in the PR so the state is easy to evaluate. (I'm also having
trouble telling exactly how the two ASF Jenkins builders differ since they
both seem to poll trunk, so I can't be certain whether Travis would be able
to completely replace the current setup. That said, it should be telling
that I have never paid attention to Jenkins output at all since it seems so
far removed from any of my actions as a contributor.)

As another alternative, Confluent would also be happy to provide build/test
infrastructure to help out. AMPLab does this for Spark, so it seems to be
acceptable to ASF. We already have PR builders and trunk/master builders
set up for other projects, so it wouldn't be hard to setup the same for
Kafka with the right access to the repos. Based on the current frequency of
builds (https://builds.apache.org/view/All/job/Kafka-trunk/ and
https://builds.apache.org/view/All/job/KafkaPreCommit/) I think it'll be
easy for even our current infrastructure to keep up.

On Thu, Apr 30, 2015 at 9:41 PM, Jaikiran Pai <jai.forums2...@gmail.com>
wrote:

> I think this will help a lot in contributions. Some of my local changes
> that I want to contribute back have been pending because I sometimes switch
> machines and I then have to go through setting up the Ruby/python and other
> stuff for the current review process. Using just github is going to help in
> quickly submitting the changes.
>
> -Jaikiran
>
> On Thursday 30 April 2015 06:42 PM, Ismael Juma wrote:
>
>> Hi all,
>>
>> Kafka currently uses a combination of Review Board and JIRA for
>> contributions and code review. In my opinion, this makes contribution and
>> code review a bit harder than it has to be.
>>
>> I think the approach used by Spark would improve the current situation:
>>
>> "Generally, Spark uses JIRA to track logical issues, including bugs and
>> improvements, and uses Github pull requests to manage the review and merge
>> of specific code changes. That is, JIRAs are used to describe what should
>> be fixed or changed, and high-level approaches, and pull requests describe
>> how to implement that change in the project's source code. For example,
>> major design decisions are discussed in JIRA."[1]
>>
>> It's worth reading the wiki page for all the details, but I will summarise
>> the suggested workflow for code changes:
>>
>>     1. Fork the Github repository at http://github.com/apache/kafka (if
>> you
>>     haven't already)
>>     2. git checkout -b kafka-XXX
>>     3. Make one or more commits (smaller commits can be easier to review
>> and
>>     reviewboard makes that hard)
>>     4. git push origin kafka-XXX
>>     5. Create PR against upstream/trunk (this will update JIRA
>>     automatically[2] and it will send an email to the dev mailing list
>> too)
>>     6. A CI build will be triggered[3]
>>     7. Review process happens on GitHub (it's quite handy to be able to
>>     comment on both commit or PR-level, unlike Review Board)
>>     8. Once all feedback has been addressed and the build is green, a
>>     variant of the `merge_spark_pr.py`[4] script is used to squash, merge,
>>     push, close the PR and JIRA issue. The squashed commit generated by
>> the
>>     script includes a bunch of useful information including links to the
>>     original commits[5] (in the future, I think it's worth reconsidering
>> the
>>     squashing of commits, but retaining the information in the commit is
>>     already an improvement)
>>
>> Neha merged a couple of commits via GitHub already and it went smoothly
>> although we are still missing a few of the pieces described above:
>>
>>     1. CI builds triggered by GitHub PRs (this is supported by Apache
>> Infra,
>>     we need to request it for Kafka and provide whatever configuration is
>>     needed)
>>     2. Adapting Spark's merge_park_pr script and integrating it into the
>>     kafka Git repository
>>     3. Updating the Kafka contribution wiki and adding a CONTRIBUTING.md
>> to
>>     the Git repository (this is shown when someone is creating a pull
>> request)
>>     4. Go through existing GitHub pull requests and close the ones that
>> are
>>     no longer relevant (there are quite a few as people have been opening
>> them
>>     over the years, but nothing was done about most of them)
>>     5. Other things I may be missing
>>
>> I am volunteering to help with the above if people agree that this is the
>> right direction for Kafka. Thoughts?
>>
>> Best.
>> Ismael
>>
>> P.S. I was told in the Apache Infra HipChat that it's not currently
>> possible (and there are no plans to change that in the near future) to use
>> the GitHub merge button to merge PRs. The merge script does quite a few
>> useful things that the merge button does not in any case.
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
>> [2]
>>
>> https://issues.apache.org/jira/browse/KAFKA-1054?focusedCommentId=14513614&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14513614
>> [3] https://blogs.apache.org/infra/entry/github_pull_request_builds_now
>> [4] https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py
>> [5]
>>
>> https://github.com/apache/spark/commit/59b7cfc41b2c06fbfbf6aca16c1619496a8d1d00
>>
>>
>

-- 
Thanks,
Ewen

Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

Reply via email to