Hi Guozhang, Comments inline.
On Fri, Jul 10, 2015 at 8:47 PM, Guozhang Wang <wangg...@gmail.com> wrote: > > I have a couple of comments on the wiki pages / merge scripts: > Thanks, it's important to discuss these things as the current version is based on what the Spark project does and may not match what we want to do exactly. 1. In the wiki page it mentions "If the change is new, then it usually > needs a new JIRA. However, trivial changes, where "what should change" is > virtually the same as "how it should change" do not require a JIRA. > Example: "Fix typos in Foo scaladoc"." > > So it sounds we are going to allow pull requests without a JIRA ticket, but > later we are also mentioning: > > "The PR title should be of the form [KAFKA-xxxx].." which is a bit > conflicting with the previous statement. Could you clarify it? Today our > commits are mostly with JIRAs except some trivial ones that are only done > by committers, I can either extend this to non-committers or not, just that > we need to make it clear. > I agree that it's not very clear and it needs to be fixed. What do we want to do though? It's a trade-off between consistency (always have a JIRA ticket) and avoiding redundancy (skip the JIRA where it doesn't add anything). The former is the more conservative option and I am happy to update the documentation if that's the preferred option. 2. The git commit message is a little verbose to me, for example: > ------ > > commit ee88dbb67f19b787e12ccef37982c9459b78c7b6 > > Author: Geoff Anderson <ge...@confluent.io> > > Date: Thu Jul 9 14:58:01 2015 -0700 > > KAFKA-2327; broker doesn't start if config defines advertised.host but > not advertised.port > > > > Added unit tests as well. These fail without the fix, but pass with the > fix. > > > > Author: Geoff Anderson <ge...@confluent.io> > > > > Closes #73 from granders/KAFKA-2327 and squashes the following commits: > > > > 52a2085 [Geoff Anderson] Cleaned up unecessary toString calls > > 23b3340 [Geoff Anderson] Fixes KAFKA-2327 > > ------ > > > The "Author" field is redundant, and we could probably also omit the github > commits. What do you think? > Is it harmful to have detailed information? None of it is actually redundant as far as I can see (but I could be missing something). There is the squashed commit author, the pull request message, the pull request author and the information about the individual commits. Even though Geoff worked on this PR by himself, multiple people can collaborate on a feature and then it's useful to credit correctly. The fact that we are keeping all the information encourages a style of development where a few small commits (each commit should must make sense on its own and pass the tests) are used in the pull request, which helps a lot during code review and when trying to understand changes after the fact. This is in contrast with the style where there is a single commit per feature even when the feature is quite large (we have a few of those ongoing at the moment). I don't know if you noticed, but GitHub actually links to the original commits, which is quite handy: https://github.com/apache/kafka/commit/ee88dbb67f19b787e12ccef37982c9459b78c7b6 This approach is a bit of a workaround, but it's easier as the script handles everything (credit to the Spark project once again, it's their code). The ideal scenario would be to keep the individual commits by merging the branch into trunk after a rebase (no fast-forward, so that a merge commit is maintained). This has a number of nice characteristics (history is linear, it's clear what commits were part of the merged branch, easy to bisect, easy to revert, etc.), but it requires a bit more work, more Git knowledge and ideally a PR builder that builds and tests every commit in the branch (Typesafe wrote a tool that does this for the Scala project, for what is worth). In other words, a step too far for us right now. :) What do you think? Best, Ismael