Hi,

I noticed something strange in our commits, in particular the committer
field is not reflecting the user who committed the commit.

*1. First, I wanted to check Gergely's commits from the last month or so.
This was getting to be suspicious as I expected to see a bunch of commits
from Sept / Oct of this year. *

*git log CLI output:*
➜ git --no-pager log --format=fuller --committer=shuzirra
commit 44bab51be44e31224dabbfa548eb27ea5fb2f916
Author:     Gergely Pollak <shuzi...@apache.org>
AuthorDate: Wed Aug 4 15:43:07 2021 +0200
Commit:     Gergely Pollak <shuzi...@apache.org>
CommitDate: Wed Aug 4 15:43:57 2021 +0200


    YARN-10849 Clarify testcase documentation for
TestServiceAM#testContainersReleasedWhenPreLaunchFails. Contributed by
Szilard Nemeth


commit e9339aa3761295fe65bb786e01938c7c177cd6e7
Author:     Gergely Pollak <shuzi...@cloudera.com>
AuthorDate: Tue Jun 1 15:57:22 2021 +0200
Commit:     Gergely Pollak <shuzi...@cloudera.com>
CommitDate: Tue Jun 1 15:57:22 2021 +0200


    YARN-10797. Logging parameter issues in scheduler package. Contributed
by Szilard Nemeth


*2. Another example of a merged PR, here I was the author and Adam Antal
was the committer:  *
PR link: https://github.com/apache/hadoop/pull/3454

*git log CLI output:*
➜ git --no-pager log --format=fuller a9b2469a534 -1
commit a9b2469a534c5bc554c09aaf2d460a5a00922aca
Author:     Adam Antal <aada...@gmail.com>
AuthorDate: Sun Sep 19 14:42:02 2021 +0200
Commit:     GitHub <nore...@github.com>
CommitDate: Sun Sep 19 14:42:02 2021 +0200


    YARN-10950. Code cleanup in QueueCapacities (#3454)


*3. Let's see another two example of merged PRs by Gergely and how the git
log CLI output look like for these commits: *

*3.1.*
PR link: https://github.com/apache/hadoop/pull/3419
Commit:
https://github.com/apache/hadoop/commit/4df4389325254465b52557d6fa99bcd470d64409

*git log CLI output:*
➜ git --no-pager log --format=fuller
4df4389325254465b52557d6fa99bcd470d64409 -1
commit 4df4389325254465b52557d6fa99bcd470d64409
Author:     Szilard Nemeth <954799+szilard-nem...@users.noreply.github.com>
AuthorDate: Mon Sep 20 16:47:46 2021 +0200
Commit:     GitHub <nore...@github.com>
CommitDate: Mon Sep 20 16:47:46 2021 +0200


    YARN-10911. AbstractCSQueue: Create a separate class for usernames and
weights that are travelling in a Map. Contributed by Szilard Nemeth


*3.2.  *
PR link: https://github.com/apache/hadoop/pull/3342
Commit:
https://github.com/apache/hadoop/commit/9f6430c9ed2bca5696e77bfe9eda5d4f10b0d280

*git log CLI output:*
➜ git --no-pager log --format=fuller
9f6430c9ed2bca5696e77bfe9eda5d4f10b0d280 -1
commit 9f6430c9ed2bca5696e77bfe9eda5d4f10b0d280
Author:     9uapaw <gyora...@gmail.com>
AuthorDate: Tue Sep 21 16:08:24 2021 +0200
Commit:     GitHub <nore...@github.com>
CommitDate: Tue Sep 21 16:08:24 2021 +0200


    YARN-10897. Introduce QueuePath class. Contributed by Andras Gyori


As you can see, the committer field contains: *"GitHub <nore...@github.com
<nore...@github.com>>".*
Is this something specific to Hadoop or our Gitbox commit environment?
Basically, any PR merged on the github.com UI will lose the committer
information in the commit, which is very bad.

As I think reviewing and having discussion on Github's UI is way better
than in jira, the only thing that makes sense for me to do perform as a
workaround is that downloading the patch from Github before the commit,
then commit from the CLI by adding the author info, optionally appending
the standard "Contributed by <name>" message to the commit message.
For example:

git commit -m "YARN-xxx. <Commit message> Contributed by <author's full
name>" --author=<author's username>

This way, both the author and committer field will be correct. One downside
is that the PR won't be merged on Github, it will be in closed state
because the commit is committed from the CLI, so the Github PR will have a
misleading status.

What do you think?
What is your workflow for commits?


Thanks,
Szilard

Reply via email to