Hi, I noticed something strange in our commits, in particular the committer field is not reflecting the user who committed the commit.
*1. First, I wanted to check Gergely's commits from the last month or so. This was getting to be suspicious as I expected to see a bunch of commits from Sept / Oct of this year. * *git log CLI output:* ➜ git --no-pager log --format=fuller --committer=shuzirra commit 44bab51be44e31224dabbfa548eb27ea5fb2f916 Author: Gergely Pollak <shuzi...@apache.org> AuthorDate: Wed Aug 4 15:43:07 2021 +0200 Commit: Gergely Pollak <shuzi...@apache.org> CommitDate: Wed Aug 4 15:43:57 2021 +0200 YARN-10849 Clarify testcase documentation for TestServiceAM#testContainersReleasedWhenPreLaunchFails. Contributed by Szilard Nemeth commit e9339aa3761295fe65bb786e01938c7c177cd6e7 Author: Gergely Pollak <shuzi...@cloudera.com> AuthorDate: Tue Jun 1 15:57:22 2021 +0200 Commit: Gergely Pollak <shuzi...@cloudera.com> CommitDate: Tue Jun 1 15:57:22 2021 +0200 YARN-10797. Logging parameter issues in scheduler package. Contributed by Szilard Nemeth *2. Another example of a merged PR, here I was the author and Adam Antal was the committer: * PR link: https://github.com/apache/hadoop/pull/3454 *git log CLI output:* ➜ git --no-pager log --format=fuller a9b2469a534 -1 commit a9b2469a534c5bc554c09aaf2d460a5a00922aca Author: Adam Antal <aada...@gmail.com> AuthorDate: Sun Sep 19 14:42:02 2021 +0200 Commit: GitHub <nore...@github.com> CommitDate: Sun Sep 19 14:42:02 2021 +0200 YARN-10950. Code cleanup in QueueCapacities (#3454) *3. Let's see another two example of merged PRs by Gergely and how the git log CLI output look like for these commits: * *3.1.* PR link: https://github.com/apache/hadoop/pull/3419 Commit: https://github.com/apache/hadoop/commit/4df4389325254465b52557d6fa99bcd470d64409 *git log CLI output:* ➜ git --no-pager log --format=fuller 4df4389325254465b52557d6fa99bcd470d64409 -1 commit 4df4389325254465b52557d6fa99bcd470d64409 Author: Szilard Nemeth <954799+szilard-nem...@users.noreply.github.com> AuthorDate: Mon Sep 20 16:47:46 2021 +0200 Commit: GitHub <nore...@github.com> CommitDate: Mon Sep 20 16:47:46 2021 +0200 YARN-10911. AbstractCSQueue: Create a separate class for usernames and weights that are travelling in a Map. Contributed by Szilard Nemeth *3.2. * PR link: https://github.com/apache/hadoop/pull/3342 Commit: https://github.com/apache/hadoop/commit/9f6430c9ed2bca5696e77bfe9eda5d4f10b0d280 *git log CLI output:* ➜ git --no-pager log --format=fuller 9f6430c9ed2bca5696e77bfe9eda5d4f10b0d280 -1 commit 9f6430c9ed2bca5696e77bfe9eda5d4f10b0d280 Author: 9uapaw <gyora...@gmail.com> AuthorDate: Tue Sep 21 16:08:24 2021 +0200 Commit: GitHub <nore...@github.com> CommitDate: Tue Sep 21 16:08:24 2021 +0200 YARN-10897. Introduce QueuePath class. Contributed by Andras Gyori As you can see, the committer field contains: *"GitHub <nore...@github.com <nore...@github.com>>".* Is this something specific to Hadoop or our Gitbox commit environment? Basically, any PR merged on the github.com UI will lose the committer information in the commit, which is very bad. As I think reviewing and having discussion on Github's UI is way better than in jira, the only thing that makes sense for me to do perform as a workaround is that downloading the patch from Github before the commit, then commit from the CLI by adding the author info, optionally appending the standard "Contributed by <name>" message to the commit message. For example: git commit -m "YARN-xxx. <Commit message> Contributed by <author's full name>" --author=<author's username> This way, both the author and committer field will be correct. One downside is that the PR won't be merged on Github, it will be in closed state because the commit is committed from the CLI, so the Github PR will have a misleading status. What do you think? What is your workflow for commits? Thanks, Szilard