Hi all, I think we all more or less follow some standard pattern when committing in Hive but with some small effort we could make things more uniform and hopefully better. I would like to start a discussion about creating some guidelines, which we could put to the wiki or in contributing.md, to improve the quality of our history (git log). I outline some suggestions below to kick off the discussion. Many things in the list are minor (and maybe even personal preferences) but one thing which is really missing from the project is B3 especially the *why* part. Why is the commit necessary? Why has the change been made?. In some cases the why part is also missing from the JIRA making the code harder to maintain.
Subject line: S1. Start with the Jira id capitalized and followed immediately (no space) by double colon (:) S2. Leave one space after the Jira id, and start the summary with a capital letter S3. Keep it concise (ideally less than 72 characters) and provide a useful description of the change S4. Do not include or end the line with period S5. Do not include the pull request id in the summary S6. Use imperative mood (“Add a handler …”) rather than past tense (“Added a handler …”) or present tense (“Adds a handler …”) S7. Avoid using "Fix"; If you are fixing a bug, it is sufficient to describe the bug (“NullPointerException if user is unknown”) and people will correctly surmise that the purpose of your change is to fix the bug. S8. Do not add a contributor's name; the author tag is made exactly for this and can be explored/parsed much more efficiently by tools/people for stats or other purposes S9. Do not add reviewers name; information is present in multiple places (e.g., committer tag, PR, JIRA) Message body: (Trivial changes may not require a body) B1. Separate subject from body with a blank line B2. Wrap the body at 72 characters B3. Use the body to explain what and why vs. how Example "Add handler methods in HiveRelMdDistictRowCount for JdbcHiveTableScan and Converter to avoid executing the fallback method which in many cases returns null and can cause NPE when this value propagates up the call stack." vs. "Added handler methods in HiveRelMdDistictRowCount for JdbcHiveTableScan and Converter" B4. If multiple authors include them using the standard GitHub marker, "Co-authored-by:", followed by the name and email of the author (e.g., Co-authored-by: Marton Bod <m...@cloudera.com>) B5. If the reviewer is different from committer (or merge via GitHub UI) use "Reviewed-by:" followed by the name and email of the reviewer (e.g., Reviewed-by: Stamatis Zampetakis <zabe...@gmail.com>) B6. Use "Co-authored-by"/"Reviewed-by" on each own line and repeat as many times as authors/reviewers. B7. Include the PR id at the end of the message (e.g., Closes #2514); someone can easily navigate back to the PR to check comments, reviewers, etc. A sample commit message following these guidelines is shown below: commit de7781f29f82083fe01274b4d436b52920a89173 Author: Soumyakanti Das <soumyakanti.das...@gmail.com> Commit: Stamatis Zampetakis <zabe...@gmail.com> HIVE-25354: NPE when estimating row count in external JDBC tables Add handler methods in HiveRelMdDistictRowCount for JdbcHiveTableScan and Converter to avoid executing the fallback method which in many cases returns null and can cause NPE when this value propagates up the call stack. Co-authored-by: Krisztian Kasa <kk...@cloudera.com> Reviewed-by: Peter Vary <pv...@cloudera.com> Reviewed-by: Zoltan Haindrich <k...@rxd.hu> Closes #2514 Let me know your thoughts. Best, Stamatis