Hi all,

I think we all more or less follow some standard pattern when committing in
Hive but with some small effort we could make things more uniform and
hopefully better.
I would like to start a discussion about creating some guidelines, which we
could put to the wiki or in contributing.md, to improve the quality of our
history (git log).
I outline some suggestions below to kick off the discussion. Many things in
the list are minor (and maybe even personal preferences) but one thing
which is really missing from the project is B3 especially the *why* part.
Why is the commit necessary? Why has the change been made?.
In some cases the why part is also missing from the JIRA making the code
harder to maintain.

Subject line:
S1. Start with the Jira id capitalized and followed immediately (no space)
by double colon (:)
S2. Leave one space after the Jira id, and start the summary with a capital
letter
S3. Keep it concise (ideally less than 72 characters) and provide a useful
description of the change
S4. Do not include or end the line with period
S5. Do not include the pull request id in the summary
S6. Use imperative mood (“Add a handler …”) rather than past tense (“Added
a handler …”) or present tense (“Adds a handler …”)
S7. Avoid using "Fix"; If you are fixing a bug, it is sufficient to
describe the bug (“NullPointerException if user is unknown”) and people
will correctly surmise that the purpose of your change is to fix the bug.
S8. Do not add a contributor's name; the author tag is made exactly for
this and can be explored/parsed much more efficiently by tools/people for
stats or other purposes
S9. Do not add reviewers name; information is present in multiple places
(e.g., committer tag, PR, JIRA)

Message body: (Trivial changes may not require a body)
B1. Separate subject from body with a blank line
B2. Wrap the body at 72 characters
B3. Use the body to explain what and why vs. how
Example
"Add handler methods in HiveRelMdDistictRowCount for JdbcHiveTableScan and
Converter to avoid executing the fallback method which in many cases
returns null and can cause NPE when this value propagates up the call
stack."
vs.
"Added handler methods in HiveRelMdDistictRowCount for JdbcHiveTableScan
and Converter"
B4. If multiple authors include them using the standard GitHub marker,
"Co-authored-by:", followed by the name and email of the author (e.g.,
Co-authored-by: Marton Bod <m...@cloudera.com>)
B5. If the reviewer is different from committer (or merge via GitHub UI)
use "Reviewed-by:" followed by the name and email of the reviewer (e.g.,
Reviewed-by: Stamatis Zampetakis <zabe...@gmail.com>)
B6. Use "Co-authored-by"/"Reviewed-by" on each own line and repeat as many
times as authors/reviewers.
B7. Include the PR id at the end of the message (e.g., Closes #2514);
someone can easily navigate back to the PR to check comments, reviewers,
etc.

A sample commit message following these guidelines is shown below:

commit de7781f29f82083fe01274b4d436b52920a89173
Author: Soumyakanti Das <soumyakanti.das...@gmail.com>
Commit: Stamatis Zampetakis <zabe...@gmail.com>

    HIVE-25354: NPE when estimating row count in external JDBC tables

    Add handler methods in HiveRelMdDistictRowCount for JdbcHiveTableScan
    and Converter to avoid executing the fallback method which in many
    cases returns null and can cause NPE when this value propagates up the
    call stack.

    Co-authored-by: Krisztian Kasa <kk...@cloudera.com>

    Reviewed-by: Peter Vary <pv...@cloudera.com>
    Reviewed-by: Zoltan Haindrich <k...@rxd.hu>

    Closes #2514

Let me know your thoughts.

Best,
Stamatis

Reply via email to