Hi everyone, I’ve noticed some questionable practices around commits going into master lately (and historically, to be honest) and I want to remind everyone about some best practices for commit and release quality.
- *Please don’t mix partial, unrelated changes into a commit.* This makes it very difficult to manage branches and to correctly address bugs when they are found because of conflicts when picking or reverting commits. If you need to revert a commit, but it made unrelated changes then you have to go resolve the conflict by hand. When backporting to a release branch, unrelated changes require you to pull in more commits to get patches to apply, or again resolve the conflicts by hand. Every conflict makes maintenance take longer and increases the risk of mistakes. When submitting and reviewing patches, please think about whether cherry-picking or reverting the commit would cause unintended changes, and remove them. - *Please don’t commit unfinished changes.* If you want to pull features from master to a release branch, it is much harder if you also need to track down all of the additional commits needed to actually make it work. I’m not talking about splitting work into reasonable chunks; I’m talking about getting commits in before they’re reasonably finished. This makes it hard to know where the remaining changes were merged, or if they were merged at all. Often, I see this combined with the first problem. For example, #19925 <https://github.com/apache/spark/pull/19925#issuecomment-351621232> was merged with a comment: “Merging to master to unblock #19926. If there are more comments, we can address them in #19926.” This change was “Add Structured Streaming APIs to DataSourceV2”, something that should clearly be finished before committing. Instead, parts of it might be lurking in “Split StreamExecution into MicroBatchExecution and StreamExecution.” - *Please don’t commit before reviews are finished.* This can be a tough call at times, but it is usually pretty clear. Are there people who should review a PR? Is there still ongoing discussion in the PR? Is there a +1 from a Spark committer? (I recently noticed both ignoring an ongoing discussion and not getting a committer +1 on the same issue!) This is also a community problem. Whether or not it is intentional, committing prematurely sends a negative message to the community that we need to avoid. A good skill for a committer is to know when you should get agreement or consensus, even if you have a +1. - *Please don’t rush commits.* Rushing is cutting corners, and that almost always causes the problems above or others. #20427 is a good example <https://github.com/apache/spark/pull/20427/files#diff-6d245f3b658c10a9d40d6e1772df3d24L28>, where admittedly rushing a commit led to trying to mix unrelated changes into the PR. #19925 <https://github.com/apache/spark/pull/19925#issuecomment-351621232> was committed unfinished to unblock another commit, despite adding a public API that should require a careful review. We all mess this up, myself included, and my intent is not to point fingers, but to show examples of where we can improve. It is just far easier to get a branch committed as-is than to adhere to these guidelines, but these are important for our releases and downstream users. Thanks for reading, rb -- Ryan Blue Software Engineer Netflix