I wasn't aware that we were standardizing on such a practice. I don't have a strong opinion on making changes one Spark version at a time or all at once. I think committers who do reviews regularly should decide. My only concern with making changes one version at a time is follow-through on the part of the contributor. We want to ensure that a change/fix applicable to multiple versions gets into all of them. Reviewer bandwidth is incurred either way. (For simple changes/fixes, perhaps all at once does save reviewer bandwidth, so we may want to be flexible.)
On Wed, Nov 3, 2021 at 5:52 PM Jack Ye <yezhao...@gmail.com> wrote: > Thanks for bringing this up Kyle! > > My personal view is the following: > 1. For new features, it should be very clear that we always implement them > against the latest version. At the same time, I suggest we create an issue > to track backport, so that if anyone is interested in backport he/she can > work on it separately. We can tag these issues based on title names (e.g. > "Backport: xxx" as title), and these are also good issues for new > contributors to work on because there is already reference implementation > in a newer version. > 2. For bug fixes, I understand sometimes it's just a one line fix and > people will try to just fix across versions. My take is that we should try > to advocate for fixing 1 version and open an issue for other versions > although it does not really need to be enforced as strictly. Sometimes even > a 1 line fix has serious implications for different versions and might > break stuff unintentionally. it's better that people that have production > dependency on the specific version carefully review and test changes before > merging. > > About enforcement strategy, I suggest we start to create a template for > Github Issues and PRs > <https://docs.github.com/en/communities/using-templates-to-encourage-useful-issues-and-pull-requests/creating-a-pull-request-template-for-your-repository>, > where we state the guidelines related to engine versions, as well as > Iceberg's preferred code style, naming convention, title convention, etc. > to make new contributors a bit easier to submit changes without too much > rewrite. Currently I observe that every time there is a new contributor, we > need to state all the guidelines through PR review, which causes quite a > lot of time spent on rewriting the code and also reduces the motivation for > people to continue work on the PR. > > Best, > Jack Ye > > > > > > On Wed, Nov 3, 2021 at 4:13 PM Kyle Bendickson <k...@tabular.io> wrote: > >> I submitted a PR to fix a Spark bug today, applying the same changes to >> all eligible Spark versions. >> >> Jack mentioned that he thought the practice going forward was to fix / >> apply changes on the latest Spark version in one PR, and then open a second >> PR to backport the fixes (presumably to minimize review overhead). >> >> Do we have a standard / preference on that? Jack mentioned he wasn't >> certain, so I thought I'd ask here. >> >> Seems like a good practice but hoping to get some clarification :) >> >> -- >> Best, >> Kyle Bendickson >> Github: @kbendick >> >