I wasn't aware that we were standardizing on such a practice. I don't have
a strong opinion on making changes one Spark version at a time or all at
once. I think committers who do reviews regularly should decide. My only
concern with making changes one version at a time is follow-through on the
part of the contributor. We want to ensure that a change/fix applicable to
multiple versions gets into all of them. Reviewer bandwidth is incurred
either way. (For simple changes/fixes, perhaps all at once does save
reviewer bandwidth, so we may want to be flexible.)

On Wed, Nov 3, 2021 at 5:52 PM Jack Ye <yezhao...@gmail.com> wrote:

> Thanks for bringing this up Kyle!
>
> My personal view is the following:
> 1. For new features, it should be very clear that we always implement them
> against the latest version. At the same time, I suggest we create an issue
> to track backport, so that if anyone is interested in backport he/she can
> work on it separately. We can tag these issues based on title names (e.g.
> "Backport: xxx" as title), and these are also good issues for new
> contributors to work on because there is already reference implementation
> in a newer version.
> 2. For bug fixes, I understand sometimes it's just a one line fix and
> people will try to just fix across versions. My take is that we should try
> to advocate for fixing 1 version and open an issue for other versions
> although it does not really need to be enforced as strictly. Sometimes even
> a 1 line fix has serious implications for different versions and might
> break stuff unintentionally. it's better that people that have production
> dependency on the specific version carefully review and test changes before
> merging.
>
> About enforcement strategy, I suggest we start to create a template for
> Github Issues and PRs
> <https://docs.github.com/en/communities/using-templates-to-encourage-useful-issues-and-pull-requests/creating-a-pull-request-template-for-your-repository>,
> where we state the guidelines related to engine versions, as well as
> Iceberg's preferred code style, naming convention, title convention, etc.
> to make new contributors a bit easier to submit changes without too much
> rewrite. Currently I observe that every time there is a new contributor, we
> need to state all the guidelines through PR review, which causes quite a
> lot of time spent on rewriting the code and also reduces the motivation for
> people to continue work on the PR.
>
> Best,
> Jack Ye
>
>
>
>
>
> On Wed, Nov 3, 2021 at 4:13 PM Kyle Bendickson <k...@tabular.io> wrote:
>
>> I submitted a PR to fix a Spark bug today, applying the same changes to
>> all eligible Spark versions.
>>
>> Jack mentioned that he thought the practice going forward was to fix /
>> apply changes on the latest Spark version in one PR, and then open a second
>> PR to backport the fixes (presumably to minimize review overhead).
>>
>> Do we have a standard / preference on that? Jack mentioned he wasn't
>> certain, so I thought I'd ask here.
>>
>> Seems like a good practice but hoping to get some clarification :)
>>
>> --
>> Best,
>> Kyle Bendickson
>> Github: @kbendick
>>
>

Reply via email to