+1 as well. I too have found most places that have a bigger PR template,
its utility goes down as it gets larger.

And smaller patches are typically how we get new contributors. Or they want
to submit one patch in an area that some of the regular contributors might
not work with much, but they don’t otherwise have that much time to be
active contributors.

The template should be friendly for new people who possibly just want to
fix one thing that affects them - often those patches are very valuable.

One thing that might be nice in the future would be having a bot that opens
issues for the backport, possibly via slash commands. So the committer who
merged it in can possibly put `/icebergbot backport issue spark 3.1, 3.0`.

I know that’s likely more work than we have bandwidth for at present, but
putting the idea out as we work through the issues surrounding slicing our
tooling for many versions.

- Kyle

On Fri, Nov 5, 2021 at 2:59 PM Jack Ye <yezhao...@gmail.com> wrote:

> definitely +1 for having documentations to point to instead of laying
> everything out in the PR template!
>
> -Jack
>
> On Fri, Nov 5, 2021 at 2:55 PM Ryan Blue <b...@tabular.io> wrote:
>
>> I agree with Jack about keeping changes targeted at one Spark version.
>> That makes it possible to revert changes to one Spark version rather than
>> manually rolling back. And it also keeps PRs smaller and more focused. I
>> think it wastes more contributor time to make updates to a PR in 3 or 4
>> places and keep them in sync.
>>
>> For the template for issues and PRs, I'm okay with that, but let's not
>> get carried away. Let's document style, standards, and practices and link
>> to that doc rather than starting off with a wall of text. The templates for
>> other projects, like Parquet, have so much content that isn't useful that I
>> think they're encouraging people with small commits to abandon the effort.
>>
>> Ryan
>>
>> On Fri, Nov 5, 2021 at 1:06 PM Kyle Bendickson <k...@tabular.io> wrote:
>>
>>> I like Yufei's idea of adding a template.
>>>
>>> As per the concern about making changes separate and having things fall
>>> through the cracks, I do share that concern.
>>>
>>> On a related note though, I've already observed that a few times with
>>> simple things that maybe people were working on before the split up. These
>>> will go away with time, though could happen in the future when we add a new
>>> version.
>>>
>>> I agree a PR template would go a long way in ensuring that people are at
>>> least aware.
>>>
>>> It would be great if more folks who maintain forks could chime in as
>>> well, as this has ramifications for that as well.
>>>
>>> But overall, especially for larger PRs, I think it's a great idea.
>>>
>>> - Kyle
>>>
>>> On Fri, Nov 5, 2021 at 10:34 AM Yufei Gu <flyrain...@gmail.com> wrote:
>>>
>>>> It is a good practice to create a template for Github Issues and PRs
>>>> <https://docs.github.com/en/communities/using-templates-to-encourage-useful-issues-and-pull-requests/creating-a-pull-request-template-for-your-repository>.
>>>>  It
>>>> will make a PR or issue easier to read overall. We can also enforce/suggest
>>>> properties like affected versions and fixed versions.
>>>>
>>>> Best,
>>>> Yufei
>>>>
>>>>
>>>> On Wed, Nov 3, 2021 at 6:02 PM Wing Yew Poon
>>>> <wyp...@cloudera.com.invalid> wrote:
>>>>
>>>>> I wasn't aware that we were standardizing on such a practice. I don't
>>>>> have a strong opinion on making changes one Spark version at a time or all
>>>>> at once. I think committers who do reviews regularly should decide. My 
>>>>> only
>>>>> concern with making changes one version at a time is follow-through on the
>>>>> part of the contributor. We want to ensure that a change/fix applicable to
>>>>> multiple versions gets into all of them. Reviewer bandwidth is incurred
>>>>> either way. (For simple changes/fixes, perhaps all at once does save
>>>>> reviewer bandwidth, so we may want to be flexible.)
>>>>>
>>>>> On Wed, Nov 3, 2021 at 5:52 PM Jack Ye <yezhao...@gmail.com> wrote:
>>>>>
>>>>>> Thanks for bringing this up Kyle!
>>>>>>
>>>>>> My personal view is the following:
>>>>>> 1. For new features, it should be very clear that we always implement
>>>>>> them against the latest version. At the same time, I suggest we create an
>>>>>> issue to track backport, so that if anyone is interested in backport 
>>>>>> he/she
>>>>>> can work on it separately. We can tag these issues based on title names
>>>>>> (e.g. "Backport: xxx" as title), and these are also good issues for new
>>>>>> contributors to work on because there is already reference implementation
>>>>>> in a newer version.
>>>>>> 2. For bug fixes, I understand sometimes it's just a one line fix and
>>>>>> people will try to just fix across versions. My take is that we should 
>>>>>> try
>>>>>> to advocate for fixing 1 version and open an issue for other versions
>>>>>> although it does not really need to be enforced as strictly. Sometimes 
>>>>>> even
>>>>>> a 1 line fix has serious implications for different versions and might
>>>>>> break stuff unintentionally. it's better that people that have production
>>>>>> dependency on the specific version carefully review and test changes 
>>>>>> before
>>>>>> merging.
>>>>>>
>>>>>> About enforcement strategy, I suggest we start to create a template
>>>>>> for Github Issues and PRs
>>>>>> <https://docs.github.com/en/communities/using-templates-to-encourage-useful-issues-and-pull-requests/creating-a-pull-request-template-for-your-repository>,
>>>>>> where we state the guidelines related to engine versions, as well as
>>>>>> Iceberg's preferred code style, naming convention, title convention, etc.
>>>>>> to make new contributors a bit easier to submit changes without too much
>>>>>> rewrite. Currently I observe that every time there is a new contributor, 
>>>>>> we
>>>>>> need to state all the guidelines through PR review, which causes quite a
>>>>>> lot of time spent on rewriting the code and also reduces the motivation 
>>>>>> for
>>>>>> people to continue work on the PR.
>>>>>>
>>>>>> Best,
>>>>>> Jack Ye
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Nov 3, 2021 at 4:13 PM Kyle Bendickson <k...@tabular.io>
>>>>>> wrote:
>>>>>>
>>>>>>> I submitted a PR to fix a Spark bug today, applying the same changes
>>>>>>> to all eligible Spark versions.
>>>>>>>
>>>>>>> Jack mentioned that he thought the practice going forward was to fix
>>>>>>> / apply changes on the latest Spark version in one PR, and then open a
>>>>>>> second PR to backport the fixes (presumably to minimize review 
>>>>>>> overhead).
>>>>>>>
>>>>>>> Do we have a standard / preference on that? Jack mentioned he wasn't
>>>>>>> certain, so I thought I'd ask here.
>>>>>>>
>>>>>>> Seems like a good practice but hoping to get some clarification :)
>>>>>>>
>>>>>>> --
>>>>>>> Best,
>>>>>>> Kyle Bendickson
>>>>>>> Github: @kbendick
>>>>>>>
>>>>>>
>>
>> --
>> Ryan Blue
>> Tabular
>>
>

Reply via email to