definitely +1 for having documentations to point to instead of laying
everything out in the PR template!

-Jack

On Fri, Nov 5, 2021 at 2:55 PM Ryan Blue <b...@tabular.io> wrote:

> I agree with Jack about keeping changes targeted at one Spark version.
> That makes it possible to revert changes to one Spark version rather than
> manually rolling back. And it also keeps PRs smaller and more focused. I
> think it wastes more contributor time to make updates to a PR in 3 or 4
> places and keep them in sync.
>
> For the template for issues and PRs, I'm okay with that, but let's not get
> carried away. Let's document style, standards, and practices and link to
> that doc rather than starting off with a wall of text. The templates for
> other projects, like Parquet, have so much content that isn't useful that I
> think they're encouraging people with small commits to abandon the effort.
>
> Ryan
>
> On Fri, Nov 5, 2021 at 1:06 PM Kyle Bendickson <k...@tabular.io> wrote:
>
>> I like Yufei's idea of adding a template.
>>
>> As per the concern about making changes separate and having things fall
>> through the cracks, I do share that concern.
>>
>> On a related note though, I've already observed that a few times with
>> simple things that maybe people were working on before the split up. These
>> will go away with time, though could happen in the future when we add a new
>> version.
>>
>> I agree a PR template would go a long way in ensuring that people are at
>> least aware.
>>
>> It would be great if more folks who maintain forks could chime in as
>> well, as this has ramifications for that as well.
>>
>> But overall, especially for larger PRs, I think it's a great idea.
>>
>> - Kyle
>>
>> On Fri, Nov 5, 2021 at 10:34 AM Yufei Gu <flyrain...@gmail.com> wrote:
>>
>>> It is a good practice to create a template for Github Issues and PRs
>>> <https://docs.github.com/en/communities/using-templates-to-encourage-useful-issues-and-pull-requests/creating-a-pull-request-template-for-your-repository>.
>>>  It
>>> will make a PR or issue easier to read overall. We can also enforce/suggest
>>> properties like affected versions and fixed versions.
>>>
>>> Best,
>>> Yufei
>>>
>>>
>>> On Wed, Nov 3, 2021 at 6:02 PM Wing Yew Poon <wyp...@cloudera.com.invalid>
>>> wrote:
>>>
>>>> I wasn't aware that we were standardizing on such a practice. I don't
>>>> have a strong opinion on making changes one Spark version at a time or all
>>>> at once. I think committers who do reviews regularly should decide. My only
>>>> concern with making changes one version at a time is follow-through on the
>>>> part of the contributor. We want to ensure that a change/fix applicable to
>>>> multiple versions gets into all of them. Reviewer bandwidth is incurred
>>>> either way. (For simple changes/fixes, perhaps all at once does save
>>>> reviewer bandwidth, so we may want to be flexible.)
>>>>
>>>> On Wed, Nov 3, 2021 at 5:52 PM Jack Ye <yezhao...@gmail.com> wrote:
>>>>
>>>>> Thanks for bringing this up Kyle!
>>>>>
>>>>> My personal view is the following:
>>>>> 1. For new features, it should be very clear that we always implement
>>>>> them against the latest version. At the same time, I suggest we create an
>>>>> issue to track backport, so that if anyone is interested in backport 
>>>>> he/she
>>>>> can work on it separately. We can tag these issues based on title names
>>>>> (e.g. "Backport: xxx" as title), and these are also good issues for new
>>>>> contributors to work on because there is already reference implementation
>>>>> in a newer version.
>>>>> 2. For bug fixes, I understand sometimes it's just a one line fix and
>>>>> people will try to just fix across versions. My take is that we should try
>>>>> to advocate for fixing 1 version and open an issue for other versions
>>>>> although it does not really need to be enforced as strictly. Sometimes 
>>>>> even
>>>>> a 1 line fix has serious implications for different versions and might
>>>>> break stuff unintentionally. it's better that people that have production
>>>>> dependency on the specific version carefully review and test changes 
>>>>> before
>>>>> merging.
>>>>>
>>>>> About enforcement strategy, I suggest we start to create a template
>>>>> for Github Issues and PRs
>>>>> <https://docs.github.com/en/communities/using-templates-to-encourage-useful-issues-and-pull-requests/creating-a-pull-request-template-for-your-repository>,
>>>>> where we state the guidelines related to engine versions, as well as
>>>>> Iceberg's preferred code style, naming convention, title convention, etc.
>>>>> to make new contributors a bit easier to submit changes without too much
>>>>> rewrite. Currently I observe that every time there is a new contributor, 
>>>>> we
>>>>> need to state all the guidelines through PR review, which causes quite a
>>>>> lot of time spent on rewriting the code and also reduces the motivation 
>>>>> for
>>>>> people to continue work on the PR.
>>>>>
>>>>> Best,
>>>>> Jack Ye
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Nov 3, 2021 at 4:13 PM Kyle Bendickson <k...@tabular.io>
>>>>> wrote:
>>>>>
>>>>>> I submitted a PR to fix a Spark bug today, applying the same changes
>>>>>> to all eligible Spark versions.
>>>>>>
>>>>>> Jack mentioned that he thought the practice going forward was to fix
>>>>>> / apply changes on the latest Spark version in one PR, and then open a
>>>>>> second PR to backport the fixes (presumably to minimize review overhead).
>>>>>>
>>>>>> Do we have a standard / preference on that? Jack mentioned he wasn't
>>>>>> certain, so I thought I'd ask here.
>>>>>>
>>>>>> Seems like a good practice but hoping to get some clarification :)
>>>>>>
>>>>>> --
>>>>>> Best,
>>>>>> Kyle Bendickson
>>>>>> Github: @kbendick
>>>>>>
>>>>>
>
> --
> Ryan Blue
> Tabular
>

Reply via email to