definitely +1 for having documentations to point to instead of laying everything out in the PR template!
-Jack On Fri, Nov 5, 2021 at 2:55 PM Ryan Blue <b...@tabular.io> wrote: > I agree with Jack about keeping changes targeted at one Spark version. > That makes it possible to revert changes to one Spark version rather than > manually rolling back. And it also keeps PRs smaller and more focused. I > think it wastes more contributor time to make updates to a PR in 3 or 4 > places and keep them in sync. > > For the template for issues and PRs, I'm okay with that, but let's not get > carried away. Let's document style, standards, and practices and link to > that doc rather than starting off with a wall of text. The templates for > other projects, like Parquet, have so much content that isn't useful that I > think they're encouraging people with small commits to abandon the effort. > > Ryan > > On Fri, Nov 5, 2021 at 1:06 PM Kyle Bendickson <k...@tabular.io> wrote: > >> I like Yufei's idea of adding a template. >> >> As per the concern about making changes separate and having things fall >> through the cracks, I do share that concern. >> >> On a related note though, I've already observed that a few times with >> simple things that maybe people were working on before the split up. These >> will go away with time, though could happen in the future when we add a new >> version. >> >> I agree a PR template would go a long way in ensuring that people are at >> least aware. >> >> It would be great if more folks who maintain forks could chime in as >> well, as this has ramifications for that as well. >> >> But overall, especially for larger PRs, I think it's a great idea. >> >> - Kyle >> >> On Fri, Nov 5, 2021 at 10:34 AM Yufei Gu <flyrain...@gmail.com> wrote: >> >>> It is a good practice to create a template for Github Issues and PRs >>> <https://docs.github.com/en/communities/using-templates-to-encourage-useful-issues-and-pull-requests/creating-a-pull-request-template-for-your-repository>. >>> It >>> will make a PR or issue easier to read overall. We can also enforce/suggest >>> properties like affected versions and fixed versions. >>> >>> Best, >>> Yufei >>> >>> >>> On Wed, Nov 3, 2021 at 6:02 PM Wing Yew Poon <wyp...@cloudera.com.invalid> >>> wrote: >>> >>>> I wasn't aware that we were standardizing on such a practice. I don't >>>> have a strong opinion on making changes one Spark version at a time or all >>>> at once. I think committers who do reviews regularly should decide. My only >>>> concern with making changes one version at a time is follow-through on the >>>> part of the contributor. We want to ensure that a change/fix applicable to >>>> multiple versions gets into all of them. Reviewer bandwidth is incurred >>>> either way. (For simple changes/fixes, perhaps all at once does save >>>> reviewer bandwidth, so we may want to be flexible.) >>>> >>>> On Wed, Nov 3, 2021 at 5:52 PM Jack Ye <yezhao...@gmail.com> wrote: >>>> >>>>> Thanks for bringing this up Kyle! >>>>> >>>>> My personal view is the following: >>>>> 1. For new features, it should be very clear that we always implement >>>>> them against the latest version. At the same time, I suggest we create an >>>>> issue to track backport, so that if anyone is interested in backport >>>>> he/she >>>>> can work on it separately. We can tag these issues based on title names >>>>> (e.g. "Backport: xxx" as title), and these are also good issues for new >>>>> contributors to work on because there is already reference implementation >>>>> in a newer version. >>>>> 2. For bug fixes, I understand sometimes it's just a one line fix and >>>>> people will try to just fix across versions. My take is that we should try >>>>> to advocate for fixing 1 version and open an issue for other versions >>>>> although it does not really need to be enforced as strictly. Sometimes >>>>> even >>>>> a 1 line fix has serious implications for different versions and might >>>>> break stuff unintentionally. it's better that people that have production >>>>> dependency on the specific version carefully review and test changes >>>>> before >>>>> merging. >>>>> >>>>> About enforcement strategy, I suggest we start to create a template >>>>> for Github Issues and PRs >>>>> <https://docs.github.com/en/communities/using-templates-to-encourage-useful-issues-and-pull-requests/creating-a-pull-request-template-for-your-repository>, >>>>> where we state the guidelines related to engine versions, as well as >>>>> Iceberg's preferred code style, naming convention, title convention, etc. >>>>> to make new contributors a bit easier to submit changes without too much >>>>> rewrite. Currently I observe that every time there is a new contributor, >>>>> we >>>>> need to state all the guidelines through PR review, which causes quite a >>>>> lot of time spent on rewriting the code and also reduces the motivation >>>>> for >>>>> people to continue work on the PR. >>>>> >>>>> Best, >>>>> Jack Ye >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Wed, Nov 3, 2021 at 4:13 PM Kyle Bendickson <k...@tabular.io> >>>>> wrote: >>>>> >>>>>> I submitted a PR to fix a Spark bug today, applying the same changes >>>>>> to all eligible Spark versions. >>>>>> >>>>>> Jack mentioned that he thought the practice going forward was to fix >>>>>> / apply changes on the latest Spark version in one PR, and then open a >>>>>> second PR to backport the fixes (presumably to minimize review overhead). >>>>>> >>>>>> Do we have a standard / preference on that? Jack mentioned he wasn't >>>>>> certain, so I thought I'd ask here. >>>>>> >>>>>> Seems like a good practice but hoping to get some clarification :) >>>>>> >>>>>> -- >>>>>> Best, >>>>>> Kyle Bendickson >>>>>> Github: @kbendick >>>>>> >>>>> > > -- > Ryan Blue > Tabular >