I like Yufei's idea of adding a template. As per the concern about making changes separate and having things fall through the cracks, I do share that concern.
On a related note though, I've already observed that a few times with simple things that maybe people were working on before the split up. These will go away with time, though could happen in the future when we add a new version. I agree a PR template would go a long way in ensuring that people are at least aware. It would be great if more folks who maintain forks could chime in as well, as this has ramifications for that as well. But overall, especially for larger PRs, I think it's a great idea. - Kyle On Fri, Nov 5, 2021 at 10:34 AM Yufei Gu <flyrain...@gmail.com> wrote: > It is a good practice to create a template for Github Issues and PRs > <https://docs.github.com/en/communities/using-templates-to-encourage-useful-issues-and-pull-requests/creating-a-pull-request-template-for-your-repository>. > It > will make a PR or issue easier to read overall. We can also enforce/suggest > properties like affected versions and fixed versions. > > Best, > Yufei > > > On Wed, Nov 3, 2021 at 6:02 PM Wing Yew Poon <wyp...@cloudera.com.invalid> > wrote: > >> I wasn't aware that we were standardizing on such a practice. I don't >> have a strong opinion on making changes one Spark version at a time or all >> at once. I think committers who do reviews regularly should decide. My only >> concern with making changes one version at a time is follow-through on the >> part of the contributor. We want to ensure that a change/fix applicable to >> multiple versions gets into all of them. Reviewer bandwidth is incurred >> either way. (For simple changes/fixes, perhaps all at once does save >> reviewer bandwidth, so we may want to be flexible.) >> >> On Wed, Nov 3, 2021 at 5:52 PM Jack Ye <yezhao...@gmail.com> wrote: >> >>> Thanks for bringing this up Kyle! >>> >>> My personal view is the following: >>> 1. For new features, it should be very clear that we always implement >>> them against the latest version. At the same time, I suggest we create an >>> issue to track backport, so that if anyone is interested in backport he/she >>> can work on it separately. We can tag these issues based on title names >>> (e.g. "Backport: xxx" as title), and these are also good issues for new >>> contributors to work on because there is already reference implementation >>> in a newer version. >>> 2. For bug fixes, I understand sometimes it's just a one line fix and >>> people will try to just fix across versions. My take is that we should try >>> to advocate for fixing 1 version and open an issue for other versions >>> although it does not really need to be enforced as strictly. Sometimes even >>> a 1 line fix has serious implications for different versions and might >>> break stuff unintentionally. it's better that people that have production >>> dependency on the specific version carefully review and test changes before >>> merging. >>> >>> About enforcement strategy, I suggest we start to create a template for >>> Github Issues and PRs >>> <https://docs.github.com/en/communities/using-templates-to-encourage-useful-issues-and-pull-requests/creating-a-pull-request-template-for-your-repository>, >>> where we state the guidelines related to engine versions, as well as >>> Iceberg's preferred code style, naming convention, title convention, etc. >>> to make new contributors a bit easier to submit changes without too much >>> rewrite. Currently I observe that every time there is a new contributor, we >>> need to state all the guidelines through PR review, which causes quite a >>> lot of time spent on rewriting the code and also reduces the motivation for >>> people to continue work on the PR. >>> >>> Best, >>> Jack Ye >>> >>> >>> >>> >>> >>> On Wed, Nov 3, 2021 at 4:13 PM Kyle Bendickson <k...@tabular.io> wrote: >>> >>>> I submitted a PR to fix a Spark bug today, applying the same changes to >>>> all eligible Spark versions. >>>> >>>> Jack mentioned that he thought the practice going forward was to fix / >>>> apply changes on the latest Spark version in one PR, and then open a second >>>> PR to backport the fixes (presumably to minimize review overhead). >>>> >>>> Do we have a standard / preference on that? Jack mentioned he wasn't >>>> certain, so I thought I'd ask here. >>>> >>>> Seems like a good practice but hoping to get some clarification :) >>>> >>>> -- >>>> Best, >>>> Kyle Bendickson >>>> Github: @kbendick >>>> >>>