Hi Russel, What do you mean by "keep these changes in master"? Can you elaborate? As for Iceberg, we back-port spark/v3.1 patches from master branch.
On Sun, Apr 23, 2023 at 10:04 AM <russell.spit...@gmail.com> wrote: > If you are on forked 0.13 is it important to keep these changes in master? > > Sent from my iPhone > > On Apr 22, 2023, at 8:42 PM, Manu Zhang <owenzhang1...@gmail.com> wrote: > > > I'd like to share our maintenance strategy and history at eBay. > > We are now on forked versions of Iceberg 0.13.1 and Spark 3.1.1. For > Spark, We started to evaluate upgrading to 3.1.1 from 2.3/2.4 in H2, 2021 > since it was the latest and most stable version then. > After migrating internal changes and finishing tests, we rolled out to > customers for our managed platforms (mainly SQL) or pushed them to upgrade > for their own (mainly Scala and PySpark). At this time, there are still > less than 10% customers that haven't upgraded. It's unlikely we will make > another major upgrade soon. We've been back-porting bug fixes from Spark > branch-3.1 but now we are on our own. > > For a company size like eBay, I don't think it's unusual to spend more > than 18 months to do such a major upgrade. The 18-month maintenance period > is too short, in my opinion. (BTW, Spark 3.2 just made its final release.) > The benefit of a community maintained branch is that we can always *be > notified of critical bug fixes* and fix them proactively before they > impact our customers. Can we at least open GitHub issues for back-porting > bug fixes and see whoever cares steps up? I'm more than willing to do it. > If after sometime, no one wants to pick up the back-port tasks, maybe we > can eventually announce it EOL. WDYT? > > Thanks, > Manu > > On Sun, Apr 23, 2023 at 3:43 AM Ryan Blue <b...@tabular.io> wrote: > >> +1 for marking 3.1 deprecated. >> >> On Sat, Apr 22, 2023 at 10:20 AM Jack Ye <yezhao...@gmail.com> wrote: >> >>> Here was the original lifecycle of engine version support guideline we >>> came up with: >>> https://iceberg.apache.org/multi-engine-support/#current-engine-version-lifecycle-status >>> >>> I think we can at least mark 3.1 support as deprecated, which matches >>> the situation here that "People who are still interested in the version can >>> backport any necessary feature or bug fix from newer versions, but the >>> community will not spend effort in achieving feature parity." But we could >>> keep it around for some more time given there is still active usage of it. >>> >>> Jack >>> >>> On Fri, Apr 21, 2023 at 5:32 PM Steven Wu <stevenz...@gmail.com> wrote: >>> >>>> > without requiring authors to cherry-pick all applicable changes, >>>> like we agreed initially. >>>> >>>> Not trying to change what agreed before. Just for my understanding. >>>> Let's say the latest Spark version is 3.3. Today, we don't require any >>>> backport to 3.2 and 3.1, correct? >>>> >>>> On Fri, Apr 21, 2023 at 5:19 PM Ryan Blue <b...@tabular.io> wrote: >>>> >>>>> I still agree with the idea that people interested in Spark 3.1 should >>>>> be primarily responsible for keeping it updated. Backporting patches is up >>>>> to the contributor. >>>>> >>>>> The only concern I have about keeping Hive 3.1 is whether there are >>>>> important bugs or security issues that are not getting backported. That >>>>> would signal that the branch is not maintained enough to continue >>>>> releasing >>>>> it. But if we are still seeing important problems getting fixed, I think >>>>> it >>>>> should be primarily up to the people maintaining the branch. >>>>> >>>>> On Fri, Apr 21, 2023 at 5:14 PM Anton Okolnychyi >>>>> <aokolnyc...@apple.com.invalid> wrote: >>>>> >>>>>> We backported only a small number of changes to 3.1, compared to 3.2. >>>>>> At this point, they also diverged quite a bit so doing those backports is >>>>>> hard. When we discussed how to support multiple engine versions, the >>>>>> community initially agreed that it’s optional for authors to cherry-pick >>>>>> changes into older versions and should be done by other members of the >>>>>> community interested in those integrations. That’s what led us to where >>>>>> we >>>>>> are today. We may reconsider this approach but only if the there is a >>>>>> small >>>>>> number of versions to support. I am also OK to keep older modules but >>>>>> only >>>>>> to provide folks a place to collaborate, without requiring authors to >>>>>> cherry-pick all applicable changes, like we agreed initially. >>>>>> >>>>>> - Anton >>>>>> >>>>>> On Apr 21, 2023, at 3:58 PM, Ryan Blue <b...@tabular.io> wrote: >>>>>> >>>>>> Good question about backports. Walaa and Edgar, are you backporting >>>>>> fixes to 3.1? It makes sense to have a place to collaborate, but only if >>>>>> people are actively keeping them updated. >>>>>> >>>>>> On Fri, Apr 21, 2023 at 3:54 PM Steven Wu <stevenz...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> For the 3.1 activities that Ryan linked, 3.1 are updated probably >>>>>>> for the requirement of backporting (keeping 3.1, 3.2, 3.3 in sync). It >>>>>>> is >>>>>>> the adopted policy. Not sure if it is an indication that people are >>>>>>> actively collaborating on 3.1. >>>>>>> >>>>>>> As Anton was saying, backporting/syncing 4 versions (3.1, 3.2, 3.3, >>>>>>> 3.4) is a pretty high budden. >>>>>>> >>>>>>> On Fri, Apr 21, 2023 at 2:29 PM Anton Okolnychyi < >>>>>>> aokolnyc...@apple.com.invalid> wrote: >>>>>>> >>>>>>>> If it is being used by folks in the community, let’s keep it for >>>>>>>> now. That said, let’s come up with a strategy on when to eventually >>>>>>>> drop it >>>>>>>> as the list cannot grow indefinitely. Our initial agreement was to keep >>>>>>>> last 3 (except Spark LTS versions), which worked well for 18 months of >>>>>>>> support promised by the Spark community. At this point, Spark will not >>>>>>>> release any bug fixes for 3.1, even critical. >>>>>>>> >>>>>>>> Walaa, Edgar, can you tell us a little bit about the Spark 3.1 >>>>>>>> integration you depend on? Do you have your own Iceberg/Spark forks? >>>>>>>> Is an >>>>>>>> updated Iceberg core module the primary thing you are looking for? >>>>>>>> How do you deal with Spark bugs? >>>>>>>> >>>>>>>> My biggest worry is that our Spark 3.1 integration randomly gets >>>>>>>> some updates from time to time. By releasing those jars with each >>>>>>>> Iceberg >>>>>>>> version, we send a message that it is being actively maintained and >>>>>>>> worked >>>>>>>> on. That’s actually not true, we cherry-pick only some changes. Also, >>>>>>>> it is >>>>>>>> still part of our release cycle, so it must be checked and tested (our >>>>>>>> next >>>>>>>> release will have 3.1, 3.2, 3.3 and 3.4 integrations to test). >>>>>>>> >>>>>>>> I am going to close the PR for now but it would be great to find a >>>>>>>> good way to handle this in the future. At least, we have to document >>>>>>>> what >>>>>>>> kind of expectations our users should have. Do we promise that all bug >>>>>>>> fixes discovered in newer Spark versions will be cherry-picked to all >>>>>>>> older >>>>>>>> Spark versions? I am not sure that’s true at this point. >>>>>>>> >>>>>>>> - Anton >>>>>>>> >>>>>>>> >>>>>>>> On Apr 21, 2023, at 10:29 AM, Ryan Blue <b...@tabular.io> wrote: >>>>>>>> >>>>>>>> According to Spark docs, a minor release will be supported for 18 >>>>>>>> months and 3.1 was released 2021-03-02, more than 2 years ago. I don't >>>>>>>> think we should expect any further updates from the Spark community for >>>>>>>> the 3.1 line. >>>>>>>> >>>>>>>> I'm also not sure that there is a problem continuing to release >>>>>>>> Iceberg's module for 3.1. It is still being updated >>>>>>>> <https://github.com/apache/iceberg/commits/master/spark/v3.1> and >>>>>>>> I don't think it is preventing us from continuing work on the later >>>>>>>> versions. Makes sense to me to keep it if people are collaborating >>>>>>>> there. >>>>>>>> We should evaluate this again soon though. >>>>>>>> >>>>>>>> On Fri, Apr 21, 2023 at 8:27 AM Edgar Rodriguez < >>>>>>>> edgar.rodrig...@airbnb.com.invalid> wrote: >>>>>>>> >>>>>>>>> Airbnb is also still on Spark 3.1 and I echo some of Walaa's >>>>>>>>> comments. >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> >>>>>>>>> On Thu, Apr 20, 2023 at 8:14 PM Walaa Eldin Moustafa < >>>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> LinkedIn is still on Spark 3.1. I am guessing a number of other >>>>>>>>>> companies could be in the same boat. I feel the argument for Spark >>>>>>>>>> 2.4 is >>>>>>>>>> different from that of Spark 3.1 and it would be great if we can >>>>>>>>>> continue >>>>>>>>>> to support 3.1 for some time. >>>>>>>>>> >>>>>>>>>> On Wed, Apr 19, 2023 at 11:06 AM Ryan Blue <b...@tabular.io> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> +1 >>>>>>>>>>> >>>>>>>>>>> As we said in the 2.4 discussion, the format itself should >>>>>>>>>>> provide forward compatibility with tables and it is more clear that >>>>>>>>>>> we >>>>>>>>>>> aren't adding new features if you have to use older versions for >>>>>>>>>>> Spark 3.1. >>>>>>>>>>> >>>>>>>>>>> On Wed, Apr 19, 2023 at 10:08 AM Anton Okolnychyi < >>>>>>>>>>> aokolnyc...@apple.com.invalid> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hey folks, >>>>>>>>>>>> >>>>>>>>>>>> What does everybody think about Spark 3.1 support after we add >>>>>>>>>>>> Spark 3.4 support? Our initial plan was to release jars for the >>>>>>>>>>>> last 3 >>>>>>>>>>>> versions. Are there any blockers for dropping 3.1? >>>>>>>>>>>> >>>>>>>>>>>> - Anton >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Ryan Blue >>>>>>>>>>> Tabular >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Edgar R >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Ryan Blue >>>>>>>> Tabular >>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>>> -- >>>>>> Ryan Blue >>>>>> Tabular >>>>>> >>>>>> >>>>>> >>>>> >>>>> -- >>>>> Ryan Blue >>>>> Tabular >>>>> >>>> >> >> -- >> Ryan Blue >> Tabular >> >