Hi Russel,

What do you mean by "keep these changes in master"? Can you elaborate?
As for Iceberg, we back-port spark/v3.1 patches from master branch.

On Sun, Apr 23, 2023 at 10:04 AM <russell.spit...@gmail.com> wrote:

> If you are on forked 0.13 is it important to keep these changes in master?
>
> Sent from my iPhone
>
> On Apr 22, 2023, at 8:42 PM, Manu Zhang <owenzhang1...@gmail.com> wrote:
>
> 
> I'd like to share our maintenance strategy and history at eBay.
>
> We are now on forked versions of Iceberg 0.13.1 and Spark 3.1.1. For
> Spark, We started to evaluate upgrading to 3.1.1 from 2.3/2.4 in H2, 2021
> since it was the latest and most stable version then.
> After migrating internal changes and finishing tests, we rolled out to
> customers for our managed platforms (mainly SQL) or pushed them to upgrade
> for their own (mainly Scala and PySpark). At this time, there are still
> less than 10% customers that haven't upgraded.  It's unlikely we will make
> another major upgrade soon. We've been back-porting bug fixes from Spark
> branch-3.1 but now we are on our own.
>
> For a company size like eBay, I don't think it's unusual to spend more
> than 18 months to do such a major upgrade. The 18-month maintenance period
> is too short, in my opinion. (BTW, Spark 3.2 just made its final release.)
> The benefit of a community maintained branch is that we can always *be
> notified of critical bug fixes* and fix them proactively before they
> impact our customers. Can we at least open GitHub issues for back-porting
> bug fixes and see whoever cares steps up? I'm more than willing to do it.
> If after sometime, no one wants to pick up the back-port tasks, maybe we
> can eventually announce it EOL. WDYT?
>
> Thanks,
> Manu
>
> On Sun, Apr 23, 2023 at 3:43 AM Ryan Blue <b...@tabular.io> wrote:
>
>> +1 for marking 3.1 deprecated.
>>
>> On Sat, Apr 22, 2023 at 10:20 AM Jack Ye <yezhao...@gmail.com> wrote:
>>
>>> Here was the original lifecycle of engine version support guideline we
>>> came up with:
>>> https://iceberg.apache.org/multi-engine-support/#current-engine-version-lifecycle-status
>>>
>>> I think we can at least mark 3.1 support as deprecated, which matches
>>> the situation here that "People who are still interested in the version can
>>> backport any necessary feature or bug fix from newer versions, but the
>>> community will not spend effort in achieving feature parity." But we could
>>> keep it around for some more time given there is still active usage of it.
>>>
>>> Jack
>>>
>>> On Fri, Apr 21, 2023 at 5:32 PM Steven Wu <stevenz...@gmail.com> wrote:
>>>
>>>> >  without requiring authors to cherry-pick all applicable changes,
>>>> like we agreed initially.
>>>>
>>>> Not trying to change what agreed before. Just for my understanding.
>>>> Let's say the latest Spark version is 3.3. Today, we don't require any
>>>> backport to 3.2 and 3.1, correct?
>>>>
>>>> On Fri, Apr 21, 2023 at 5:19 PM Ryan Blue <b...@tabular.io> wrote:
>>>>
>>>>> I still agree with the idea that people interested in Spark 3.1 should
>>>>> be primarily responsible for keeping it updated. Backporting patches is up
>>>>> to the contributor.
>>>>>
>>>>> The only concern I have about keeping Hive 3.1 is whether there are
>>>>> important bugs or security issues that are not getting backported. That
>>>>> would signal that the branch is not maintained enough to continue 
>>>>> releasing
>>>>> it. But if we are still seeing important problems getting fixed, I think 
>>>>> it
>>>>> should be primarily up to the people maintaining the branch.
>>>>>
>>>>> On Fri, Apr 21, 2023 at 5:14 PM Anton Okolnychyi
>>>>> <aokolnyc...@apple.com.invalid> wrote:
>>>>>
>>>>>> We backported only a small number of changes to 3.1, compared to 3.2.
>>>>>> At this point, they also diverged quite a bit so doing those backports is
>>>>>> hard. When we discussed how to support multiple engine versions, the
>>>>>> community initially agreed that it’s optional for authors to cherry-pick
>>>>>> changes into older versions and should be done by other members of the
>>>>>> community interested in those integrations. That’s what led us to where 
>>>>>> we
>>>>>> are today. We may reconsider this approach but only if the there is a 
>>>>>> small
>>>>>> number of versions to support. I am also OK to keep older modules but 
>>>>>> only
>>>>>> to provide folks a place to collaborate, without requiring authors to
>>>>>> cherry-pick all applicable changes, like we agreed initially.
>>>>>>
>>>>>> - Anton
>>>>>>
>>>>>> On Apr 21, 2023, at 3:58 PM, Ryan Blue <b...@tabular.io> wrote:
>>>>>>
>>>>>> Good question about backports. Walaa and Edgar, are you backporting
>>>>>> fixes to 3.1? It makes sense to have a place to collaborate, but only if
>>>>>> people are actively keeping them updated.
>>>>>>
>>>>>> On Fri, Apr 21, 2023 at 3:54 PM Steven Wu <stevenz...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> For the 3.1 activities that Ryan linked, 3.1 are updated probably
>>>>>>> for the requirement of backporting (keeping 3.1, 3.2, 3.3 in sync). It 
>>>>>>> is
>>>>>>> the adopted policy. Not sure if it is an indication that people are
>>>>>>> actively collaborating on 3.1.
>>>>>>>
>>>>>>> As Anton was saying, backporting/syncing 4 versions (3.1, 3.2, 3.3,
>>>>>>> 3.4) is a pretty high budden.
>>>>>>>
>>>>>>> On Fri, Apr 21, 2023 at 2:29 PM Anton Okolnychyi <
>>>>>>> aokolnyc...@apple.com.invalid> wrote:
>>>>>>>
>>>>>>>> If it is being used by folks in the community, let’s keep it for
>>>>>>>> now. That said, let’s come up with a strategy on when to eventually 
>>>>>>>> drop it
>>>>>>>> as the list cannot grow indefinitely. Our initial agreement was to keep
>>>>>>>> last 3 (except Spark LTS versions), which worked well for 18 months of
>>>>>>>> support promised by the Spark community. At this point, Spark will not
>>>>>>>> release any bug fixes for 3.1, even critical.
>>>>>>>>
>>>>>>>> Walaa, Edgar, can you tell us a little bit about the Spark 3.1
>>>>>>>> integration you depend on? Do you have your own Iceberg/Spark forks? 
>>>>>>>> Is an
>>>>>>>> updated Iceberg core module the primary thing you are looking for?
>>>>>>>> How do you deal with Spark bugs?
>>>>>>>>
>>>>>>>> My biggest worry is that our Spark 3.1 integration randomly gets
>>>>>>>> some updates from time to time. By releasing those jars with each 
>>>>>>>> Iceberg
>>>>>>>> version, we send a message that it is being actively maintained and 
>>>>>>>> worked
>>>>>>>> on. That’s actually not true, we cherry-pick only some changes. Also, 
>>>>>>>> it is
>>>>>>>> still part of our release cycle, so it must be checked and tested (our 
>>>>>>>> next
>>>>>>>> release will have 3.1, 3.2, 3.3 and 3.4 integrations to test).
>>>>>>>>
>>>>>>>> I am going to close the PR for now but it would be great to find a
>>>>>>>> good way to handle this in the future. At least, we have to document 
>>>>>>>> what
>>>>>>>> kind of expectations our users should have. Do we promise that all bug
>>>>>>>> fixes discovered in newer Spark versions will be cherry-picked to all 
>>>>>>>> older
>>>>>>>> Spark versions? I am not sure that’s true at this point.
>>>>>>>>
>>>>>>>> - Anton
>>>>>>>>
>>>>>>>>
>>>>>>>> On Apr 21, 2023, at 10:29 AM, Ryan Blue <b...@tabular.io> wrote:
>>>>>>>>
>>>>>>>> According to Spark docs, a minor release will be supported for 18
>>>>>>>> months and 3.1 was released 2021-03-02, more than 2 years ago. I don't
>>>>>>>> think we should expect any further updates from the Spark community for
>>>>>>>> the 3.1 line.
>>>>>>>>
>>>>>>>> I'm also not sure that there is a problem continuing to release
>>>>>>>> Iceberg's module for 3.1. It is still being updated
>>>>>>>> <https://github.com/apache/iceberg/commits/master/spark/v3.1> and
>>>>>>>> I don't think it is preventing us from continuing work on the later
>>>>>>>> versions. Makes sense to me to keep it if people are collaborating 
>>>>>>>> there.
>>>>>>>> We should evaluate this again soon though.
>>>>>>>>
>>>>>>>> On Fri, Apr 21, 2023 at 8:27 AM Edgar Rodriguez <
>>>>>>>> edgar.rodrig...@airbnb.com.invalid> wrote:
>>>>>>>>
>>>>>>>>> Airbnb is also still on Spark 3.1 and I echo some of Walaa's
>>>>>>>>> comments.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>>
>>>>>>>>> On Thu, Apr 20, 2023 at 8:14 PM Walaa Eldin Moustafa <
>>>>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> LinkedIn is still on Spark 3.1. I am guessing a number of other
>>>>>>>>>> companies could be in the same boat. I feel the argument for Spark 
>>>>>>>>>> 2.4 is
>>>>>>>>>> different from that of Spark 3.1 and it would be great if we can 
>>>>>>>>>> continue
>>>>>>>>>> to support 3.1 for some time.
>>>>>>>>>>
>>>>>>>>>> On Wed, Apr 19, 2023 at 11:06 AM Ryan Blue <b...@tabular.io>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> +1
>>>>>>>>>>>
>>>>>>>>>>> As we said in the 2.4 discussion, the format itself should
>>>>>>>>>>> provide forward compatibility with tables and it is more clear that 
>>>>>>>>>>> we
>>>>>>>>>>> aren't adding new features if you have to use older versions for 
>>>>>>>>>>> Spark 3.1.
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Apr 19, 2023 at 10:08 AM Anton Okolnychyi <
>>>>>>>>>>> aokolnyc...@apple.com.invalid> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hey folks,
>>>>>>>>>>>>
>>>>>>>>>>>> What does everybody think about Spark 3.1 support after we add
>>>>>>>>>>>> Spark 3.4 support? Our initial plan was to release jars for the 
>>>>>>>>>>>> last 3
>>>>>>>>>>>> versions. Are there any blockers for dropping 3.1?
>>>>>>>>>>>>
>>>>>>>>>>>> - Anton
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Ryan Blue
>>>>>>>>>>> Tabular
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Edgar R
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Ryan Blue
>>>>>>>> Tabular
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Ryan Blue
>>>>>> Tabular
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Ryan Blue
>>>>> Tabular
>>>>>
>>>>
>>
>> --
>> Ryan Blue
>> Tabular
>>
>

Reply via email to