If you are on forked 0.13 is it important to keep these changes in master?

Sent from my iPhone

On Apr 22, 2023, at 8:42 PM, Manu Zhang <owenzhang1...@gmail.com> wrote:


I'd like to share our maintenance strategy and history at eBay.

We are now on forked versions of Iceberg 0.13.1 and Spark 3.1.1. For Spark, We started to evaluate upgrading to 3.1.1 from 2.3/2.4 in H2, 2021 since it was the latest and most stable version then.
After migrating internal changes and finishing tests, we rolled out to customers for our managed platforms (mainly SQL) or pushed them to upgrade for their own (mainly Scala and PySpark). At this time, there are still less than 10% customers that haven't upgraded.  It's unlikely we will make another major upgrade soon. We've been back-porting bug fixes from Spark branch-3.1 but now we are on our own. 

For a company size like eBay, I don't think it's unusual to spend more than 18 months to do such a major upgrade. The 18-month maintenance period is too short, in my opinion. (BTW, Spark 3.2 just made its final release.)
The benefit of a community maintained branch is that we can always be notified of critical bug fixes and fix them proactively before they impact our customers. Can we at least open GitHub issues for back-porting bug fixes and see whoever cares steps up? I'm more than willing to do it. If after sometime, no one wants to pick up the back-port tasks, maybe we can eventually announce it EOL. WDYT?

Thanks,
Manu

On Sun, Apr 23, 2023 at 3:43 AM Ryan Blue <b...@tabular.io> wrote:
+1 for marking 3.1 deprecated.

On Sat, Apr 22, 2023 at 10:20 AM Jack Ye <yezhao...@gmail.com> wrote:
Here was the original lifecycle of engine version support guideline we came up with: https://iceberg.apache.org/multi-engine-support/#current-engine-version-lifecycle-status

I think we can at least mark 3.1 support as deprecated, which matches the situation here that "People who are still interested in the version can backport any necessary feature or bug fix from newer versions, but the community will not spend effort in achieving feature parity." But we could keep it around for some more time given there is still active usage of it.

Jack

On Fri, Apr 21, 2023 at 5:32 PM Steven Wu <stevenz...@gmail.com> wrote:
>  without requiring authors to cherry-pick all applicable changes, like we agreed initially.

Not trying to change what agreed before. Just for my understanding. Let's say the latest Spark version is 3.3. Today, we don't require any backport to 3.2 and 3.1, correct?

On Fri, Apr 21, 2023 at 5:19 PM Ryan Blue <b...@tabular.io> wrote:
I still agree with the idea that people interested in Spark 3.1 should be primarily responsible for keeping it updated. Backporting patches is up to the contributor.

The only concern I have about keeping Hive 3.1 is whether there are important bugs or security issues that are not getting backported. That would signal that the branch is not maintained enough to continue releasing it. But if we are still seeing important problems getting fixed, I think it should be primarily up to the people maintaining the branch.

On Fri, Apr 21, 2023 at 5:14 PM Anton Okolnychyi <aokolnyc...@apple.com.invalid> wrote:
We backported only a small number of changes to 3.1, compared to 3.2. At this point, they also diverged quite a bit so doing those backports is hard. When we discussed how to support multiple engine versions, the community initially agreed that it’s optional for authors to cherry-pick changes into older versions and should be done by other members of the community interested in those integrations. That’s what led us to where we are today. We may reconsider this approach but only if the there is a small number of versions to support. I am also OK to keep older modules but only to provide folks a place to collaborate, without requiring authors to cherry-pick all applicable changes, like we agreed initially.

- Anton

On Apr 21, 2023, at 3:58 PM, Ryan Blue <b...@tabular.io> wrote:

Good question about backports. Walaa and Edgar, are you backporting fixes to 3.1? It makes sense to have a place to collaborate, but only if people are actively keeping them updated.

On Fri, Apr 21, 2023 at 3:54 PM Steven Wu <stevenz...@gmail.com> wrote:
For the 3.1 activities that Ryan linked, 3.1 are updated probably for the requirement of backporting (keeping 3.1, 3.2, 3.3 in sync). It is the adopted policy. Not sure if it is an indication that people are actively collaborating on 3.1. 

As Anton was saying, backporting/syncing 4 versions (3.1, 3.2, 3.3, 3.4) is a pretty high budden.

On Fri, Apr 21, 2023 at 2:29 PM Anton Okolnychyi <aokolnyc...@apple.com.invalid> wrote:
If it is being used by folks in the community, let’s keep it for now. That said, let’s come up with a strategy on when to eventually drop it as the list cannot grow indefinitely. Our initial agreement was to keep last 3 (except Spark LTS versions), which worked well for 18 months of support promised by the Spark community. At this point, Spark will not release any bug fixes for 3.1, even critical.

Walaa, Edgar, can you tell us a little bit about the Spark 3.1 integration you depend on? Do you have your own Iceberg/Spark forks? Is an updated Iceberg core module the primary thing you are looking for? How do you deal with Spark bugs?

My biggest worry is that our Spark 3.1 integration randomly gets some updates from time to time. By releasing those jars with each Iceberg version, we send a message that it is being actively maintained and worked on. That’s actually not true, we cherry-pick only some changes. Also, it is still part of our release cycle, so it must be checked and tested (our next release will have 3.1, 3.2, 3.3 and 3.4 integrations to test).

I am going to close the PR for now but it would be great to find a good way to handle this in the future. At least, we have to document what kind of expectations our users should have. Do we promise that all bug fixes discovered in newer Spark versions will be cherry-picked to all older Spark versions? I am not sure that’s true at this point.

- Anton


On Apr 21, 2023, at 10:29 AM, Ryan Blue <b...@tabular.io> wrote:

According to Spark docs, a minor release will be supported for 18 months and 3.1 was released 2021-03-02, more than 2 years ago. I don't think we should expect any further updates from the Spark community for the 3.1 line.

I'm also not sure that there is a problem continuing to release Iceberg's module for 3.1. It is still being updated and I don't think it is preventing us from continuing work on the later versions. Makes sense to me to keep it if people are collaborating there. We should evaluate this again soon though.

On Fri, Apr 21, 2023 at 8:27 AM Edgar Rodriguez <edgar.rodrig...@airbnb.com.invalid> wrote:
Airbnb is also still on Spark 3.1 and I echo some of Walaa's comments.

Cheers,

On Thu, Apr 20, 2023 at 8:14 PM Walaa Eldin Moustafa <wa.moust...@gmail.com> wrote:
LinkedIn is still on Spark 3.1. I am guessing a number of other companies could be in the same boat. I feel the argument for Spark 2.4 is different from that of Spark 3.1 and it would be great if we can continue to support 3.1 for some time.

On Wed, Apr 19, 2023 at 11:06 AM Ryan Blue <b...@tabular.io> wrote:
+1

As we said in the 2.4 discussion, the format itself should provide forward compatibility with tables and it is more clear that we aren't adding new features if you have to use older versions for Spark 3.1.

On Wed, Apr 19, 2023 at 10:08 AM Anton Okolnychyi <aokolnyc...@apple.com.invalid> wrote:
Hey folks,

What does everybody think about Spark 3.1 support after we add Spark 3.4 support? Our initial plan was to release jars for the last 3 versions. Are there any blockers for dropping 3.1?

- Anton


--
Ryan Blue
Tabular


--
Edgar R


--
Ryan Blue
Tabular



--
Ryan Blue
Tabular



--
Ryan Blue
Tabular


--
Ryan Blue
Tabular

Reply via email to