Re: Join the python iceberg project

2021-09-14 Thread Jun H.
Hi Mordechai, Thanks for your interest! Addition to what Jack mentioned, we also have a slack channel #python in apache-iceberg slack workspace for the iceberg python library. As the iceberg python library is an implementation of iceberg spec, it would be great to get familiar with the spec

Re: [DISCUSS] Spark version support strategy

2021-09-14 Thread Jack Ye
Hi Wing Yew, I think 2.4 is a different story, we will continue to support Spark 2.4, but as you can see it will continue to have very limited functionalities comparing to Spark 3. I believe we discussed about option 3 when we were doing Spark 3.0 to 3.1 upgrade. Recently we are seeing the same is

Re: [DISCUSS] Spark version support strategy

2021-09-14 Thread Wing Yew Poon
I understand and sympathize with the desire to use new DSv2 features in Spark 3.2. I agree that Option 1 is the easiest for developers, but I don't think it considers the interests of users. I do not think that most users will upgrade to Spark 3.2 as soon as it is released. It is a "minor version"

Re: [DISCUSS] Spark version support strategy

2021-09-14 Thread Yufei Gu
Option 1 sounds good to me. Here are my reasons: 1. Both 2 and 3 will slow down the development. Considering the limited resources in the open source community, the upsides of option 2 and 3 are probably not worthy. 2. Both 2 and 3 assume the use cases may not exist. It's hard to predict anything,

Re: [DISCUSS] Spark version support strategy

2021-09-14 Thread Anton Okolnychyi
To sum up what we have so far: Option 1 (support just the most recent minor Spark 3 version) The easiest option for us devs, forces the user to upgrade to the most recent minor Spark version to consume any new Iceberg features. Option 2 (a separate project under Iceberg) Can support as many S

Re: Join the python iceberg project

2021-09-14 Thread Jack Ye
Hi Mordechai, Thank you very much for your interest! We are in the process of refactoring the python codebase, so there are a lot of opportunities for contribution. We have had a few discussions so far, you can join future meetings by subscribing to this google group: https://groups.google.com/g/

Re: [DISCUSS] Spark version support strategy

2021-09-14 Thread Russell Spitzer
I think we should go for option 1. I already am not a big fan of having runtime errors for unsupported things based on versions and I don't think minor version upgrades are a large issue for users. I'm especially not looking forward to supporting interfaces that only exist in Spark 3.2 in a mul

Re: [DISCUSS] Spark version support strategy

2021-09-14 Thread Anton Okolnychyi
Hey Imran, I don’t know why I forgot to mention this option too. It is definitely a solution to consider. We used this approach to support Spark 2 and Spark 3. Right now, this would mean having iceberg-spark (common code for all versions), iceberg-spark2, iceberg-spark-3 (common code for all Spa

Re: [DISCUSS] Spark version support strategy

2021-09-14 Thread Anton Okolnychyi
> First of all, is option 2 a viable option? We discussed separating the python > module outside of the project a few weeks ago, and decided to not do that > because it's beneficial for code cross reference and more intuitive for new > developers to see everything in the same repository. I would

Join the python iceberg project

2021-09-14 Thread Mordechai Ben-Zecharia
Hi I want to join the effort of the iceberg python package. I have several years of Python/Big data/Backend/ML experience and will be happy to donate code to this project Do you have any guidelines and some learning materials? Thanks Mordechai Ben Zecharia Big Data Engineer | Data Engineering T