Seeking Input on Handling Ambiguity in Generating Changelogs

2023-04-20 Thread Yufei Gu
Hi folks, I am reaching out to request your insights on addressing the ambiguous behavior of generating changelogs in Iceberg. To provide some context, Iceberg does not enforce row uniqueness even when configured with identifier fields (a.k.a primary key in the other database system) during write

Re: [DISCUSS] Spark 3.1 support?

2023-04-20 Thread Walaa Eldin Moustafa
LinkedIn is still on Spark 3.1. I am guessing a number of other companies could be in the same boat. I feel the argument for Spark 2.4 is different from that of Spark 3.1 and it would be great if we can continue to support 3.1 for some time. On Wed, Apr 19, 2023 at 11:06 AM Ryan Blue wrote: > +1

Re: [DISCUSS] Dropping Spark 2.4 support

2023-04-20 Thread Ryan Blue
Great question. I don't have a good idea of who is on JDK 8 still. Maybe we should start another thread? On Thu, Apr 20, 2023 at 1:05 PM Anton Okolnychyi wrote: > What about JDK 8? If I remember Spark 2 was holding us, do we want to > consider switching to JDK 11 for releases? > > - Anton > > On

Re: [DISCUSS] Spark 3.1 support?

2023-04-20 Thread Fokko Driesprong
Spring cleaning! I checked which versions of Spark the cloud vendors are supporting. Both AWS and GCP are already on 3.3. However, for Azure , Spark 3.3 is in preview and is still on 3.1.3. They are planning to upg

Re: [DISCUSS] Spark 3.1 support?

2023-04-20 Thread Anton Okolnychyi
Since there are no objections and it is in line with what we planned initially, I created a PR to drop 3.1. https://github.com/apache/iceberg/pull/7390 - Anton > On Apr 19, 2023, at 11:05 AM, Ryan Blue wrote: > > +1 > > As we said in the 2.4 dis

Re: [DISCUSS] Dropping Spark 2.4 support

2023-04-20 Thread Anton Okolnychyi
What about JDK 8? If I remember Spark 2 was holding us, do we want to consider switching to JDK 11 for releases? - Anton > On Apr 20, 2023, at 2:10 AM, Driesprong, Fokko wrote: > > Thanks all for the response, much appreciated. > > That said, I'd love to hear from more people on this. I think

Re: Data incorrectness when bucket joining Iceberg table

2023-04-20 Thread Anton Okolnychyi
Iceberg and Spark hash functions are not compatible, just like Hive and Spark hash functions are not compatible. That’s why the new SPJ framework depends on the function catalog. - Anton > On Apr 18, 2023, at 7:09 PM, Manu Zhang wrote: > > Hi All, > > Since there had been no bucket join in S

Re: Welcome new PMC members!

2023-04-20 Thread Anton Okolnychyi
Well deserved! Congrats! - Anton > On Apr 19, 2023, at 10:46 PM, liwei li wrote: > > Congrats ! > > > On Thu, Apr 13, 2023 at 2:32 AM Prashant Singh > wrote: > Congratulations all ! > > Regards, > Prashant Singh > > On Wed, Apr 12, 2023 at 10:48 AM Jonas J

Re: [DISCUSS] Dropping Spark 2.4 support

2023-04-20 Thread Driesprong, Fokko
Thanks all for the response, much appreciated. That said, I'd love to hear from more people on this. I think it would be > great to drop support, but I don't know how many people still use it. Is > upgrading Hadoop a good reason to drop support for an engine? Hadoop seems > like a minor concern to