Re: [DISCUSS] Hive Support

2024-11-19 Thread Manu Zhang
Okay, let me add this option D. Drop Hive 2 & 3 support and suggest to use built-in Iceberg support of Hive 4 On Wed, Nov 20, 2024 at 2:00 PM Cheng Pan wrote: > Hive 4 brings built-in support for Iceberg format, duplicated > implementation in both sides look a redundant stuff. > > As Hive 2 and

Re: [DISCUSS] Hive Support

2024-11-19 Thread Cheng Pan
Hive 4 brings built-in support for Iceberg format, duplicated implementation in both sides look a redundant stuff. As Hive 2 and 3 do not support Java 11+, and Iceberg 1.8 requires Java 11+, the combination is invalid. How about simply dropping support for Hive 2&3 and suggesting the Hive user

Re: [DISCUSS] Spark 3.3 support?

2024-11-19 Thread Anton Okolnychyi
Here we go then: https://github.com/apache/iceberg/pull/11596 - Anton вт, 19 лист. 2024 р. о 02:00 roryqi пише: > +1 to deprecate it and remove it. > > Yufei Gu 于2024年11月19日周二 15:32写道: > > > > +1 to deprecate it and remove it. > > > > Yufei > > > > > > On Wed, Nov 13, 2024 at 9:17 AM Fokko Dr

[DISCUSS] Hive Support

2024-11-19 Thread Manu Zhang
Hi all, We previously reached consensus[1] to deprecate Hive 2 in 1.7 and drop in 1.8. However, when working on the removal PR[2], multiple tests failed in Hive 3 due to not supporting JDK11[3]. The fix has been back-ported to branch-3.1[4] but not released yet. As announced on Hive website, Hive

Re: [DISCUSS] - Deprecate Equality Deletes

2024-11-19 Thread Jean-Baptiste Onofré
I don’t think it’s a problem while an alternative is explored (the JDK itself does that pretty often). So it’s up to the community: of course I’m against removing it without solid alternative, but deprecation is fine imho. Regards JB Le mar. 19 nov. 2024 à 12:19, Ajantha Bhat a écrit : > - ok f

Re: [VOTE] Deprecate and remove last-column-id

2024-11-19 Thread Renjie Liu
+1, thanks Fokko! On Wed, Nov 20, 2024 at 8:45 AM Steve Zhang wrote: > +1 nb > > Thanks, > Steve Zhang > > > > On Nov 19, 2024, at 12:18 AM, Fokko Driesprong wrote: > > Hi everyone, > > Based on the positive feedback on the [DISCUSS] thread >

Re: [VOTE] Deprecate and remove last-column-id

2024-11-19 Thread Steve Zhang
+1 nb Thanks, Steve Zhang > On Nov 19, 2024, at 12:18 AM, Fokko Driesprong wrote: > > Hi everyone, > > Based on the positive feedback on the [DISCUSS] thread > and the > pull-request on GitHub

Re: [DISCUSS] - Deprecate Equality Deletes

2024-11-19 Thread Russell Spitzer
- How would the Delta table look like? Delta Table is just another Iceberg table with the exact same schema as the base table (it could possibly skip partitioning since we expect it to stay very small) - Would it just contain the whole new record? It could, doesn't have to. The key is that any val

Re: [DISCUSS] - Deprecate Equality Deletes

2024-11-19 Thread Micah Kornfield
> > The key here is that you only use position deletes on your delta table > which you keep small, say 1gb or less. Would this cause issues operationally for either a high enough sustained throughput of streamed data or if the maintenance process of moving the data out of the delta table has an o

Changing Ownership and Cadence for Catalog Community Sync

2024-11-19 Thread Jack Ye
Hi everyone, We have been doing the catalog community sync for quite a few months now and have made quite some good progress on the REST catalog development front. I personally have some plans for travelling in the next few months and would likely not be able to host or join the meeting series. B

Re: Changing Ownership and Cadence for Catalog Community Sync

2024-11-19 Thread Kevin Liu
Thanks, Honah, I received the new GCal invite. I double-checked the links for Google Meets and meeting notes, everything seems to be correct. Best, Kevin Liu On Tue, Nov 19, 2024 at 11:48 AM Honah J. wrote: > Thanks Jack! > > Hi everyone, I am very happy to help host the meeting series. Just a

Re: Changing Ownership and Cadence for Catalog Community Sync

2024-11-19 Thread Honah J.
Thanks Jack! Hi everyone, I am very happy to help host the meeting series. Just a quick heads up that tomorrow's Iceberg Catalog Community Sync (Nov 20 9:00 am - 10:00am PST) meeting will proceed as usual. I will create a new event series in the same calendar soon. Best regards, Honah On Tue, No

Re: [DISCUSS] Deprecate embedded manifests

2024-11-19 Thread Bryan Keller
+1 to deprecate > On Nov 19, 2024, at 3:32 AM, Fokko Driesprong wrote: > > Hi everyone, > > I would like to propose to deprecate embedded manifests > . This has been used before the > manifest-list was introduced, but I don't think they are used s

Re: [VOTE] Deprecate and remove last-column-id

2024-11-19 Thread Russell Spitzer
+1 On Tue, Nov 19, 2024 at 4:11 AM Fokko Driesprong wrote: > Hey Manu, > > That's an excellent question. I took the following rationale: > >- For the code, the iceberg-core module, a minor release deprecation >cycle is required >

Re: [DISCUSS] - Deprecate Equality Deletes

2024-11-19 Thread Péter Váry
Hi Team, I have a few questions about the Delta table: - How would the Delta table look like? - Would it just contain the whole new record? - How would the solution handle the double updates? - Would it just write a second version of the record to the Delta table? - How would the solution handl

Re: [DISCUSS] Deprecate embedded manifests

2024-11-19 Thread Kevin Liu
+1 On Tue, Nov 19, 2024 at 9:23 AM Bryan Keller wrote: > +1 to deprecate > > On Nov 19, 2024, at 3:32 AM, Fokko Driesprong wrote: > > Hi everyone, > > I would like to propose to deprecate embedded manifests > . This has been used before > the manife

Re: [VOTE] Deprecate and remove last-column-id

2024-11-19 Thread Honah J.
+1. Thanks Fokko. Best, Honah On Tue, Nov 19, 2024 at 9:11 AM Kevin Liu wrote: > +1 (non-binding). The spec and code deprecation schedule looks good to me. > > Best, > Kevin Liu > > On Tue, Nov 19, 2024 at 8:42 AM Christian Thiel > wrote: > >> +1 (non-binding) – looks like we are going in the

Re: [VOTE] Deprecate and remove last-column-id

2024-11-19 Thread Kevin Liu
+1 (non-binding). The spec and code deprecation schedule looks good to me. Best, Kevin Liu On Tue, Nov 19, 2024 at 8:42 AM Christian Thiel wrote: > +1 (non-binding) – looks like we are going in the right direction in rust! > > > Christian > > > On 19. Nov 2024, at 16:13, Jack Ye wrote: > > +1

Re: [DISCUSS] Deprecate embedded manifests

2024-11-19 Thread Russell Spitzer
Deprecate On Tue, Nov 19, 2024 at 5:40 AM Jean-Baptiste Onofré wrote: > Hi Fokko > > As I don’t think it’s actually used, I think it’s fine to deprecate it. > > Regards > JB > > Le mar. 19 nov. 2024 à 12:32, Fokko Driesprong a > écrit : > >> Hi everyone, >> >> I would like to propose to depreca

Re: [DISCUSS] - Deprecate Equality Deletes

2024-11-19 Thread Bryan Keller
I think it is great to explore alternatives, but I still feel we shouldn't deprecate equality deletes until we have a clear path forward. > On Nov 19, 2024, at 7:56 AM, Russell Spitzer > wrote: > > I'm strongly in favor of moving to the Delta + Base table approach discussed > in the cookbook

Re: [DISCUSS] - Deprecate Equality Deletes

2024-11-19 Thread Jack Ye
The proposal sounds similar to the Delta Lake CDC feature with CDC file type [1] and CDC action [2]. There was also the proposal I wrote a long time ago [3] to use a "cdc" branch rather than 2 private tables, which was inspired by the Delta Lake approach. The feedback was mixed at that time becaus

Re: [VOTE] Deprecate and remove last-column-id

2024-11-19 Thread Christian Thiel
+1 (non-binding) – looks like we are going in the right direction in rust! Christian On 19. Nov 2024, at 16:13, Jack Ye wrote: +1 -Jack On Tue, Nov 19, 2024 at 7:45 AM Russell Spitzer mailto:russell.spit...@gmail.com>> wrote: +1 On Tue, Nov 19, 2024 at 4:11 AM Fokko Driesprong mailto:fo..

Re: [DISCUSS] Deprecate embedded manifests

2024-11-19 Thread Jean-Baptiste Onofré
Hi Fokko As I don’t think it’s actually used, I think it’s fine to deprecate it. Regards JB Le mar. 19 nov. 2024 à 12:32, Fokko Driesprong a écrit : > Hi everyone, > > I would like to propose to deprecate embedded manifests > . This has been used b

Re: [DISCUSS] - Deprecate Equality Deletes

2024-11-19 Thread Russell Spitzer
I'm strongly in favor of moving to the Delta + Base table approach discussed in the cookbook above. I wonder if we should codify that into something more standardized but it seems to me to be a much better path forward. I'm not sure we need to support his at the spec level but it would be nice if w

Re: [DISCUSS] - Deprecate Equality Deletes

2024-11-19 Thread Manu Zhang
Hi Ajantha, I'm proposing exploring a view-based approach similar to the changelog-mirror table pattern[1] rather than supporting delta writers for Kafka connect Iceberg sink. 1. https://www.tabular.io/apache-iceberg-cookbook/data-engineering-cdc-table-mirroring/ On Tue, Nov 19, 2024 at 7:38 PM

Re: [DISCUSS] REST: Way to query if metadata pointer is the latest

2024-11-19 Thread Gabor Kaszab
Hi, Thanks for sharing your view, Taeyun! I think there are many levels of representation here and we might not mean the same with our points. I think in general an interaction between a query engine and an Iceberg REST catalog has these different layers: 1) The engine (Impala, Spark, Trino, etc.)

[DISCUSS] Deprecate embedded manifests

2024-11-19 Thread Fokko Driesprong
Hi everyone, I would like to propose to deprecate embedded manifests . This has been used before the manifest-list was introduced, but I don't think they are used since the project has been open-sourced, and it would be good to officially deprecate the

Re: [DISCUSS] - Deprecate Equality Deletes

2024-11-19 Thread Ajantha Bhat
> > - ok for deprecate equality deletes > - not ok to remove it @JB: I don't think it is a good idea to use deprecated functionality in the new feature development. Hence, my specific question was about kafka connect upsert operation. @Manu: I meant the delta writers for kafka connect Iceberg si

Re: [DISCUSS] Spark 3.3 support?

2024-11-19 Thread roryqi
+1 to deprecate it and remove it. Yufei Gu 于2024年11月19日周二 15:32写道: > > +1 to deprecate it and remove it. > > Yufei > > > On Wed, Nov 13, 2024 at 9:17 AM Fokko Driesprong wrote: >> >> +1 to deprecating and removing it >> >> Kind regards, >> Fokko >> >> Op wo 13 nov 2024 om 18:02 schreef Jean-Bapt

Re: [DISCUSS] - Deprecate Equality Deletes

2024-11-19 Thread Manu Zhang
I second Anton's proposal to standardize on a view-based approach to handle CDC cases. Actually, it's already been explored in detail[1] by Jack before. [1] Improving Change Data Capture Use Case for Apache Iceberg

[VOTE] Deprecate and remove last-column-id

2024-11-19 Thread Fokko Driesprong
Hi everyone, Based on the positive feedback on the [DISCUSS] thread and the pull-request on GitHub , I would like to raise a vote to deprecate and remove the last-column-id field from

Re: [DISCUSS] - Deprecate Equality Deletes

2024-11-19 Thread Jean-Baptiste Onofré
My proposal is the following (already expressed): - ok for deprecate equality deletes - not ok to remove it - work on position deletes improvements to address streaming use cases. I think we should explore different approaches. Personally I think a possible approach would be to find index way to da