Re: [DISCUSS] Deprecate HadoopTableOperations, move to tests in 2.0

2024-07-18 Thread Fokko Driesprong
Hey Ryan and others, Thanks for bringing this up. I would be in favor of removing the HadoopTableOperations, mostly because of the reasons that you already mentioned, but also about the fact that it is not fully in line with the first principles of Iceberg (being object store native) as it uses fi

Re: [DISCUSS] Deprecate HadoopTableOperations, move to tests in 2.0

2024-07-18 Thread Ajantha Bhat
+1 on deprecating the `File System Tables` from spec and `HadoopCatalog`, `HadoopTableOperations` in code for now and removing them permanently during 2.0 release. For testing we can use `InMemoryCatalog` as others mentioned. I am not sure about moving to test or keeping them only for HDFS. Becau

Re: Building with JDK 21

2024-07-18 Thread Denys Kuzmenko
Hi All, Let me chime in here and add some Hive perspective on that. The only reason Iceberg support has moved into Hive, is that we didn't get enough support from the existing community. Our PRs got stuck pending review and even if we got some +1 those were not binding. Don't take me wrong, it

Re: Building with JDK 21

2024-07-18 Thread Péter Váry
Hi Team, I was there when we decided to move the Hive development from the Iceberg repo to the Hive repo. Apart from the previously mentioned reasons there was another big blocker which prevented us moving forward with the development in the Iceberg repo - the lack of Hive releases. We needed sever

[DISCUSS] Flink unaligned checkpoints

2024-07-18 Thread Péter Váry
Hi Team, Qishang Zhong found a bug with Flink Sink [1]. In nutshell: - Failure during checkpoints could cause duplicate data with Flink sink with CDC writes In more detail: - If there is a failure after the `prepareSnapshotPreBarrier` but before the `snapshotState` for CHK1, then the data/delete

Re: [DISCUSS] Deprecate HadoopTableOperations, move to tests in 2.0

2024-07-18 Thread Eduard Tudenhöfner
+1 on deprecating now and removing them from the codebase with Iceberg 2.0 On Thu, Jul 18, 2024 at 10:40 AM Ajantha Bhat wrote: > +1 on deprecating the `File System Tables` from spec and `HadoopCatalog`, > `HadoopTableOperations` in code for now > and removing them permanently during 2.0 release

Re: Building with JDK 21

2024-07-18 Thread Denys Kuzmenko
In the following 1-2 months we plan to release HIVE-4.0.1 which includes bug fixes and then focus on HIVE-4.1.0 release with jdk17.

Re: [DISCUSS] Flink unaligned checkpoints

2024-07-18 Thread Steven Wu
Regarding unaligned checkpoints, Flink savepoint is always aligned and recommended for Flink version upgrade. We can potentially recommend users to use Flink savepoint to pick up this fix. I will take a closer look at the PR. On Thu, Jul 18, 2024 at 6:22 AM Péter Váry wrote: > Hi Team, > > Qish

Re: [DISCUSS] Deprecate HadoopTableOperations, move to tests in 2.0

2024-07-18 Thread Jack Ye
Thank you for bringing this up Ryan. I have been also in the camp of saying HadoopCatalog is not recommended, but after thinking about this more deeply last night, I now have mixed feelings about this topic. Just to comment on the reasons you listed first: * For reason 1 & 2, it looks like the roo

Re: [Early Feedback] Variant and Subcolumnarization Support

2024-07-18 Thread Russell Spitzer
I'm aligned with point 1. For point 2 I think we should choose quickly, I honestly do think this would be fine as part of the Iceberg Spec directly but understand it may be better for the more broad community if it was a sub project. As a sub-project I would still prefer it being an Iceberg Subpro

Re: [DISCUSS] Deprecate HadoopTableOperations, move to tests in 2.0

2024-07-18 Thread John Zhuge
Appreciate the thoughtful comments! On Thu, Jul 18, 2024 at 10:29 AM Jack Ye wrote: > Thank you for bringing this up Ryan. I have been also in the camp of > saying HadoopCatalog is not recommended, but after thinking about this more > deeply last night, I now have mixed feelings about this to

Re: [Early Feedback] Variant and Subcolumnarization Support

2024-07-18 Thread Ryan Blue
Similarly, I'm aligned with point 1 and I'd choose to support only variant for point 3. We'll need to work with the Spark community to find a good place for the library and spec, since it touches many different projects. I'd also prefer Iceberg as the home. I also think it's a good idea to get su

Re: [DISCUSS] Deprecate HadoopTableOperations, move to tests in 2.0

2024-07-18 Thread Steven Wu
Thanks Jack for the thoughtful comments. I am not fully sold that object storage issues have been solved. S3 directory bucket is not a general purpose bucket and lives in a single zone. The data durability guarantee may not work for many use cases. We don't know when S3 will add the atomic renamin

Administration of Apache Iceberg Social/Marketing Channels

2024-07-18 Thread Jonathan Leang
Hey all, I have a question related to some questions about shared administration of Apache Iceberg social/marketing channels such as YouTube in this specific instance. Kevin Liu and I have organized the first Seattle Iceberg meetups and it looks like there will be a steady stream of technical talk

Re: Administration of Apache Iceberg Social/Marketing Channels

2024-07-18 Thread Jack Ye
> posting/content management privileges to the Iceberg YouTube channel for these meetup recordings This sounds reasonable to me, when the meetup is recurring. So I am good with doing this for the Seattle meetup series. We have technically already done so for Bits for the community sync meeting ser

Re: Building with JDK 21

2024-07-18 Thread Ryan Blue
Thanks for the context, Denys and Peter. Sounds like there's a good question here about where the Hive integration should live and the most recent decision was to maintain that support in Hive. I definitely hear the point about Hive 3 users depending on the Iceberg modules. I'm also glad to hear th

Re: Administration of Apache Iceberg Social/Marketing Channels

2024-07-18 Thread Ryan Blue
I strongly prefer not posting meetup videos on an official Apache Iceberg channel. That's why we have up until now used the channel only for videos from sanctioned Iceberg events , like community syncs or the Apache Iceberg Summit. I wasn't aware that

Re: Administration of Apache Iceberg Social/Marketing Channels

2024-07-18 Thread Brian Olsen
Hey Ryan, I will take that down, apologies for that confusion. I remember having some discussion around this before and my recollection was this was more of a concern about vendor content but after I read your reply I remembered this was a more general ASF sentiment. So with that, I still think di

Re: Administration of Apache Iceberg Social/Marketing Channels

2024-07-18 Thread Brian Olsen
Update: I moved the video and playlist to a Private status until this is discussed. In the meantime, we'll wait to hear some guidance from the PMC and community to advise. Otherwise, I suggest that Jonathan and Kevin create an unofficial community Apache Iceberg Meetups channel that follows the tra

Re: [DISCUSS] Flink unaligned checkpoints

2024-07-18 Thread Ryan Blue
This sounds like a reasonable solution to me. Thanks, Peter! On Thu, Jul 18, 2024 at 8:28 AM Steven Wu wrote: > Regarding unaligned checkpoints, Flink savepoint is always aligned and > recommended for Flink version upgrade. We can potentially recommend users > to use Flink savepoint to pick up t

Re: [DISCUSS] Deprecate HadoopTableOperations, move to tests in 2.0

2024-07-18 Thread Ryan Blue
I am not fully sold that object storage issues have been solved. S3 directory bucket is not a general purpose bucket and lives in a single zone. The data durability guarantee may not work for many use cases. We don’t know when S3 will add the atomic renaming support. I agree with Steven here that

Re: Administration of Apache Iceberg Social/Marketing Channels

2024-07-18 Thread Jack Ye
Looking at https://www.youtube.com/@ApacheIceberg/playlists, there are also many other talks and even tutorials, what about those? -Jack On Thu, Jul 18, 2024 at 2:45 PM Brian Olsen wrote: > Update: I moved the video and playlist to a Private status until this is > discussed. In the meantime, we

Re: [VOTE] Merge table spec clarifications on time travel and equality deletes

2024-07-18 Thread Ryan Blue
+1 Thanks, Micah! On Tue, Jul 16, 2024 at 7:04 AM Jean-Baptiste Onofré wrote: > +1 (non binding) > > Thanks ! > Regards > JB > > On Mon, Jul 15, 2024 at 10:35 PM Micah Kornfield > wrote: > > > > I'd like to raise on modifying the table specification with > clarifications on time travel and equ

Re: Administration of Apache Iceberg Social/Marketing Channels

2024-07-18 Thread Ryan Blue
I didn't realize that the account was curating playlists. I'm trying to draw a parallel with the blog post example. Do we consider playlists links? I guess the question is whether it looks official. Would inclusion in a playlist be mistaken for an endorsement? Or would we get into unnecessary debat

Re: [VOTE] Merge table spec clarifications on time travel and equality deletes

2024-07-18 Thread Steven Wu
I am +1 for the spec clarifications. I have left some comments for the time travel PR. we can discuss the details in the PR itself before merging. In particular, I am wondering if the time travel clarification can be add to the existing `snapshots` section of the spec (instead of adding a new `imp

Re: write distribution change when setting local order to a partition table

2024-07-18 Thread Anton Okolnychyi
The command currently strictly follows the provided instructions and defaults the distribution and sort order if not provided. This means it actually unsets the sort order if it was not provided and sets the distribution to none if the distribution clause is missing. I guess it would be more natura

Re: Administration of Apache Iceberg Social/Marketing Channels

2024-07-18 Thread Jack Ye
Can people share what other projects do? And are there any related ASF guidelines out there about this? -Jack -Jack On Thu, Jul 18, 2024, 3:57 PM Ryan Blue wrote: > I didn't realize that the account was curating playlists. I'm trying to > draw a parallel with the blog post example. Do we consi

Re: Building with JDK 21

2024-07-18 Thread Cheng Pan
A basic question, is iceberg-hive-metastore considered part of the "Hive module"? I suppose that HMS 2.x is still widely used. AFAIK, the current iceberg-hive-metastore is compatible with HMS 2.1+, based on Iceberg and Spark CI, I also suppose it works well with Java 8 to 21. Thanks, Cheng Pan O

Re: [VOTE] Merge table spec clarifications on time travel and equality deletes

2024-07-18 Thread Amogh Jahagirdar
+1 (non-binding) on these spec clarifications Thanks, Amogh Jahagirdar On Thu, Jul 18, 2024 at 5:08 PM Steven Wu wrote: > I am +1 for the spec clarifications. > > I have left some comments for the time travel PR. we can discuss the > details in the PR itself before merging. In particular, I am

Re:Re: [DISCUSS] Deprecate HadoopTableOperations, move to tests in 2.0

2024-07-18 Thread lisoda
Hi team. I am not a pmc member, just a regular user. Instead of discussing whether hadoopcatalog needs to continue to exist, I'd like to share a more practical issue. We currently serve over 30,000 customers, all of whom use Iceberg to store their foundational data, and all business