Re: [DISCUSS] Guidelines for committing PRs

2024-08-12 Thread Walaa Eldin Moustafa
I think the issue with the first paragraph is about: 1- The perceived contradiction between a) trusting committers to act in the best interest of the project and b) simultaneously providing specific guidelines on how to act (e.g., by avoiding conflicts of interest). 2- The specific examples given

Re: [DISCUSS] Filesystem in PyIceberg

2024-08-12 Thread Xuanwo
Hi, André Thanks a lot for starting this thread. List operations on storage services are expensive and slow. That's why Iceberg is designed to store metadata in files and avoid using list operations in FileIO. However, `orphan file removal` or `garbage cleanup` are special tasks that do requir

[DISCUSS] Variant Spec Location

2024-08-12 Thread Russell Spitzer
Hi Y’all, We’ve hit a bit of a roadblock with the Variant Proposal, while we were hoping to move the Variant and Shredding specifications from Spark into Iceberg there doesn’t seem to be a lot of interest in that. Unfortunately, I think we have a number of issues with just linking to the Spark pro

Re: [DISCUSS] Filesystem in PyIceberg

2024-08-12 Thread André Luis Anastácio
Thank you Fokko about the context! This blog post helped me a lot! I understand that in the Iceberg Java implementation the maintenance procedures are just [interfaces](https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/actions/DeleteOrphanFiles.java#L34), and the

Re: [DISCUSS] Flink 1.20: make FLIP-27 default in SQL and mark the old FlinkSource as deprecated

2024-08-12 Thread Ryan Blue
+1 from me. Good idea. On Mon, Aug 12, 2024 at 9:01 AM Péter Váry wrote: > Cool +1 from me then. > > Steven Wu ezt írta (időpont: 2024. aug. 12., H, > 17:56): > >> > My only concern is doing this only for Flink 1.20. If this is only a >> single default value change, I'm fine with it. >> >> it i

Re: [DISCUSS] Filesystem in PyIceberg

2024-08-12 Thread Fokko Driesprong
Hi André, First of all, thanks for raising this. Maintenance routines are a long-awaited functionality in PyIceberg. The FileIO concept is not limited to PyIceberg, but is also present in Java

Re: Iceberg-arrow vectorized read bug

2024-08-12 Thread Lessard, Steve
Iceberg-arrow is a non-trivial chunk of code that appears to me as though it has no current owner. Based on the history I find in GitHub it appears the original author of the org.apache.iceberg.arrow.vectorized package, Mayur Srivastava, wrote this package as his one and only contribution to any

Re: [DISCUSS] Flink 1.20: make FLIP-27 default in SQL and mark the old FlinkSource as deprecated

2024-08-12 Thread Péter Váry
Cool +1 from me then. Steven Wu ezt írta (időpont: 2024. aug. 12., H, 17:56): > > My only concern is doing this only for Flink 1.20. If this is only a > single default value change, I'm fine with it. > > it is one config change plus Java doc and @deprecated change. It is very > minimal. > > I do

Re: [DISCUSS] Flink 1.20: make FLIP-27 default in SQL and mark the old FlinkSource as deprecated

2024-08-12 Thread Steven Wu
> My only concern is doing this only for Flink 1.20. If this is only a single default value change, I'm fine with it. it is one config change plus Java doc and @deprecated change. It is very minimal. I don't see the benefit outweighing the state incompatibility of the switch if we also make the c

Re: [VOTE] Release Apache PyIceberg 0.7.1rc1

2024-08-12 Thread Sung Yun
Hi folks, happy Monday! First of all, thank you all for taking the time to download and verify the release candidate. Unfortunately over the weekend, a critical bug was reported where the deletes were not correctly performed on the desired predicate for files that require partial rewrites. (th

[DISCUSS] Filesystem in PyIceberg

2024-08-12 Thread André Luis Anastácio
Hello everyone, I’ve been studying the Java implementation of orphan file removal to replicate it in PyIceberg. During this process, I noticed a key difference: in Java, we use the Hadoop Filesystem[1], while in PyIceberg, we use the Filesystem provided by FileIO[2][3]. Currently, we support t

Re: [DISCUSS] How about setup a iceberg meetup in Beijing?

2024-08-12 Thread Xuanwo
Thank you, Kevin. The guide is really helpful; I will review and refine my proposal later. On Mon, Aug 12, 2024, at 02:17, Kevin Liu wrote: > Hi Xuanwo, > > Love the idea! We've been hosting the Seattle area meetup for the last couple > of months and are also helping to coordinate the Bay Area

Re: [DISCUSS] Flink 1.20: make FLIP-27 default in SQL and mark the old FlinkSource as deprecated

2024-08-12 Thread Péter Váry
Thanks Steven for driving this! I'm very much for deprecating FlinkSource for IcebergSource. My only concern is doing this only for Flink 1.20. If this is only a single default value change, I'm fine with it. OTOH having bigger differences between the source of the different Flink versions would c

Re: [DISCUSS] Flink 1.20: make FLIP-27 default in SQL and mark the old FlinkSource as deprecated

2024-08-12 Thread Fokko Driesprong
Hey Steven, That sounds very exciting! I'm not a heavy Flink user, but I don't see any issues enabling it on Flink 1.20. We should make it explicit in the changelog, and if possible give some hints on how to drain the Flink jobs. Kind regards, Fokko Op ma 12 aug 2024 om 04:57 schreef Steven Wu :