Re: [DISCUSS] Introduce C FFI for iceberg rust

2025-02-17 Thread Gang Wu
Thanks Xuanwo! Looking forward to the possibility of iceberg-cpp integration with the C FFI! Best, Gang On Tue, Feb 18, 2025 at 3:21 PM Renjie Liu wrote: > Hi: > > Thanks Xuanwo for raising this. > > As xuanwo mentioned, rust implementation + c binding will provide a good > foundation for cros

Re: [DISCUSS] Introduce C FFI for iceberg rust

2025-02-17 Thread Renjie Liu
Hi: Thanks Xuanwo for raising this. As xuanwo mentioned, rust implementation + c binding will provide a good foundation for cross lang implementation of iceberg spec, including cpp implementation. Looking forward to see more opinions from community! On Tue, Feb 18, 2025 at 2:52 PM Xuanwo wrote

[DISCUSS] Introduce C FFI for iceberg rust

2025-02-17 Thread Xuanwo
Hello everyone I have started a PoC to introduce C FFI for Iceberg Rust. This will allow users to interact with Iceberg Rust through the C ABI, enabling them to integrate Iceberg support into their existing C or C++ codebase. You can view the PR here: https://github.com/apache/iceberg-rust/pull

Re: [DISCUSS] PyIceberg 0.9.0 release

2025-02-17 Thread Kevin Liu
Thanks for volunteering! I'm happy to assist in any way I can. Let's coordinate on Slack :) Quick follow up on the commit hash pinned above, it's meant as a reference point and not the absolute cutoff. In fact, we merged a few PRs today that will be included in 0.9.0. Please chime in here if there

Re: [Discuss] Print un-pretty metadata JSON files without whitespace

2025-02-17 Thread Kevin Liu
+1, json with no whitespace sounds like a reasonable default. But if saving storage space and network is the main goal, then setting `write.metadata.compression-codec` to `gzip` is way more impactful. Perhaps this is a good default on the catalog side when creating new metadata json. Best, Kevin L

Re: Remove deprecated table properties

2025-02-17 Thread Steven Wu
I have some concerns on the issue of silent behavior change that Steve Zhang raised in the PR comment. E.g., users may set the location based on the deprecated table property, With this change, it would silently switch to a new location. This can potentially mess up orphan file cleanup etc. Maybe

Re: Remove deprecated table properties

2025-02-17 Thread Kevin Liu
+1 for removing. Thanks for taking up the cleanup duty! I looked up the usage for the property and its string value with github search, and confirmed that they are not used. Also, for reference, here are the previous related PRs: https://github.com/apache/iceberg/pull/3094 https://github.com/apac

Re: Remove deprecated table properties

2025-02-17 Thread Yufei Gu
+1 to remove them. Yufei On Mon, Feb 17, 2025 at 1:26 PM Steve Zhang wrote: > Thanks Fokko for removing deprecated properties! > > Just want to highlight the worst case for tables with old configuration > and not aware of this deprecation might experience silent behavior change. > But consideri

Re: Remove deprecated table properties

2025-02-17 Thread Steve Zhang
Thanks Fokko for removing deprecated properties! Just want to highlight the worst case for tables with old configuration and not aware of this deprecation might experience silent behavior change. But considering this has been deprecated for past 3 years, here’s my +1. Thanks, Steve Zhang > O

Re: Remove deprecated table properties

2025-02-17 Thread Jean-Baptiste Onofré
Hi Fokko +1 to clean this. Thanks ! Regards JB On Mon, Feb 17, 2025 at 11:18 AM Fokko Driesprong wrote: > > Hi everyone, > > While reviewing the LocationProvider equivalent of PyIceberg, I noticed some > old code in the Java codebase that I felt could be cleaned up. You can find > the PR over

Re: [Discuss] Print un-pretty metadata JSON files without whitespace

2025-02-17 Thread Ian Streeter
The numbers I shared were for uncompressed files. I am embarrassed to say I had not noticed there is an option `write.metadata.compression-codec`. I had it set to the default `none`, and I reckon many other Iceberg users will too. Here are some updated numbers for my example metadata file: - Un

Re: [Discuss] Print un-pretty metadata JSON files without whitespace

2025-02-17 Thread Russell Spitzer
+0 - I would be surprised if post compression sizes were that different but minifying json is a pretty standard practice for over the wire transfers On Mon, Feb 17, 2025 at 1:51 PM Steve Zhang wrote: > +1. Configure table property `write.metadata.compression-codec` to gzip is > usually suggested

Re: [ANNOUNCE] Apache Iceberg release 1.8.0

2025-02-17 Thread Russell Spitzer
It sounds like the argument here is that we should change the Spec for V1, V2, and V3 to mark current-snapshot-id as required. Then we should change all other implementations to follow this new standard. I'm not sure that is a good solution going forwards but I'm not sure of how we can support cata

Re: [Discuss] Print un-pretty metadata JSON files without whitespace

2025-02-17 Thread Steve Zhang
+1. Configure table property `write.metadata.compression-codec` to gzip is usually suggested to reduce metadata size but drop whitespace can still help here. Thanks, Steve Zhang > On Feb 17, 2025, at 8:32 AM, Fokko Driesprong wrote: > > Hey Ian, > > Thanks for raising this. The numbers yo

Re: Remove deprecated table properties

2025-02-17 Thread Russell Spitzer
+1 to remove in 1.9 On Mon, Feb 17, 2025 at 4:20 AM Fokko Driesprong wrote: > Hi everyone, > > While reviewing the LocationProvider equivalent of PyIceberg, I noticed > some old code in the Java codebase that I felt could be cleaned up. You > can find the PR over here

Re: [Discuss] Print un-pretty metadata JSON files without whitespace

2025-02-17 Thread Steven Wu
+1. it seems reasonable to produce unpretty json by default. On Mon, Feb 17, 2025 at 8:35 AM Fokko Driesprong wrote: > Hey Ian, > > Thanks for raising this. The numbers you mention, do you know if this was > compressed or uncompressed? > > I have read other issues in github which mention gigabyt

Re: [ANNOUNCE] Apache Iceberg release 1.8.0

2025-02-17 Thread Fokko Driesprong
Hey Robert, The thing is, that -1 cannot "go away". Yes, I agree, but that's also the case for null, as the field is optional in the spec . Therefore we support both in PyIceberg

Re: [DISCUSS] PyIceberg 0.9.0 release

2025-02-17 Thread Drew
Hey Kevin, Thanks for kicking this off. It’s exciting to see how much PyIceberg has been evolving! I’d be happy to take on the Release Manager role for this release! I think it would be a good opportunity to try out the new release process documentation. Thanks, Drew On Mon, Feb 17, 2025 at 8:5

Re: [ANNOUNCE] Apache Iceberg release 1.8.0

2025-02-17 Thread Robert Stupp
Hi Fokko, sure, in general "absent" or "null" would be cleaner. But now we have two representations for the same case - I suspect most went with the "reference behavior". The thing is, that -1 cannot "go away". I'd prefer to keep the previous behavior - otherwise implementations may fall ba

[DISCUSS] PyIceberg 0.9.0 release

2025-02-17 Thread Kevin Liu
Hi everyone, It's been a while since we released a new version of PyIceberg! The last minor release (0.8.0) was on November 18, and the most recent patch release (0.8.1) was on December 6. Time flies! There have been >200 commits since 0.8.0

Re: [Discuss] Print un-pretty metadata JSON files without whitespace

2025-02-17 Thread Fokko Driesprong
Hey Ian, Thanks for raising this. The numbers you mention, do you know if this was compressed or uncompressed? I have read other issues in github which mention gigabyte-scale metadata > files. This sounds like a bad practice, and that table probably needs some maintenance. I don't have the his

[Discuss] Print un-pretty metadata JSON files without whitespace

2025-02-17 Thread Ian Streeter
Currently, metadata files are pretty-printed, with lots of new-lines and whitespace indentations. This is the relevant line of code, which uses the Jackson default pretty printer: https://github.com/apache/iceberg/blob/abb47830e7df7dc2ae93c74b0ad97f06cdd37aad/core/src/main/java/org/apache/iceberg

Re: [ANNOUNCE] Apache Iceberg release 1.8.0

2025-02-17 Thread Fokko Driesprong
Hey Robert, Thanks for raising this. snapshot-ID -1 isn't per-se invalid, because the valid values are not > defined in the spec. For me, this is invalid, since there is no snapshot with -1 in the snapshots property. In the tests with the PR, you can see that there are no snapshots

Re: [ANNOUNCE] Apache Iceberg release 1.8.0

2025-02-17 Thread Robert Stupp
Feels like https://github.com/apache/iceberg/pull/11560 introduced a behavior change. snapshot-ID -1 isn't per-se invalid, because the valid values are not defined in the spec. Previous Iceberg-Java versions always produced -1 if there's no current snapshot - 1.8 produces `null` in that case

Re: [DISCUSS] Consolidate docs under Concepts and Project/Terms

2025-02-17 Thread Fokko Driesprong
I've left some comments on the PR, thanks for cleaning this up Manu. Manish, regarding your question. First, I think this is a separate discussion, but a valid one. This was also suggested in the past , but got closed because I was changing diapers. I t

Remove deprecated table properties

2025-02-17 Thread Fokko Driesprong
Hi everyone, While reviewing the LocationProvider equivalent of PyIceberg, I noticed some old code in the Java codebase that I felt could be cleaned up. You can find the PR over here . This one removes the deprecated properties: OBJECT_STORE_PATH = "w