Re: [Discuss] Print un-pretty metadata JSON files without whitespace

2025-03-20 Thread Micah Kornfield
This reminds me that GZipped metadata files are not covered in the spec. I opened https://github.com/apache/iceberg/pull/12598 to try to document them (feedback welcome). On Mon, Feb 17, 2025 at 2:35 PM Kevin Liu wrote: > +1, json with no whitespace sounds like a reasonable default. But if > sa

Re: [DISCUSS] Row lineage required for v3

2025-03-20 Thread Péter Váry
I agree with most what Ryan said with a single important exception: > If an engine doesn’t implement row ID preservation (which, by the way, is not hard!) [..] For streaming applications (Flink, Kafka Connect) this is a non-trivial task. The engine either has to keep everything in memory or do co

Re: [DISCUSS] Row lineage required for v3

2025-03-20 Thread Eduard Tudenhöfner
I'm convinced that always having lineage metadata is the right call, so I'm +1 here. On Thu, Mar 20, 2025 at 11:15 PM Ryan Blue wrote: > Now, if we make it required for V3 tables, what if users don’t need the > row lineage feature. There is a bit overhead (although low) for row > lineage. E.g.,

Re: [DISCUSS] Row lineage required for v3

2025-03-20 Thread Ryan Blue
Now, if we make it required for V3 tables, what if users don’t need the row lineage feature. There is a bit overhead (although low) for row lineage. E.g., extra metadata columns in data files during rewrite/compaction. The majority of the work for row lineage is done behind the Iceberg API. The on

[VOTE][Go] Release Apache Iceberg Go v0.2.0 RC1

2025-03-20 Thread Matt Topol
Hi, I would like to propose the following release candidate (RC1) of Apache Iceberg Go version v0.2.0. This release candidate is based on commit: cfd2c3ba2b61106bbbfdd1c0d045cc467c42c4e0 [1] The source release rc1 is hosted at [2]. Please download, verify checksums and signatures, run the unit

Re: [VOTE][Go] Release Apache Iceberg Go v0.2.0 RC0

2025-03-20 Thread Matt Topol
The updates requested have been merged with the help of Kevin Liu, I've cut a new RC. Starting a new vote now! On Thu, Mar 20, 2025 at 1:10 PM Matt Topol wrote: > Thanks Fokko! > > The linked PR has been merged, and I've filed a series of small PRs to fix > the metadata inconsistencies that you

Re: [DISCUSS] Row lineage required for v3

2025-03-20 Thread Ryan Blue
+1 for the PR and always having the lineage metadata. I think that is going to make the feature much more reliable. We don't gain anything from allowing the feature to be turned off for compatibility, when we have reasonable ways to interpret data written by any engine. Ryan On Wed, Mar 19, 2025

Re: [DISCUSS] Row lineage required for v3

2025-03-20 Thread Russell Spitzer
I think I'm in favor of this but I would like some way of knowing whether or not a snapshot was produced while preserving row_ids or not. Just so we can make it clear on read what the row-lineage behavior of the writer was without knowing what system wrote the data. On Thu, Mar 20, 2025 at 10:43 A

Re: [DISCUSS] FileFormat API proposal

2025-03-20 Thread Péter Váry
Hi Team, Thanks everyone for the reviews on https://github.com/apache/iceberg/pull/12298! I have addressed most of comments, but a few questions still remain which might merit a bit wider audience: 1. We should decide on the expected filtering behavior when the filters are pushed down to the

[DISCUSS] Events Endpoint for IRC

2025-03-20 Thread Christian Thiel
Dear all, We have recently discussed in the Iceberg Catalog Community Sync [1] and the Mailing List [2] different ways on how federation between Catalogs could be standardized. This proposal introduces a /events endpoint to the IRC specification. The endpoint provides events of modifications to o

Re: [VOTE] Minor simplifications for Geo Spec

2025-03-20 Thread Jean-Baptiste Onofré
+1 (non binding) Regards JB On Wed, Mar 19, 2025 at 1:01 AM Szehon Ho wrote: > > Hi everyone, > > While working on the reference implementation for Geometry/Geography spec, we > noticed some parts that can be simplified for this first version: > > Default values should always be null (requires

Re: [VOTE][Go] Release Apache Iceberg Go v0.2.0 RC0

2025-03-20 Thread Fokko Driesprong
-1 (binding) Since this version includes write support, I did some testing yesterday and found an issue with the V1 metadata . Someone that's working with Clickhouse ran into the same issue, and posted a PR