Re: [Discuss] Iceberg 1.9.1 Release

2025-05-12 Thread Jean-Baptiste Onofré
Hi I did a fix/improvement on Avro. I will propose to do new Avro releases. Maybe worth to include in Iceberg 1.9.1 if the timing is ok. Regards JB Le lun. 12 mai 2025 à 20:03, Russell Spitzer a écrit : > I'd rather we didn't get any "feature" sorts of things in like > * Enable HTTP proxy supp

Re: [VOTE] Add commit timestamp to CommitReport

2025-05-12 Thread Manu Zhang
Hi all, The background is that we schedule maintenance jobs based on commit reports for Iceberg tables, and we want to know *when commits happen*. Adding timestamp to the commit report would save us from loading metadata of every table from the filesystem. Please take a look at the PR and cast yo

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-05-12 Thread Steven Wu
agree with Peter that 1:1 mapping of data files and inverted indexes are not as useful. With columnar format like Parquet, this can also be achieved equivalently by reading the data file with projection on the identifier columns. On Mon, May 12, 2025 at 4:20 AM Péter Váry wrote: > Hi Xiaoxuan,

Re: Should DDL operations always create new snapshots?

2025-05-12 Thread Brian Hulette
I've also been a bit confused by this edge case. Is {a: 1, b: NULL} the correct result for querying at "time" v3? If so I agree it's a bit finicky for engines to produce that result. It can't be determined from the snapshot-log alone, IIUC they'll need to look at the metadata-log to find the acti

Re: [VOTE] Merge details about GZip metadata files to the spec.

2025-05-12 Thread Brian Hulette
+1 (non-binding) On Mon, May 12, 2025 at 1:25 PM Steven Wu wrote: > +1 (binding) > > On Mon, May 12, 2025 at 1:10 PM Ryan Blue wrote: > >> +1 (binding) >> >> On Mon, May 12, 2025 at 10:50 AM Szehon Ho >> wrote: >> >>> +1 (binding) >>> >>> Thanks >>> Szehon >>> >>> On Mon, May 12, 2025 at 9:19 

Re: Should DDL operations always create new snapshots?

2025-05-12 Thread Ryan Blue
Snapshots are created when data changes and there is no change to the data tree at “time” v3. If you want to create new snapshots when the schema changes it is alright to do it, but I don’t think that we need to require it in the spec. Also, it isn’t clear to me why the time travel query would res

Re: [DISCUSS] [REST SPEC] Add first-row-id in the data files for Row Lineage

2025-05-12 Thread Ryan Blue
I thought sure I had a PR that added this, but I can't find it. +1 to adding `first_row_id`. Thanks, Prashant! On Mon, May 12, 2025 at 9:22 AM Russell Spitzer wrote: > Makes sense to me, perhaps we should also add in a test that checks that > the Datafile api object and the REst spec are always

Re: [VOTE] Merge details about GZip metadata files to the spec.

2025-05-12 Thread Steven Wu
+1 (binding) On Mon, May 12, 2025 at 1:10 PM Ryan Blue wrote: > +1 (binding) > > On Mon, May 12, 2025 at 10:50 AM Szehon Ho > wrote: > >> +1 (binding) >> >> Thanks >> Szehon >> >> On Mon, May 12, 2025 at 9:19 AM Russell Spitzer < >> russell.spit...@gmail.com> wrote: >> >>> +1 (binding) >>> >>>

Re: [VOTE] Merge details about GZip metadata files to the spec.

2025-05-12 Thread Ryan Blue
+1 (binding) On Mon, May 12, 2025 at 10:50 AM Szehon Ho wrote: > +1 (binding) > > Thanks > Szehon > > On Mon, May 12, 2025 at 9:19 AM Russell Spitzer > wrote: > >> +1 (binding) >> >> On Mon, May 12, 2025 at 5:32 AM Eduard Tudenhöfner < >> etudenhoef...@apache.org> wrote: >> >>> +1 (binding) >>>

Re: [VOTE] Merge details about GZip metadata files to the spec.

2025-05-12 Thread Fokko Driesprong
+1 Op ma 12 mei 2025 om 20:47 schreef Steve Zhang : > +1 (non-binding) > > Thanks, > Steve Zhang > > > > On May 11, 2025, at 6:45 PM, Gang Wu wrote: > > +1 (non-binding) > > >

[RESULT] [VOTE] Add encryption key updates to REST spec

2025-05-12 Thread Ryan Blue
With 5 +1 votes and no -1 or +0 votes, this passes. Thanks, everyone! On Fri, May 9, 2025 at 3:00 PM Denny Lee wrote: > +1 (non-binding) > > On Fri, May 9, 2025 at 14:11 Ryan Blue wrote: > >> +1 (binding) >> >> On Thu, May 8, 2025 at 10:33 AM Russell Spitzer < >> russell.spit...@gmail.com> wrot

Re: [VOTE] Merge details about GZip metadata files to the spec.

2025-05-12 Thread Steve Zhang
+1 (non-binding) Thanks, Steve Zhang > On May 11, 2025, at 6:45 PM, Gang Wu wrote: > > +1 (non-binding)

Re: [Discuss] Iceberg 1.9.1 Release

2025-05-12 Thread Russell Spitzer
I'd rather we didn't get any "feature" sorts of things in like * Enable HTTP proxy support for the client used by REST Catalog #12406 * GCP: Support multiple storage credential prefixes #12881 These seem

Re: [VOTE] Merge details about GZip metadata files to the spec.

2025-05-12 Thread Szehon Ho
+1 (binding) Thanks Szehon On Mon, May 12, 2025 at 9:19 AM Russell Spitzer wrote: > +1 (binding) > > On Mon, May 12, 2025 at 5:32 AM Eduard Tudenhöfner < > etudenhoef...@apache.org> wrote: > >> +1 (binding) >> >> On Mon, May 12, 2025 at 3:45 AM Gang Wu wrote: >> >>> +1 (non-binding) >>> >>> On

Re: [Discuss] Iceberg 1.9.1 Release

2025-05-12 Thread Yufei Gu
Thanks Kevin for the list! That looks good to me. Looking forward to getting these fixes out! Yufei On Mon, May 12, 2025 at 10:19 AM Kevin Liu wrote: > Hi Russell, > > I went through the commits since 1.9.x release, > https://github.com/apache/iceberg/compare/1.9.x...main > > Here are some pos

Re: [Discuss] Iceberg 1.9.1 Release

2025-05-12 Thread Kevin Liu
Hi Russell, I went through the commits since 1.9.x release, https://github.com/apache/iceberg/compare/1.9.x...main Here are some possible candidates for 1.9.1 patch release, * Core: Fix Kryo ser/de with StorageCredential config #12882 * Core: Ensure

Re: [Discuss] Iceberg 1.9.1 Release

2025-05-12 Thread Russell Spitzer
I haven't gotten any other issues for 1.9.1 on the milestone and no one has responded here. I think it's important that we get a version of Iceberg out with a working Version function so I'll start a release today or tomorrow for a vote. On Sat, May 3, 2025 at 1:22 AM Jean-Baptiste Onofré wrote:

Re: [DISCUSS] [REST SPEC] Add first-row-id in the data files for Row Lineage

2025-05-12 Thread Russell Spitzer
Makes sense to me, perhaps we should also add in a test that checks that the Datafile api object and the REst spec are always in sync? On Mon, May 12, 2025 at 10:52 AM Amogh Jahagirdar <2am...@gmail.com> wrote: > Thanks Prashant, I definitely agree the first_row_id will need to be added > to the

Re: [VOTE] Merge details about GZip metadata files to the spec.

2025-05-12 Thread Russell Spitzer
+1 (binding) On Mon, May 12, 2025 at 5:32 AM Eduard Tudenhöfner wrote: > +1 (binding) > > On Mon, May 12, 2025 at 3:45 AM Gang Wu wrote: > >> +1 (non-binding) >> >> On Mon, May 12, 2025 at 3:27 AM Kevin Liu wrote: >> >>> +1 (non-binding) >>> >>> Thanks for starting a vote. >>> >>> There's extr

Re: [DISCUSS] [REST SPEC] Add first-row-id in the data files for Row Lineage

2025-05-12 Thread Amogh Jahagirdar
Thanks Prashant, I definitely agree the first_row_id will need to be added to the REST Spec. Commented on the PR, I also think we'll need to make sure the first-row-id for Snapshots are also added as part of this. Thanks, Amogh Jahagirdar On Fri, May 9, 2025 at 6:12 PM Prashant Singh wrote: > H

Re: cleanExpiredMetadata in RemoveSnapshots

2025-05-12 Thread Pucheng Yang
Thanks all for the discussion. I also agree that we should make this behavior turned off by default. And I would also love to see this flag be added to the Spark/ Flink procedure. I think having this feature available on the client side seems more achievable in the short run and designing a server

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-05-12 Thread Péter Váry
Hi Xiaoxuan, Do we plan to store the indexes in a separate file alongside the data files? If so, then I have the following thoughts: - I agree that the 1-on-1 mapping of data files and index files is easy to maintain OTOH it is less useful as an index. - The writer (which is looking for a column w

Re: [VOTE] Merge details about GZip metadata files to the spec.

2025-05-12 Thread Eduard Tudenhöfner
+1 (binding) On Mon, May 12, 2025 at 3:45 AM Gang Wu wrote: > +1 (non-binding) > > On Mon, May 12, 2025 at 3:27 AM Kevin Liu wrote: > >> +1 (non-binding) >> >> Thanks for starting a vote. >> >> There's extra context in the PR description. As a summary, >> `gz.metadata.json` is the current namin