Re: [DISCUSS] v4 - Improved column statistics

2025-06-02 Thread Szehon Ho
+1 , excited for this one too, we've seen the current metrics maps blow up the memory and hope can improve that. On the Geo front, this could allow us to add supplementary metrics that don't conform to the geo type, like S2 Cell Ids. Thanks Szehon On Mon, Jun 2, 2025 at 6:14 AM Eduard Tudenhöfne

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-06-02 Thread Xiaoxuan Li
Thanks, Peter, for bringing these ideas forward! and you also raised a great point about clarifying the goal of indexing. I’ve been considering it with the intention of eventually enabling fast upserts through DVs. To support that, we need an index that maps primary keys to both the data file and t

Re: [DISCUSS] v4 - One file commits

2025-06-02 Thread Huang-Hsiang Cheng
I am interested in this idea and looking forward to collaboration. Thanks, Huang-Hsiang > On Jun 2, 2025, at 10:14 AM, namratha mk wrote: > > Hello, > > I am interested in contributing to this effort. > > Thanks, > Namratha > > On Thu, May 29, 2025 at 1:36 PM Amogh Jahagirdar <2am...@gmail.

Re: [DISCUSS] v4 - One file commits

2025-06-02 Thread namratha mk
Hello, I am interested in contributing to this effort. Thanks, Namratha On Thu, May 29, 2025 at 1:36 PM Amogh Jahagirdar <2am...@gmail.com> wrote: > Thanks for kicking this thread off Ryan, I'm interested in helping out > here! I've been working on a proposal in this area and it would be great

Re: [DISCUSS] Apache Iceberg 1.10.0 release

2025-06-02 Thread Alex Dutra
Hi Steven, Could you please include the following PRs, all related to authentication: https://github.com/apache/iceberg/pull/13215 https://github.com/apache/iceberg/pull/12562 https://github.com/apache/iceberg/pull/12563 The first one is a fix for a performance degradation in request signing and

Re: [DISCUSS] Apache Iceberg 1.10.0 release

2025-06-02 Thread Talat Uyarer
Hi Steven, I would like to get in these prs too https://github.com/apache/iceberg/pull/13111 https://github.com/apache/iceberg/pull/13212 Thanks Talat On Thu, May 29, 2025 at 7:26 PM Prashant Singh wrote: > Thank you so much for driving this release ! > It will be really helpful in getting th

Re: [DISCUSS] v4 - One file commits

2025-06-02 Thread Steve Loughran
so this'll cut down on #of manifest files read, won't it? so improving query planning Does anyone have an estimate of what benefit this is likely to have in production deployments? On Thu, 29 May 2025 at 21:25, Ryan Blue wrote: > Hi everyone, > > Like Russell’s recent note, I’m starting a threa

[DISCUSS] v4 - Improved column statistics

2025-06-02 Thread Eduard Tudenhöfner
Hey everyone, I'm starting a thread to connect folks interested in improving the existing way of collecting column-level statistics (often referred to as *metrics* in the code). I've already started a proposal, which can be found at https://s.apache.org/iceberg-column-stats. *Motivation* Column

Re: Wide tables in V4

2025-06-02 Thread Péter Váry
Hi Bart, Thanks for your answer! I’ve pulled out some text from your thorough and well-organized response to make it easier to highlight my comments. > It would be well possible to tune parquet writers to write very large row groups when a large string column dominates. [..] What would you do, i

Re: Wide tables in V4

2025-06-02 Thread Bart Samwel
On Fri, May 30, 2025 at 8:35 PM Péter Váry wrote: > Consider this example > Imagine a table with one large string column and many small numeric > columns. > > Scenario 1: Single File > >- All columns are written into a single file. >- The RowGroup size is small due to the large string col

Re: [VOTE] File Format API

2025-06-02 Thread Péter Váry
Hi everyone, I would like to encourage everybody who wants to participate in the discussion of the topic to share their thoughts either on the doc, or on the PRs. I would like to finalize, merge the API in 1.10, so we can merge the implementation early 1.11. This would allow more throughout testin