RE: Re: [DISCUSS] Events Endpoint for IRC

2025-06-13 Thread Adnan Hemani
Hi all, Wanted to see if there were any remaining action items or discussions remaining on this topic. I’m trying to get a gauge of how close we might be on getting this merged! Best, Adnan Hemani On 2025/05/28 17:33:20 Christian Thiel wrote: > Thank you all for the great discussion today! > I

Re: [DISCUSS] V4 - Parquet as Metadata File Format

2025-06-13 Thread MLHSModelUN
I'm interested in working on this change as well. I think it pairs nicely with the proposal for per column structs for statistics. Thanks, Harman On Thu, Jun 12, 2025 at 9:43 PM Russell Spitzer wrote: > It’s not required at compile time, only at test runtime. > > On Thu, Jun 12, 2025 at 8:37 PM

Re: [DISCUSS] v4 - One file commits

2025-06-13 Thread Jagdeep Sidhu
Hi everyone, I am new to the Iceberg community but would love to participate in these discussions to reduce the number of file writes, especially for small writes/commits. Thank you! -Jagdeep On Thu, Jun 5, 2025 at 4:02 PM Anurag Mantripragada wrote: > We have been hitting all the metadata pro

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-06-13 Thread Péter Váry
Hi Haizhou Zhao, Thanks for the detailed use-cases! Quick question for the Use Case 1: Do you have a primary key to identify the updated rows? How wide, complicated is this key? Is the ranking always recalculated for the full partition? If so the discussion around the wide tables (https://lists.a

Re: Wide tables in V4

2025-06-13 Thread Péter Váry
Hi everyone, I did some experiments with splitting up wide Parquet files into multiple column families. You can check the PR here: https://github.com/apache/iceberg/pull/13306. What the test does: - Creates tables with 100/1000/1 columns, where the column type is double - Generates