Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-05-13 Thread Péter Váry
Hi Xiaoxuan, Let me describe, how the Flink streaming writer uses equality deletes, and how it could use indexes. When the Flink streaming writer receives a new insert, then it appends the data to a data file. When it receives a delete, it appends the primary key to an equality delete file. When

Re: [Discuss] Iceberg 1.9.1 Release

2025-05-13 Thread Jean-Baptiste Onofré
The new Avro release will content security improvement (and only this). So even if not strictly required (as iceberg is not impacted), it would be interesting to have security scanner happy ;) Regards JB Le mar. 13 mai 2025 à 23:24, Péter Váry a écrit : > Do we really want to include a new lib

Re: [VOTE] Merge details about GZip metadata files to the spec.

2025-05-13 Thread Renjie Liu
+1 (binding) On Tue, May 13, 2025 at 7:12 AM Brian Hulette wrote: > +1 (non-binding) > > On Mon, May 12, 2025 at 1:25 PM Steven Wu wrote: > >> +1 (binding) >> >> On Mon, May 12, 2025 at 1:10 PM Ryan Blue wrote: >> >>> +1 (binding) >>> >>> On Mon, May 12, 2025 at 10:50 AM Szehon Ho >>> wrote:

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-05-13 Thread Xiaoxuan Li
Hi Peter, Thanks for the detailed illustration. I understand your concern. I believe the core question here is whether the index is used during job planning or at the scan task. This depends on how index files are referenced, at the file level or partition level. In my view, both approaches ulti

Re: [ANNOUNCE] Apache PyIceberg release 0.9.1

2025-05-13 Thread Renjie Liu
Thanks Fokko for running the release! BTW, it would be better to update status page to reflect the latest changes. On Wed, Apr 30, 2025 at 11:45 PM Kevin Liu wrote: > Thanks for running the release, Fokko! It's great to see the number of > fixes in this patc

Re: [DISCUSS] Finalizing the v3 spec

2025-05-13 Thread Anton Okolnychyi
I went ahead and created https://github.com/apache/iceberg/pull/13042 to include the discussed requirement for DVs. ср, 7 трав. 2025 р. о 20:31 Anton Okolnychyi пише: > Steven, that may be a good point to add to ensure the metadata is properly > maintained. If I remember correctly, the Spark imp

Re: [Discuss] Iceberg 1.9.1 Release

2025-05-13 Thread Péter Váry
Do we really want to include a new lib version in a maintenance release? In the past, we have seen issues when upgrading libs. Avro is very important, as it is used for metadata files. I would rather not include a new version, unless it is absolutely necessary. On Tue, May 13, 2025, 06:42 Jean-Bap

Re: [VOTE] Add commit timestamp to CommitReport

2025-05-13 Thread Yufei Gu
+1 I'm OK to add it as long as it's optional. Yufei On Mon, May 12, 2025 at 8:47 PM Manu Zhang wrote: > Hi all, > > The background is that we schedule maintenance jobs based on commit > reports for Iceberg tables, and we want to know *when commits happen*. > Adding timestamp to the commit repo

Re: Should DDL operations always create new snapshots?

2025-05-13 Thread Vladimir Ozerov
In my example, the expected results for "SELECT * FROM t" are: v1: CREATE TABLE t (a) {} v2: INSERT INTO t VALUES (1) {a=1} v3: ALTER TABLE t ADD COLUMN b {a=1, b=null} v4: UPDATE t SET b = 2 {a=1, b=2} The problem is that the state v3 is reachable only if it is the last operation on the table