Re: Append-only table scans in the presence of OVERWRITE snapshots

2025-06-30 Thread Gyula Fóra
Hi Peter! We have quite a few use-cases where option C would be required that currently have no workarounds. So this is something that we will have to do in either case, we feel that it would also be a good addition for other users. Cheers, Gyula On Mon, Jun 30, 2025 at 12:39 PM Péter Váry wrot

Re: Append-only table scans in the presence of OVERWRITE snapshots

2025-06-30 Thread Péter Váry
I like the first 2 points of your proposal A (with warning) + B and changing the default with 2.0, but I would suggest avoiding C. I see very limited case for deduplication (only if there are no deletes the table, just updates), and it would cause more confusion and configuration clutter. Maximili

Re: Append-only table scans in the presence of OVERWRITE snapshots

2025-06-30 Thread Maximilian Michels
That's essentially what I propose, minus the WARN message which is a great addition. The flag to skip overwrite snapshots should probably be there forever, similarly to Spark's flags to skip overwrite / delete snapshots. Here's again the proposal: For Flink incremental / streaming reads: A. By def

Re: Append-only table scans in the presence of OVERWRITE snapshots

2025-06-30 Thread Péter Váry
Minimally LOG.warn message about deprecation. Maybe a "hidden" flag which could turn back to skip overwrite snapshots. This flag could be deprecated immediately and removed in the next release. Maybe wait until 2.0, where we can introduce breaking changes? Maximilian Michels ezt írta (időpont: 20

Re: Append-only table scans in the presence of OVERWRITE snapshots

2025-06-30 Thread Maximilian Michels
How would such a grace period look like? Even if we defer this X amount of releases, some users will probably be surprised by this. When users upgrade their Iceberg version, they would generally expect slightly different (improved) behavior. Right now, some users are discovering that they are not r

Re: Append-only table scans in the presence of OVERWRITE snapshots

2025-06-27 Thread Péter Váry
I would try to avoid breaking the current behaviour. Maybe after some grace period it could be ok, but not "suddenly" ezt írta (időpont: 2025. jún. 27., P, 15:28): > Sounds good to me! > > Gyula > Sent from my iPhone > > On 27 Jun 2025, at 13:48, Maximilian Michels wrote: > >  > In my understa

Re: Append-only table scans in the presence of OVERWRITE snapshots

2025-06-27 Thread gyula . fora
Sounds good to me!GyulaSent from my iPhoneOn 27 Jun 2025, at 13:48, Maximilian Michels wrote:In my understanding, overwrite snapshots are specifically there to overwrite data, i.e. there is always an append for a delete. That is also the case for the Flink Iceberg sink. There may be other writers

Re: Append-only table scans in the presence of OVERWRITE snapshots

2025-06-27 Thread Maximilian Michels
In my understanding, overwrite snapshots are specifically there to overwrite data, i.e. there is always an append for a delete. That is also the case for the Flink Iceberg sink. There may be other writers, which use overwrite snapshots differently. But point taken, we may need to also opt-in to ski

Re: Append-only table scans in the presence of OVERWRITE snapshots

2025-06-26 Thread Péter Váry
> Consequently, we must throw on DELETE snapshots, even if users opt-in to reading appends of OVERWRITE snapshots. OVERWRITE snapshots themselves could still contain deletes. So in this regard, I don't see a difference between the DELETE and the OVERWRITE snapshots. Maximilian Michels ezt írta (

Re: Append-only table scans in the presence of OVERWRITE snapshots

2025-06-26 Thread Gyula Fóra
I agree with Peter, it would be weird to get an error on a DELETE snapshot if you already explicitly opted in for reading the appends of OVERWRITE snapshots. Users may not be able to control the type of snapshot to be created so this would otherwise render this feature useless. Gyula On Thu, Jun

Re: Append-only table scans in the presence of OVERWRITE snapshots

2025-06-26 Thread Maximilian Michels
Thank you for your feedback Steven and Gyula! @Steven > the Flink streaming read only consumes `append` only commits. This is a > snapshot commit `DataOperation` type. You were talking about row-level > appends, delete etc. Yes, there is no doubt the fact that Flink streaming reads only proces

Re: Append-only table scans in the presence of OVERWRITE snapshots

2025-06-26 Thread Gyula Fóra
Hi Max! I like this proposal especially that proper streaming reads of deletes seem to be quite a bit of work based on recent efforts. Giving an option to include the append parts of OVERWRITE snapshots (2) is a great quick improvement that will unblock use-cases where the iceberg table is used t

Re: Append-only table scans in the presence of OVERWRITE snapshots

2025-06-25 Thread Steven Wu
the Flink streaming read only consumes `append` only commits. This is a snapshot commit `DataOperation` type. You were talking about row-level appends, delete etc. > 2. Add an option to read appended data of overwrite snapshots to allow users to de-duplicate downstream (opt-in config) For update

Append-only table scans in the presence of OVERWRITE snapshots

2025-06-25 Thread Maximilian Michels
Hi, It is well known that Flink and other Iceberg engines do not support "merge-on-read" in streaming/incremental read mode. There are plans to change that, see the "Improving Merge-On-Read Query Performance" thread, but this is not what this message is about. I used to think that when we increme