Hi Peter!
We have quite a few use-cases where option C would be required that
currently have no workarounds.
So this is something that we will have to do in either case, we feel that
it would also be a good addition for other users.
Cheers,
Gyula
On Mon, Jun 30, 2025 at 12:39 PM Péter Váry
wrot
I like the first 2 points of your proposal A (with warning) + B and
changing the default with 2.0, but I would suggest avoiding C. I see very
limited case for deduplication (only if there are no deletes the table,
just updates), and it would cause more confusion and configuration clutter.
Maximili
That's essentially what I propose, minus the WARN message which is a great
addition. The flag to skip overwrite snapshots should probably be there
forever, similarly to Spark's flags to skip overwrite / delete snapshots.
Here's again the proposal:
For Flink incremental / streaming reads:
A. By def
Minimally LOG.warn message about deprecation.
Maybe a "hidden" flag which could turn back to skip overwrite snapshots.
This flag could be deprecated immediately and removed in the next release.
Maybe wait until 2.0, where we can introduce breaking changes?
Maximilian Michels ezt írta (időpont: 20
How would such a grace period look like? Even if we defer this X amount of
releases, some users will probably be surprised by this. When users upgrade
their Iceberg version, they would generally expect slightly different
(improved) behavior. Right now, some users are discovering that they are
not r
I would try to avoid breaking the current behaviour.
Maybe after some grace period it could be ok, but not "suddenly"
ezt írta (időpont: 2025. jún. 27., P, 15:28):
> Sounds good to me!
>
> Gyula
> Sent from my iPhone
>
> On 27 Jun 2025, at 13:48, Maximilian Michels wrote:
>
>
> In my understa
Sounds good to me!GyulaSent from my iPhoneOn 27 Jun 2025, at 13:48, Maximilian Michels wrote:In my understanding, overwrite snapshots are specifically there to overwrite data, i.e. there is always an append for a delete. That is also the case for the Flink Iceberg sink. There may be other writers
In my understanding, overwrite snapshots are specifically there to
overwrite data, i.e. there is always an append for a delete. That is also
the case for the Flink Iceberg sink. There may be other writers, which use
overwrite snapshots differently. But point taken, we may need to also
opt-in to ski
> Consequently, we must throw on DELETE snapshots, even if users opt-in to
reading appends of OVERWRITE snapshots.
OVERWRITE snapshots themselves could still contain deletes. So in this
regard, I don't see a difference between the DELETE and the OVERWRITE
snapshots.
Maximilian Michels ezt írta (
I agree with Peter, it would be weird to get an error on a DELETE snapshot
if you already explicitly opted in for reading the appends of OVERWRITE
snapshots.
Users may not be able to control the type of snapshot to be created so this
would otherwise render this feature useless.
Gyula
On Thu, Jun
Thank you for your feedback Steven and Gyula!
@Steven
> the Flink streaming read only consumes `append` only commits. This is a
> snapshot commit `DataOperation` type. You were talking about row-level
> appends, delete etc.
Yes, there is no doubt the fact that Flink streaming reads only proces
Hi Max!
I like this proposal especially that proper streaming reads of deletes seem
to be quite a bit of work based on recent efforts.
Giving an option to include the append parts of OVERWRITE snapshots (2) is
a great quick improvement that will unblock use-cases where the iceberg
table is used t
the Flink streaming read only consumes `append` only commits. This is a
snapshot commit `DataOperation` type. You were talking about row-level
appends, delete etc.
> 2. Add an option to read appended data of overwrite snapshots to allow
users to de-duplicate downstream (opt-in config)
For update
13 matches
Mail list logo