I would probably try to just extend my expiration interval if that was
possible to fix the issue since it's basically functioning as a watermark
for state at the moment.

Is our underlying issue here that we cannot determine the lineage of a
Snapshot that has been expired? IE: We know all the files and which
snapshots added them, but we cannot determine where our "From" snapshot
exists in history since we did the expiration?

On Mon, Jan 11, 2021 at 11:07 AM Filip <filip....@gmail.com> wrote:

> Hi team,
>
> We've recently bumped into an issue with a particular edge case that
> messes with our implementation of leveraging the incremental read and the
> expire snapshot features combined.
>
> With incremental read we're relying on the client to preserve the snapshot
> that was last used for reading data as a checkpoint. Every time the client
> does an incremental read it gets new data (if available) along with the
> current snapshot that the client will store along as its new checkpoint.
>
> Expire snapshot is scheduled to kick in and wipe snapshots based on
> recency (say older than N days).
> But in the edge-case of two consecutive write operations happening less
> often than the expiration interval (*)  if the incremental read process
> doesn't run before the snapshot expiration then the client will be left in
> an inconsistent state since the snapshot it has stored as checkpoint is not
> going to work anymore.
>
> So we were looking at either extending the snapshot expiration feature or
> extending the implementation of incremental read.
>
> I'll just drop-in some details on exploring the solution to extend
> incremental read - extend it by adding a fallback logic when the provided
> snapshot is missing and try to locate the snapshot parented by that
> particular snapshot instead.
> This would change the logic of the incremental read with respect to
> inclusiveness of loading the snapshots, if it currently considers the
> provided "from" snapshot as exclusive, in the case of the fallback logic to
> using the child snapshot as "from" it would have to be inclusive.
>
> Let me know if you think this edge-case should be supported by Iceberg and
> if this idea of extending the incremental read logic makes sense or if
> folks in the community have a better solution for this.
>
> (*) We expire snapshots older than 10 days but we observe two
> consecutive write operations 11 days apart.
>
> --
> Filip Bocse
>

Reply via email to