I'm working on incremental ingest of Iceberg tables into SingleStore. I
know this is an active area of work in the Iceberg community, as it's very
similar to materialized views, only the "MV" in the case of ingesting into
another system is a trivial one. But for Iceberg v2, I have some questions
about when delete files can go away. In particular, I'm thinking about
overlapping equality deletes.

The case I'm worried about is that an equality delete file is removed,
representing some rows being restored. But perhaps there is some other
equality delete which overlaps with the dropped delete. This would happen
only for equality deletes on non-unique columns, and only if you have
equality deletes which are applied to different columns, both of which seem
like a bad idea but are allowed by the spec.

As I understand it, a delete operation can *only* add remove data files or
add delete files, it cannot remove delete files. And of course an append
can only add data files. Given that, it seems to me that there are two ways
that delete files could be in one snapshot and not appear in a subsequent
one:
1. During a replace operation, a delete file may be partly or fully
applied, and new data and delete files generated. For instance, if snapshot
N has a data file with records with ids = [1, 2, 3, 4] and a equality
delete file on the id column with values [1, 2], a new snapshot N+1 might
have a data file with ids [3, 4] and no delete files, or it might have a
data file with ids [2, 3, 4] and a delete file for ids [2]. The second case
is a partial compaction and seems legal, if a little unusual on its face.
2. During an overwrite operation, anything could happen, including the
revert of a delete (dropping of a delete file from the table). This could
occur due to reverting to an earlier snapshot in a system that writes to
the table. This could also be done by writing an older value into
current-snapshot-id in the table metadata, but that isn't the only way to
represent a revert.

I think that those are the only two ways that an equality delete file can
go away. Am I missing one? And what are other implementers doing here? Are
there implementations out there which will remove equality deletes to
represent a row being restored?

—
Michael Leuchtenburg
Staff Software Engineer
m. 413.433.0739
Try SingleStore Free
<https://www.singlestore.com/managed-service-trial/?utm_source=emailsiglink>
[image: SingleStore]
<https://www.singlestore.com/managed-service-trial/?utm_source=emailsig>

Reply via email to