BLUF:
- We are using snapshot references to preserve custom table-level metadata
that
  currently exists in snapshot summaries. Is this an anti-pattern or
expected
  usage?
- If it is an anti-pattern, is there something else in the spec we can use
for
  this purpose? If not, would it make sense to introduce table-level
metadata
  in the spec?

Details below:

Hello Iceberg community,

We (Redpanda[1][2]) have built a log storage engine that, in addition to
writing log format data, writes data as Parquet files and commits them to
the
Iceberg catalog. One of the requirements we have is to ensure exactly once
delivery of records into Iceberg. To this end, we keep metadata in two
places:
- In the Iceberg table, we add the position in our log up to which has been
  committed as a field in each new Iceberg snapshot’s summary.
- In our system, we checkpoint this same position up to which we have
committed
  to Iceberg.

It’s possible for these to diverge (e.g. in the event of a node failure in
between the above two events), but in such cases, the Iceberg table is
taken as
the source of truth. As I understand it, this is the same technique the
Kafka
Connect connector uses.

But there is a problem with this approach when considering snapshot expiry
alongside concurrent updates from multiple systems. While the default
snapshot
expiration is 5 days, it’s conceivable a user sets the table’s snapshot
expiry
to something significantly lower to avoid metadata bloat. To boot, we cannot
assume that our system is the only system writing to Iceberg, and the main
snapshot is the only snapshot guaranteed to be retained at all times. It’s
thus
conceivable that external systems add snapshots to the table, and for
snapshot
expiry to remove the snapshot metadata we require. If these conditions are
met
in a moment of divergence, there is room for exactly once delivery to be
violated and for files to be committed to the table more than once.

To mitigate this, we maintain an Iceberg tag for the latest snapshot
written by
our system, and rely on the snapshot reference expiry policy[3] to ensure
that
these tagged snapshots aren’t removed, with the assumption that it is more
likely to tune down the `max-snapshot-age-ms` property (to keep manifest
list
size small) than it is to tune down the `max-ref-age-ms` property.

There are still at least a couple issues with this approach:
- A user can still set `max-ref-age-ms` to something pathologically small
and
  end up causing an exactly-once violation.
- It feels like we’re overloading the intended behavior of tags by using
them
  to force explicit snapshot retention.

Our question is, is there anything better that we can be doing here? Are
there
other parts of the spec that can serve our needs? Table properties field
seems
somewhat what we want, but:
- It is explicitly described as being not meant for arbitrary metadata[4].
- For it to be useful for our use case, we'd need some kind of table
  requirement that checks these properties atomically (today, we use
  snapshot-based table requirements when we commit).

So if not something existing, do folks have thoughts on generalized ways to
store custom metadata in the table? As an example, is there any appetite in
adding a different table-level metadata field to the spec? As Iceberg
becomes
adopted by more systems, it's not hard to imagine similar requirements
popping
up elsewhere. It may be useful in similar situations to update the table
only
if certain metadata fields have not changed, without tying these fields to
specific snapshots.


Thanks,
Andrew

[1] https://www.redpanda.com/
[2] https://github.com/redpanda-data/redpanda
[3] https://iceberg.apache.org/spec/#snapshot-retention-policy
[4] https://iceberg.apache.org/spec/#table-metadata-fields

Reply via email to