Identifying the schema of an Iceberg snapshot

Vivekanand Vellanki Sun, 07 Nov 2021 22:01:32 -0800

Hi,

I am trying to understand how to identify the schema for an Iceberg
snapshot.


Looking at the spec, I see the following:
Snapshots

A snapshot consists of the following fields:
v1v2FieldDescription
*required* *required* snapshot-id A unique long ID
*optional* *optional* parent-snapshot-id The snapshot ID of the snapshot’s
parent. Omitted for any snapshot with no parent
*required* sequence-number A monotonically increasing long that tracks the
order of changes to a table
*required* *required* timestamp-ms A timestamp when the snapshot was
created, used for garbage collection and table inspection
*optional* *required* manifest-list The location of a manifest list for
this snapshot that tracks manifest files with additional meadata
*optional* manifests A list of manifest file locations. Must be omitted if
manifest-list is present
*optional* *required* summary A string map that summarizes the snapshot
changes, including operation (see below)
*optional* *optional* schema-id ID of the table’s current schema when the
snapshot was createdAlso the table metadata portion of the spec says the
following:
v1v2FieldDescription
*optional* *required* schemas A list of schemas, stored as objects with
schema-id.
For a v2 Iceberg table, my understanding is that the reader needs to do the
following to figure out the schema of a snapshot:

   - Read the schema-id for the snapshot
   - Use the schemas field from the table metadata and find the schema
   corresponding to the snapshot's schema-id

Since schema-id is optional in V2 for a given snapshot, is this the correct
approach? How does this work, if the schema-id field is missing?

For a V1 Iceberg table, how do we determine the schema of a particular
snapshot?

Thanks
Vivek

Identifying the schema of an Iceberg snapshot

Reply via email to