Hi everyone,

I’d like to start a discussion about adopting the current set of changes to
the spec as v2. Adopting the current set of changes means that we won’t
have the ability to include any more breaking changes in v2. Any new
breaking change would require v3. This would not stop or affect any of the
ongoing work to add secondary indexes or new metadata because those
additions are forward-compatible (old readers can ignore it).

The main feature that has been added to the spec is row-level deletes. This
requires a breaking change because all readers must be updated to apply the
deletes when reading. Existing readers don’t do that, so a new format
version is required to ensure correctness for tables that track delete
files. Older readers are not forward-compatible with v2 tables.

Row-level delete changes include:

   - Addition of equality and positional delete files
   - Addition of content column in manifests to track the type of file
   (data, position deletes, equality deletes)
   - Addition of sequence_number column in manifests to track when in time
   a data or delete file was added
   - New rules for inheriting sequence numbers through the metadata tree to
   keep metadata write amplification low

In addition, v2 tightens requirements for additions that were compatible
with v1. For example, in v1 we added schemas and current-schema-id to table
metadata to allow tracking multiple versions of the table schema and
tracking the schema that was current when a given snapshot was written. The
table metadata fields were optional in v1 but became mandatory in v2. Also,
the schema field is removed in v2 in favor of the new fields.

Features that are supported by v2 are:

   - Row-level deletes
   - Multiple schemas, snapshot schemas
   - Multiple partition specs, better partition evolution
   - Table sort orders
   - Tracking identifier fields
   - NaN value counts for double and float fields
   - Manifest file encryption metadata

The full list of v2 changes <https://iceberg.apache.org/spec/#version-2> is
documented at the end of the spec.

I should also mention that v2 is not the default for tables in the
reference implementation (Java). We will continue building out better
support for row-level deletes, but I think that we are confident that the
row-level deletes design works and isn’t going to need breaking changes.

I think the next steps are to discuss the current v2 spec changes and make
sure everyone is comfortable adopting them. If that goes well, we’ll have a
vote to adopt the changes. Thanks for discussing, everyone!

Ryan
-- 
Ryan Blue

Reply via email to