Hi everyone, I’d like to start a discussion about adopting the current set of changes to the spec as v2. Adopting the current set of changes means that we won’t have the ability to include any more breaking changes in v2. Any new breaking change would require v3. This would not stop or affect any of the ongoing work to add secondary indexes or new metadata because those additions are forward-compatible (old readers can ignore it).
The main feature that has been added to the spec is row-level deletes. This requires a breaking change because all readers must be updated to apply the deletes when reading. Existing readers don’t do that, so a new format version is required to ensure correctness for tables that track delete files. Older readers are not forward-compatible with v2 tables. Row-level delete changes include: - Addition of equality and positional delete files - Addition of content column in manifests to track the type of file (data, position deletes, equality deletes) - Addition of sequence_number column in manifests to track when in time a data or delete file was added - New rules for inheriting sequence numbers through the metadata tree to keep metadata write amplification low In addition, v2 tightens requirements for additions that were compatible with v1. For example, in v1 we added schemas and current-schema-id to table metadata to allow tracking multiple versions of the table schema and tracking the schema that was current when a given snapshot was written. The table metadata fields were optional in v1 but became mandatory in v2. Also, the schema field is removed in v2 in favor of the new fields. Features that are supported by v2 are: - Row-level deletes - Multiple schemas, snapshot schemas - Multiple partition specs, better partition evolution - Table sort orders - Tracking identifier fields - NaN value counts for double and float fields - Manifest file encryption metadata The full list of v2 changes <https://iceberg.apache.org/spec/#version-2> is documented at the end of the spec. I should also mention that v2 is not the default for tables in the reference implementation (Java). We will continue building out better support for row-level deletes, but I think that we are confident that the row-level deletes design works and isn’t going to need breaking changes. I think the next steps are to discuss the current v2 spec changes and make sure everyone is comfortable adopting them. If that goes well, we’ll have a vote to adopt the changes. Thanks for discussing, everyone! Ryan -- Ryan Blue