Hey everyone,

When Row lineage was originally introduced, it was believed to be
incompatible with equality deletes and we initially added lineage as a
feature that could be turned on.  Now that these features can co-exist
<https://lists.apache.org/thread/vhl7p72433m904y115hmh7vnnbjdz4xn>, we
would like to require lineage for v3 as there are benefits for feature
enablement and adoption.

We discussed this topic
<https://www.youtube.com/watch?v=9BBZKTfcU0s&t=1475s> in the community
sync, but I would like to raise this here on the dev list to have a broader
discussion around the spec changes proposed in this PR
<https://github.com/apache/iceberg/pull/12580>.

The main points of discussion were around what implications this
requirement has for writers, especially where tracking changes may be
difficult/expensive.  The proposal is to address this in the spec by
clarifying the semantics of treating changes as upserts vs. delete/add is
an engine implementation decision. The update to the spec would state that
engines *should* track row ids through modification, but depending on
the engine or the viability of tracking these changes, an engine may choose
to model updates as deletes/adds.

I'd love to get everyones thoughts/feedback on this proposal since we still
have the opportunity to change this for v3,
-Dan

Reply via email to