Hey everyone, When Row lineage was originally introduced, it was believed to be incompatible with equality deletes and we initially added lineage as a feature that could be turned on. Now that these features can co-exist <https://lists.apache.org/thread/vhl7p72433m904y115hmh7vnnbjdz4xn>, we would like to require lineage for v3 as there are benefits for feature enablement and adoption.
We discussed this topic <https://www.youtube.com/watch?v=9BBZKTfcU0s&t=1475s> in the community sync, but I would like to raise this here on the dev list to have a broader discussion around the spec changes proposed in this PR <https://github.com/apache/iceberg/pull/12580>. The main points of discussion were around what implications this requirement has for writers, especially where tracking changes may be difficult/expensive. The proposal is to address this in the spec by clarifying the semantics of treating changes as upserts vs. delete/add is an engine implementation decision. The update to the spec would state that engines *should* track row ids through modification, but depending on the engine or the viability of tracking these changes, an engine may choose to model updates as deletes/adds. I'd love to get everyones thoughts/feedback on this proposal since we still have the opportunity to change this for v3, -Dan