+1 for the PR and always having the lineage metadata.

I think that is going to make the feature much more reliable. We don't gain
anything from allowing the feature to be turned off for compatibility, when
we have reasonable ways to interpret data written by any engine.

Ryan

On Wed, Mar 19, 2025 at 12:37 PM Daniel Weeks <dwe...@apache.org> wrote:

> Hey everyone,
>
> When Row lineage was originally introduced, it was believed to be
> incompatible with equality deletes and we initially added lineage as a
> feature that could be turned on.  Now that these features can co-exist
> <https://lists.apache.org/thread/vhl7p72433m904y115hmh7vnnbjdz4xn>, we
> would like to require lineage for v3 as there are benefits for feature
> enablement and adoption.
>
> We discussed this topic
> <https://www.youtube.com/watch?v=9BBZKTfcU0s&t=1475s> in the community
> sync, but I would like to raise this here on the dev list to have a broader
> discussion around the spec changes proposed in this PR
> <https://github.com/apache/iceberg/pull/12580>.
>
> The main points of discussion were around what implications this
> requirement has for writers, especially where tracking changes may be
> difficult/expensive.  The proposal is to address this in the spec by
> clarifying the semantics of treating changes as upserts vs. delete/add is
> an engine implementation decision. The update to the spec would state that
> engines *should* track row ids through modification, but depending on
> the engine or the viability of tracking these changes, an engine may choose
> to model updates as deletes/adds.
>
> I'd love to get everyones thoughts/feedback on this proposal since we
> still have the opportunity to change this for v3,
> -Dan
>

Reply via email to