Here are the 3 PRs to add corresponding tests. https://github.com/apache/iceberg/pull/13648 https://github.com/apache/iceberg/pull/13649 https://github.com/apache/iceberg/pull/13650
I've tagged them with the 1.10 milestone, waiting for CI to complete :) Best, Kevin Liu On Wed, Jul 23, 2025 at 1:08 PM Steven Wu <stevenz...@gmail.com> wrote: > Kevin, thanks for checking that. I will take a look at your backport PRs. > Can you add them to the 1.10.0 milestone? > > On Wed, Jul 23, 2025 at 12:27 PM Kevin Liu <kevinjq...@apache.org> wrote: > >> Thanks again for driving this Steven! We're very close!! >> >> As mentioned in the community sync today, I wanted to verify feature >> parity between Spark 3.5 and Spark 4.0 for this release. >> I was able to verify that Spark 3.5 and Spark 4.0 have feature parity for >> this upcoming release. More details in the other devlist thread >> https://lists.apache.org/thread/7x7xcm3y87y81c4grq4nn9gdjd4jm05f >> >> Thanks, >> Kevin Liu >> >> On Wed, Jul 23, 2025 at 12:17 PM Steven Wu <stevenz...@gmail.com> wrote: >> >>> Another update on the release. >>> >>> The existing blocker PRs are almost done. >>> >>> During today's community sync, we identified the following issues/PRs to >>> be included in the 1.10.0 release. >>> >>> 1. backport of PR 13100 to the main branch. I have created a cherry-pick >>> PR <https://github.com/apache/iceberg/pull/13647> for that. There is >>> a one line difference compared to the original PR due to the removal of >>> the >>> deprecated RemoveSnapshot class in main branch for 1.10.0 target. Amogh >>> has >>> suggested using RemoveSnapshots with a single snapshot id, which should >>> be >>> supported by all REST catalog servers. >>> 2. Flink compaction doesn't support row lineage. Fail the compaction >>> for V3 tables. I created a PR >>> <https://github.com/apache/iceberg/pull/13646> for that. Will >>> backport after it is merged. >>> 3. Spark: fix data frame join based on different versions of the >>> same table that may lead to weird results. Anton is working on a fix. It >>> requires a small behavior change (table state may be stale up to refresh >>> interval). Hence it is better to include it in the 1.10.0 release where >>> Spark 4.0 is first supported. >>> 4. Variant support in core and Spark 4.0. Ryan thinks this is very >>> close and will prioritize the review. >>> >>> Thanks, >>> steven >>> >>> The 1.10.0 milestone can be found here. >>> https://github.com/apache/iceberg/milestone/54 >>> >>> >>> On Wed, Jul 16, 2025 at 9:15 AM Steven Wu <stevenz...@gmail.com> wrote: >>> >>>> Ajantha/Robin, thanks for the note. We can include the PR in the 1.10.0 >>>> milestone. >>>> >>>> >>>> >>>> On Wed, Jul 16, 2025 at 3:20 AM Robin Moffatt >>>> <ro...@confluent.io.invalid> wrote: >>>> >>>>> Thanks Ajantha. Just to confirm, from a Confluent point of view, we >>>>> will not be able to publish the connector on Confluent Hub until this >>>>> CVE[1] is fixed. >>>>> Since we would not publish a snapshot build, if the fix doesn't make >>>>> it into 1.10 then we'd have to wait for 1.11 (or a dot release of 1.10) to >>>>> be able to include the connector on Confluent Hub. >>>>> >>>>> Thanks, Robin. >>>>> >>>>> [1] >>>>> https://github.com/apache/iceberg/issues/10745#issuecomment-3074300861 >>>>> >>>>> On Wed, 16 Jul 2025 at 04:03, Ajantha Bhat <ajanthab...@gmail.com> >>>>> wrote: >>>>> >>>>>> I have approached Confluent people >>>>>> <https://github.com/apache/iceberg/issues/10745#issuecomment-3058281281> >>>>>> to help us publish the OSS Kafka Connect Iceberg sink plugin. >>>>>> It seems we have a CVE from dependency that blocks us from publishing >>>>>> the plugin. >>>>>> >>>>>> Please include the below PR for 1.10.0 release which fixes that. >>>>>> https://github.com/apache/iceberg/pull/13561 >>>>>> >>>>>> - Ajantha >>>>>> >>>>>> On Tue, Jul 15, 2025 at 10:48 AM Steven Wu <stevenz...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> > Engines may model operations as deleting/inserting rows or as >>>>>>> modifications to rows that preserve row ids. >>>>>>> >>>>>>> Manu, I agree this sentence probably lacks some context. The first >>>>>>> half (as deleting/inserting rows) is probably about the row lineage >>>>>>> handling with equality deletes, which is described in another place. >>>>>>> >>>>>>> "Row lineage does not track lineage for rows updated via Equality >>>>>>> Deletes <https://iceberg.apache.org/spec/#equality-delete-files>, >>>>>>> because engines using equality deletes avoid reading existing data >>>>>>> before >>>>>>> writing changes and can't provide the original row ID for the new rows. >>>>>>> These updates are always treated as if the existing row was completely >>>>>>> removed and a unique new row was added." >>>>>>> >>>>>>> On Mon, Jul 14, 2025 at 5:49 PM Manu Zhang <owenzhang1...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Thanks Steven, I missed that part but the following sentence is a >>>>>>>> bit hard to understand (maybe just me) >>>>>>>> >>>>>>>> Engines may model operations as deleting/inserting rows or as >>>>>>>> modifications to rows that preserve row ids. >>>>>>>> >>>>>>>> Can you please help to explain? >>>>>>>> >>>>>>>> >>>>>>>> Steven Wu <stevenz...@gmail.com>于2025年7月15日 周二04:41写道: >>>>>>>> >>>>>>>>> Manu >>>>>>>>> >>>>>>>>> The spec already covers the row lineage carry over (for replace) >>>>>>>>> https://iceberg.apache.org/spec/#row-lineage >>>>>>>>> >>>>>>>>> "When an existing row is moved to a different data file for any >>>>>>>>> reason, writers should write _row_id and >>>>>>>>> _last_updated_sequence_number according to the following rules:" >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Steven >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Jul 14, 2025 at 1:38 PM Steven Wu <stevenz...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> another update on the release. >>>>>>>>>> >>>>>>>>>> We have one open PR left for the 1.10.0 milestone >>>>>>>>>> <https://github.com/apache/iceberg/milestone/54> (with 25 closed >>>>>>>>>> PRs). Amogh is actively working on the last blocker PR. >>>>>>>>>> Spark 4.0: Preserve row lineage information on compaction >>>>>>>>>> <https://github.com/apache/iceberg/pull/13555> >>>>>>>>>> >>>>>>>>>> I will publish a release candidate after the above blocker is >>>>>>>>>> merged and backported. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Steven >>>>>>>>>> >>>>>>>>>> On Mon, Jul 7, 2025 at 11:56 PM Manu Zhang < >>>>>>>>>> owenzhang1...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Amogh, >>>>>>>>>>> >>>>>>>>>>> Is it defined in the table spec that "replace" operation should >>>>>>>>>>> carry over existing lineage info insteading of assigning new IDs? >>>>>>>>>>> If not, >>>>>>>>>>> we'd better firstly define it in spec because all engines and >>>>>>>>>>> implementations need to follow it. >>>>>>>>>>> >>>>>>>>>>> On Tue, Jul 8, 2025 at 11:44 AM Amogh Jahagirdar < >>>>>>>>>>> 2am...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> One other area I think we need to make sure works with row >>>>>>>>>>>> lineage before release is data file compaction. At the moment, >>>>>>>>>>>> <https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/SparkBinPackFileRewriteRunner.java#L44> >>>>>>>>>>>> it >>>>>>>>>>>> looks like compaction will read the records from the data files >>>>>>>>>>>> without >>>>>>>>>>>> projecting the lineage fields. What this means is that on write of >>>>>>>>>>>> the new >>>>>>>>>>>> compacted data files we'd be losing the lineage information. >>>>>>>>>>>> There's no >>>>>>>>>>>> data change in a compaction but we do need to make sure the >>>>>>>>>>>> lineage info >>>>>>>>>>>> from carried over records is materialized in the newly compacted >>>>>>>>>>>> files so >>>>>>>>>>>> they don't get new IDs or inherit the new file sequence number. >>>>>>>>>>>> I'm working >>>>>>>>>>>> on addressing this as well, but I'd call this out as a blocker as >>>>>>>>>>>> well. >>>>>>>>>>>> >>>>>>>>>>> >>>>> >>>>> -- >>>>> *Robin Moffatt* >>>>> *Sr. Principal Advisor, Streaming Data Technologies* >>>>> >>>>