Re: Iceberg 1.10.0 release update - July 1, 2025

Kevin Liu Wed, 23 Jul 2025 13:33:35 -0700

Here are the 3 PRs to add corresponding tests.
https://github.com/apache/iceberg/pull/13648
https://github.com/apache/iceberg/pull/13649
https://github.com/apache/iceberg/pull/13650


I've tagged them with the 1.10 milestone, waiting for CI to complete :)

Best,
Kevin Liu

On Wed, Jul 23, 2025 at 1:08 PM Steven Wu <[email protected]> wrote:

> Kevin, thanks for checking that. I will take a look at your backport PRs.
> Can you add them to the 1.10.0 milestone?
>
> On Wed, Jul 23, 2025 at 12:27 PM Kevin Liu <[email protected]> wrote:
>
>> Thanks again for driving this Steven! We're very close!!
>>
>> As mentioned in the community sync today, I wanted to verify feature
>> parity between Spark 3.5 and Spark 4.0 for this release.
>> I was able to verify that Spark 3.5 and Spark 4.0 have feature parity for
>> this upcoming release. More details in the other devlist thread
>> https://lists.apache.org/thread/7x7xcm3y87y81c4grq4nn9gdjd4jm05f
>>
>> Thanks,
>> Kevin Liu
>>
>> On Wed, Jul 23, 2025 at 12:17 PM Steven Wu <[email protected]> wrote:
>>
>>> Another update on the release.
>>>
>>> The existing blocker PRs are almost done.
>>>
>>> During today's community sync, we identified the following issues/PRs to
>>> be included in the 1.10.0 release.
>>>
>>>    1. backport of PR 13100 to the main branch. I have created a cherry-pick
>>>    PR <https://github.com/apache/iceberg/pull/13647> for that. There is
>>>    a one line difference compared to the original PR due to the removal of 
>>> the
>>>    deprecated RemoveSnapshot class in main branch for 1.10.0 target. Amogh 
>>> has
>>>    suggested using RemoveSnapshots with a single snapshot id, which should 
>>> be
>>>    supported by all REST catalog servers.
>>>    2. Flink compaction doesn't support row lineage. Fail the compaction
>>>    for V3 tables. I created a PR
>>>    <https://github.com/apache/iceberg/pull/13646> for that. Will
>>>    backport after it is merged.
>>>    3. Spark: fix data frame join based on different versions of the
>>>    same table that may lead to weird results. Anton is working on a fix. It
>>>    requires a small behavior change (table state may be stale up to refresh
>>>    interval). Hence it is better to include it in the 1.10.0 release where
>>>    Spark 4.0 is first supported.
>>>    4. Variant support in core and Spark 4.0. Ryan thinks this is very
>>>    close and will prioritize the review.
>>>
>>> Thanks,
>>> steven
>>>
>>> The 1.10.0 milestone can be found here.
>>> https://github.com/apache/iceberg/milestone/54
>>>
>>>
>>> On Wed, Jul 16, 2025 at 9:15 AM Steven Wu <[email protected]> wrote:
>>>
>>>> Ajantha/Robin, thanks for the note. We can include the PR in the 1.10.0
>>>> milestone.
>>>>
>>>>
>>>>
>>>> On Wed, Jul 16, 2025 at 3:20 AM Robin Moffatt
>>>> <[email protected]> wrote:
>>>>
>>>>> Thanks Ajantha. Just to confirm, from a Confluent point of view, we
>>>>> will not be able to publish the connector on Confluent Hub until this
>>>>> CVE[1] is fixed.
>>>>> Since we would not publish a snapshot build, if the fix doesn't make
>>>>> it into 1.10 then we'd have to wait for 1.11 (or a dot release of 1.10) to
>>>>> be able to include the connector on Confluent Hub.
>>>>>
>>>>> Thanks, Robin.
>>>>>
>>>>> [1]
>>>>> https://github.com/apache/iceberg/issues/10745#issuecomment-3074300861
>>>>>
>>>>> On Wed, 16 Jul 2025 at 04:03, Ajantha Bhat <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> I have approached Confluent people
>>>>>> <https://github.com/apache/iceberg/issues/10745#issuecomment-3058281281>
>>>>>> to help us publish the OSS Kafka Connect Iceberg sink plugin.
>>>>>> It seems we have a CVE from dependency that blocks us from publishing
>>>>>> the plugin.
>>>>>>
>>>>>> Please include the below PR for 1.10.0 release which fixes that.
>>>>>> https://github.com/apache/iceberg/pull/13561
>>>>>>
>>>>>> - Ajantha
>>>>>>
>>>>>> On Tue, Jul 15, 2025 at 10:48 AM Steven Wu <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> > Engines may model operations as deleting/inserting rows or as
>>>>>>> modifications to rows that preserve row ids.
>>>>>>>
>>>>>>> Manu, I agree this sentence probably lacks some context. The first
>>>>>>> half (as deleting/inserting rows) is probably about the row lineage
>>>>>>> handling with equality deletes, which is described in another place.
>>>>>>>
>>>>>>> "Row lineage does not track lineage for rows updated via Equality
>>>>>>> Deletes <https://iceberg.apache.org/spec/#equality-delete-files>,
>>>>>>> because engines using equality deletes avoid reading existing data 
>>>>>>> before
>>>>>>> writing changes and can't provide the original row ID for the new rows.
>>>>>>> These updates are always treated as if the existing row was completely
>>>>>>> removed and a unique new row was added."
>>>>>>>
>>>>>>> On Mon, Jul 14, 2025 at 5:49 PM Manu Zhang <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks Steven, I missed that part but the following sentence is a
>>>>>>>> bit hard to understand (maybe just me)
>>>>>>>>
>>>>>>>> Engines may model operations as deleting/inserting rows or as
>>>>>>>> modifications to rows that preserve row ids.
>>>>>>>>
>>>>>>>> Can you please help to explain?
>>>>>>>>
>>>>>>>>
>>>>>>>> Steven Wu <[email protected]>于2025年7月15日 周二04:41写道：
>>>>>>>>
>>>>>>>>> Manu
>>>>>>>>>
>>>>>>>>> The spec already covers the row lineage carry over (for replace)
>>>>>>>>> https://iceberg.apache.org/spec/#row-lineage
>>>>>>>>>
>>>>>>>>> "When an existing row is moved to a different data file for any
>>>>>>>>> reason, writers should write _row_id and
>>>>>>>>> _last_updated_sequence_number according to the following rules:"
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Steven
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Jul 14, 2025 at 1:38 PM Steven Wu <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> another update on the release.
>>>>>>>>>>
>>>>>>>>>> We have one open PR left for the 1.10.0 milestone
>>>>>>>>>> <https://github.com/apache/iceberg/milestone/54> (with 25 closed
>>>>>>>>>> PRs). Amogh is actively working on the last blocker PR.
>>>>>>>>>> Spark 4.0: Preserve row lineage information on compaction
>>>>>>>>>> <https://github.com/apache/iceberg/pull/13555>
>>>>>>>>>>
>>>>>>>>>> I will publish a release candidate after the above blocker is
>>>>>>>>>> merged and backported.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Steven
>>>>>>>>>>
>>>>>>>>>> On Mon, Jul 7, 2025 at 11:56 PM Manu Zhang <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Amogh,
>>>>>>>>>>>
>>>>>>>>>>> Is it defined in the table spec that "replace" operation should
>>>>>>>>>>> carry over existing lineage info insteading of assigning new IDs? 
>>>>>>>>>>> If not,
>>>>>>>>>>> we'd better firstly define it in spec because all engines and
>>>>>>>>>>> implementations need to follow it.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jul 8, 2025 at 11:44 AM Amogh Jahagirdar <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> One other area I think we need to make sure works with row
>>>>>>>>>>>> lineage before release is data file compaction. At the moment,
>>>>>>>>>>>> <https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/SparkBinPackFileRewriteRunner.java#L44>
>>>>>>>>>>>>  it
>>>>>>>>>>>> looks like compaction will read the records from the data files 
>>>>>>>>>>>> without
>>>>>>>>>>>> projecting the lineage fields. What this means is that on write of 
>>>>>>>>>>>> the new
>>>>>>>>>>>> compacted data files we'd be losing the lineage information. 
>>>>>>>>>>>> There's no
>>>>>>>>>>>> data change in a compaction but we do need to make sure the 
>>>>>>>>>>>> lineage info
>>>>>>>>>>>> from carried over records is materialized in the newly compacted 
>>>>>>>>>>>> files so
>>>>>>>>>>>> they don't get new IDs or inherit the new file sequence number. 
>>>>>>>>>>>> I'm working
>>>>>>>>>>>> on addressing this as well, but I'd call this out as a blocker as 
>>>>>>>>>>>> well.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>
>>>>> --
>>>>> *Robin Moffatt*
>>>>> *Sr. Principal Advisor, Streaming Data Technologies*
>>>>>
>>>>

Re: Iceberg 1.10.0 release update - July 1, 2025

Reply via email to