Re: Iceberg 1.10.0 release update - July 1, 2025

Kevin Liu Thu, 24 Jul 2025 13:21:52 -0700

The 3 PRs above are merged. Thanks everyone for the review.

I've added 2 more PRs to the 1.10 milestone. These are both nice-to-haves.
- docs: add subpage for REST Catalog Spec in "Specification" #13521
<https://github.com/apache/iceberg/pull/13521>
- REST-Fixture: Ensure strict mode on jdbc catalog for rest fixture #13599
<https://github.com/apache/iceberg/pull/13599>


The first one changes the link for "REST Catalog Spec" on the left nav of
https://iceberg.apache.org/spec/ from the swagger.io link to a dedicated
page for IRC.
The second one fixes the default behavior of `iceberg-rest-fixture` image
to align with the general expectation when creating a table in a catalog.

Please take a look. I would like to have both of these as part of the 1.10
release.

Best,
Kevin Liu


On Wed, Jul 23, 2025 at 1:31 PM Kevin Liu <[email protected]> wrote:

> Here are the 3 PRs to add corresponding tests.
> https://github.com/apache/iceberg/pull/13648
> https://github.com/apache/iceberg/pull/13649
> https://github.com/apache/iceberg/pull/13650
>
> I've tagged them with the 1.10 milestone, waiting for CI to complete :)
>
> Best,
> Kevin Liu
>
> On Wed, Jul 23, 2025 at 1:08 PM Steven Wu <[email protected]> wrote:
>
>> Kevin, thanks for checking that. I will take a look at your backport PRs.
>> Can you add them to the 1.10.0 milestone?
>>
>> On Wed, Jul 23, 2025 at 12:27 PM Kevin Liu <[email protected]> wrote:
>>
>>> Thanks again for driving this Steven! We're very close!!
>>>
>>> As mentioned in the community sync today, I wanted to verify feature
>>> parity between Spark 3.5 and Spark 4.0 for this release.
>>> I was able to verify that Spark 3.5 and Spark 4.0 have feature parity
>>> for this upcoming release. More details in the other devlist thread
>>> https://lists.apache.org/thread/7x7xcm3y87y81c4grq4nn9gdjd4jm05f
>>>
>>> Thanks,
>>> Kevin Liu
>>>
>>> On Wed, Jul 23, 2025 at 12:17 PM Steven Wu <[email protected]> wrote:
>>>
>>>> Another update on the release.
>>>>
>>>> The existing blocker PRs are almost done.
>>>>
>>>> During today's community sync, we identified the following issues/PRs
>>>> to be included in the 1.10.0 release.
>>>>
>>>>    1. backport of PR 13100 to the main branch. I have created a cherry-pick
>>>>    PR <https://github.com/apache/iceberg/pull/13647> for that. There
>>>>    is a one line difference compared to the original PR due to the removal 
>>>> of
>>>>    the deprecated RemoveSnapshot class in main branch for 1.10.0 target. 
>>>> Amogh
>>>>    has suggested using RemoveSnapshots with a single snapshot id, which 
>>>> should
>>>>    be supported by all REST catalog servers.
>>>>    2. Flink compaction doesn't support row lineage. Fail the
>>>>    compaction for V3 tables. I created a PR
>>>>    <https://github.com/apache/iceberg/pull/13646> for that. Will
>>>>    backport after it is merged.
>>>>    3. Spark: fix data frame join based on different versions of the
>>>>    same table that may lead to weird results. Anton is working on a fix. It
>>>>    requires a small behavior change (table state may be stale up to refresh
>>>>    interval). Hence it is better to include it in the 1.10.0 release where
>>>>    Spark 4.0 is first supported.
>>>>    4. Variant support in core and Spark 4.0. Ryan thinks this is very
>>>>    close and will prioritize the review.
>>>>
>>>> Thanks,
>>>> steven
>>>>
>>>> The 1.10.0 milestone can be found here.
>>>> https://github.com/apache/iceberg/milestone/54
>>>>
>>>>
>>>> On Wed, Jul 16, 2025 at 9:15 AM Steven Wu <[email protected]> wrote:
>>>>
>>>>> Ajantha/Robin, thanks for the note. We can include the PR in the
>>>>> 1.10.0 milestone.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jul 16, 2025 at 3:20 AM Robin Moffatt
>>>>> <[email protected]> wrote:
>>>>>
>>>>>> Thanks Ajantha. Just to confirm, from a Confluent point of view, we
>>>>>> will not be able to publish the connector on Confluent Hub until this
>>>>>> CVE[1] is fixed.
>>>>>> Since we would not publish a snapshot build, if the fix doesn't make
>>>>>> it into 1.10 then we'd have to wait for 1.11 (or a dot release of 1.10) 
>>>>>> to
>>>>>> be able to include the connector on Confluent Hub.
>>>>>>
>>>>>> Thanks, Robin.
>>>>>>
>>>>>> [1]
>>>>>> https://github.com/apache/iceberg/issues/10745#issuecomment-3074300861
>>>>>>
>>>>>> On Wed, 16 Jul 2025 at 04:03, Ajantha Bhat <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> I have approached Confluent people
>>>>>>> <https://github.com/apache/iceberg/issues/10745#issuecomment-3058281281>
>>>>>>> to help us publish the OSS Kafka Connect Iceberg sink plugin.
>>>>>>> It seems we have a CVE from dependency that blocks us from
>>>>>>> publishing the plugin.
>>>>>>>
>>>>>>> Please include the below PR for 1.10.0 release which fixes that.
>>>>>>> https://github.com/apache/iceberg/pull/13561
>>>>>>>
>>>>>>> - Ajantha
>>>>>>>
>>>>>>> On Tue, Jul 15, 2025 at 10:48 AM Steven Wu <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> > Engines may model operations as deleting/inserting rows or as
>>>>>>>> modifications to rows that preserve row ids.
>>>>>>>>
>>>>>>>> Manu, I agree this sentence probably lacks some context. The first
>>>>>>>> half (as deleting/inserting rows) is probably about the row
>>>>>>>> lineage handling with equality deletes, which is described in another 
>>>>>>>> place.
>>>>>>>>
>>>>>>>> "Row lineage does not track lineage for rows updated via Equality
>>>>>>>> Deletes <https://iceberg.apache.org/spec/#equality-delete-files>,
>>>>>>>> because engines using equality deletes avoid reading existing data 
>>>>>>>> before
>>>>>>>> writing changes and can't provide the original row ID for the new rows.
>>>>>>>> These updates are always treated as if the existing row was completely
>>>>>>>> removed and a unique new row was added."
>>>>>>>>
>>>>>>>> On Mon, Jul 14, 2025 at 5:49 PM Manu Zhang <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks Steven, I missed that part but the following sentence is a
>>>>>>>>> bit hard to understand (maybe just me)
>>>>>>>>>
>>>>>>>>> Engines may model operations as deleting/inserting rows or as
>>>>>>>>> modifications to rows that preserve row ids.
>>>>>>>>>
>>>>>>>>> Can you please help to explain?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Steven Wu <[email protected]>于2025年7月15日 周二04:41写道：
>>>>>>>>>
>>>>>>>>>> Manu
>>>>>>>>>>
>>>>>>>>>> The spec already covers the row lineage carry over (for replace)
>>>>>>>>>> https://iceberg.apache.org/spec/#row-lineage
>>>>>>>>>>
>>>>>>>>>> "When an existing row is moved to a different data file for any
>>>>>>>>>> reason, writers should write _row_id and
>>>>>>>>>> _last_updated_sequence_number according to the following rules:"
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Steven
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Jul 14, 2025 at 1:38 PM Steven Wu <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> another update on the release.
>>>>>>>>>>>
>>>>>>>>>>> We have one open PR left for the 1.10.0 milestone
>>>>>>>>>>> <https://github.com/apache/iceberg/milestone/54> (with 25
>>>>>>>>>>> closed PRs). Amogh is actively working on the last blocker PR.
>>>>>>>>>>> Spark 4.0: Preserve row lineage information on compaction
>>>>>>>>>>> <https://github.com/apache/iceberg/pull/13555>
>>>>>>>>>>>
>>>>>>>>>>> I will publish a release candidate after the above blocker is
>>>>>>>>>>> merged and backported.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Steven
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jul 7, 2025 at 11:56 PM Manu Zhang <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Amogh,
>>>>>>>>>>>>
>>>>>>>>>>>> Is it defined in the table spec that "replace" operation should
>>>>>>>>>>>> carry over existing lineage info insteading of assigning new IDs? 
>>>>>>>>>>>> If not,
>>>>>>>>>>>> we'd better firstly define it in spec because all engines and
>>>>>>>>>>>> implementations need to follow it.
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jul 8, 2025 at 11:44 AM Amogh Jahagirdar <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> One other area I think we need to make sure works with row
>>>>>>>>>>>>> lineage before release is data file compaction. At the moment,
>>>>>>>>>>>>> <https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/SparkBinPackFileRewriteRunner.java#L44>
>>>>>>>>>>>>>  it
>>>>>>>>>>>>> looks like compaction will read the records from the data files 
>>>>>>>>>>>>> without
>>>>>>>>>>>>> projecting the lineage fields. What this means is that on write 
>>>>>>>>>>>>> of the new
>>>>>>>>>>>>> compacted data files we'd be losing the lineage information. 
>>>>>>>>>>>>> There's no
>>>>>>>>>>>>> data change in a compaction but we do need to make sure the 
>>>>>>>>>>>>> lineage info
>>>>>>>>>>>>> from carried over records is materialized in the newly compacted 
>>>>>>>>>>>>> files so
>>>>>>>>>>>>> they don't get new IDs or inherit the new file sequence number. 
>>>>>>>>>>>>> I'm working
>>>>>>>>>>>>> on addressing this as well, but I'd call this out as a blocker as 
>>>>>>>>>>>>> well.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Robin Moffatt*
>>>>>> *Sr. Principal Advisor, Streaming Data Technologies*
>>>>>>
>>>>>

Re: Iceberg 1.10.0 release update - July 1, 2025

Reply via email to