Re: Iceberg 1.10.0 release update - July 1, 2025

Eduard Tudenhöfner Fri, 25 Jul 2025 09:37:29 -0700

I believe we also wanted to get in at least the read path for UnknownType.
Fokko has a WIP PR <https://github.com/apache/iceberg/pull/13445> for that.


On Fri, Jul 25, 2025 at 6:13 PM Steven Wu <[email protected]> wrote:

> 3. Spark: fix data frame join based on different versions of the same
> table that may lead to weird results. Anton is working on a fix. It
> requires a small behavior change (table state may be stale up to refresh
> interval). Hence it is better to include it in the 1.10.0 release where
> Spark 4.0 is first supported.
> 4. Variant support in core and Spark 4.0. Ryan thinks this is very close
> and will prioritize the review.
>
> We still have the above two issues pending. 3 doesn't have a PR yet. PR
> for 4 is not associated with the milestone yet.
>
> On Fri, Jul 25, 2025 at 9:02 AM Kevin Liu <[email protected]> wrote:
>
>> Thanks everyone for the review. The 2 PRs are both merged.
>> Looks like there's only 1 PR left in the 1.10 milestone
>> <https://github.com/apache/iceberg/milestone/54> :)
>>
>> Best,
>> Kevin Liu
>>
>> On Thu, Jul 24, 2025 at 7:44 PM Manu Zhang <[email protected]>
>> wrote:
>>
>>> Thanks Kevin. The first change is not in the versioned doc so it can be
>>> released anytime.
>>>
>>> Regards,
>>> Manu
>>>
>>> On Fri, Jul 25, 2025 at 4:21 AM Kevin Liu <[email protected]> wrote:
>>>
>>>> The 3 PRs above are merged. Thanks everyone for the review.
>>>>
>>>> I've added 2 more PRs to the 1.10 milestone. These are both
>>>> nice-to-haves.
>>>> - docs: add subpage for REST Catalog Spec in "Specification" #13521
>>>> <https://github.com/apache/iceberg/pull/13521>
>>>> - REST-Fixture: Ensure strict mode on jdbc catalog for rest fixture
>>>> #13599 <https://github.com/apache/iceberg/pull/13599>
>>>>
>>>> The first one changes the link for "REST Catalog Spec" on the left nav
>>>> of https://iceberg.apache.org/spec/ from the swagger.io link to a
>>>> dedicated page for IRC.
>>>> The second one fixes the default behavior of `iceberg-rest-fixture`
>>>> image to align with the general expectation when creating a table in a
>>>> catalog.
>>>>
>>>> Please take a look. I would like to have both of these as part of the
>>>> 1.10 release.
>>>>
>>>> Best,
>>>> Kevin Liu
>>>>
>>>>
>>>> On Wed, Jul 23, 2025 at 1:31 PM Kevin Liu <[email protected]>
>>>> wrote:
>>>>
>>>>> Here are the 3 PRs to add corresponding tests.
>>>>> https://github.com/apache/iceberg/pull/13648
>>>>> https://github.com/apache/iceberg/pull/13649
>>>>> https://github.com/apache/iceberg/pull/13650
>>>>>
>>>>> I've tagged them with the 1.10 milestone, waiting for CI to complete
>>>>> :)
>>>>>
>>>>> Best,
>>>>> Kevin Liu
>>>>>
>>>>> On Wed, Jul 23, 2025 at 1:08 PM Steven Wu <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Kevin, thanks for checking that. I will take a look at your backport
>>>>>> PRs. Can you add them to the 1.10.0 milestone?
>>>>>>
>>>>>> On Wed, Jul 23, 2025 at 12:27 PM Kevin Liu <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks again for driving this Steven! We're very close!!
>>>>>>>
>>>>>>> As mentioned in the community sync today, I wanted to verify feature
>>>>>>> parity between Spark 3.5 and Spark 4.0 for this release.
>>>>>>> I was able to verify that Spark 3.5 and Spark 4.0 have feature
>>>>>>> parity for this upcoming release. More details in the other devlist 
>>>>>>> thread
>>>>>>> https://lists.apache.org/thread/7x7xcm3y87y81c4grq4nn9gdjd4jm05f
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Kevin Liu
>>>>>>>
>>>>>>> On Wed, Jul 23, 2025 at 12:17 PM Steven Wu <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Another update on the release.
>>>>>>>>
>>>>>>>> The existing blocker PRs are almost done.
>>>>>>>>
>>>>>>>> During today's community sync, we identified the following
>>>>>>>> issues/PRs to be included in the 1.10.0 release.
>>>>>>>>
>>>>>>>>    1. backport of PR 13100 to the main branch. I have created a 
>>>>>>>> cherry-pick
>>>>>>>>    PR <https://github.com/apache/iceberg/pull/13647> for that.
>>>>>>>>    There is a one line difference compared to the original PR due to 
>>>>>>>> the
>>>>>>>>    removal of the deprecated RemoveSnapshot class in main branch for 
>>>>>>>> 1.10.0
>>>>>>>>    target. Amogh has suggested using RemoveSnapshots with a single 
>>>>>>>> snapshot
>>>>>>>>    id, which should be supported by all REST catalog servers.
>>>>>>>>    2. Flink compaction doesn't support row lineage. Fail the
>>>>>>>>    compaction for V3 tables. I created a PR
>>>>>>>>    <https://github.com/apache/iceberg/pull/13646> for that. Will
>>>>>>>>    backport after it is merged.
>>>>>>>>    3. Spark: fix data frame join based on different versions of
>>>>>>>>    the same table that may lead to weird results. Anton is working on 
>>>>>>>> a fix.
>>>>>>>>    It requires a small behavior change (table state may be stale up to 
>>>>>>>> refresh
>>>>>>>>    interval). Hence it is better to include it in the 1.10.0 release 
>>>>>>>> where
>>>>>>>>    Spark 4.0 is first supported.
>>>>>>>>    4. Variant support in core and Spark 4.0. Ryan thinks this is
>>>>>>>>    very close and will prioritize the review.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> steven
>>>>>>>>
>>>>>>>> The 1.10.0 milestone can be found here.
>>>>>>>> https://github.com/apache/iceberg/milestone/54
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jul 16, 2025 at 9:15 AM Steven Wu <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Ajantha/Robin, thanks for the note. We can include the PR in the
>>>>>>>>> 1.10.0 milestone.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jul 16, 2025 at 3:20 AM Robin Moffatt
>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks Ajantha. Just to confirm, from a Confluent point of view,
>>>>>>>>>> we will not be able to publish the connector on Confluent Hub until 
>>>>>>>>>> this
>>>>>>>>>> CVE[1] is fixed.
>>>>>>>>>> Since we would not publish a snapshot build, if the fix doesn't
>>>>>>>>>> make it into 1.10 then we'd have to wait for 1.11 (or a dot release 
>>>>>>>>>> of
>>>>>>>>>> 1.10) to be able to include the connector on Confluent Hub.
>>>>>>>>>>
>>>>>>>>>> Thanks, Robin.
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>> https://github.com/apache/iceberg/issues/10745#issuecomment-3074300861
>>>>>>>>>>
>>>>>>>>>> On Wed, 16 Jul 2025 at 04:03, Ajantha Bhat <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I have approached Confluent people
>>>>>>>>>>> <https://github.com/apache/iceberg/issues/10745#issuecomment-3058281281>
>>>>>>>>>>> to help us publish the OSS Kafka Connect Iceberg sink plugin.
>>>>>>>>>>> It seems we have a CVE from dependency that blocks us from
>>>>>>>>>>> publishing the plugin.
>>>>>>>>>>>
>>>>>>>>>>> Please include the below PR for 1.10.0 release which fixes that.
>>>>>>>>>>> https://github.com/apache/iceberg/pull/13561
>>>>>>>>>>>
>>>>>>>>>>> - Ajantha
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jul 15, 2025 at 10:48 AM Steven Wu <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> > Engines may model operations as deleting/inserting rows or
>>>>>>>>>>>> as modifications to rows that preserve row ids.
>>>>>>>>>>>>
>>>>>>>>>>>> Manu, I agree this sentence probably lacks some context. The
>>>>>>>>>>>> first half (as deleting/inserting rows) is probably about the
>>>>>>>>>>>> row lineage handling with equality deletes, which is described in 
>>>>>>>>>>>> another
>>>>>>>>>>>> place.
>>>>>>>>>>>>
>>>>>>>>>>>> "Row lineage does not track lineage for rows updated via Equality
>>>>>>>>>>>> Deletes
>>>>>>>>>>>> <https://iceberg.apache.org/spec/#equality-delete-files>,
>>>>>>>>>>>> because engines using equality deletes avoid reading existing data 
>>>>>>>>>>>> before
>>>>>>>>>>>> writing changes and can't provide the original row ID for the new 
>>>>>>>>>>>> rows.
>>>>>>>>>>>> These updates are always treated as if the existing row was 
>>>>>>>>>>>> completely
>>>>>>>>>>>> removed and a unique new row was added."
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jul 14, 2025 at 5:49 PM Manu Zhang <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks Steven, I missed that part but the following sentence
>>>>>>>>>>>>> is a bit hard to understand (maybe just me)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Engines may model operations as deleting/inserting rows or as
>>>>>>>>>>>>> modifications to rows that preserve row ids.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can you please help to explain?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Steven Wu <[email protected]>于2025年7月15日 周二04:41写道：
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Manu
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The spec already covers the row lineage carry over (for
>>>>>>>>>>>>>> replace)
>>>>>>>>>>>>>> https://iceberg.apache.org/spec/#row-lineage
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> "When an existing row is moved to a different data file for
>>>>>>>>>>>>>> any reason, writers should write _row_id and
>>>>>>>>>>>>>> _last_updated_sequence_number according to the following
>>>>>>>>>>>>>> rules:"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Steven
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 1:38 PM Steven Wu <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> another update on the release.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We have one open PR left for the 1.10.0 milestone
>>>>>>>>>>>>>>> <https://github.com/apache/iceberg/milestone/54> (with 25
>>>>>>>>>>>>>>> closed PRs). Amogh is actively working on the last blocker PR.
>>>>>>>>>>>>>>> Spark 4.0: Preserve row lineage information on compaction
>>>>>>>>>>>>>>> <https://github.com/apache/iceberg/pull/13555>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I will publish a release candidate after the above blocker
>>>>>>>>>>>>>>> is merged and backported.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Steven
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Jul 7, 2025 at 11:56 PM Manu Zhang <
>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Amogh,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Is it defined in the table spec that "replace" operation
>>>>>>>>>>>>>>>> should carry over existing lineage info insteading of 
>>>>>>>>>>>>>>>> assigning new IDs? If
>>>>>>>>>>>>>>>> not, we'd better firstly define it in spec because all engines 
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> implementations need to follow it.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Jul 8, 2025 at 11:44 AM Amogh Jahagirdar <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> One other area I think we need to make sure works with row
>>>>>>>>>>>>>>>>> lineage before release is data file compaction. At the
>>>>>>>>>>>>>>>>> moment,
>>>>>>>>>>>>>>>>> <https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/SparkBinPackFileRewriteRunner.java#L44>
>>>>>>>>>>>>>>>>>  it
>>>>>>>>>>>>>>>>> looks like compaction will read the records from the data 
>>>>>>>>>>>>>>>>> files without
>>>>>>>>>>>>>>>>> projecting the lineage fields. What this means is that on 
>>>>>>>>>>>>>>>>> write of the new
>>>>>>>>>>>>>>>>> compacted data files we'd be losing the lineage information. 
>>>>>>>>>>>>>>>>> There's no
>>>>>>>>>>>>>>>>> data change in a compaction but we do need to make sure the 
>>>>>>>>>>>>>>>>> lineage info
>>>>>>>>>>>>>>>>> from carried over records is materialized in the newly 
>>>>>>>>>>>>>>>>> compacted files so
>>>>>>>>>>>>>>>>> they don't get new IDs or inherit the new file sequence 
>>>>>>>>>>>>>>>>> number. I'm working
>>>>>>>>>>>>>>>>> on addressing this as well, but I'd call this out as a 
>>>>>>>>>>>>>>>>> blocker as well.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> *Robin Moffatt*
>>>>>>>>>> *Sr. Principal Advisor, Streaming Data Technologies*
>>>>>>>>>>
>>>>>>>>>

Re: Iceberg 1.10.0 release update - July 1, 2025

Reply via email to