Re: Iceberg 1.10.0 release update - July 1, 2025

Steven Wu Fri, 25 Jul 2025 11:27:44 -0700

> I believe we also wanted to get in at least the read path for
UnknownType. Fokko has a WIP PR
<https://github.com/apache/iceberg/pull/13445> for that.
I thought in the community sync the consensus is that this is not a
blocker, because it is a new feature implementation. If it is ready, it
will be included.


On Fri, Jul 25, 2025 at 9:43 AM Kevin Liu <[email protected]> wrote:

> I think Fokko's OOO. Should we help with that PR?
>
> On Fri, Jul 25, 2025 at 9:38 AM Eduard Tudenhöfner <
> [email protected]> wrote:
>
>> I believe we also wanted to get in at least the read path for
>> UnknownType. Fokko has a WIP PR
>> <https://github.com/apache/iceberg/pull/13445> for that.
>>
>> On Fri, Jul 25, 2025 at 6:13 PM Steven Wu <[email protected]> wrote:
>>
>>> 3. Spark: fix data frame join based on different versions of the same
>>> table that may lead to weird results. Anton is working on a fix. It
>>> requires a small behavior change (table state may be stale up to refresh
>>> interval). Hence it is better to include it in the 1.10.0 release where
>>> Spark 4.0 is first supported.
>>> 4. Variant support in core and Spark 4.0. Ryan thinks this is very close
>>> and will prioritize the review.
>>>
>>> We still have the above two issues pending. 3 doesn't have a PR yet. PR
>>> for 4 is not associated with the milestone yet.
>>>
>>> On Fri, Jul 25, 2025 at 9:02 AM Kevin Liu <[email protected]> wrote:
>>>
>>>> Thanks everyone for the review. The 2 PRs are both merged.
>>>> Looks like there's only 1 PR left in the 1.10 milestone
>>>> <https://github.com/apache/iceberg/milestone/54> :)
>>>>
>>>> Best,
>>>> Kevin Liu
>>>>
>>>> On Thu, Jul 24, 2025 at 7:44 PM Manu Zhang <[email protected]>
>>>> wrote:
>>>>
>>>>> Thanks Kevin. The first change is not in the versioned doc so it can
>>>>> be released anytime.
>>>>>
>>>>> Regards,
>>>>> Manu
>>>>>
>>>>> On Fri, Jul 25, 2025 at 4:21 AM Kevin Liu <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> The 3 PRs above are merged. Thanks everyone for the review.
>>>>>>
>>>>>> I've added 2 more PRs to the 1.10 milestone. These are both
>>>>>> nice-to-haves.
>>>>>> - docs: add subpage for REST Catalog Spec in "Specification" #13521
>>>>>> <https://github.com/apache/iceberg/pull/13521>
>>>>>> - REST-Fixture: Ensure strict mode on jdbc catalog for rest fixture
>>>>>> #13599 <https://github.com/apache/iceberg/pull/13599>
>>>>>>
>>>>>> The first one changes the link for "REST Catalog Spec" on the left
>>>>>> nav of https://iceberg.apache.org/spec/ from the swagger.io link to
>>>>>> a dedicated page for IRC.
>>>>>> The second one fixes the default behavior of `iceberg-rest-fixture`
>>>>>> image to align with the general expectation when creating a table in a
>>>>>> catalog.
>>>>>>
>>>>>> Please take a look. I would like to have both of these as part of the
>>>>>> 1.10 release.
>>>>>>
>>>>>> Best,
>>>>>> Kevin Liu
>>>>>>
>>>>>>
>>>>>> On Wed, Jul 23, 2025 at 1:31 PM Kevin Liu <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Here are the 3 PRs to add corresponding tests.
>>>>>>> https://github.com/apache/iceberg/pull/13648
>>>>>>> https://github.com/apache/iceberg/pull/13649
>>>>>>> https://github.com/apache/iceberg/pull/13650
>>>>>>>
>>>>>>> I've tagged them with the 1.10 milestone, waiting for CI to complete
>>>>>>> :)
>>>>>>>
>>>>>>> Best,
>>>>>>> Kevin Liu
>>>>>>>
>>>>>>> On Wed, Jul 23, 2025 at 1:08 PM Steven Wu <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Kevin, thanks for checking that. I will take a look at your
>>>>>>>> backport PRs. Can you add them to the 1.10.0 milestone?
>>>>>>>>
>>>>>>>> On Wed, Jul 23, 2025 at 12:27 PM Kevin Liu <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks again for driving this Steven! We're very close!!
>>>>>>>>>
>>>>>>>>> As mentioned in the community sync today, I wanted to verify
>>>>>>>>> feature parity between Spark 3.5 and Spark 4.0 for this release.
>>>>>>>>> I was able to verify that Spark 3.5 and Spark 4.0 have feature
>>>>>>>>> parity for this upcoming release. More details in the other devlist 
>>>>>>>>> thread
>>>>>>>>> https://lists.apache.org/thread/7x7xcm3y87y81c4grq4nn9gdjd4jm05f
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Kevin Liu
>>>>>>>>>
>>>>>>>>> On Wed, Jul 23, 2025 at 12:17 PM Steven Wu <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Another update on the release.
>>>>>>>>>>
>>>>>>>>>> The existing blocker PRs are almost done.
>>>>>>>>>>
>>>>>>>>>> During today's community sync, we identified the following
>>>>>>>>>> issues/PRs to be included in the 1.10.0 release.
>>>>>>>>>>
>>>>>>>>>>    1. backport of PR 13100 to the main branch. I have created a 
>>>>>>>>>> cherry-pick
>>>>>>>>>>    PR <https://github.com/apache/iceberg/pull/13647> for that.
>>>>>>>>>>    There is a one line difference compared to the original PR due to 
>>>>>>>>>> the
>>>>>>>>>>    removal of the deprecated RemoveSnapshot class in main branch for 
>>>>>>>>>> 1.10.0
>>>>>>>>>>    target. Amogh has suggested using RemoveSnapshots with a single 
>>>>>>>>>> snapshot
>>>>>>>>>>    id, which should be supported by all REST catalog servers.
>>>>>>>>>>    2. Flink compaction doesn't support row lineage. Fail the
>>>>>>>>>>    compaction for V3 tables. I created a PR
>>>>>>>>>>    <https://github.com/apache/iceberg/pull/13646> for that. Will
>>>>>>>>>>    backport after it is merged.
>>>>>>>>>>    3. Spark: fix data frame join based on different versions of
>>>>>>>>>>    the same table that may lead to weird results. Anton is working 
>>>>>>>>>> on a fix.
>>>>>>>>>>    It requires a small behavior change (table state may be stale up 
>>>>>>>>>> to refresh
>>>>>>>>>>    interval). Hence it is better to include it in the 1.10.0 release 
>>>>>>>>>> where
>>>>>>>>>>    Spark 4.0 is first supported.
>>>>>>>>>>    4. Variant support in core and Spark 4.0. Ryan thinks this is
>>>>>>>>>>    very close and will prioritize the review.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> steven
>>>>>>>>>>
>>>>>>>>>> The 1.10.0 milestone can be found here.
>>>>>>>>>> https://github.com/apache/iceberg/milestone/54
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Jul 16, 2025 at 9:15 AM Steven Wu <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Ajantha/Robin, thanks for the note. We can include the PR in the
>>>>>>>>>>> 1.10.0 milestone.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jul 16, 2025 at 3:20 AM Robin Moffatt
>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks Ajantha. Just to confirm, from a Confluent point of
>>>>>>>>>>>> view, we will not be able to publish the connector on Confluent 
>>>>>>>>>>>> Hub until
>>>>>>>>>>>> this CVE[1] is fixed.
>>>>>>>>>>>> Since we would not publish a snapshot build, if the fix doesn't
>>>>>>>>>>>> make it into 1.10 then we'd have to wait for 1.11 (or a dot 
>>>>>>>>>>>> release of
>>>>>>>>>>>> 1.10) to be able to include the connector on Confluent Hub.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks, Robin.
>>>>>>>>>>>>
>>>>>>>>>>>> [1]
>>>>>>>>>>>> https://github.com/apache/iceberg/issues/10745#issuecomment-3074300861
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, 16 Jul 2025 at 04:03, Ajantha Bhat <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I have approached Confluent people
>>>>>>>>>>>>> <https://github.com/apache/iceberg/issues/10745#issuecomment-3058281281>
>>>>>>>>>>>>> to help us publish the OSS Kafka Connect Iceberg sink plugin.
>>>>>>>>>>>>> It seems we have a CVE from dependency that blocks us from
>>>>>>>>>>>>> publishing the plugin.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Please include the below PR for 1.10.0 release which fixes
>>>>>>>>>>>>> that.
>>>>>>>>>>>>> https://github.com/apache/iceberg/pull/13561
>>>>>>>>>>>>>
>>>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jul 15, 2025 at 10:48 AM Steven Wu <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> > Engines may model operations as deleting/inserting rows or
>>>>>>>>>>>>>> as modifications to rows that preserve row ids.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Manu, I agree this sentence probably lacks some context. The
>>>>>>>>>>>>>> first half (as deleting/inserting rows) is probably about
>>>>>>>>>>>>>> the row lineage handling with equality deletes, which is 
>>>>>>>>>>>>>> described in
>>>>>>>>>>>>>> another place.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> "Row lineage does not track lineage for rows updated via Equality
>>>>>>>>>>>>>> Deletes
>>>>>>>>>>>>>> <https://iceberg.apache.org/spec/#equality-delete-files>,
>>>>>>>>>>>>>> because engines using equality deletes avoid reading existing 
>>>>>>>>>>>>>> data before
>>>>>>>>>>>>>> writing changes and can't provide the original row ID for the 
>>>>>>>>>>>>>> new rows.
>>>>>>>>>>>>>> These updates are always treated as if the existing row was 
>>>>>>>>>>>>>> completely
>>>>>>>>>>>>>> removed and a unique new row was added."
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 5:49 PM Manu Zhang <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks Steven, I missed that part but the following sentence
>>>>>>>>>>>>>>> is a bit hard to understand (maybe just me)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Engines may model operations as deleting/inserting rows or
>>>>>>>>>>>>>>> as modifications to rows that preserve row ids.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Can you please help to explain?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Steven Wu <[email protected]>于2025年7月15日 周二04:41写道：
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Manu
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The spec already covers the row lineage carry over (for
>>>>>>>>>>>>>>>> replace)
>>>>>>>>>>>>>>>> https://iceberg.apache.org/spec/#row-lineage
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> "When an existing row is moved to a different data file
>>>>>>>>>>>>>>>> for any reason, writers should write _row_id and
>>>>>>>>>>>>>>>> _last_updated_sequence_number according to the following
>>>>>>>>>>>>>>>> rules:"
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Steven
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 1:38 PM Steven Wu <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> another update on the release.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> We have one open PR left for the 1.10.0 milestone
>>>>>>>>>>>>>>>>> <https://github.com/apache/iceberg/milestone/54> (with 25
>>>>>>>>>>>>>>>>> closed PRs). Amogh is actively working on the last blocker PR.
>>>>>>>>>>>>>>>>> Spark 4.0: Preserve row lineage information on compaction
>>>>>>>>>>>>>>>>> <https://github.com/apache/iceberg/pull/13555>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I will publish a release candidate after the above blocker
>>>>>>>>>>>>>>>>> is merged and backported.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Steven
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, Jul 7, 2025 at 11:56 PM Manu Zhang <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Amogh,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Is it defined in the table spec that "replace" operation
>>>>>>>>>>>>>>>>>> should carry over existing lineage info insteading of 
>>>>>>>>>>>>>>>>>> assigning new IDs? If
>>>>>>>>>>>>>>>>>> not, we'd better firstly define it in spec because all 
>>>>>>>>>>>>>>>>>> engines and
>>>>>>>>>>>>>>>>>> implementations need to follow it.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Tue, Jul 8, 2025 at 11:44 AM Amogh Jahagirdar <
>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> One other area I think we need to make sure works with
>>>>>>>>>>>>>>>>>>> row lineage before release is data file compaction. At
>>>>>>>>>>>>>>>>>>> the moment,
>>>>>>>>>>>>>>>>>>> <https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/SparkBinPackFileRewriteRunner.java#L44>
>>>>>>>>>>>>>>>>>>>  it
>>>>>>>>>>>>>>>>>>> looks like compaction will read the records from the data 
>>>>>>>>>>>>>>>>>>> files without
>>>>>>>>>>>>>>>>>>> projecting the lineage fields. What this means is that on 
>>>>>>>>>>>>>>>>>>> write of the new
>>>>>>>>>>>>>>>>>>> compacted data files we'd be losing the lineage 
>>>>>>>>>>>>>>>>>>> information. There's no
>>>>>>>>>>>>>>>>>>> data change in a compaction but we do need to make sure the 
>>>>>>>>>>>>>>>>>>> lineage info
>>>>>>>>>>>>>>>>>>> from carried over records is materialized in the newly 
>>>>>>>>>>>>>>>>>>> compacted files so
>>>>>>>>>>>>>>>>>>> they don't get new IDs or inherit the new file sequence 
>>>>>>>>>>>>>>>>>>> number. I'm working
>>>>>>>>>>>>>>>>>>> on addressing this as well, but I'd call this out as a 
>>>>>>>>>>>>>>>>>>> blocker as well.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> *Robin Moffatt*
>>>>>>>>>>>> *Sr. Principal Advisor, Streaming Data Technologies*
>>>>>>>>>>>>
>>>>>>>>>>>

Re: Iceberg 1.10.0 release update - July 1, 2025

Reply via email to