Re: Iceberg 1.10.0 release update - August 2025

Kevin Liu Fri, 29 Aug 2025 09:53:38 -0700

Thanks Steven! I did another pass to check for feature parity between spark
3.5 and spark 4.0 for this release and everything looks good. There are a
few test cases that have not been ported, but we can punt those for now.


Best,
Kevin Liu

On Thu, Aug 28, 2025 at 7:08 PM Steven Wu <[email protected]> wrote:

> Thanks to Fokko and Ryan, the unknown type support PR was merged today.
>
> Everything in the 1.10.0 milestone is closed now.
>
> I will work on a release candidate next.
>
> On Fri, Aug 8, 2025 at 6:14 AM Fokko Driesprong <[email protected]> wrote:
>
>> Hi Steven,
>>
>> Thanks for updating this thread.
>>
>> I've updated the UnknownType PR
>> <https://github.com/apache/iceberg/pull/13445> to first block on the
>> complex cases that will require some more discussion. This way we can
>> revisit this also after the 1.10.0 release.
>>
>> Kind regards,
>> Fokko
>>
>>
>>
>>
>> Op do 7 aug 2025 om 23:56 schreef Steven Wu <[email protected]>:
>>
>>> edited the subject line as we are into August.
>>>
>>> We are still waiting for the following two changes for the 1.10.0 release
>>> * Anton's fix for the data frame join using the same snapshot, which
>>> will introduce a slight behavior change in spark 4.0.
>>> * unknown type support.
>>>
>>>
>>> On Fri, Aug 1, 2025 at 6:56 AM Alexandre Dutra <[email protected]>
>>> wrote:
>>>
>>>> Hi Steven,
>>>>
>>>> A small regression with S3 signing has been reported to me. The fix is
>>>> simple:
>>>>
>>>> https://github.com/apache/iceberg/pull/13718
>>>>
>>>> Would it be still possible to have it in 1.10 please?
>>>>
>>>> Thanks,
>>>> Alex
>>>>
>>>>
>>>> On Thu, Jul 31, 2025 at 7:19 PM Steven Wu <[email protected]> wrote:
>>>> >
>>>> > Currently, the 1.10.0 milestone have no open PRs
>>>> > https://github.com/apache/iceberg/milestone/54
>>>> >
>>>> > The variant PR was merged this and last week. There are still some
>>>> variant testing related PRs, which are probably not blockers for 1.10.0
>>>> release.
>>>> > * Spark variant read: https://github.com/apache/iceberg/pull/13219
>>>> > * use short strings: https://github.com/apache/iceberg/pull/13284
>>>> >
>>>> > We are still waiting for the following two changes
>>>> > * Anton's fix for the data frame join using the same snapshot, which
>>>> will introduce a slight behavior change in spark 4.0.
>>>> > * unknown type support. Fokko raised a discussion thread on a
>>>> blocking issue.
>>>> >
>>>> > Anything else did I miss?
>>>> >
>>>> >
>>>> >
>>>> > On Sat, Jul 26, 2025 at 5:52 AM Fokko Driesprong <[email protected]>
>>>> wrote:
>>>> >>
>>>> >> Hey all,
>>>> >>
>>>> >> The read path for the UnknownType needs some community discussion.
>>>> I've raised a separate thread. PTAL
>>>> >>
>>>> >> Kind regards from Belgium,
>>>> >> Fokko
>>>> >>
>>>> >> Op za 26 jul 2025 om 00:58 schreef Ryan Blue <[email protected]>:
>>>> >>>
>>>> >>> I thought that we said we wanted to get support out for v3 features
>>>> in this release unless there is some reasonable blocker, like Spark not
>>>> having geospatial types. To me, I think that means we should aim to get
>>>> variant and unknown done so that we have a complete implementation with a
>>>> major engine. And it should not be particularly difficult to get unknown
>>>> done so I'd opt to get it in.
>>>> >>>
>>>> >>> On Fri, Jul 25, 2025 at 11:28 AM Steven Wu <[email protected]>
>>>> wrote:
>>>> >>>>
>>>> >>>> > I believe we also wanted to get in at least the read path for
>>>> UnknownType. Fokko has a WIP PR for that.
>>>> >>>> I thought in the community sync the consensus is that this is not
>>>> a blocker, because it is a new feature implementation. If it is ready, it
>>>> will be included.
>>>> >>>>
>>>> >>>> On Fri, Jul 25, 2025 at 9:43 AM Kevin Liu <[email protected]>
>>>> wrote:
>>>> >>>>>
>>>> >>>>> I think Fokko's OOO. Should we help with that PR?
>>>> >>>>>
>>>> >>>>> On Fri, Jul 25, 2025 at 9:38 AM Eduard Tudenhöfner <
>>>> [email protected]> wrote:
>>>> >>>>>>
>>>> >>>>>> I believe we also wanted to get in at least the read path for
>>>> UnknownType. Fokko has a WIP PR for that.
>>>> >>>>>>
>>>> >>>>>> On Fri, Jul 25, 2025 at 6:13 PM Steven Wu <[email protected]>
>>>> wrote:
>>>> >>>>>>>
>>>> >>>>>>> 3. Spark: fix data frame join based on different versions of
>>>> the same table that may lead to weird results. Anton is working on a fix.
>>>> It requires a small behavior change (table state may be stale up to refresh
>>>> interval). Hence it is better to include it in the 1.10.0 release where
>>>> Spark 4.0 is first supported.
>>>> >>>>>>> 4. Variant support in core and Spark 4.0. Ryan thinks this is
>>>> very close and will prioritize the review.
>>>> >>>>>>>
>>>> >>>>>>> We still have the above two issues pending. 3 doesn't have a PR
>>>> yet. PR for 4 is not associated with the milestone yet.
>>>> >>>>>>>
>>>> >>>>>>> On Fri, Jul 25, 2025 at 9:02 AM Kevin Liu <
>>>> [email protected]> wrote:
>>>> >>>>>>>>
>>>> >>>>>>>> Thanks everyone for the review. The 2 PRs are both merged.
>>>> >>>>>>>> Looks like there's only 1 PR left in the 1.10 milestone :)
>>>> >>>>>>>>
>>>> >>>>>>>> Best,
>>>> >>>>>>>> Kevin Liu
>>>> >>>>>>>>
>>>> >>>>>>>> On Thu, Jul 24, 2025 at 7:44 PM Manu Zhang <
>>>> [email protected]> wrote:
>>>> >>>>>>>>>
>>>> >>>>>>>>> Thanks Kevin. The first change is not in the versioned doc so
>>>> it can be released anytime.
>>>> >>>>>>>>>
>>>> >>>>>>>>> Regards,
>>>> >>>>>>>>> Manu
>>>> >>>>>>>>>
>>>> >>>>>>>>> On Fri, Jul 25, 2025 at 4:21 AM Kevin Liu <
>>>> [email protected]> wrote:
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> The 3 PRs above are merged. Thanks everyone for the review.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> I've added 2 more PRs to the 1.10 milestone. These are both
>>>> nice-to-haves.
>>>> >>>>>>>>>> - docs: add subpage for REST Catalog Spec in "Specification"
>>>> #13521
>>>> >>>>>>>>>> - REST-Fixture: Ensure strict mode on jdbc catalog for rest
>>>> fixture #13599
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> The first one changes the link for "REST Catalog Spec" on
>>>> the left nav of https://iceberg.apache.org/spec/ from the swagger.io
>>>> link to a dedicated page for IRC.
>>>> >>>>>>>>>> The second one fixes the default behavior of
>>>> `iceberg-rest-fixture` image to align with the general expectation when
>>>> creating a table in a catalog.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> Please take a look. I would like to have both of these as
>>>> part of the 1.10 release.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> Best,
>>>> >>>>>>>>>> Kevin Liu
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> On Wed, Jul 23, 2025 at 1:31 PM Kevin Liu <
>>>> [email protected]> wrote:
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> Here are the 3 PRs to add corresponding tests.
>>>> >>>>>>>>>>> https://github.com/apache/iceberg/pull/13648
>>>> >>>>>>>>>>> https://github.com/apache/iceberg/pull/13649
>>>> >>>>>>>>>>> https://github.com/apache/iceberg/pull/13650
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> I've tagged them with the 1.10 milestone, waiting for CI to
>>>> complete :)
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> Best,
>>>> >>>>>>>>>>> Kevin Liu
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> On Wed, Jul 23, 2025 at 1:08 PM Steven Wu <
>>>> [email protected]> wrote:
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> Kevin, thanks for checking that. I will take a look at
>>>> your backport PRs. Can you add them to the 1.10.0 milestone?
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> On Wed, Jul 23, 2025 at 12:27 PM Kevin Liu <
>>>> [email protected]> wrote:
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> Thanks again for driving this Steven! We're very close!!
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> As mentioned in the community sync today, I wanted to
>>>> verify feature parity between Spark 3.5 and Spark 4.0 for this release.
>>>> >>>>>>>>>>>>> I was able to verify that Spark 3.5 and Spark 4.0 have
>>>> feature parity for this upcoming release. More details in the other devlist
>>>> thread https://lists.apache.org/thread/7x7xcm3y87y81c4grq4nn9gdjd4jm05f
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> Thanks,
>>>> >>>>>>>>>>>>> Kevin Liu
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> On Wed, Jul 23, 2025 at 12:17 PM Steven Wu <
>>>> [email protected]> wrote:
>>>> >>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>> Another update on the release.
>>>> >>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>> The existing blocker PRs are almost done.
>>>> >>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>> During today's community sync, we identified the
>>>> following issues/PRs to be included in the 1.10.0 release.
>>>> >>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>> backport of PR 13100 to the main branch. I have created
>>>> a cherry-pick PR for that. There is a one line difference compared to the
>>>> original PR due to the removal of the deprecated RemoveSnapshot class in
>>>> main branch for 1.10.0 target. Amogh has suggested using RemoveSnapshots
>>>> with a single snapshot id, which should be supported by all REST catalog
>>>> servers.
>>>> >>>>>>>>>>>>>> Flink compaction doesn't support row lineage. Fail the
>>>> compaction for V3 tables. I created a PR for that. Will backport after it
>>>> is merged.
>>>> >>>>>>>>>>>>>> Spark: fix data frame join based on different versions
>>>> of the same table that may lead to weird results. Anton is working on a
>>>> fix. It requires a small behavior change (table state may be stale up to
>>>> refresh interval). Hence it is better to include it in the 1.10.0 release
>>>> where Spark 4.0 is first supported.
>>>> >>>>>>>>>>>>>> Variant support in core and Spark 4.0. Ryan thinks this
>>>> is very close and will prioritize the review.
>>>> >>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>> Thanks,
>>>> >>>>>>>>>>>>>> steven
>>>> >>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>> The 1.10.0 milestone can be found here.
>>>> >>>>>>>>>>>>>> https://github.com/apache/iceberg/milestone/54
>>>> >>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>> On Wed, Jul 16, 2025 at 9:15 AM Steven Wu <
>>>> [email protected]> wrote:
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> Ajantha/Robin, thanks for the note. We can include the
>>>> PR in the 1.10.0 milestone.
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> On Wed, Jul 16, 2025 at 3:20 AM Robin Moffatt
>>>> <[email protected]> wrote:
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>> Thanks Ajantha. Just to confirm, from a Confluent
>>>> point of view, we will not be able to publish the connector on Confluent
>>>> Hub until this CVE[1] is fixed.
>>>> >>>>>>>>>>>>>>>> Since we would not publish a snapshot build, if the
>>>> fix doesn't make it into 1.10 then we'd have to wait for 1.11 (or a dot
>>>> release of 1.10) to be able to include the connector on Confluent Hub.
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>> Thanks, Robin.
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>> [1]
>>>> https://github.com/apache/iceberg/issues/10745#issuecomment-3074300861
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>> On Wed, 16 Jul 2025 at 04:03, Ajantha Bhat <
>>>> [email protected]> wrote:
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>> I have approached Confluent people to help us publish
>>>> the OSS Kafka Connect Iceberg sink plugin.
>>>> >>>>>>>>>>>>>>>>> It seems we have a CVE from dependency that blocks us
>>>> from publishing the plugin.
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>> Please include the below PR for 1.10.0 release which
>>>> fixes that.
>>>> >>>>>>>>>>>>>>>>> https://github.com/apache/iceberg/pull/13561
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>> - Ajantha
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>> On Tue, Jul 15, 2025 at 10:48 AM Steven Wu <
>>>> [email protected]> wrote:
>>>> >>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>> > Engines may model operations as deleting/inserting
>>>> rows or as modifications to rows that preserve row ids.
>>>> >>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>> Manu, I agree this sentence probably lacks some
>>>> context. The first half (as deleting/inserting rows) is probably about the
>>>> row lineage handling with equality deletes, which is described in another
>>>> place.
>>>> >>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>> "Row lineage does not track lineage for rows updated
>>>> via Equality Deletes, because engines using equality deletes avoid reading
>>>> existing data before writing changes and can't provide the original row ID
>>>> for the new rows. These updates are always treated as if the existing row
>>>> was completely removed and a unique new row was added."
>>>> >>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 5:49 PM Manu Zhang <
>>>> [email protected]> wrote:
>>>> >>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>> Thanks Steven, I missed that part but the following
>>>> sentence is a bit hard to understand (maybe just me)
>>>> >>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>> Engines may model operations as deleting/inserting
>>>> rows or as modifications to rows that preserve row ids.
>>>> >>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>> Can you please help to explain?
>>>> >>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>> Steven Wu <[email protected]>于2025年7月15日
>>>> 周二04:41写道：
>>>> >>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>> Manu
>>>> >>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>> The spec already covers the row lineage carry over
>>>> (for replace)
>>>> >>>>>>>>>>>>>>>>>>>> https://iceberg.apache.org/spec/#row-lineage
>>>> >>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>> "When an existing row is moved to a different data
>>>> file for any reason, writers should write _row_id and
>>>> _last_updated_sequence_number according to the following rules:"
>>>> >>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>> Thanks,
>>>> >>>>>>>>>>>>>>>>>>>> Steven
>>>> >>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 1:38 PM Steven Wu <
>>>> [email protected]> wrote:
>>>> >>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>> another update on the release.
>>>> >>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>> We have one open PR left for the 1.10.0 milestone
>>>> (with 25 closed PRs). Amogh is actively working on the last blocker PR.
>>>> >>>>>>>>>>>>>>>>>>>>> Spark 4.0: Preserve row lineage information on
>>>> compaction
>>>> >>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>> I will publish a release candidate after the
>>>> above blocker is merged and backported.
>>>> >>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>> Thanks,
>>>> >>>>>>>>>>>>>>>>>>>>> Steven
>>>> >>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>> On Mon, Jul 7, 2025 at 11:56 PM Manu Zhang <
>>>> [email protected]> wrote:
>>>> >>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>> Hi Amogh,
>>>> >>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>> Is it defined in the table spec that "replace"
>>>> operation should carry over existing lineage info insteading of assigning
>>>> new IDs? If not, we'd better firstly define it in spec because all engines
>>>> and implementations need to follow it.
>>>> >>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 8, 2025 at 11:44 AM Amogh Jahagirdar
>>>> <[email protected]> wrote:
>>>> >>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>> One other area I think we need to make sure
>>>> works with row lineage before release is data file compaction. At the
>>>> moment, it looks like compaction will read the records from the data files
>>>> without projecting the lineage fields. What this means is that on write of
>>>> the new compacted data files we'd be losing the lineage information.
>>>> There's no data change in a compaction but we do need to make sure the
>>>> lineage info from carried over records is materialized in the newly
>>>> compacted files so they don't get new IDs or inherit the new file sequence
>>>> number. I'm working on addressing this as well, but I'd call this out as a
>>>> blocker as well.
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>> --
>>>> >>>>>>>>>>>>>>>> Robin Moffatt
>>>> >>>>>>>>>>>>>>>> Sr. Principal Advisor, Streaming Data Technologies
>>>>
>>>

Re: Iceberg 1.10.0 release update - August 2025

Reply via email to