Re: Iceberg 1.10.0 release update - August 2025

Fokko Driesprong Fri, 08 Aug 2025 06:14:34 -0700

Hi Steven,

Thanks for updating this thread.


I've updated the UnknownType PR
<https://github.com/apache/iceberg/pull/13445> to first block on the
complex cases that will require some more discussion. This way we can
revisit this also after the 1.10.0 release.

Kind regards,
Fokko




Op do 7 aug 2025 om 23:56 schreef Steven Wu <[email protected]>:

> edited the subject line as we are into August.
>
> We are still waiting for the following two changes for the 1.10.0 release
> * Anton's fix for the data frame join using the same snapshot, which will
> introduce a slight behavior change in spark 4.0.
> * unknown type support.
>
>
> On Fri, Aug 1, 2025 at 6:56 AM Alexandre Dutra <[email protected]> wrote:
>
>> Hi Steven,
>>
>> A small regression with S3 signing has been reported to me. The fix is
>> simple:
>>
>> https://github.com/apache/iceberg/pull/13718
>>
>> Would it be still possible to have it in 1.10 please?
>>
>> Thanks,
>> Alex
>>
>>
>> On Thu, Jul 31, 2025 at 7:19 PM Steven Wu <[email protected]> wrote:
>> >
>> > Currently, the 1.10.0 milestone have no open PRs
>> > https://github.com/apache/iceberg/milestone/54
>> >
>> > The variant PR was merged this and last week. There are still some
>> variant testing related PRs, which are probably not blockers for 1.10.0
>> release.
>> > * Spark variant read: https://github.com/apache/iceberg/pull/13219
>> > * use short strings: https://github.com/apache/iceberg/pull/13284
>> >
>> > We are still waiting for the following two changes
>> > * Anton's fix for the data frame join using the same snapshot, which
>> will introduce a slight behavior change in spark 4.0.
>> > * unknown type support. Fokko raised a discussion thread on a blocking
>> issue.
>> >
>> > Anything else did I miss?
>> >
>> >
>> >
>> > On Sat, Jul 26, 2025 at 5:52 AM Fokko Driesprong <[email protected]>
>> wrote:
>> >>
>> >> Hey all,
>> >>
>> >> The read path for the UnknownType needs some community discussion.
>> I've raised a separate thread. PTAL
>> >>
>> >> Kind regards from Belgium,
>> >> Fokko
>> >>
>> >> Op za 26 jul 2025 om 00:58 schreef Ryan Blue <[email protected]>:
>> >>>
>> >>> I thought that we said we wanted to get support out for v3 features
>> in this release unless there is some reasonable blocker, like Spark not
>> having geospatial types. To me, I think that means we should aim to get
>> variant and unknown done so that we have a complete implementation with a
>> major engine. And it should not be particularly difficult to get unknown
>> done so I'd opt to get it in.
>> >>>
>> >>> On Fri, Jul 25, 2025 at 11:28 AM Steven Wu <[email protected]>
>> wrote:
>> >>>>
>> >>>> > I believe we also wanted to get in at least the read path for
>> UnknownType. Fokko has a WIP PR for that.
>> >>>> I thought in the community sync the consensus is that this is not a
>> blocker, because it is a new feature implementation. If it is ready, it
>> will be included.
>> >>>>
>> >>>> On Fri, Jul 25, 2025 at 9:43 AM Kevin Liu <[email protected]>
>> wrote:
>> >>>>>
>> >>>>> I think Fokko's OOO. Should we help with that PR?
>> >>>>>
>> >>>>> On Fri, Jul 25, 2025 at 9:38 AM Eduard Tudenhöfner <
>> [email protected]> wrote:
>> >>>>>>
>> >>>>>> I believe we also wanted to get in at least the read path for
>> UnknownType. Fokko has a WIP PR for that.
>> >>>>>>
>> >>>>>> On Fri, Jul 25, 2025 at 6:13 PM Steven Wu <[email protected]>
>> wrote:
>> >>>>>>>
>> >>>>>>> 3. Spark: fix data frame join based on different versions of the
>> same table that may lead to weird results. Anton is working on a fix. It
>> requires a small behavior change (table state may be stale up to refresh
>> interval). Hence it is better to include it in the 1.10.0 release where
>> Spark 4.0 is first supported.
>> >>>>>>> 4. Variant support in core and Spark 4.0. Ryan thinks this is
>> very close and will prioritize the review.
>> >>>>>>>
>> >>>>>>> We still have the above two issues pending. 3 doesn't have a PR
>> yet. PR for 4 is not associated with the milestone yet.
>> >>>>>>>
>> >>>>>>> On Fri, Jul 25, 2025 at 9:02 AM Kevin Liu <[email protected]>
>> wrote:
>> >>>>>>>>
>> >>>>>>>> Thanks everyone for the review. The 2 PRs are both merged.
>> >>>>>>>> Looks like there's only 1 PR left in the 1.10 milestone :)
>> >>>>>>>>
>> >>>>>>>> Best,
>> >>>>>>>> Kevin Liu
>> >>>>>>>>
>> >>>>>>>> On Thu, Jul 24, 2025 at 7:44 PM Manu Zhang <
>> [email protected]> wrote:
>> >>>>>>>>>
>> >>>>>>>>> Thanks Kevin. The first change is not in the versioned doc so
>> it can be released anytime.
>> >>>>>>>>>
>> >>>>>>>>> Regards,
>> >>>>>>>>> Manu
>> >>>>>>>>>
>> >>>>>>>>> On Fri, Jul 25, 2025 at 4:21 AM Kevin Liu <
>> [email protected]> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>> The 3 PRs above are merged. Thanks everyone for the review.
>> >>>>>>>>>>
>> >>>>>>>>>> I've added 2 more PRs to the 1.10 milestone. These are both
>> nice-to-haves.
>> >>>>>>>>>> - docs: add subpage for REST Catalog Spec in "Specification"
>> #13521
>> >>>>>>>>>> - REST-Fixture: Ensure strict mode on jdbc catalog for rest
>> fixture #13599
>> >>>>>>>>>>
>> >>>>>>>>>> The first one changes the link for "REST Catalog Spec" on the
>> left nav of https://iceberg.apache.org/spec/ from the swagger.io link to
>> a dedicated page for IRC.
>> >>>>>>>>>> The second one fixes the default behavior of
>> `iceberg-rest-fixture` image to align with the general expectation when
>> creating a table in a catalog.
>> >>>>>>>>>>
>> >>>>>>>>>> Please take a look. I would like to have both of these as part
>> of the 1.10 release.
>> >>>>>>>>>>
>> >>>>>>>>>> Best,
>> >>>>>>>>>> Kevin Liu
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> On Wed, Jul 23, 2025 at 1:31 PM Kevin Liu <
>> [email protected]> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>> Here are the 3 PRs to add corresponding tests.
>> >>>>>>>>>>> https://github.com/apache/iceberg/pull/13648
>> >>>>>>>>>>> https://github.com/apache/iceberg/pull/13649
>> >>>>>>>>>>> https://github.com/apache/iceberg/pull/13650
>> >>>>>>>>>>>
>> >>>>>>>>>>> I've tagged them with the 1.10 milestone, waiting for CI to
>> complete :)
>> >>>>>>>>>>>
>> >>>>>>>>>>> Best,
>> >>>>>>>>>>> Kevin Liu
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Wed, Jul 23, 2025 at 1:08 PM Steven Wu <
>> [email protected]> wrote:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Kevin, thanks for checking that. I will take a look at your
>> backport PRs. Can you add them to the 1.10.0 milestone?
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> On Wed, Jul 23, 2025 at 12:27 PM Kevin Liu <
>> [email protected]> wrote:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Thanks again for driving this Steven! We're very close!!
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> As mentioned in the community sync today, I wanted to
>> verify feature parity between Spark 3.5 and Spark 4.0 for this release.
>> >>>>>>>>>>>>> I was able to verify that Spark 3.5 and Spark 4.0 have
>> feature parity for this upcoming release. More details in the other devlist
>> thread https://lists.apache.org/thread/7x7xcm3y87y81c4grq4nn9gdjd4jm05f
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Thanks,
>> >>>>>>>>>>>>> Kevin Liu
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> On Wed, Jul 23, 2025 at 12:17 PM Steven Wu <
>> [email protected]> wrote:
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Another update on the release.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> The existing blocker PRs are almost done.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> During today's community sync, we identified the following
>> issues/PRs to be included in the 1.10.0 release.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> backport of PR 13100 to the main branch. I have created a
>> cherry-pick PR for that. There is a one line difference compared to the
>> original PR due to the removal of the deprecated RemoveSnapshot class in
>> main branch for 1.10.0 target. Amogh has suggested using RemoveSnapshots
>> with a single snapshot id, which should be supported by all REST catalog
>> servers.
>> >>>>>>>>>>>>>> Flink compaction doesn't support row lineage. Fail the
>> compaction for V3 tables. I created a PR for that. Will backport after it
>> is merged.
>> >>>>>>>>>>>>>> Spark: fix data frame join based on different versions of
>> the same table that may lead to weird results. Anton is working on a fix.
>> It requires a small behavior change (table state may be stale up to refresh
>> interval). Hence it is better to include it in the 1.10.0 release where
>> Spark 4.0 is first supported.
>> >>>>>>>>>>>>>> Variant support in core and Spark 4.0. Ryan thinks this is
>> very close and will prioritize the review.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Thanks,
>> >>>>>>>>>>>>>> steven
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> The 1.10.0 milestone can be found here.
>> >>>>>>>>>>>>>> https://github.com/apache/iceberg/milestone/54
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> On Wed, Jul 16, 2025 at 9:15 AM Steven Wu <
>> [email protected]> wrote:
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Ajantha/Robin, thanks for the note. We can include the PR
>> in the 1.10.0 milestone.
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> On Wed, Jul 16, 2025 at 3:20 AM Robin Moffatt
>> <[email protected]> wrote:
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Thanks Ajantha. Just to confirm, from a Confluent point
>> of view, we will not be able to publish the connector on Confluent Hub
>> until this CVE[1] is fixed.
>> >>>>>>>>>>>>>>>> Since we would not publish a snapshot build, if the fix
>> doesn't make it into 1.10 then we'd have to wait for 1.11 (or a dot release
>> of 1.10) to be able to include the connector on Confluent Hub.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Thanks, Robin.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> [1]
>> https://github.com/apache/iceberg/issues/10745#issuecomment-3074300861
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> On Wed, 16 Jul 2025 at 04:03, Ajantha Bhat <
>> [email protected]> wrote:
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> I have approached Confluent people to help us publish
>> the OSS Kafka Connect Iceberg sink plugin.
>> >>>>>>>>>>>>>>>>> It seems we have a CVE from dependency that blocks us
>> from publishing the plugin.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Please include the below PR for 1.10.0 release which
>> fixes that.
>> >>>>>>>>>>>>>>>>> https://github.com/apache/iceberg/pull/13561
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> - Ajantha
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> On Tue, Jul 15, 2025 at 10:48 AM Steven Wu <
>> [email protected]> wrote:
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> > Engines may model operations as deleting/inserting
>> rows or as modifications to rows that preserve row ids.
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> Manu, I agree this sentence probably lacks some
>> context. The first half (as deleting/inserting rows) is probably about the
>> row lineage handling with equality deletes, which is described in another
>> place.
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> "Row lineage does not track lineage for rows updated
>> via Equality Deletes, because engines using equality deletes avoid reading
>> existing data before writing changes and can't provide the original row ID
>> for the new rows. These updates are always treated as if the existing row
>> was completely removed and a unique new row was added."
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 5:49 PM Manu Zhang <
>> [email protected]> wrote:
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> Thanks Steven, I missed that part but the following
>> sentence is a bit hard to understand (maybe just me)
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> Engines may model operations as deleting/inserting
>> rows or as modifications to rows that preserve row ids.
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> Can you please help to explain?
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> Steven Wu <[email protected]>于2025年7月15日
>> 周二04:41写道：
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> Manu
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> The spec already covers the row lineage carry over
>> (for replace)
>> >>>>>>>>>>>>>>>>>>>> https://iceberg.apache.org/spec/#row-lineage
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> "When an existing row is moved to a different data
>> file for any reason, writers should write _row_id and
>> _last_updated_sequence_number according to the following rules:"
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> Thanks,
>> >>>>>>>>>>>>>>>>>>>> Steven
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 1:38 PM Steven Wu <
>> [email protected]> wrote:
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> another update on the release.
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> We have one open PR left for the 1.10.0 milestone
>> (with 25 closed PRs). Amogh is actively working on the last blocker PR.
>> >>>>>>>>>>>>>>>>>>>>> Spark 4.0: Preserve row lineage information on
>> compaction
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> I will publish a release candidate after the above
>> blocker is merged and backported.
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> Thanks,
>> >>>>>>>>>>>>>>>>>>>>> Steven
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> On Mon, Jul 7, 2025 at 11:56 PM Manu Zhang <
>> [email protected]> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> Hi Amogh,
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> Is it defined in the table spec that "replace"
>> operation should carry over existing lineage info insteading of assigning
>> new IDs? If not, we'd better firstly define it in spec because all engines
>> and implementations need to follow it.
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 8, 2025 at 11:44 AM Amogh Jahagirdar <
>> [email protected]> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> One other area I think we need to make sure works
>> with row lineage before release is data file compaction. At the moment, it
>> looks like compaction will read the records from the data files without
>> projecting the lineage fields. What this means is that on write of the new
>> compacted data files we'd be losing the lineage information. There's no
>> data change in a compaction but we do need to make sure the lineage info
>> from carried over records is materialized in the newly compacted files so
>> they don't get new IDs or inherit the new file sequence number. I'm working
>> on addressing this as well, but I'd call this out as a blocker as well.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> --
>> >>>>>>>>>>>>>>>> Robin Moffatt
>> >>>>>>>>>>>>>>>> Sr. Principal Advisor, Streaming Data Technologies
>>
>

Re: Iceberg 1.10.0 release update - August 2025

Reply via email to