Re: Iceberg 1.10.0 release update - July 1, 2025

Alexandre Dutra Fri, 01 Aug 2025 06:56:16 -0700

Hi Steven,

A small regression with S3 signing has been reported to me. The fix is simple:


https://github.com/apache/iceberg/pull/13718

Would it be still possible to have it in 1.10 please?

Thanks,
Alex


On Thu, Jul 31, 2025 at 7:19 PM Steven Wu <[email protected]> wrote:
>
> Currently, the 1.10.0 milestone have no open PRs
> https://github.com/apache/iceberg/milestone/54
>
> The variant PR was merged this and last week. There are still some variant 
> testing related PRs, which are probably not blockers for 1.10.0 release.
> * Spark variant read: https://github.com/apache/iceberg/pull/13219
> * use short strings: https://github.com/apache/iceberg/pull/13284
>
> We are still waiting for the following two changes
> * Anton's fix for the data frame join using the same snapshot, which will 
> introduce a slight behavior change in spark 4.0.
> * unknown type support. Fokko raised a discussion thread on a blocking issue.
>
> Anything else did I miss?
>
>
>
> On Sat, Jul 26, 2025 at 5:52 AM Fokko Driesprong <[email protected]> wrote:
>>
>> Hey all,
>>
>> The read path for the UnknownType needs some community discussion. I've 
>> raised a separate thread. PTAL
>>
>> Kind regards from Belgium,
>> Fokko
>>
>> Op za 26 jul 2025 om 00:58 schreef Ryan Blue <[email protected]>:
>>>
>>> I thought that we said we wanted to get support out for v3 features in this 
>>> release unless there is some reasonable blocker, like Spark not having 
>>> geospatial types. To me, I think that means we should aim to get variant 
>>> and unknown done so that we have a complete implementation with a major 
>>> engine. And it should not be particularly difficult to get unknown done so 
>>> I'd opt to get it in.
>>>
>>> On Fri, Jul 25, 2025 at 11:28 AM Steven Wu <[email protected]> wrote:
>>>>
>>>> > I believe we also wanted to get in at least the read path for 
>>>> > UnknownType. Fokko has a WIP PR for that.
>>>> I thought in the community sync the consensus is that this is not a 
>>>> blocker, because it is a new feature implementation. If it is ready, it 
>>>> will be included.
>>>>
>>>> On Fri, Jul 25, 2025 at 9:43 AM Kevin Liu <[email protected]> wrote:
>>>>>
>>>>> I think Fokko's OOO. Should we help with that PR?
>>>>>
>>>>> On Fri, Jul 25, 2025 at 9:38 AM Eduard Tudenhöfner 
>>>>> <[email protected]> wrote:
>>>>>>
>>>>>> I believe we also wanted to get in at least the read path for 
>>>>>> UnknownType. Fokko has a WIP PR for that.
>>>>>>
>>>>>> On Fri, Jul 25, 2025 at 6:13 PM Steven Wu <[email protected]> wrote:
>>>>>>>
>>>>>>> 3. Spark: fix data frame join based on different versions of the same 
>>>>>>> table that may lead to weird results. Anton is working on a fix. It 
>>>>>>> requires a small behavior change (table state may be stale up to 
>>>>>>> refresh interval). Hence it is better to include it in the 1.10.0 
>>>>>>> release where Spark 4.0 is first supported.
>>>>>>> 4. Variant support in core and Spark 4.0. Ryan thinks this is very 
>>>>>>> close and will prioritize the review.
>>>>>>>
>>>>>>> We still have the above two issues pending. 3 doesn't have a PR yet. PR 
>>>>>>> for 4 is not associated with the milestone yet.
>>>>>>>
>>>>>>> On Fri, Jul 25, 2025 at 9:02 AM Kevin Liu <[email protected]> wrote:
>>>>>>>>
>>>>>>>> Thanks everyone for the review. The 2 PRs are both merged.
>>>>>>>> Looks like there's only 1 PR left in the 1.10 milestone :)
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Kevin Liu
>>>>>>>>
>>>>>>>> On Thu, Jul 24, 2025 at 7:44 PM Manu Zhang <[email protected]> 
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Thanks Kevin. The first change is not in the versioned doc so it can 
>>>>>>>>> be released anytime.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Manu
>>>>>>>>>
>>>>>>>>> On Fri, Jul 25, 2025 at 4:21 AM Kevin Liu <[email protected]> 
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> The 3 PRs above are merged. Thanks everyone for the review.
>>>>>>>>>>
>>>>>>>>>> I've added 2 more PRs to the 1.10 milestone. These are both 
>>>>>>>>>> nice-to-haves.
>>>>>>>>>> - docs: add subpage for REST Catalog Spec in "Specification" #13521
>>>>>>>>>> - REST-Fixture: Ensure strict mode on jdbc catalog for rest fixture 
>>>>>>>>>> #13599
>>>>>>>>>>
>>>>>>>>>> The first one changes the link for "REST Catalog Spec" on the left 
>>>>>>>>>> nav of https://iceberg.apache.org/spec/ from the swagger.io link to 
>>>>>>>>>> a dedicated page for IRC.
>>>>>>>>>> The second one fixes the default behavior of `iceberg-rest-fixture` 
>>>>>>>>>> image to align with the general expectation when creating a table in 
>>>>>>>>>> a catalog.
>>>>>>>>>>
>>>>>>>>>> Please take a look. I would like to have both of these as part of 
>>>>>>>>>> the 1.10 release.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Kevin Liu
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Jul 23, 2025 at 1:31 PM Kevin Liu <[email protected]> 
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Here are the 3 PRs to add corresponding tests.
>>>>>>>>>>> https://github.com/apache/iceberg/pull/13648
>>>>>>>>>>> https://github.com/apache/iceberg/pull/13649
>>>>>>>>>>> https://github.com/apache/iceberg/pull/13650
>>>>>>>>>>>
>>>>>>>>>>> I've tagged them with the 1.10 milestone, waiting for CI to 
>>>>>>>>>>> complete :)
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Kevin Liu
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jul 23, 2025 at 1:08 PM Steven Wu <[email protected]> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Kevin, thanks for checking that. I will take a look at your 
>>>>>>>>>>>> backport PRs. Can you add them to the 1.10.0 milestone?
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jul 23, 2025 at 12:27 PM Kevin Liu <[email protected]> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks again for driving this Steven! We're very close!!
>>>>>>>>>>>>>
>>>>>>>>>>>>> As mentioned in the community sync today, I wanted to verify 
>>>>>>>>>>>>> feature parity between Spark 3.5 and Spark 4.0 for this release.
>>>>>>>>>>>>> I was able to verify that Spark 3.5 and Spark 4.0 have feature 
>>>>>>>>>>>>> parity for this upcoming release. More details in the other 
>>>>>>>>>>>>> devlist thread 
>>>>>>>>>>>>> https://lists.apache.org/thread/7x7xcm3y87y81c4grq4nn9gdjd4jm05f
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Kevin Liu
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Jul 23, 2025 at 12:17 PM Steven Wu <[email protected]> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Another update on the release.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The existing blocker PRs are almost done.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> During today's community sync, we identified the following 
>>>>>>>>>>>>>> issues/PRs to be included in the 1.10.0 release.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> backport of PR 13100 to the main branch. I have created a 
>>>>>>>>>>>>>> cherry-pick PR for that. There is a one line difference compared 
>>>>>>>>>>>>>> to the original PR due to the removal of the deprecated 
>>>>>>>>>>>>>> RemoveSnapshot class in main branch for 1.10.0 target. Amogh has 
>>>>>>>>>>>>>> suggested using RemoveSnapshots with a single snapshot id, which 
>>>>>>>>>>>>>> should be supported by all REST catalog servers.
>>>>>>>>>>>>>> Flink compaction doesn't support row lineage. Fail the 
>>>>>>>>>>>>>> compaction for V3 tables. I created a PR for that. Will backport 
>>>>>>>>>>>>>> after it is merged.
>>>>>>>>>>>>>> Spark: fix data frame join based on different versions of the 
>>>>>>>>>>>>>> same table that may lead to weird results. Anton is working on a 
>>>>>>>>>>>>>> fix. It requires a small behavior change (table state may be 
>>>>>>>>>>>>>> stale up to refresh interval). Hence it is better to include it 
>>>>>>>>>>>>>> in the 1.10.0 release where Spark 4.0 is first supported.
>>>>>>>>>>>>>> Variant support in core and Spark 4.0. Ryan thinks this is very 
>>>>>>>>>>>>>> close and will prioritize the review.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> steven
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The 1.10.0 milestone can be found here.
>>>>>>>>>>>>>> https://github.com/apache/iceberg/milestone/54
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Jul 16, 2025 at 9:15 AM Steven Wu <[email protected]> 
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Ajantha/Robin, thanks for the note. We can include the PR in 
>>>>>>>>>>>>>>> the 1.10.0 milestone.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Jul 16, 2025 at 3:20 AM Robin Moffatt 
>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks Ajantha. Just to confirm, from a Confluent point of 
>>>>>>>>>>>>>>>> view, we will not be able to publish the connector on 
>>>>>>>>>>>>>>>> Confluent Hub until this CVE[1] is fixed.
>>>>>>>>>>>>>>>> Since we would not publish a snapshot build, if the fix 
>>>>>>>>>>>>>>>> doesn't make it into 1.10 then we'd have to wait for 1.11 (or 
>>>>>>>>>>>>>>>> a dot release of 1.10) to be able to include the connector on 
>>>>>>>>>>>>>>>> Confluent Hub.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks, Robin.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> [1] 
>>>>>>>>>>>>>>>> https://github.com/apache/iceberg/issues/10745#issuecomment-3074300861
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, 16 Jul 2025 at 04:03, Ajantha Bhat 
>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I have approached Confluent people to help us publish the OSS 
>>>>>>>>>>>>>>>>> Kafka Connect Iceberg sink plugin.
>>>>>>>>>>>>>>>>> It seems we have a CVE from dependency that blocks us from 
>>>>>>>>>>>>>>>>> publishing the plugin.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Please include the below PR for 1.10.0 release which fixes 
>>>>>>>>>>>>>>>>> that.
>>>>>>>>>>>>>>>>> https://github.com/apache/iceberg/pull/13561
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Jul 15, 2025 at 10:48 AM Steven Wu 
>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> > Engines may model operations as deleting/inserting rows or 
>>>>>>>>>>>>>>>>>> > as modifications to rows that preserve row ids.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Manu, I agree this sentence probably lacks some context. The 
>>>>>>>>>>>>>>>>>> first half (as deleting/inserting rows) is probably about 
>>>>>>>>>>>>>>>>>> the row lineage handling with equality deletes, which is 
>>>>>>>>>>>>>>>>>> described in another place.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> "Row lineage does not track lineage for rows updated via 
>>>>>>>>>>>>>>>>>> Equality Deletes, because engines using equality deletes 
>>>>>>>>>>>>>>>>>> avoid reading existing data before writing changes and can't 
>>>>>>>>>>>>>>>>>> provide the original row ID for the new rows. These updates 
>>>>>>>>>>>>>>>>>> are always treated as if the existing row was completely 
>>>>>>>>>>>>>>>>>> removed and a unique new row was added."
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 5:49 PM Manu Zhang 
>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks Steven, I missed that part but the following 
>>>>>>>>>>>>>>>>>>> sentence is a bit hard to understand (maybe just me)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Engines may model operations as deleting/inserting rows or 
>>>>>>>>>>>>>>>>>>> as modifications to rows that preserve row ids.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Can you please help to explain?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Steven Wu <[email protected]>于2025年7月15日 周二04:41写道：
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Manu
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> The spec already covers the row lineage carry over (for 
>>>>>>>>>>>>>>>>>>>> replace)
>>>>>>>>>>>>>>>>>>>> https://iceberg.apache.org/spec/#row-lineage
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> "When an existing row is moved to a different data file 
>>>>>>>>>>>>>>>>>>>> for any reason, writers should write _row_id and 
>>>>>>>>>>>>>>>>>>>> _last_updated_sequence_number according to the following 
>>>>>>>>>>>>>>>>>>>> rules:"
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>> Steven
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 1:38 PM Steven Wu 
>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> another update on the release.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> We have one open PR left for the 1.10.0 milestone (with 
>>>>>>>>>>>>>>>>>>>>> 25 closed PRs). Amogh is actively working on the last 
>>>>>>>>>>>>>>>>>>>>> blocker PR.
>>>>>>>>>>>>>>>>>>>>> Spark 4.0: Preserve row lineage information on compaction
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I will publish a release candidate after the above 
>>>>>>>>>>>>>>>>>>>>> blocker is merged and backported.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>> Steven
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Mon, Jul 7, 2025 at 11:56 PM Manu Zhang 
>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hi Amogh,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Is it defined in the table spec that "replace" operation 
>>>>>>>>>>>>>>>>>>>>>> should carry over existing lineage info insteading of 
>>>>>>>>>>>>>>>>>>>>>> assigning new IDs? If not, we'd better firstly define it 
>>>>>>>>>>>>>>>>>>>>>> in spec because all engines and implementations need to 
>>>>>>>>>>>>>>>>>>>>>> follow it.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 8, 2025 at 11:44 AM Amogh Jahagirdar 
>>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> One other area I think we need to make sure works with 
>>>>>>>>>>>>>>>>>>>>>>> row lineage before release is data file compaction. At 
>>>>>>>>>>>>>>>>>>>>>>> the moment, it looks like compaction will read the 
>>>>>>>>>>>>>>>>>>>>>>> records from the data files without projecting the 
>>>>>>>>>>>>>>>>>>>>>>> lineage fields. What this means is that on write of the 
>>>>>>>>>>>>>>>>>>>>>>> new compacted data files we'd be losing the lineage 
>>>>>>>>>>>>>>>>>>>>>>> information. There's no data change in a compaction but 
>>>>>>>>>>>>>>>>>>>>>>> we do need to make sure the lineage info from carried 
>>>>>>>>>>>>>>>>>>>>>>> over records is materialized in the newly compacted 
>>>>>>>>>>>>>>>>>>>>>>> files so they don't get new IDs or inherit the new file 
>>>>>>>>>>>>>>>>>>>>>>> sequence number. I'm working on addressing this as 
>>>>>>>>>>>>>>>>>>>>>>> well, but I'd call this out as a blocker as well.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Robin Moffatt
>>>>>>>>>>>>>>>> Sr. Principal Advisor, Streaming Data Technologies

Re: Iceberg 1.10.0 release update - July 1, 2025

Reply via email to