Re: Iceberg 1.10.0 release update - September 2025

Cheng Pan Tue, 02 Sep 2025 11:28:59 -0700

Have you checked repository.apache.org <http://repository.apache.org/>? I 
remember the staging repo will record the Client IP.


It’s likely that you have multiple Public IPs in your local network and the 
HTTP connections happen go via different IPs.

I think it doesn’t matter, just listing all repo links in the vote thread is 
fine.

Thanks,
Cheng Pan



> On Sep 3, 2025, at 01:02, Steven Wu <[email protected]> wrote:
> 
> sorry, the PR link for the staging-binaries.sh was wrong (missing a digit).
> 
> I thought this PR will fix the issue. Initially, it worked well with a few 
> runs. But later I am still experiencing the same problem. Suggestions are 
> appreciated!
> https://github.com/apache/iceberg/pull/13958
> 
> On Tue, Sep 2, 2025 at 9:51 AM Steven Wu <[email protected] 
> <mailto:[email protected]>> wrote:
>> Hi,
>> 
>> Just to update the community on the status.
>> 
>> Fokko also reached out to include Parquet Java 1.16.0 in this release. Vote 
>> just passed in the Parquet community. We are waiting for the binary release. 
>> We will try to include it in the 1.10.0 release. Reviews are welcomed.
>> https://github.com/apache/iceberg/pull/1394
>> 
>> We also ran into a couple of issues with the release script/process.
>> 
>> 1) staging-binaries.sh has race conditions on concurrent publish and 2 
>> folders in Maven repo. 
>> 
>> I thought this PR will fix the issue. Initially, it worked well with a few 
>> runs. But later I am still experiencing the same problem. Suggestions are 
>> appreciated!
>> https://github.com/apache/iceberg/pull/13958
>> 
>> 2) Yuya found out that the iceberg-api module wasn't published in the RC2 
>> staging (1243). 
>> https://repository.apache.org/content/repositories/orgapacheiceberg-1243/
>> 
>> The first release issue is the more annoying/impacting problem. the second 
>> release issue is uncommon, as I didn't see it in a few other runs of 
>> staging-binaries.sh.
>> 
>> Thanks,
>> Steven
>> 
>> 
>> 
>> On Sun, Aug 31, 2025 at 12:48 PM Steven Wu <[email protected] 
>> <mailto:[email protected]>> wrote:
>>> I started a vote thread for 1.10.0 RC2.
>>> 
>>> I have to fix a couple of release script issues. Hence the first release 
>>> candidate is RC2 to vote.
>>> 
>>> On Fri, Aug 29, 2025 at 9:53 AM Kevin Liu <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>>> Thanks Steven! I did another pass to check for feature parity between 
>>>> spark 3.5 and spark 4.0 for this release and everything looks good. There 
>>>> are a few test cases that have not been ported, but we can punt those for 
>>>> now.
>>>> 
>>>> Best,
>>>> Kevin Liu
>>>> 
>>>> On Thu, Aug 28, 2025 at 7:08 PM Steven Wu <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>>> Thanks to Fokko and Ryan, the unknown type support PR was merged today.
>>>>> 
>>>>> Everything in the 1.10.0 milestone is closed now.
>>>>> 
>>>>> I will work on a release candidate next.
>>>>> 
>>>>> On Fri, Aug 8, 2025 at 6:14 AM Fokko Driesprong <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>>> Hi Steven,
>>>>>> 
>>>>>> Thanks for updating this thread.
>>>>>> 
>>>>>> I've updated the UnknownType PR 
>>>>>> <https://github.com/apache/iceberg/pull/13445> to first block on the 
>>>>>> complex cases that will require some more discussion. This way we can 
>>>>>> revisit this also after the 1.10.0 release.
>>>>>> 
>>>>>> Kind regards,
>>>>>> Fokko
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Op do 7 aug 2025 om 23:56 schreef Steven Wu <[email protected] 
>>>>>> <mailto:[email protected]>>:
>>>>>>> edited the subject line as we are into August.
>>>>>>> 
>>>>>>> We are still waiting for the following two changes for the 1.10.0 
>>>>>>> release
>>>>>>> * Anton's fix for the data frame join using the same snapshot, which 
>>>>>>> will introduce a slight behavior change in spark 4.0.
>>>>>>> * unknown type support. 
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, Aug 1, 2025 at 6:56 AM Alexandre Dutra <[email protected] 
>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>> Hi Steven,
>>>>>>>> 
>>>>>>>> A small regression with S3 signing has been reported to me. The fix is 
>>>>>>>> simple:
>>>>>>>> 
>>>>>>>> https://github.com/apache/iceberg/pull/13718
>>>>>>>> 
>>>>>>>> Would it be still possible to have it in 1.10 please?
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Alex
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Thu, Jul 31, 2025 at 7:19 PM Steven Wu <[email protected] 
>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>> >
>>>>>>>> > Currently, the 1.10.0 milestone have no open PRs
>>>>>>>> > https://github.com/apache/iceberg/milestone/54
>>>>>>>> >
>>>>>>>> > The variant PR was merged this and last week. There are still some 
>>>>>>>> > variant testing related PRs, which are probably not blockers for 
>>>>>>>> > 1.10.0 release.
>>>>>>>> > * Spark variant read: https://github.com/apache/iceberg/pull/13219
>>>>>>>> > * use short strings: https://github.com/apache/iceberg/pull/13284
>>>>>>>> >
>>>>>>>> > We are still waiting for the following two changes
>>>>>>>> > * Anton's fix for the data frame join using the same snapshot, which 
>>>>>>>> > will introduce a slight behavior change in spark 4.0.
>>>>>>>> > * unknown type support. Fokko raised a discussion thread on a 
>>>>>>>> > blocking issue.
>>>>>>>> >
>>>>>>>> > Anything else did I miss?
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On Sat, Jul 26, 2025 at 5:52 AM Fokko Driesprong <[email protected] 
>>>>>>>> > <mailto:[email protected]>> wrote:
>>>>>>>> >>
>>>>>>>> >> Hey all,
>>>>>>>> >>
>>>>>>>> >> The read path for the UnknownType needs some community discussion. 
>>>>>>>> >> I've raised a separate thread. PTAL
>>>>>>>> >>
>>>>>>>> >> Kind regards from Belgium,
>>>>>>>> >> Fokko
>>>>>>>> >>
>>>>>>>> >> Op za 26 jul 2025 om 00:58 schreef Ryan Blue <[email protected] 
>>>>>>>> >> <mailto:[email protected]>>:
>>>>>>>> >>>
>>>>>>>> >>> I thought that we said we wanted to get support out for v3 
>>>>>>>> >>> features in this release unless there is some reasonable blocker, 
>>>>>>>> >>> like Spark not having geospatial types. To me, I think that means 
>>>>>>>> >>> we should aim to get variant and unknown done so that we have a 
>>>>>>>> >>> complete implementation with a major engine. And it should not be 
>>>>>>>> >>> particularly difficult to get unknown done so I'd opt to get it in.
>>>>>>>> >>>
>>>>>>>> >>> On Fri, Jul 25, 2025 at 11:28 AM Steven Wu <[email protected] 
>>>>>>>> >>> <mailto:[email protected]>> wrote:
>>>>>>>> >>>>
>>>>>>>> >>>> > I believe we also wanted to get in at least the read path for 
>>>>>>>> >>>> > UnknownType. Fokko has a WIP PR for that.
>>>>>>>> >>>> I thought in the community sync the consensus is that this is not 
>>>>>>>> >>>> a blocker, because it is a new feature implementation. If it is 
>>>>>>>> >>>> ready, it will be included.
>>>>>>>> >>>>
>>>>>>>> >>>> On Fri, Jul 25, 2025 at 9:43 AM Kevin Liu <[email protected] 
>>>>>>>> >>>> <mailto:[email protected]>> wrote:
>>>>>>>> >>>>>
>>>>>>>> >>>>> I think Fokko's OOO. Should we help with that PR?
>>>>>>>> >>>>>
>>>>>>>> >>>>> On Fri, Jul 25, 2025 at 9:38 AM Eduard Tudenhöfner 
>>>>>>>> >>>>> <[email protected] <mailto:[email protected]>> 
>>>>>>>> >>>>> wrote:
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> I believe we also wanted to get in at least the read path for 
>>>>>>>> >>>>>> UnknownType. Fokko has a WIP PR for that.
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> On Fri, Jul 25, 2025 at 6:13 PM Steven Wu <[email protected] 
>>>>>>>> >>>>>> <mailto:[email protected]>> wrote:
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> 3. Spark: fix data frame join based on different versions of 
>>>>>>>> >>>>>>> the same table that may lead to weird results. Anton is 
>>>>>>>> >>>>>>> working on a fix. It requires a small behavior change (table 
>>>>>>>> >>>>>>> state may be stale up to refresh interval). Hence it is better 
>>>>>>>> >>>>>>> to include it in the 1.10.0 release where Spark 4.0 is first 
>>>>>>>> >>>>>>> supported.
>>>>>>>> >>>>>>> 4. Variant support in core and Spark 4.0. Ryan thinks this is 
>>>>>>>> >>>>>>> very close and will prioritize the review.
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> We still have the above two issues pending. 3 doesn't have a 
>>>>>>>> >>>>>>> PR yet. PR for 4 is not associated with the milestone yet.
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> On Fri, Jul 25, 2025 at 9:02 AM Kevin Liu 
>>>>>>>> >>>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> Thanks everyone for the review. The 2 PRs are both merged.
>>>>>>>> >>>>>>>> Looks like there's only 1 PR left in the 1.10 milestone :)
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> Best,
>>>>>>>> >>>>>>>> Kevin Liu
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> On Thu, Jul 24, 2025 at 7:44 PM Manu Zhang 
>>>>>>>> >>>>>>>> <[email protected] <mailto:[email protected]>> 
>>>>>>>> >>>>>>>> wrote:
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>> Thanks Kevin. The first change is not in the versioned doc 
>>>>>>>> >>>>>>>>> so it can be released anytime.
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>> Regards,
>>>>>>>> >>>>>>>>> Manu
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>> On Fri, Jul 25, 2025 at 4:21 AM Kevin Liu 
>>>>>>>> >>>>>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>>>>> >>>>>>>>>>
>>>>>>>> >>>>>>>>>> The 3 PRs above are merged. Thanks everyone for the review.
>>>>>>>> >>>>>>>>>>
>>>>>>>> >>>>>>>>>> I've added 2 more PRs to the 1.10 milestone. These are both 
>>>>>>>> >>>>>>>>>> nice-to-haves.
>>>>>>>> >>>>>>>>>> - docs: add subpage for REST Catalog Spec in 
>>>>>>>> >>>>>>>>>> "Specification" #13521
>>>>>>>> >>>>>>>>>> - REST-Fixture: Ensure strict mode on jdbc catalog for rest 
>>>>>>>> >>>>>>>>>> fixture #13599
>>>>>>>> >>>>>>>>>>
>>>>>>>> >>>>>>>>>> The first one changes the link for "REST Catalog Spec" on 
>>>>>>>> >>>>>>>>>> the left nav of https://iceberg.apache.org/spec/ from the 
>>>>>>>> >>>>>>>>>> swagger.io <http://swagger.io/> link to a dedicated page 
>>>>>>>> >>>>>>>>>> for IRC.
>>>>>>>> >>>>>>>>>> The second one fixes the default behavior of 
>>>>>>>> >>>>>>>>>> `iceberg-rest-fixture` image to align with the general 
>>>>>>>> >>>>>>>>>> expectation when creating a table in a catalog.
>>>>>>>> >>>>>>>>>>
>>>>>>>> >>>>>>>>>> Please take a look. I would like to have both of these as 
>>>>>>>> >>>>>>>>>> part of the 1.10 release.
>>>>>>>> >>>>>>>>>>
>>>>>>>> >>>>>>>>>> Best,
>>>>>>>> >>>>>>>>>> Kevin Liu
>>>>>>>> >>>>>>>>>>
>>>>>>>> >>>>>>>>>>
>>>>>>>> >>>>>>>>>> On Wed, Jul 23, 2025 at 1:31 PM Kevin Liu 
>>>>>>>> >>>>>>>>>> <[email protected] <mailto:[email protected]>> 
>>>>>>>> >>>>>>>>>> wrote:
>>>>>>>> >>>>>>>>>>>
>>>>>>>> >>>>>>>>>>> Here are the 3 PRs to add corresponding tests.
>>>>>>>> >>>>>>>>>>> https://github.com/apache/iceberg/pull/13648
>>>>>>>> >>>>>>>>>>> https://github.com/apache/iceberg/pull/13649
>>>>>>>> >>>>>>>>>>> https://github.com/apache/iceberg/pull/13650
>>>>>>>> >>>>>>>>>>>
>>>>>>>> >>>>>>>>>>> I've tagged them with the 1.10 milestone, waiting for CI 
>>>>>>>> >>>>>>>>>>> to complete :)
>>>>>>>> >>>>>>>>>>>
>>>>>>>> >>>>>>>>>>> Best,
>>>>>>>> >>>>>>>>>>> Kevin Liu
>>>>>>>> >>>>>>>>>>>
>>>>>>>> >>>>>>>>>>> On Wed, Jul 23, 2025 at 1:08 PM Steven Wu 
>>>>>>>> >>>>>>>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>>>>> >>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>> Kevin, thanks for checking that. I will take a look at 
>>>>>>>> >>>>>>>>>>>> your backport PRs. Can you add them to the 1.10.0 
>>>>>>>> >>>>>>>>>>>> milestone?
>>>>>>>> >>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>> On Wed, Jul 23, 2025 at 12:27 PM Kevin Liu 
>>>>>>>> >>>>>>>>>>>> <[email protected] <mailto:[email protected]>> 
>>>>>>>> >>>>>>>>>>>> wrote:
>>>>>>>> >>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>> Thanks again for driving this Steven! We're very close!!
>>>>>>>> >>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>> As mentioned in the community sync today, I wanted to 
>>>>>>>> >>>>>>>>>>>>> verify feature parity between Spark 3.5 and Spark 4.0 
>>>>>>>> >>>>>>>>>>>>> for this release.
>>>>>>>> >>>>>>>>>>>>> I was able to verify that Spark 3.5 and Spark 4.0 have 
>>>>>>>> >>>>>>>>>>>>> feature parity for this upcoming release. More details 
>>>>>>>> >>>>>>>>>>>>> in the other devlist thread 
>>>>>>>> >>>>>>>>>>>>> https://lists.apache.org/thread/7x7xcm3y87y81c4grq4nn9gdjd4jm05f
>>>>>>>> >>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>> Thanks,
>>>>>>>> >>>>>>>>>>>>> Kevin Liu
>>>>>>>> >>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>> On Wed, Jul 23, 2025 at 12:17 PM Steven Wu 
>>>>>>>> >>>>>>>>>>>>> <[email protected] <mailto:[email protected]>> 
>>>>>>>> >>>>>>>>>>>>> wrote:
>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>> Another update on the release.
>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>> The existing blocker PRs are almost done.
>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>> During today's community sync, we identified the 
>>>>>>>> >>>>>>>>>>>>>> following issues/PRs to be included in the 1.10.0 
>>>>>>>> >>>>>>>>>>>>>> release.
>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>> backport of PR 13100 to the main branch. I have created 
>>>>>>>> >>>>>>>>>>>>>> a cherry-pick PR for that. There is a one line 
>>>>>>>> >>>>>>>>>>>>>> difference compared to the original PR due to the 
>>>>>>>> >>>>>>>>>>>>>> removal of the deprecated RemoveSnapshot class in main 
>>>>>>>> >>>>>>>>>>>>>> branch for 1.10.0 target. Amogh has suggested using 
>>>>>>>> >>>>>>>>>>>>>> RemoveSnapshots with a single snapshot id, which should 
>>>>>>>> >>>>>>>>>>>>>> be supported by all REST catalog servers.
>>>>>>>> >>>>>>>>>>>>>> Flink compaction doesn't support row lineage. Fail the 
>>>>>>>> >>>>>>>>>>>>>> compaction for V3 tables. I created a PR for that. Will 
>>>>>>>> >>>>>>>>>>>>>> backport after it is merged.
>>>>>>>> >>>>>>>>>>>>>> Spark: fix data frame join based on different versions 
>>>>>>>> >>>>>>>>>>>>>> of the same table that may lead to weird results. Anton 
>>>>>>>> >>>>>>>>>>>>>> is working on a fix. It requires a small behavior 
>>>>>>>> >>>>>>>>>>>>>> change (table state may be stale up to refresh 
>>>>>>>> >>>>>>>>>>>>>> interval). Hence it is better to include it in the 
>>>>>>>> >>>>>>>>>>>>>> 1.10.0 release where Spark 4.0 is first supported.
>>>>>>>> >>>>>>>>>>>>>> Variant support in core and Spark 4.0. Ryan thinks this 
>>>>>>>> >>>>>>>>>>>>>> is very close and will prioritize the review.
>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>> Thanks,
>>>>>>>> >>>>>>>>>>>>>> steven
>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>> The 1.10.0 milestone can be found here.
>>>>>>>> >>>>>>>>>>>>>> https://github.com/apache/iceberg/milestone/54
>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>> On Wed, Jul 16, 2025 at 9:15 AM Steven Wu 
>>>>>>>> >>>>>>>>>>>>>> <[email protected] <mailto:[email protected]>> 
>>>>>>>> >>>>>>>>>>>>>> wrote:
>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>> Ajantha/Robin, thanks for the note. We can include the 
>>>>>>>> >>>>>>>>>>>>>>> PR in the 1.10.0 milestone.
>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>> On Wed, Jul 16, 2025 at 3:20 AM Robin Moffatt 
>>>>>>>> >>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>> >>>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>>> Thanks Ajantha. Just to confirm, from a Confluent 
>>>>>>>> >>>>>>>>>>>>>>>> point of view, we will not be able to publish the 
>>>>>>>> >>>>>>>>>>>>>>>> connector on Confluent Hub until this CVE[1] is fixed.
>>>>>>>> >>>>>>>>>>>>>>>> Since we would not publish a snapshot build, if the 
>>>>>>>> >>>>>>>>>>>>>>>> fix doesn't make it into 1.10 then we'd have to wait 
>>>>>>>> >>>>>>>>>>>>>>>> for 1.11 (or a dot release of 1.10) to be able to 
>>>>>>>> >>>>>>>>>>>>>>>> include the connector on Confluent Hub.
>>>>>>>> >>>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>>> Thanks, Robin.
>>>>>>>> >>>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>>> [1] 
>>>>>>>> >>>>>>>>>>>>>>>> https://github.com/apache/iceberg/issues/10745#issuecomment-3074300861
>>>>>>>> >>>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>>> On Wed, 16 Jul 2025 at 04:03, Ajantha Bhat 
>>>>>>>> >>>>>>>>>>>>>>>> <[email protected] 
>>>>>>>> >>>>>>>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>> >>>>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>>>> I have approached Confluent people to help us 
>>>>>>>> >>>>>>>>>>>>>>>>> publish the OSS Kafka Connect Iceberg sink plugin.
>>>>>>>> >>>>>>>>>>>>>>>>> It seems we have a CVE from dependency that blocks 
>>>>>>>> >>>>>>>>>>>>>>>>> us from publishing the plugin.
>>>>>>>> >>>>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>>>> Please include the below PR for 1.10.0 release which 
>>>>>>>> >>>>>>>>>>>>>>>>> fixes that.
>>>>>>>> >>>>>>>>>>>>>>>>> https://github.com/apache/iceberg/pull/13561
>>>>>>>> >>>>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>>>> - Ajantha
>>>>>>>> >>>>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>>>> On Tue, Jul 15, 2025 at 10:48 AM Steven Wu 
>>>>>>>> >>>>>>>>>>>>>>>>> <[email protected] <mailto:[email protected]>> 
>>>>>>>> >>>>>>>>>>>>>>>>> wrote:
>>>>>>>> >>>>>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>>>>> > Engines may model operations as 
>>>>>>>> >>>>>>>>>>>>>>>>>> > deleting/inserting rows or as modifications to 
>>>>>>>> >>>>>>>>>>>>>>>>>> > rows that preserve row ids.
>>>>>>>> >>>>>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>>>>> Manu, I agree this sentence probably lacks some 
>>>>>>>> >>>>>>>>>>>>>>>>>> context. The first half (as deleting/inserting 
>>>>>>>> >>>>>>>>>>>>>>>>>> rows) is probably about the row lineage handling 
>>>>>>>> >>>>>>>>>>>>>>>>>> with equality deletes, which is described in 
>>>>>>>> >>>>>>>>>>>>>>>>>> another place.
>>>>>>>> >>>>>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>>>>> "Row lineage does not track lineage for rows 
>>>>>>>> >>>>>>>>>>>>>>>>>> updated via Equality Deletes, because engines using 
>>>>>>>> >>>>>>>>>>>>>>>>>> equality deletes avoid reading existing data before 
>>>>>>>> >>>>>>>>>>>>>>>>>> writing changes and can't provide the original row 
>>>>>>>> >>>>>>>>>>>>>>>>>> ID for the new rows. These updates are always 
>>>>>>>> >>>>>>>>>>>>>>>>>> treated as if the existing row was completely 
>>>>>>>> >>>>>>>>>>>>>>>>>> removed and a unique new row was added."
>>>>>>>> >>>>>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 5:49 PM Manu Zhang 
>>>>>>>> >>>>>>>>>>>>>>>>>> <[email protected] 
>>>>>>>> >>>>>>>>>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>> >>>>>>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks Steven, I missed that part but the 
>>>>>>>> >>>>>>>>>>>>>>>>>>> following sentence is a bit hard to understand 
>>>>>>>> >>>>>>>>>>>>>>>>>>> (maybe just me)
>>>>>>>> >>>>>>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>>>>>> Engines may model operations as deleting/inserting 
>>>>>>>> >>>>>>>>>>>>>>>>>>> rows or as modifications to rows that preserve row 
>>>>>>>> >>>>>>>>>>>>>>>>>>> ids.
>>>>>>>> >>>>>>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>>>>>> Can you please help to explain?
>>>>>>>> >>>>>>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>>>>>> Steven Wu <[email protected] 
>>>>>>>> >>>>>>>>>>>>>>>>>>> <mailto:[email protected]>>于2025年7月15日 
>>>>>>>> >>>>>>>>>>>>>>>>>>> 周二04:41写道：
>>>>>>>> >>>>>>>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>>>>>>> Manu
>>>>>>>> >>>>>>>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>>>>>>> The spec already covers the row lineage carry 
>>>>>>>> >>>>>>>>>>>>>>>>>>>> over (for replace)
>>>>>>>> >>>>>>>>>>>>>>>>>>>> https://iceberg.apache.org/spec/#row-lineage
>>>>>>>> >>>>>>>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>>>>>>> "When an existing row is moved to a different 
>>>>>>>> >>>>>>>>>>>>>>>>>>>> data file for any reason, writers should write 
>>>>>>>> >>>>>>>>>>>>>>>>>>>> _row_id and _last_updated_sequence_number 
>>>>>>>> >>>>>>>>>>>>>>>>>>>> according to the following rules:"
>>>>>>>> >>>>>>>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>> >>>>>>>>>>>>>>>>>>>> Steven
>>>>>>>> >>>>>>>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 1:38 PM Steven Wu 
>>>>>>>> >>>>>>>>>>>>>>>>>>>> <[email protected] 
>>>>>>>> >>>>>>>>>>>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>> >>>>>>>>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>>>>>>>> another update on the release.
>>>>>>>> >>>>>>>>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>>>>>>>> We have one open PR left for the 1.10.0 
>>>>>>>> >>>>>>>>>>>>>>>>>>>>> milestone (with 25 closed PRs). Amogh is 
>>>>>>>> >>>>>>>>>>>>>>>>>>>>> actively working on the last blocker PR.
>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Spark 4.0: Preserve row lineage information on 
>>>>>>>> >>>>>>>>>>>>>>>>>>>>> compaction
>>>>>>>> >>>>>>>>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I will publish a release candidate after the 
>>>>>>>> >>>>>>>>>>>>>>>>>>>>> above blocker is merged and backported.
>>>>>>>> >>>>>>>>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Steven
>>>>>>>> >>>>>>>>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Mon, Jul 7, 2025 at 11:56 PM Manu Zhang 
>>>>>>>> >>>>>>>>>>>>>>>>>>>>> <[email protected] 
>>>>>>>> >>>>>>>>>>>>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Hi Amogh,
>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Is it defined in the table spec that "replace" 
>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> operation should carry over existing lineage 
>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> info insteading of assigning new IDs? If not, 
>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> we'd better firstly define it in spec because 
>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> all engines and implementations need to follow 
>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> it.
>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 8, 2025 at 11:44 AM Amogh 
>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Jahagirdar <[email protected] 
>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> One other area I think we need to make sure 
>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> works with row lineage before release is data 
>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> file compaction. At the moment, it looks like 
>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> compaction will read the records from the data 
>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> files without projecting the lineage fields. 
>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> What this means is that on write of the new 
>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> compacted data files we'd be losing the 
>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> lineage information. There's no data change in 
>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> a compaction but we do need to make sure the 
>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> lineage info from carried over records is 
>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> materialized in the newly compacted files so 
>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> they don't get new IDs or inherit the new file 
>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> sequence number. I'm working on addressing 
>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> this as well, but I'd call this out as a 
>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> blocker as well.
>>>>>>>> >>>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>>> --
>>>>>>>> >>>>>>>>>>>>>>>> Robin Moffatt
>>>>>>>> >>>>>>>>>>>>>>>> Sr. Principal Advisor, Streaming Data Technologies

Re: Iceberg 1.10.0 release update - September 2025

Reply via email to