I believe we also wanted to get in at least the read path for UnknownType. Fokko has a WIP PR <https://github.com/apache/iceberg/pull/13445> for that.
On Fri, Jul 25, 2025 at 6:13 PM Steven Wu <stevenz...@gmail.com> wrote: > 3. Spark: fix data frame join based on different versions of the same > table that may lead to weird results. Anton is working on a fix. It > requires a small behavior change (table state may be stale up to refresh > interval). Hence it is better to include it in the 1.10.0 release where > Spark 4.0 is first supported. > 4. Variant support in core and Spark 4.0. Ryan thinks this is very close > and will prioritize the review. > > We still have the above two issues pending. 3 doesn't have a PR yet. PR > for 4 is not associated with the milestone yet. > > On Fri, Jul 25, 2025 at 9:02 AM Kevin Liu <kevinjq...@apache.org> wrote: > >> Thanks everyone for the review. The 2 PRs are both merged. >> Looks like there's only 1 PR left in the 1.10 milestone >> <https://github.com/apache/iceberg/milestone/54> :) >> >> Best, >> Kevin Liu >> >> On Thu, Jul 24, 2025 at 7:44 PM Manu Zhang <owenzhang1...@gmail.com> >> wrote: >> >>> Thanks Kevin. The first change is not in the versioned doc so it can be >>> released anytime. >>> >>> Regards, >>> Manu >>> >>> On Fri, Jul 25, 2025 at 4:21 AM Kevin Liu <kevinjq...@apache.org> wrote: >>> >>>> The 3 PRs above are merged. Thanks everyone for the review. >>>> >>>> I've added 2 more PRs to the 1.10 milestone. These are both >>>> nice-to-haves. >>>> - docs: add subpage for REST Catalog Spec in "Specification" #13521 >>>> <https://github.com/apache/iceberg/pull/13521> >>>> - REST-Fixture: Ensure strict mode on jdbc catalog for rest fixture >>>> #13599 <https://github.com/apache/iceberg/pull/13599> >>>> >>>> The first one changes the link for "REST Catalog Spec" on the left nav >>>> of https://iceberg.apache.org/spec/ from the swagger.io link to a >>>> dedicated page for IRC. >>>> The second one fixes the default behavior of `iceberg-rest-fixture` >>>> image to align with the general expectation when creating a table in a >>>> catalog. >>>> >>>> Please take a look. I would like to have both of these as part of the >>>> 1.10 release. >>>> >>>> Best, >>>> Kevin Liu >>>> >>>> >>>> On Wed, Jul 23, 2025 at 1:31 PM Kevin Liu <kevinjq...@apache.org> >>>> wrote: >>>> >>>>> Here are the 3 PRs to add corresponding tests. >>>>> https://github.com/apache/iceberg/pull/13648 >>>>> https://github.com/apache/iceberg/pull/13649 >>>>> https://github.com/apache/iceberg/pull/13650 >>>>> >>>>> I've tagged them with the 1.10 milestone, waiting for CI to complete >>>>> :) >>>>> >>>>> Best, >>>>> Kevin Liu >>>>> >>>>> On Wed, Jul 23, 2025 at 1:08 PM Steven Wu <stevenz...@gmail.com> >>>>> wrote: >>>>> >>>>>> Kevin, thanks for checking that. I will take a look at your backport >>>>>> PRs. Can you add them to the 1.10.0 milestone? >>>>>> >>>>>> On Wed, Jul 23, 2025 at 12:27 PM Kevin Liu <kevinjq...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> Thanks again for driving this Steven! We're very close!! >>>>>>> >>>>>>> As mentioned in the community sync today, I wanted to verify feature >>>>>>> parity between Spark 3.5 and Spark 4.0 for this release. >>>>>>> I was able to verify that Spark 3.5 and Spark 4.0 have feature >>>>>>> parity for this upcoming release. More details in the other devlist >>>>>>> thread >>>>>>> https://lists.apache.org/thread/7x7xcm3y87y81c4grq4nn9gdjd4jm05f >>>>>>> >>>>>>> Thanks, >>>>>>> Kevin Liu >>>>>>> >>>>>>> On Wed, Jul 23, 2025 at 12:17 PM Steven Wu <stevenz...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Another update on the release. >>>>>>>> >>>>>>>> The existing blocker PRs are almost done. >>>>>>>> >>>>>>>> During today's community sync, we identified the following >>>>>>>> issues/PRs to be included in the 1.10.0 release. >>>>>>>> >>>>>>>> 1. backport of PR 13100 to the main branch. I have created a >>>>>>>> cherry-pick >>>>>>>> PR <https://github.com/apache/iceberg/pull/13647> for that. >>>>>>>> There is a one line difference compared to the original PR due to >>>>>>>> the >>>>>>>> removal of the deprecated RemoveSnapshot class in main branch for >>>>>>>> 1.10.0 >>>>>>>> target. Amogh has suggested using RemoveSnapshots with a single >>>>>>>> snapshot >>>>>>>> id, which should be supported by all REST catalog servers. >>>>>>>> 2. Flink compaction doesn't support row lineage. Fail the >>>>>>>> compaction for V3 tables. I created a PR >>>>>>>> <https://github.com/apache/iceberg/pull/13646> for that. Will >>>>>>>> backport after it is merged. >>>>>>>> 3. Spark: fix data frame join based on different versions of >>>>>>>> the same table that may lead to weird results. Anton is working on >>>>>>>> a fix. >>>>>>>> It requires a small behavior change (table state may be stale up to >>>>>>>> refresh >>>>>>>> interval). Hence it is better to include it in the 1.10.0 release >>>>>>>> where >>>>>>>> Spark 4.0 is first supported. >>>>>>>> 4. Variant support in core and Spark 4.0. Ryan thinks this is >>>>>>>> very close and will prioritize the review. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> steven >>>>>>>> >>>>>>>> The 1.10.0 milestone can be found here. >>>>>>>> https://github.com/apache/iceberg/milestone/54 >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jul 16, 2025 at 9:15 AM Steven Wu <stevenz...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Ajantha/Robin, thanks for the note. We can include the PR in the >>>>>>>>> 1.10.0 milestone. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Jul 16, 2025 at 3:20 AM Robin Moffatt >>>>>>>>> <ro...@confluent.io.invalid> wrote: >>>>>>>>> >>>>>>>>>> Thanks Ajantha. Just to confirm, from a Confluent point of view, >>>>>>>>>> we will not be able to publish the connector on Confluent Hub until >>>>>>>>>> this >>>>>>>>>> CVE[1] is fixed. >>>>>>>>>> Since we would not publish a snapshot build, if the fix doesn't >>>>>>>>>> make it into 1.10 then we'd have to wait for 1.11 (or a dot release >>>>>>>>>> of >>>>>>>>>> 1.10) to be able to include the connector on Confluent Hub. >>>>>>>>>> >>>>>>>>>> Thanks, Robin. >>>>>>>>>> >>>>>>>>>> [1] >>>>>>>>>> https://github.com/apache/iceberg/issues/10745#issuecomment-3074300861 >>>>>>>>>> >>>>>>>>>> On Wed, 16 Jul 2025 at 04:03, Ajantha Bhat <ajanthab...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> I have approached Confluent people >>>>>>>>>>> <https://github.com/apache/iceberg/issues/10745#issuecomment-3058281281> >>>>>>>>>>> to help us publish the OSS Kafka Connect Iceberg sink plugin. >>>>>>>>>>> It seems we have a CVE from dependency that blocks us from >>>>>>>>>>> publishing the plugin. >>>>>>>>>>> >>>>>>>>>>> Please include the below PR for 1.10.0 release which fixes that. >>>>>>>>>>> https://github.com/apache/iceberg/pull/13561 >>>>>>>>>>> >>>>>>>>>>> - Ajantha >>>>>>>>>>> >>>>>>>>>>> On Tue, Jul 15, 2025 at 10:48 AM Steven Wu <stevenz...@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> > Engines may model operations as deleting/inserting rows or >>>>>>>>>>>> as modifications to rows that preserve row ids. >>>>>>>>>>>> >>>>>>>>>>>> Manu, I agree this sentence probably lacks some context. The >>>>>>>>>>>> first half (as deleting/inserting rows) is probably about the >>>>>>>>>>>> row lineage handling with equality deletes, which is described in >>>>>>>>>>>> another >>>>>>>>>>>> place. >>>>>>>>>>>> >>>>>>>>>>>> "Row lineage does not track lineage for rows updated via Equality >>>>>>>>>>>> Deletes >>>>>>>>>>>> <https://iceberg.apache.org/spec/#equality-delete-files>, >>>>>>>>>>>> because engines using equality deletes avoid reading existing data >>>>>>>>>>>> before >>>>>>>>>>>> writing changes and can't provide the original row ID for the new >>>>>>>>>>>> rows. >>>>>>>>>>>> These updates are always treated as if the existing row was >>>>>>>>>>>> completely >>>>>>>>>>>> removed and a unique new row was added." >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Jul 14, 2025 at 5:49 PM Manu Zhang < >>>>>>>>>>>> owenzhang1...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Thanks Steven, I missed that part but the following sentence >>>>>>>>>>>>> is a bit hard to understand (maybe just me) >>>>>>>>>>>>> >>>>>>>>>>>>> Engines may model operations as deleting/inserting rows or as >>>>>>>>>>>>> modifications to rows that preserve row ids. >>>>>>>>>>>>> >>>>>>>>>>>>> Can you please help to explain? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Steven Wu <stevenz...@gmail.com>于2025年7月15日 周二04:41写道: >>>>>>>>>>>>> >>>>>>>>>>>>>> Manu >>>>>>>>>>>>>> >>>>>>>>>>>>>> The spec already covers the row lineage carry over (for >>>>>>>>>>>>>> replace) >>>>>>>>>>>>>> https://iceberg.apache.org/spec/#row-lineage >>>>>>>>>>>>>> >>>>>>>>>>>>>> "When an existing row is moved to a different data file for >>>>>>>>>>>>>> any reason, writers should write _row_id and >>>>>>>>>>>>>> _last_updated_sequence_number according to the following >>>>>>>>>>>>>> rules:" >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Steven >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 1:38 PM Steven Wu < >>>>>>>>>>>>>> stevenz...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> another update on the release. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We have one open PR left for the 1.10.0 milestone >>>>>>>>>>>>>>> <https://github.com/apache/iceberg/milestone/54> (with 25 >>>>>>>>>>>>>>> closed PRs). Amogh is actively working on the last blocker PR. >>>>>>>>>>>>>>> Spark 4.0: Preserve row lineage information on compaction >>>>>>>>>>>>>>> <https://github.com/apache/iceberg/pull/13555> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I will publish a release candidate after the above blocker >>>>>>>>>>>>>>> is merged and backported. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Steven >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Mon, Jul 7, 2025 at 11:56 PM Manu Zhang < >>>>>>>>>>>>>>> owenzhang1...@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Amogh, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Is it defined in the table spec that "replace" operation >>>>>>>>>>>>>>>> should carry over existing lineage info insteading of >>>>>>>>>>>>>>>> assigning new IDs? If >>>>>>>>>>>>>>>> not, we'd better firstly define it in spec because all engines >>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>> implementations need to follow it. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, Jul 8, 2025 at 11:44 AM Amogh Jahagirdar < >>>>>>>>>>>>>>>> 2am...@gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> One other area I think we need to make sure works with row >>>>>>>>>>>>>>>>> lineage before release is data file compaction. At the >>>>>>>>>>>>>>>>> moment, >>>>>>>>>>>>>>>>> <https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/SparkBinPackFileRewriteRunner.java#L44> >>>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>>> looks like compaction will read the records from the data >>>>>>>>>>>>>>>>> files without >>>>>>>>>>>>>>>>> projecting the lineage fields. What this means is that on >>>>>>>>>>>>>>>>> write of the new >>>>>>>>>>>>>>>>> compacted data files we'd be losing the lineage information. >>>>>>>>>>>>>>>>> There's no >>>>>>>>>>>>>>>>> data change in a compaction but we do need to make sure the >>>>>>>>>>>>>>>>> lineage info >>>>>>>>>>>>>>>>> from carried over records is materialized in the newly >>>>>>>>>>>>>>>>> compacted files so >>>>>>>>>>>>>>>>> they don't get new IDs or inherit the new file sequence >>>>>>>>>>>>>>>>> number. I'm working >>>>>>>>>>>>>>>>> on addressing this as well, but I'd call this out as a >>>>>>>>>>>>>>>>> blocker as well. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> *Robin Moffatt* >>>>>>>>>> *Sr. Principal Advisor, Streaming Data Technologies* >>>>>>>>>> >>>>>>>>>