Thanks to Fokko and Ryan, the unknown type support PR was merged today. Everything in the 1.10.0 milestone is closed now.
I will work on a release candidate next. On Fri, Aug 8, 2025 at 6:14 AM Fokko Driesprong <fo...@apache.org> wrote: > Hi Steven, > > Thanks for updating this thread. > > I've updated the UnknownType PR > <https://github.com/apache/iceberg/pull/13445> to first block on the > complex cases that will require some more discussion. This way we can > revisit this also after the 1.10.0 release. > > Kind regards, > Fokko > > > > > Op do 7 aug 2025 om 23:56 schreef Steven Wu <stevenz...@gmail.com>: > >> edited the subject line as we are into August. >> >> We are still waiting for the following two changes for the 1.10.0 release >> * Anton's fix for the data frame join using the same snapshot, which will >> introduce a slight behavior change in spark 4.0. >> * unknown type support. >> >> >> On Fri, Aug 1, 2025 at 6:56 AM Alexandre Dutra <adu...@apache.org> wrote: >> >>> Hi Steven, >>> >>> A small regression with S3 signing has been reported to me. The fix is >>> simple: >>> >>> https://github.com/apache/iceberg/pull/13718 >>> >>> Would it be still possible to have it in 1.10 please? >>> >>> Thanks, >>> Alex >>> >>> >>> On Thu, Jul 31, 2025 at 7:19 PM Steven Wu <stevenz...@gmail.com> wrote: >>> > >>> > Currently, the 1.10.0 milestone have no open PRs >>> > https://github.com/apache/iceberg/milestone/54 >>> > >>> > The variant PR was merged this and last week. There are still some >>> variant testing related PRs, which are probably not blockers for 1.10.0 >>> release. >>> > * Spark variant read: https://github.com/apache/iceberg/pull/13219 >>> > * use short strings: https://github.com/apache/iceberg/pull/13284 >>> > >>> > We are still waiting for the following two changes >>> > * Anton's fix for the data frame join using the same snapshot, which >>> will introduce a slight behavior change in spark 4.0. >>> > * unknown type support. Fokko raised a discussion thread on a blocking >>> issue. >>> > >>> > Anything else did I miss? >>> > >>> > >>> > >>> > On Sat, Jul 26, 2025 at 5:52 AM Fokko Driesprong <fo...@apache.org> >>> wrote: >>> >> >>> >> Hey all, >>> >> >>> >> The read path for the UnknownType needs some community discussion. >>> I've raised a separate thread. PTAL >>> >> >>> >> Kind regards from Belgium, >>> >> Fokko >>> >> >>> >> Op za 26 jul 2025 om 00:58 schreef Ryan Blue <rdb...@gmail.com>: >>> >>> >>> >>> I thought that we said we wanted to get support out for v3 features >>> in this release unless there is some reasonable blocker, like Spark not >>> having geospatial types. To me, I think that means we should aim to get >>> variant and unknown done so that we have a complete implementation with a >>> major engine. And it should not be particularly difficult to get unknown >>> done so I'd opt to get it in. >>> >>> >>> >>> On Fri, Jul 25, 2025 at 11:28 AM Steven Wu <stevenz...@gmail.com> >>> wrote: >>> >>>> >>> >>>> > I believe we also wanted to get in at least the read path for >>> UnknownType. Fokko has a WIP PR for that. >>> >>>> I thought in the community sync the consensus is that this is not a >>> blocker, because it is a new feature implementation. If it is ready, it >>> will be included. >>> >>>> >>> >>>> On Fri, Jul 25, 2025 at 9:43 AM Kevin Liu <kevinjq...@apache.org> >>> wrote: >>> >>>>> >>> >>>>> I think Fokko's OOO. Should we help with that PR? >>> >>>>> >>> >>>>> On Fri, Jul 25, 2025 at 9:38 AM Eduard Tudenhöfner < >>> etudenhoef...@apache.org> wrote: >>> >>>>>> >>> >>>>>> I believe we also wanted to get in at least the read path for >>> UnknownType. Fokko has a WIP PR for that. >>> >>>>>> >>> >>>>>> On Fri, Jul 25, 2025 at 6:13 PM Steven Wu <stevenz...@gmail.com> >>> wrote: >>> >>>>>>> >>> >>>>>>> 3. Spark: fix data frame join based on different versions of the >>> same table that may lead to weird results. Anton is working on a fix. It >>> requires a small behavior change (table state may be stale up to refresh >>> interval). Hence it is better to include it in the 1.10.0 release where >>> Spark 4.0 is first supported. >>> >>>>>>> 4. Variant support in core and Spark 4.0. Ryan thinks this is >>> very close and will prioritize the review. >>> >>>>>>> >>> >>>>>>> We still have the above two issues pending. 3 doesn't have a PR >>> yet. PR for 4 is not associated with the milestone yet. >>> >>>>>>> >>> >>>>>>> On Fri, Jul 25, 2025 at 9:02 AM Kevin Liu <kevinjq...@apache.org> >>> wrote: >>> >>>>>>>> >>> >>>>>>>> Thanks everyone for the review. The 2 PRs are both merged. >>> >>>>>>>> Looks like there's only 1 PR left in the 1.10 milestone :) >>> >>>>>>>> >>> >>>>>>>> Best, >>> >>>>>>>> Kevin Liu >>> >>>>>>>> >>> >>>>>>>> On Thu, Jul 24, 2025 at 7:44 PM Manu Zhang < >>> owenzhang1...@gmail.com> wrote: >>> >>>>>>>>> >>> >>>>>>>>> Thanks Kevin. The first change is not in the versioned doc so >>> it can be released anytime. >>> >>>>>>>>> >>> >>>>>>>>> Regards, >>> >>>>>>>>> Manu >>> >>>>>>>>> >>> >>>>>>>>> On Fri, Jul 25, 2025 at 4:21 AM Kevin Liu < >>> kevinjq...@apache.org> wrote: >>> >>>>>>>>>> >>> >>>>>>>>>> The 3 PRs above are merged. Thanks everyone for the review. >>> >>>>>>>>>> >>> >>>>>>>>>> I've added 2 more PRs to the 1.10 milestone. These are both >>> nice-to-haves. >>> >>>>>>>>>> - docs: add subpage for REST Catalog Spec in "Specification" >>> #13521 >>> >>>>>>>>>> - REST-Fixture: Ensure strict mode on jdbc catalog for rest >>> fixture #13599 >>> >>>>>>>>>> >>> >>>>>>>>>> The first one changes the link for "REST Catalog Spec" on the >>> left nav of https://iceberg.apache.org/spec/ from the swagger.io link >>> to a dedicated page for IRC. >>> >>>>>>>>>> The second one fixes the default behavior of >>> `iceberg-rest-fixture` image to align with the general expectation when >>> creating a table in a catalog. >>> >>>>>>>>>> >>> >>>>>>>>>> Please take a look. I would like to have both of these as >>> part of the 1.10 release. >>> >>>>>>>>>> >>> >>>>>>>>>> Best, >>> >>>>>>>>>> Kevin Liu >>> >>>>>>>>>> >>> >>>>>>>>>> >>> >>>>>>>>>> On Wed, Jul 23, 2025 at 1:31 PM Kevin Liu < >>> kevinjq...@apache.org> wrote: >>> >>>>>>>>>>> >>> >>>>>>>>>>> Here are the 3 PRs to add corresponding tests. >>> >>>>>>>>>>> https://github.com/apache/iceberg/pull/13648 >>> >>>>>>>>>>> https://github.com/apache/iceberg/pull/13649 >>> >>>>>>>>>>> https://github.com/apache/iceberg/pull/13650 >>> >>>>>>>>>>> >>> >>>>>>>>>>> I've tagged them with the 1.10 milestone, waiting for CI to >>> complete :) >>> >>>>>>>>>>> >>> >>>>>>>>>>> Best, >>> >>>>>>>>>>> Kevin Liu >>> >>>>>>>>>>> >>> >>>>>>>>>>> On Wed, Jul 23, 2025 at 1:08 PM Steven Wu < >>> stevenz...@gmail.com> wrote: >>> >>>>>>>>>>>> >>> >>>>>>>>>>>> Kevin, thanks for checking that. I will take a look at your >>> backport PRs. Can you add them to the 1.10.0 milestone? >>> >>>>>>>>>>>> >>> >>>>>>>>>>>> On Wed, Jul 23, 2025 at 12:27 PM Kevin Liu < >>> kevinjq...@apache.org> wrote: >>> >>>>>>>>>>>>> >>> >>>>>>>>>>>>> Thanks again for driving this Steven! We're very close!! >>> >>>>>>>>>>>>> >>> >>>>>>>>>>>>> As mentioned in the community sync today, I wanted to >>> verify feature parity between Spark 3.5 and Spark 4.0 for this release. >>> >>>>>>>>>>>>> I was able to verify that Spark 3.5 and Spark 4.0 have >>> feature parity for this upcoming release. More details in the other devlist >>> thread https://lists.apache.org/thread/7x7xcm3y87y81c4grq4nn9gdjd4jm05f >>> >>>>>>>>>>>>> >>> >>>>>>>>>>>>> Thanks, >>> >>>>>>>>>>>>> Kevin Liu >>> >>>>>>>>>>>>> >>> >>>>>>>>>>>>> On Wed, Jul 23, 2025 at 12:17 PM Steven Wu < >>> stevenz...@gmail.com> wrote: >>> >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>> Another update on the release. >>> >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>> The existing blocker PRs are almost done. >>> >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>> During today's community sync, we identified the >>> following issues/PRs to be included in the 1.10.0 release. >>> >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>> backport of PR 13100 to the main branch. I have created a >>> cherry-pick PR for that. There is a one line difference compared to the >>> original PR due to the removal of the deprecated RemoveSnapshot class in >>> main branch for 1.10.0 target. Amogh has suggested using RemoveSnapshots >>> with a single snapshot id, which should be supported by all REST catalog >>> servers. >>> >>>>>>>>>>>>>> Flink compaction doesn't support row lineage. Fail the >>> compaction for V3 tables. I created a PR for that. Will backport after it >>> is merged. >>> >>>>>>>>>>>>>> Spark: fix data frame join based on different versions of >>> the same table that may lead to weird results. Anton is working on a fix. >>> It requires a small behavior change (table state may be stale up to refresh >>> interval). Hence it is better to include it in the 1.10.0 release where >>> Spark 4.0 is first supported. >>> >>>>>>>>>>>>>> Variant support in core and Spark 4.0. Ryan thinks this >>> is very close and will prioritize the review. >>> >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>> Thanks, >>> >>>>>>>>>>>>>> steven >>> >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>> The 1.10.0 milestone can be found here. >>> >>>>>>>>>>>>>> https://github.com/apache/iceberg/milestone/54 >>> >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>> On Wed, Jul 16, 2025 at 9:15 AM Steven Wu < >>> stevenz...@gmail.com> wrote: >>> >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> Ajantha/Robin, thanks for the note. We can include the >>> PR in the 1.10.0 milestone. >>> >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> On Wed, Jul 16, 2025 at 3:20 AM Robin Moffatt >>> <ro...@confluent.io.invalid> wrote: >>> >>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>> Thanks Ajantha. Just to confirm, from a Confluent point >>> of view, we will not be able to publish the connector on Confluent Hub >>> until this CVE[1] is fixed. >>> >>>>>>>>>>>>>>>> Since we would not publish a snapshot build, if the fix >>> doesn't make it into 1.10 then we'd have to wait for 1.11 (or a dot release >>> of 1.10) to be able to include the connector on Confluent Hub. >>> >>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>> Thanks, Robin. >>> >>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>> [1] >>> https://github.com/apache/iceberg/issues/10745#issuecomment-3074300861 >>> >>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>> On Wed, 16 Jul 2025 at 04:03, Ajantha Bhat < >>> ajanthab...@gmail.com> wrote: >>> >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>> I have approached Confluent people to help us publish >>> the OSS Kafka Connect Iceberg sink plugin. >>> >>>>>>>>>>>>>>>>> It seems we have a CVE from dependency that blocks us >>> from publishing the plugin. >>> >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>> Please include the below PR for 1.10.0 release which >>> fixes that. >>> >>>>>>>>>>>>>>>>> https://github.com/apache/iceberg/pull/13561 >>> >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>> - Ajantha >>> >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>> On Tue, Jul 15, 2025 at 10:48 AM Steven Wu < >>> stevenz...@gmail.com> wrote: >>> >>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>> > Engines may model operations as deleting/inserting >>> rows or as modifications to rows that preserve row ids. >>> >>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>> Manu, I agree this sentence probably lacks some >>> context. The first half (as deleting/inserting rows) is probably about the >>> row lineage handling with equality deletes, which is described in another >>> place. >>> >>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>> "Row lineage does not track lineage for rows updated >>> via Equality Deletes, because engines using equality deletes avoid reading >>> existing data before writing changes and can't provide the original row ID >>> for the new rows. These updates are always treated as if the existing row >>> was completely removed and a unique new row was added." >>> >>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 5:49 PM Manu Zhang < >>> owenzhang1...@gmail.com> wrote: >>> >>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>> Thanks Steven, I missed that part but the following >>> sentence is a bit hard to understand (maybe just me) >>> >>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>> Engines may model operations as deleting/inserting >>> rows or as modifications to rows that preserve row ids. >>> >>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>> Can you please help to explain? >>> >>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>> Steven Wu <stevenz...@gmail.com>于2025年7月15日 >>> 周二04:41写道: >>> >>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>> Manu >>> >>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>> The spec already covers the row lineage carry over >>> (for replace) >>> >>>>>>>>>>>>>>>>>>>> https://iceberg.apache.org/spec/#row-lineage >>> >>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>> "When an existing row is moved to a different data >>> file for any reason, writers should write _row_id and >>> _last_updated_sequence_number according to the following rules:" >>> >>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>> Thanks, >>> >>>>>>>>>>>>>>>>>>>> Steven >>> >>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 1:38 PM Steven Wu < >>> stevenz...@gmail.com> wrote: >>> >>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>> another update on the release. >>> >>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>> We have one open PR left for the 1.10.0 milestone >>> (with 25 closed PRs). Amogh is actively working on the last blocker PR. >>> >>>>>>>>>>>>>>>>>>>>> Spark 4.0: Preserve row lineage information on >>> compaction >>> >>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>> I will publish a release candidate after the above >>> blocker is merged and backported. >>> >>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>> Thanks, >>> >>>>>>>>>>>>>>>>>>>>> Steven >>> >>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>> On Mon, Jul 7, 2025 at 11:56 PM Manu Zhang < >>> owenzhang1...@gmail.com> wrote: >>> >>>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>>> Hi Amogh, >>> >>>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>>> Is it defined in the table spec that "replace" >>> operation should carry over existing lineage info insteading of assigning >>> new IDs? If not, we'd better firstly define it in spec because all engines >>> and implementations need to follow it. >>> >>>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 8, 2025 at 11:44 AM Amogh Jahagirdar < >>> 2am...@gmail.com> wrote: >>> >>>>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>>>> One other area I think we need to make sure >>> works with row lineage before release is data file compaction. At the >>> moment, it looks like compaction will read the records from the data files >>> without projecting the lineage fields. What this means is that on write of >>> the new compacted data files we'd be losing the lineage information. >>> There's no data change in a compaction but we do need to make sure the >>> lineage info from carried over records is materialized in the newly >>> compacted files so they don't get new IDs or inherit the new file sequence >>> number. I'm working on addressing this as well, but I'd call this out as a >>> blocker as well. >>> >>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>> -- >>> >>>>>>>>>>>>>>>> Robin Moffatt >>> >>>>>>>>>>>>>>>> Sr. Principal Advisor, Streaming Data Technologies >>> >>