Hi Steven, Thanks for updating this thread.
I've updated the UnknownType PR <https://github.com/apache/iceberg/pull/13445> to first block on the complex cases that will require some more discussion. This way we can revisit this also after the 1.10.0 release. Kind regards, Fokko Op do 7 aug 2025 om 23:56 schreef Steven Wu <stevenz...@gmail.com>: > edited the subject line as we are into August. > > We are still waiting for the following two changes for the 1.10.0 release > * Anton's fix for the data frame join using the same snapshot, which will > introduce a slight behavior change in spark 4.0. > * unknown type support. > > > On Fri, Aug 1, 2025 at 6:56 AM Alexandre Dutra <adu...@apache.org> wrote: > >> Hi Steven, >> >> A small regression with S3 signing has been reported to me. The fix is >> simple: >> >> https://github.com/apache/iceberg/pull/13718 >> >> Would it be still possible to have it in 1.10 please? >> >> Thanks, >> Alex >> >> >> On Thu, Jul 31, 2025 at 7:19 PM Steven Wu <stevenz...@gmail.com> wrote: >> > >> > Currently, the 1.10.0 milestone have no open PRs >> > https://github.com/apache/iceberg/milestone/54 >> > >> > The variant PR was merged this and last week. There are still some >> variant testing related PRs, which are probably not blockers for 1.10.0 >> release. >> > * Spark variant read: https://github.com/apache/iceberg/pull/13219 >> > * use short strings: https://github.com/apache/iceberg/pull/13284 >> > >> > We are still waiting for the following two changes >> > * Anton's fix for the data frame join using the same snapshot, which >> will introduce a slight behavior change in spark 4.0. >> > * unknown type support. Fokko raised a discussion thread on a blocking >> issue. >> > >> > Anything else did I miss? >> > >> > >> > >> > On Sat, Jul 26, 2025 at 5:52 AM Fokko Driesprong <fo...@apache.org> >> wrote: >> >> >> >> Hey all, >> >> >> >> The read path for the UnknownType needs some community discussion. >> I've raised a separate thread. PTAL >> >> >> >> Kind regards from Belgium, >> >> Fokko >> >> >> >> Op za 26 jul 2025 om 00:58 schreef Ryan Blue <rdb...@gmail.com>: >> >>> >> >>> I thought that we said we wanted to get support out for v3 features >> in this release unless there is some reasonable blocker, like Spark not >> having geospatial types. To me, I think that means we should aim to get >> variant and unknown done so that we have a complete implementation with a >> major engine. And it should not be particularly difficult to get unknown >> done so I'd opt to get it in. >> >>> >> >>> On Fri, Jul 25, 2025 at 11:28 AM Steven Wu <stevenz...@gmail.com> >> wrote: >> >>>> >> >>>> > I believe we also wanted to get in at least the read path for >> UnknownType. Fokko has a WIP PR for that. >> >>>> I thought in the community sync the consensus is that this is not a >> blocker, because it is a new feature implementation. If it is ready, it >> will be included. >> >>>> >> >>>> On Fri, Jul 25, 2025 at 9:43 AM Kevin Liu <kevinjq...@apache.org> >> wrote: >> >>>>> >> >>>>> I think Fokko's OOO. Should we help with that PR? >> >>>>> >> >>>>> On Fri, Jul 25, 2025 at 9:38 AM Eduard Tudenhöfner < >> etudenhoef...@apache.org> wrote: >> >>>>>> >> >>>>>> I believe we also wanted to get in at least the read path for >> UnknownType. Fokko has a WIP PR for that. >> >>>>>> >> >>>>>> On Fri, Jul 25, 2025 at 6:13 PM Steven Wu <stevenz...@gmail.com> >> wrote: >> >>>>>>> >> >>>>>>> 3. Spark: fix data frame join based on different versions of the >> same table that may lead to weird results. Anton is working on a fix. It >> requires a small behavior change (table state may be stale up to refresh >> interval). Hence it is better to include it in the 1.10.0 release where >> Spark 4.0 is first supported. >> >>>>>>> 4. Variant support in core and Spark 4.0. Ryan thinks this is >> very close and will prioritize the review. >> >>>>>>> >> >>>>>>> We still have the above two issues pending. 3 doesn't have a PR >> yet. PR for 4 is not associated with the milestone yet. >> >>>>>>> >> >>>>>>> On Fri, Jul 25, 2025 at 9:02 AM Kevin Liu <kevinjq...@apache.org> >> wrote: >> >>>>>>>> >> >>>>>>>> Thanks everyone for the review. The 2 PRs are both merged. >> >>>>>>>> Looks like there's only 1 PR left in the 1.10 milestone :) >> >>>>>>>> >> >>>>>>>> Best, >> >>>>>>>> Kevin Liu >> >>>>>>>> >> >>>>>>>> On Thu, Jul 24, 2025 at 7:44 PM Manu Zhang < >> owenzhang1...@gmail.com> wrote: >> >>>>>>>>> >> >>>>>>>>> Thanks Kevin. The first change is not in the versioned doc so >> it can be released anytime. >> >>>>>>>>> >> >>>>>>>>> Regards, >> >>>>>>>>> Manu >> >>>>>>>>> >> >>>>>>>>> On Fri, Jul 25, 2025 at 4:21 AM Kevin Liu < >> kevinjq...@apache.org> wrote: >> >>>>>>>>>> >> >>>>>>>>>> The 3 PRs above are merged. Thanks everyone for the review. >> >>>>>>>>>> >> >>>>>>>>>> I've added 2 more PRs to the 1.10 milestone. These are both >> nice-to-haves. >> >>>>>>>>>> - docs: add subpage for REST Catalog Spec in "Specification" >> #13521 >> >>>>>>>>>> - REST-Fixture: Ensure strict mode on jdbc catalog for rest >> fixture #13599 >> >>>>>>>>>> >> >>>>>>>>>> The first one changes the link for "REST Catalog Spec" on the >> left nav of https://iceberg.apache.org/spec/ from the swagger.io link to >> a dedicated page for IRC. >> >>>>>>>>>> The second one fixes the default behavior of >> `iceberg-rest-fixture` image to align with the general expectation when >> creating a table in a catalog. >> >>>>>>>>>> >> >>>>>>>>>> Please take a look. I would like to have both of these as part >> of the 1.10 release. >> >>>>>>>>>> >> >>>>>>>>>> Best, >> >>>>>>>>>> Kevin Liu >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> On Wed, Jul 23, 2025 at 1:31 PM Kevin Liu < >> kevinjq...@apache.org> wrote: >> >>>>>>>>>>> >> >>>>>>>>>>> Here are the 3 PRs to add corresponding tests. >> >>>>>>>>>>> https://github.com/apache/iceberg/pull/13648 >> >>>>>>>>>>> https://github.com/apache/iceberg/pull/13649 >> >>>>>>>>>>> https://github.com/apache/iceberg/pull/13650 >> >>>>>>>>>>> >> >>>>>>>>>>> I've tagged them with the 1.10 milestone, waiting for CI to >> complete :) >> >>>>>>>>>>> >> >>>>>>>>>>> Best, >> >>>>>>>>>>> Kevin Liu >> >>>>>>>>>>> >> >>>>>>>>>>> On Wed, Jul 23, 2025 at 1:08 PM Steven Wu < >> stevenz...@gmail.com> wrote: >> >>>>>>>>>>>> >> >>>>>>>>>>>> Kevin, thanks for checking that. I will take a look at your >> backport PRs. Can you add them to the 1.10.0 milestone? >> >>>>>>>>>>>> >> >>>>>>>>>>>> On Wed, Jul 23, 2025 at 12:27 PM Kevin Liu < >> kevinjq...@apache.org> wrote: >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> Thanks again for driving this Steven! We're very close!! >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> As mentioned in the community sync today, I wanted to >> verify feature parity between Spark 3.5 and Spark 4.0 for this release. >> >>>>>>>>>>>>> I was able to verify that Spark 3.5 and Spark 4.0 have >> feature parity for this upcoming release. More details in the other devlist >> thread https://lists.apache.org/thread/7x7xcm3y87y81c4grq4nn9gdjd4jm05f >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> Thanks, >> >>>>>>>>>>>>> Kevin Liu >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> On Wed, Jul 23, 2025 at 12:17 PM Steven Wu < >> stevenz...@gmail.com> wrote: >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> Another update on the release. >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> The existing blocker PRs are almost done. >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> During today's community sync, we identified the following >> issues/PRs to be included in the 1.10.0 release. >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> backport of PR 13100 to the main branch. I have created a >> cherry-pick PR for that. There is a one line difference compared to the >> original PR due to the removal of the deprecated RemoveSnapshot class in >> main branch for 1.10.0 target. Amogh has suggested using RemoveSnapshots >> with a single snapshot id, which should be supported by all REST catalog >> servers. >> >>>>>>>>>>>>>> Flink compaction doesn't support row lineage. Fail the >> compaction for V3 tables. I created a PR for that. Will backport after it >> is merged. >> >>>>>>>>>>>>>> Spark: fix data frame join based on different versions of >> the same table that may lead to weird results. Anton is working on a fix. >> It requires a small behavior change (table state may be stale up to refresh >> interval). Hence it is better to include it in the 1.10.0 release where >> Spark 4.0 is first supported. >> >>>>>>>>>>>>>> Variant support in core and Spark 4.0. Ryan thinks this is >> very close and will prioritize the review. >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> Thanks, >> >>>>>>>>>>>>>> steven >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> The 1.10.0 milestone can be found here. >> >>>>>>>>>>>>>> https://github.com/apache/iceberg/milestone/54 >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> On Wed, Jul 16, 2025 at 9:15 AM Steven Wu < >> stevenz...@gmail.com> wrote: >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> Ajantha/Robin, thanks for the note. We can include the PR >> in the 1.10.0 milestone. >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> On Wed, Jul 16, 2025 at 3:20 AM Robin Moffatt >> <ro...@confluent.io.invalid> wrote: >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> Thanks Ajantha. Just to confirm, from a Confluent point >> of view, we will not be able to publish the connector on Confluent Hub >> until this CVE[1] is fixed. >> >>>>>>>>>>>>>>>> Since we would not publish a snapshot build, if the fix >> doesn't make it into 1.10 then we'd have to wait for 1.11 (or a dot release >> of 1.10) to be able to include the connector on Confluent Hub. >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> Thanks, Robin. >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> [1] >> https://github.com/apache/iceberg/issues/10745#issuecomment-3074300861 >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> On Wed, 16 Jul 2025 at 04:03, Ajantha Bhat < >> ajanthab...@gmail.com> wrote: >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> I have approached Confluent people to help us publish >> the OSS Kafka Connect Iceberg sink plugin. >> >>>>>>>>>>>>>>>>> It seems we have a CVE from dependency that blocks us >> from publishing the plugin. >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> Please include the below PR for 1.10.0 release which >> fixes that. >> >>>>>>>>>>>>>>>>> https://github.com/apache/iceberg/pull/13561 >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> - Ajantha >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> On Tue, Jul 15, 2025 at 10:48 AM Steven Wu < >> stevenz...@gmail.com> wrote: >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> > Engines may model operations as deleting/inserting >> rows or as modifications to rows that preserve row ids. >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> Manu, I agree this sentence probably lacks some >> context. The first half (as deleting/inserting rows) is probably about the >> row lineage handling with equality deletes, which is described in another >> place. >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> "Row lineage does not track lineage for rows updated >> via Equality Deletes, because engines using equality deletes avoid reading >> existing data before writing changes and can't provide the original row ID >> for the new rows. These updates are always treated as if the existing row >> was completely removed and a unique new row was added." >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 5:49 PM Manu Zhang < >> owenzhang1...@gmail.com> wrote: >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> Thanks Steven, I missed that part but the following >> sentence is a bit hard to understand (maybe just me) >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> Engines may model operations as deleting/inserting >> rows or as modifications to rows that preserve row ids. >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> Can you please help to explain? >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> Steven Wu <stevenz...@gmail.com>于2025年7月15日 >> 周二04:41写道: >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> Manu >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> The spec already covers the row lineage carry over >> (for replace) >> >>>>>>>>>>>>>>>>>>>> https://iceberg.apache.org/spec/#row-lineage >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> "When an existing row is moved to a different data >> file for any reason, writers should write _row_id and >> _last_updated_sequence_number according to the following rules:" >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> Thanks, >> >>>>>>>>>>>>>>>>>>>> Steven >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 1:38 PM Steven Wu < >> stevenz...@gmail.com> wrote: >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> another update on the release. >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> We have one open PR left for the 1.10.0 milestone >> (with 25 closed PRs). Amogh is actively working on the last blocker PR. >> >>>>>>>>>>>>>>>>>>>>> Spark 4.0: Preserve row lineage information on >> compaction >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> I will publish a release candidate after the above >> blocker is merged and backported. >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> Thanks, >> >>>>>>>>>>>>>>>>>>>>> Steven >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> On Mon, Jul 7, 2025 at 11:56 PM Manu Zhang < >> owenzhang1...@gmail.com> wrote: >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> Hi Amogh, >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> Is it defined in the table spec that "replace" >> operation should carry over existing lineage info insteading of assigning >> new IDs? If not, we'd better firstly define it in spec because all engines >> and implementations need to follow it. >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 8, 2025 at 11:44 AM Amogh Jahagirdar < >> 2am...@gmail.com> wrote: >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> One other area I think we need to make sure works >> with row lineage before release is data file compaction. At the moment, it >> looks like compaction will read the records from the data files without >> projecting the lineage fields. What this means is that on write of the new >> compacted data files we'd be losing the lineage information. There's no >> data change in a compaction but we do need to make sure the lineage info >> from carried over records is materialized in the newly compacted files so >> they don't get new IDs or inherit the new file sequence number. I'm working >> on addressing this as well, but I'd call this out as a blocker as well. >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> -- >> >>>>>>>>>>>>>>>> Robin Moffatt >> >>>>>>>>>>>>>>>> Sr. Principal Advisor, Streaming Data Technologies >> >