Thanks Steven! I did another pass to check for feature parity between spark 3.5 and spark 4.0 for this release and everything looks good. There are a few test cases that have not been ported, but we can punt those for now.
Best, Kevin Liu On Thu, Aug 28, 2025 at 7:08 PM Steven Wu <stevenz...@gmail.com> wrote: > Thanks to Fokko and Ryan, the unknown type support PR was merged today. > > Everything in the 1.10.0 milestone is closed now. > > I will work on a release candidate next. > > On Fri, Aug 8, 2025 at 6:14 AM Fokko Driesprong <fo...@apache.org> wrote: > >> Hi Steven, >> >> Thanks for updating this thread. >> >> I've updated the UnknownType PR >> <https://github.com/apache/iceberg/pull/13445> to first block on the >> complex cases that will require some more discussion. This way we can >> revisit this also after the 1.10.0 release. >> >> Kind regards, >> Fokko >> >> >> >> >> Op do 7 aug 2025 om 23:56 schreef Steven Wu <stevenz...@gmail.com>: >> >>> edited the subject line as we are into August. >>> >>> We are still waiting for the following two changes for the 1.10.0 release >>> * Anton's fix for the data frame join using the same snapshot, which >>> will introduce a slight behavior change in spark 4.0. >>> * unknown type support. >>> >>> >>> On Fri, Aug 1, 2025 at 6:56 AM Alexandre Dutra <adu...@apache.org> >>> wrote: >>> >>>> Hi Steven, >>>> >>>> A small regression with S3 signing has been reported to me. The fix is >>>> simple: >>>> >>>> https://github.com/apache/iceberg/pull/13718 >>>> >>>> Would it be still possible to have it in 1.10 please? >>>> >>>> Thanks, >>>> Alex >>>> >>>> >>>> On Thu, Jul 31, 2025 at 7:19 PM Steven Wu <stevenz...@gmail.com> wrote: >>>> > >>>> > Currently, the 1.10.0 milestone have no open PRs >>>> > https://github.com/apache/iceberg/milestone/54 >>>> > >>>> > The variant PR was merged this and last week. There are still some >>>> variant testing related PRs, which are probably not blockers for 1.10.0 >>>> release. >>>> > * Spark variant read: https://github.com/apache/iceberg/pull/13219 >>>> > * use short strings: https://github.com/apache/iceberg/pull/13284 >>>> > >>>> > We are still waiting for the following two changes >>>> > * Anton's fix for the data frame join using the same snapshot, which >>>> will introduce a slight behavior change in spark 4.0. >>>> > * unknown type support. Fokko raised a discussion thread on a >>>> blocking issue. >>>> > >>>> > Anything else did I miss? >>>> > >>>> > >>>> > >>>> > On Sat, Jul 26, 2025 at 5:52 AM Fokko Driesprong <fo...@apache.org> >>>> wrote: >>>> >> >>>> >> Hey all, >>>> >> >>>> >> The read path for the UnknownType needs some community discussion. >>>> I've raised a separate thread. PTAL >>>> >> >>>> >> Kind regards from Belgium, >>>> >> Fokko >>>> >> >>>> >> Op za 26 jul 2025 om 00:58 schreef Ryan Blue <rdb...@gmail.com>: >>>> >>> >>>> >>> I thought that we said we wanted to get support out for v3 features >>>> in this release unless there is some reasonable blocker, like Spark not >>>> having geospatial types. To me, I think that means we should aim to get >>>> variant and unknown done so that we have a complete implementation with a >>>> major engine. And it should not be particularly difficult to get unknown >>>> done so I'd opt to get it in. >>>> >>> >>>> >>> On Fri, Jul 25, 2025 at 11:28 AM Steven Wu <stevenz...@gmail.com> >>>> wrote: >>>> >>>> >>>> >>>> > I believe we also wanted to get in at least the read path for >>>> UnknownType. Fokko has a WIP PR for that. >>>> >>>> I thought in the community sync the consensus is that this is not >>>> a blocker, because it is a new feature implementation. If it is ready, it >>>> will be included. >>>> >>>> >>>> >>>> On Fri, Jul 25, 2025 at 9:43 AM Kevin Liu <kevinjq...@apache.org> >>>> wrote: >>>> >>>>> >>>> >>>>> I think Fokko's OOO. Should we help with that PR? >>>> >>>>> >>>> >>>>> On Fri, Jul 25, 2025 at 9:38 AM Eduard Tudenhöfner < >>>> etudenhoef...@apache.org> wrote: >>>> >>>>>> >>>> >>>>>> I believe we also wanted to get in at least the read path for >>>> UnknownType. Fokko has a WIP PR for that. >>>> >>>>>> >>>> >>>>>> On Fri, Jul 25, 2025 at 6:13 PM Steven Wu <stevenz...@gmail.com> >>>> wrote: >>>> >>>>>>> >>>> >>>>>>> 3. Spark: fix data frame join based on different versions of >>>> the same table that may lead to weird results. Anton is working on a fix. >>>> It requires a small behavior change (table state may be stale up to refresh >>>> interval). Hence it is better to include it in the 1.10.0 release where >>>> Spark 4.0 is first supported. >>>> >>>>>>> 4. Variant support in core and Spark 4.0. Ryan thinks this is >>>> very close and will prioritize the review. >>>> >>>>>>> >>>> >>>>>>> We still have the above two issues pending. 3 doesn't have a PR >>>> yet. PR for 4 is not associated with the milestone yet. >>>> >>>>>>> >>>> >>>>>>> On Fri, Jul 25, 2025 at 9:02 AM Kevin Liu < >>>> kevinjq...@apache.org> wrote: >>>> >>>>>>>> >>>> >>>>>>>> Thanks everyone for the review. The 2 PRs are both merged. >>>> >>>>>>>> Looks like there's only 1 PR left in the 1.10 milestone :) >>>> >>>>>>>> >>>> >>>>>>>> Best, >>>> >>>>>>>> Kevin Liu >>>> >>>>>>>> >>>> >>>>>>>> On Thu, Jul 24, 2025 at 7:44 PM Manu Zhang < >>>> owenzhang1...@gmail.com> wrote: >>>> >>>>>>>>> >>>> >>>>>>>>> Thanks Kevin. The first change is not in the versioned doc so >>>> it can be released anytime. >>>> >>>>>>>>> >>>> >>>>>>>>> Regards, >>>> >>>>>>>>> Manu >>>> >>>>>>>>> >>>> >>>>>>>>> On Fri, Jul 25, 2025 at 4:21 AM Kevin Liu < >>>> kevinjq...@apache.org> wrote: >>>> >>>>>>>>>> >>>> >>>>>>>>>> The 3 PRs above are merged. Thanks everyone for the review. >>>> >>>>>>>>>> >>>> >>>>>>>>>> I've added 2 more PRs to the 1.10 milestone. These are both >>>> nice-to-haves. >>>> >>>>>>>>>> - docs: add subpage for REST Catalog Spec in "Specification" >>>> #13521 >>>> >>>>>>>>>> - REST-Fixture: Ensure strict mode on jdbc catalog for rest >>>> fixture #13599 >>>> >>>>>>>>>> >>>> >>>>>>>>>> The first one changes the link for "REST Catalog Spec" on >>>> the left nav of https://iceberg.apache.org/spec/ from the swagger.io >>>> link to a dedicated page for IRC. >>>> >>>>>>>>>> The second one fixes the default behavior of >>>> `iceberg-rest-fixture` image to align with the general expectation when >>>> creating a table in a catalog. >>>> >>>>>>>>>> >>>> >>>>>>>>>> Please take a look. I would like to have both of these as >>>> part of the 1.10 release. >>>> >>>>>>>>>> >>>> >>>>>>>>>> Best, >>>> >>>>>>>>>> Kevin Liu >>>> >>>>>>>>>> >>>> >>>>>>>>>> >>>> >>>>>>>>>> On Wed, Jul 23, 2025 at 1:31 PM Kevin Liu < >>>> kevinjq...@apache.org> wrote: >>>> >>>>>>>>>>> >>>> >>>>>>>>>>> Here are the 3 PRs to add corresponding tests. >>>> >>>>>>>>>>> https://github.com/apache/iceberg/pull/13648 >>>> >>>>>>>>>>> https://github.com/apache/iceberg/pull/13649 >>>> >>>>>>>>>>> https://github.com/apache/iceberg/pull/13650 >>>> >>>>>>>>>>> >>>> >>>>>>>>>>> I've tagged them with the 1.10 milestone, waiting for CI to >>>> complete :) >>>> >>>>>>>>>>> >>>> >>>>>>>>>>> Best, >>>> >>>>>>>>>>> Kevin Liu >>>> >>>>>>>>>>> >>>> >>>>>>>>>>> On Wed, Jul 23, 2025 at 1:08 PM Steven Wu < >>>> stevenz...@gmail.com> wrote: >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> Kevin, thanks for checking that. I will take a look at >>>> your backport PRs. Can you add them to the 1.10.0 milestone? >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> On Wed, Jul 23, 2025 at 12:27 PM Kevin Liu < >>>> kevinjq...@apache.org> wrote: >>>> >>>>>>>>>>>>> >>>> >>>>>>>>>>>>> Thanks again for driving this Steven! We're very close!! >>>> >>>>>>>>>>>>> >>>> >>>>>>>>>>>>> As mentioned in the community sync today, I wanted to >>>> verify feature parity between Spark 3.5 and Spark 4.0 for this release. >>>> >>>>>>>>>>>>> I was able to verify that Spark 3.5 and Spark 4.0 have >>>> feature parity for this upcoming release. More details in the other devlist >>>> thread https://lists.apache.org/thread/7x7xcm3y87y81c4grq4nn9gdjd4jm05f >>>> >>>>>>>>>>>>> >>>> >>>>>>>>>>>>> Thanks, >>>> >>>>>>>>>>>>> Kevin Liu >>>> >>>>>>>>>>>>> >>>> >>>>>>>>>>>>> On Wed, Jul 23, 2025 at 12:17 PM Steven Wu < >>>> stevenz...@gmail.com> wrote: >>>> >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> Another update on the release. >>>> >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> The existing blocker PRs are almost done. >>>> >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> During today's community sync, we identified the >>>> following issues/PRs to be included in the 1.10.0 release. >>>> >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> backport of PR 13100 to the main branch. I have created >>>> a cherry-pick PR for that. There is a one line difference compared to the >>>> original PR due to the removal of the deprecated RemoveSnapshot class in >>>> main branch for 1.10.0 target. Amogh has suggested using RemoveSnapshots >>>> with a single snapshot id, which should be supported by all REST catalog >>>> servers. >>>> >>>>>>>>>>>>>> Flink compaction doesn't support row lineage. Fail the >>>> compaction for V3 tables. I created a PR for that. Will backport after it >>>> is merged. >>>> >>>>>>>>>>>>>> Spark: fix data frame join based on different versions >>>> of the same table that may lead to weird results. Anton is working on a >>>> fix. It requires a small behavior change (table state may be stale up to >>>> refresh interval). Hence it is better to include it in the 1.10.0 release >>>> where Spark 4.0 is first supported. >>>> >>>>>>>>>>>>>> Variant support in core and Spark 4.0. Ryan thinks this >>>> is very close and will prioritize the review. >>>> >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> Thanks, >>>> >>>>>>>>>>>>>> steven >>>> >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> The 1.10.0 milestone can be found here. >>>> >>>>>>>>>>>>>> https://github.com/apache/iceberg/milestone/54 >>>> >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> On Wed, Jul 16, 2025 at 9:15 AM Steven Wu < >>>> stevenz...@gmail.com> wrote: >>>> >>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>> Ajantha/Robin, thanks for the note. We can include the >>>> PR in the 1.10.0 milestone. >>>> >>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>> On Wed, Jul 16, 2025 at 3:20 AM Robin Moffatt >>>> <ro...@confluent.io.invalid> wrote: >>>> >>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>> Thanks Ajantha. Just to confirm, from a Confluent >>>> point of view, we will not be able to publish the connector on Confluent >>>> Hub until this CVE[1] is fixed. >>>> >>>>>>>>>>>>>>>> Since we would not publish a snapshot build, if the >>>> fix doesn't make it into 1.10 then we'd have to wait for 1.11 (or a dot >>>> release of 1.10) to be able to include the connector on Confluent Hub. >>>> >>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>> Thanks, Robin. >>>> >>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>> [1] >>>> https://github.com/apache/iceberg/issues/10745#issuecomment-3074300861 >>>> >>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>> On Wed, 16 Jul 2025 at 04:03, Ajantha Bhat < >>>> ajanthab...@gmail.com> wrote: >>>> >>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>> I have approached Confluent people to help us publish >>>> the OSS Kafka Connect Iceberg sink plugin. >>>> >>>>>>>>>>>>>>>>> It seems we have a CVE from dependency that blocks us >>>> from publishing the plugin. >>>> >>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>> Please include the below PR for 1.10.0 release which >>>> fixes that. >>>> >>>>>>>>>>>>>>>>> https://github.com/apache/iceberg/pull/13561 >>>> >>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>> - Ajantha >>>> >>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>> On Tue, Jul 15, 2025 at 10:48 AM Steven Wu < >>>> stevenz...@gmail.com> wrote: >>>> >>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>> > Engines may model operations as deleting/inserting >>>> rows or as modifications to rows that preserve row ids. >>>> >>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>> Manu, I agree this sentence probably lacks some >>>> context. The first half (as deleting/inserting rows) is probably about the >>>> row lineage handling with equality deletes, which is described in another >>>> place. >>>> >>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>> "Row lineage does not track lineage for rows updated >>>> via Equality Deletes, because engines using equality deletes avoid reading >>>> existing data before writing changes and can't provide the original row ID >>>> for the new rows. These updates are always treated as if the existing row >>>> was completely removed and a unique new row was added." >>>> >>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 5:49 PM Manu Zhang < >>>> owenzhang1...@gmail.com> wrote: >>>> >>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>> Thanks Steven, I missed that part but the following >>>> sentence is a bit hard to understand (maybe just me) >>>> >>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>> Engines may model operations as deleting/inserting >>>> rows or as modifications to rows that preserve row ids. >>>> >>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>> Can you please help to explain? >>>> >>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>> Steven Wu <stevenz...@gmail.com>于2025年7月15日 >>>> 周二04:41写道: >>>> >>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>> Manu >>>> >>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>> The spec already covers the row lineage carry over >>>> (for replace) >>>> >>>>>>>>>>>>>>>>>>>> https://iceberg.apache.org/spec/#row-lineage >>>> >>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>> "When an existing row is moved to a different data >>>> file for any reason, writers should write _row_id and >>>> _last_updated_sequence_number according to the following rules:" >>>> >>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>> Thanks, >>>> >>>>>>>>>>>>>>>>>>>> Steven >>>> >>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 1:38 PM Steven Wu < >>>> stevenz...@gmail.com> wrote: >>>> >>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>> another update on the release. >>>> >>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>> We have one open PR left for the 1.10.0 milestone >>>> (with 25 closed PRs). Amogh is actively working on the last blocker PR. >>>> >>>>>>>>>>>>>>>>>>>>> Spark 4.0: Preserve row lineage information on >>>> compaction >>>> >>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>> I will publish a release candidate after the >>>> above blocker is merged and backported. >>>> >>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>> Thanks, >>>> >>>>>>>>>>>>>>>>>>>>> Steven >>>> >>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>> On Mon, Jul 7, 2025 at 11:56 PM Manu Zhang < >>>> owenzhang1...@gmail.com> wrote: >>>> >>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>> Hi Amogh, >>>> >>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>> Is it defined in the table spec that "replace" >>>> operation should carry over existing lineage info insteading of assigning >>>> new IDs? If not, we'd better firstly define it in spec because all engines >>>> and implementations need to follow it. >>>> >>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 8, 2025 at 11:44 AM Amogh Jahagirdar >>>> <2am...@gmail.com> wrote: >>>> >>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>> One other area I think we need to make sure >>>> works with row lineage before release is data file compaction. At the >>>> moment, it looks like compaction will read the records from the data files >>>> without projecting the lineage fields. What this means is that on write of >>>> the new compacted data files we'd be losing the lineage information. >>>> There's no data change in a compaction but we do need to make sure the >>>> lineage info from carried over records is materialized in the newly >>>> compacted files so they don't get new IDs or inherit the new file sequence >>>> number. I'm working on addressing this as well, but I'd call this out as a >>>> blocker as well. >>>> >>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>> -- >>>> >>>>>>>>>>>>>>>> Robin Moffatt >>>> >>>>>>>>>>>>>>>> Sr. Principal Advisor, Streaming Data Technologies >>>> >>>