Re: [DISCUSS] October board report

2025-10-08 Thread Steven Wu
> Flink: Merged dynamic sync supporting multiple tables and schema evolution Typo: dynamic sync -> dynamic sink We can also add this bullet for Flink * Flink: add support for Flink 2.0 On Wed, Oct 8, 2025 at 2:08 PM Ryan Blue wrote: > Hi everyone, > > Here is my initial draft for the October r

Re: [vote] Remove Spark 3.4 from the repo

2025-10-07 Thread Steven Wu
+1 On Tue, Oct 7, 2025 at 9:38 AM Eduard Tudenhöfner wrote: > +1 > > On Tue, Oct 7, 2025 at 5:18 PM slfan1989 wrote: > >> +1 (non-binding) >> I agree that removing Spark 3.4 in the upcoming 1.11 release makes sense. >> >> Best, >> Shilun Fan >> >> >> On Tue, Oct 7, 2025 at 10:59 PM Kevin Liu w

Re: Proposal discussion process (was Re: [DISCUSS] Iceberg REST FGAC OpenAPI proposal)

2025-10-01 Thread Steven Wu
Laurent, The general process for non-trivial improvement is design doc, community discussion, spec discussion and review, spec voting. Looking at the threads and community sync recordings, there seems to be pretty broad consensus on the direction of Iceberg expression and piggyback on loadTable r

Dedicated sync for Iceberg materialized view

2025-09-25 Thread Steven Wu
Hi all, Iceberg materialized view has been discussed in the community for a long time. Thanks Jan Kaul for driving the discussion and the spec PR. It has been stalled for a long time due to lack of consensus on 1 or 2 topics. In Wed's Iceberg community sync meeting, Talat brought up the question o

Re: [DISCUSS] Geometry type wraparound restriction

2025-09-22 Thread Steven Wu
> Proposed: For the X values of the Geography type only, xmin may be greater than xmax +1 on Jia's proposed spec clarification that only allows X wraparound for geography. On Sun, Sep 21, 2025 at 9:45 PM Jia Yu wrote: > Hi all, > > > I agree that it has been difficult to reach consensus on the

Re: [DISCUSS] Spec: bring back added-rows in the snapshot fields

2025-09-20 Thread Steven Wu
what it actually is. > > On Thu, 11 Sept 2025 at 16:55, Russell Spitzer > wrote: > >> +1, I also am fine with the name. >> >> On Wed, Sep 10, 2025 at 10:30 PM Steven Wu wrote: >> >>> >>> In the 1.10.0 RC5 voting thread >>> <https:

Re: [VOTE] Deprecation of Position Deletes with Row Data

2025-09-20 Thread Steven Wu
+1 On Wed, Sep 10, 2025 at 9:30 AM Russell Spitzer wrote: > +1 > > On Wed, Sep 10, 2025 at 10:58 AM Péter Váry > wrote: > >> The Position Deletes With Row (PDWR) feature, originally introduced in >> the Iceberg V2 specification, has been deprecated in V3. >> >> Following the discussion in the p

Re: [Discuss] Deprecating Spark 3.4

2025-09-19 Thread Steven Wu
Following up on Manu's question, why not just remove Spark 3.4 for the next 1.11 release? Or do we usually wait for one more release and remove it in the 1.12 release after marking 3.4 as deprecated in the engine status doc page? On Fri, Sep 19, 2025 at 9:12 AM Kevin Liu wrote: > > Given the man

Re: [DISCUSS] Iceberg REST Catalog Idempotency

2025-09-18 Thread Steven Wu
+1 for the feature that can make retry safe for 500s and improve the client fault-tolerance of transient server failures. Peter and Dimitri raised a good question on the fingerprint. The IETF draft doesn't actually define the fingerprint algo. We can also go with simple checksum of the entire requ

Re: [DISCUSS] Removal of Individually Curated Blogs and Talks and Position on Vendor Documentation

2025-09-18 Thread Steven Wu
agree with cleaning up blogs and talks as discussed in the community sync that Kevin linked. Meanwhile, I am also looking forward to future Iceberg dev/announcement blogs, as discussed in the dev thread Max started. Regarding the

Re: [VOTE] Spec: bring back added-rows in the snapshot fields

2025-09-17 Thread Steven Wu
a Xu wrote: > >> +1 (non-binding) >> >> On Mon, Sep 15, 2025 at 11:07 PM Jean-Baptiste Onofré >> wrote: >> >>> +1 (non binding) >>> >>> Regards >>> JB >>> >>> Le lun. 15 sept. 2025 à 06:35, Steven Wu a >&g

Re: [Discuss] Deprecating Spark 3.4

2025-09-17 Thread Steven Wu
+1 On Wed, Sep 17, 2025 at 3:07 PM Anurag Mantripragada wrote: > +1 > > Thanks, > Anurag > > On Sep 17, 2025, at 2:38 PM, Szehon Ho wrote: > > +1 > > Thanks > Szehon > > On Wed, Sep 17, 2025 at 2:37 PM Russell Spitzer > wrote: > >> +1. But I'm always on the aggressive side of dropping old rele

Re: [DISCUSS] FileFormat API proposal

2025-09-15 Thread Steven Wu
Peter, thanks for summarizing the 4 options. Both 0 and 1 seem good to me, as they are explicit and easier to deprecate and remove the position deletes in the future. Maybe option 0 is a tiny bit better as it is similar to the existing FileWriterFactory API. I will leave PR related comments in the

[VOTE] Spec: bring back added-rows in the snapshot fields

2025-09-14 Thread Steven Wu
Hi, I like to raise a vote on a small spec fix that brings back added-rows in the snapshot fields. The spec issue was brought up by Christian Thiel in the 1.10.0 RC5 voting thread . Discussion thread: https://lists.apache.org/threa

Re: [VOTE][C++] Release Apache Iceberg C++ 0.1.0 RC4

2025-09-12 Thread Steven Wu
u wrote: >> >>> To Steven: >>> >>> Yes, please check the Audit section from >>> https://github.com/apache/iceberg-cpp/actions/runs/17609907663/job/50029231341 >>> >>> It ran >>> https://github.com/apache/iceberg-cpp/blob/main/dev/

[RESULT][VOTE] Release Apache Iceberg 1.10.0 RC5

2025-09-11 Thread Steven Wu
Thanks everyone who participated in the vote for Release Apache Iceberg RC. The vote result is:

Re: [ANNOUNCE] Apache Iceberg release 1.10.0

2025-09-11 Thread Steven Wu
Looks like the release notes indentation/bullets don't render correctly (while rendering was correct locally). There are also a couple other links/places that need to be fixed. I will follow up with a separate PR. On Thu, Sep 11, 2025 at 11:35 AM Steven Wu wrote: > I'm pleased to

Re: [ANNOUNCE] Apache Iceberg release 1.10.0

2025-09-11 Thread Steven Wu
The versioned doc and javadoc have been released for 1.10.0 https://iceberg.apache.org/docs/latest/ https://iceberg.apache.org/javadoc/latest/ On Thu, Sep 11, 2025 at 3:56 PM Steven Wu wrote: > Release notes fixup PR is merged (indentation, Spark 4.0, Flink 2.0 > download links, date) &

Re: [ANNOUNCE] Apache Iceberg release 1.10.0

2025-09-11 Thread Steven Wu
Release notes fixup PR is merged (indentation, Spark 4.0, Flink 2.0 download links, date) https://github.com/apache/iceberg/pull/14055 I am working on the versioned doc and Javadoc release On Thu, Sep 11, 2025 at 11:42 AM Steven Wu wrote: > Looks like the release notes indentation/bull

[ANNOUNCE] Apache Iceberg release 1.10.0

2025-09-11 Thread Steven Wu
I'm pleased to announce the release of Apache Iceberg 1.10.0 Apache Iceberg is an open table format for huge analytic datasets. Iceberg

Re: [VOTE] Release Apache Iceberg 1.10.0 RC5

2025-09-11 Thread Steven Wu
e binary works in Java 11 >>> >>> On Wed, Sep 10, 2025 at 2:20 PM Ryan Blue wrote: >>> >>>> I think we should continue to use `added-rows` as well. We can update >>>> the spec to explain that it should be the number of rows that will be >

Re: [VOTE][C++] Release Apache Iceberg C++ 0.1.0 RC4

2025-09-10 Thread Steven Wu
Does the release check script run RAT checks on license headers? On Wed, Sep 10, 2025 at 9:16 PM ying cai wrote: > +1 (non-binding) > > Ran dev/release/verify_rc.sh 0.1.0 4 successfully on apple m1 macOS > 15.6.1 with clang 19.1.7. > Verified checksums and signatures, also checked the build and

[DISCUSS] Spec: bring back added-rows in the snapshot fields

2025-09-10 Thread Steven Wu
In the 1.10.0 RC5 voting thread , Christian brought up an inconsistency issue between the spec and the Java implementation. Spec removed the `added-rows` while the Java implementation continued to use and encode it. After some discu

Re: [VOTE] Release Apache Iceberg 1.10.0 RC5

2025-09-10 Thread Steven Wu
of the added and existing rows in all added manifest files. > > On Wed, Sep 10, 2025 at 12:37 PM Steven Wu wrote: > >> Adding the information back seems to be the right thing to do here. We >> can start a separate thread on how to move forward properly, as it is >>

Re: [VOTE] Release Apache Iceberg 1.10.0 RC5

2025-09-10 Thread Steven Wu
>>>> > verified signature and checksums >>>> > verified RAT license check >>>> > verified build/tests passing >>>> > ran some manual tests with GlueCatalog >>>> > >>>> > - Drew >>>> > >>&

Re: [VOTE] Release Apache Iceberg 1.10.0 RC5

2025-09-07 Thread Steven Wu
> <https://github.com/apache/iceberg/pull/13614?utm_source=chatgpt.com> — >> “Fix incorrect selection of incremental cleanup in expire snapshots.” I >> believe our test should be updated to reflect the behavior introduced by >> this fix. >> +1 (non-binding). >&

[VOTE] Release Apache Iceberg 1.10.0 RC5

2025-09-05 Thread Steven Wu
Hi Everyone, I propose that we release the following RC as the official Apache Iceberg 1.10.0 release. The commit ID is 2114bf631e49af532d66e2ce148ee49dd1dd1f1f * This corresponds to the tag: apache-iceberg-1.10.0-rc5 * https://github.com/apache/iceberg/commits/apache-iceberg-1.10.0-rc5 * https:/

Re: [VOTE] Release Apache Iceberg 1.10.0 RC4

2025-09-05 Thread Steven Wu
The quick PR is to unblock the 1.10 release. > > Regards > JB > > Le jeu. 4 sept. 2025 à 18:03, Steven Wu a écrit : > >> > I will create a PR to fix that (quickly/short term) and also another >> PR to remove the versions from the bundle artifacts (as we do in

Re: [VOTE] Release Apache Iceberg 1.10.0 RC4

2025-09-04 Thread Steven Wu
ibute this artifact): the versions >> in LICENSE don't match the versions of the artifacts in the lib folder >> (azure-core, etc) >> >> I will create a PR to fix that (quickly/short term) and also another >> PR to remove the versions from the bundle artifacts (as w

[VOTE] Release Apache Iceberg 1.10.0 RC4

2025-09-03 Thread Steven Wu
Hi Everyone, I propose that we release the following RC as the official Apache Iceberg 1.10.0 release. The commit ID is 12ab7fc3d6d53534da02decdab99133853b36dfd * This corresponds to the tag: apache-iceberg-1.10.0-rc4 * https://github.com/apache/iceberg/commits/apache-iceberg-1.10.0-rc4 * https:/

Re: Iceberg 1.10.0 release update - September 2025

2025-09-03 Thread Steven Wu
nks, > Cheng Pan > > > > On Sep 3, 2025, at 01:02, Steven Wu wrote: > > sorry, the PR link for the staging-binaries.sh was wrong (missing a digit). > > I thought this PR will fix the issue. Initially, it worked well with a few > runs. But later I am still experi

Re: Iceberg 1.10.0 release update - September 2025

2025-09-02 Thread Steven Wu
, 2025 at 9:51 AM Steven Wu wrote: > Hi, > > Just to update the community on the status. > > Fokko also reached out to include Parquet Java 1.16.0 in this release. > Vote just passed in the Parquet community. We are waiting for the binary > release. We will try to include it

Re: Iceberg 1.10.0 release update - September 2025

2025-09-02 Thread Steven Wu
tral. > > Thanks, > Cheng Pan > > > > On Sep 3, 2025, at 01:31, Steven Wu wrote: > > Thanks, Cheng. > > You are right. There were two public IPs in the two repositories. > > > https://stackoverflow.com/questions/15511484/mvn-releaseperform-creates-multipl

Re: Iceberg 1.10.0 release update - September 2025

2025-09-02 Thread Steven Wu
actually, the Parquet 1.16.0 has the wrong link https://github.com/apache/iceberg/pull/13941 On Tue, Sep 2, 2025 at 10:02 AM Steven Wu wrote: > sorry, the PR link for the staging-binaries.sh was wrong (missing a digit). > > I thought this PR will fix the issue. Initially, it worked we

Re: Iceberg 1.10.0 release update - September 2025

2025-09-02 Thread Steven Wu
more annoying/impacting problem. the second release issue is uncommon, as I didn't see it in a few other runs of staging-binaries.sh. Thanks, Steven On Sun, Aug 31, 2025 at 12:48 PM Steven Wu wrote: > I started a vote thread for 1.10.0 RC2. > > I have to fix a couple of release scrip

Re: [VOTE] Release Apache Iceberg 1.10.0 RC2

2025-08-31 Thread Steven Wu
w). Was > this intentional? > > https://repository.apache.org/content/repositories/orgapacheiceberg-1243/org/apache/iceberg/ > > BR, > Yuya > > On Mon, Sep 1, 2025 at 4:48 AM Steven Wu wrote: > >> Hi Everyone, >> >> I propose that we release the following RC

Re: Iceberg 1.10.0 release update - August 2025

2025-08-31 Thread Steven Wu
r this release and everything looks good. There > are a few test cases that have not been ported, but we can punt those for > now. > > Best, > Kevin Liu > > On Thu, Aug 28, 2025 at 7:08 PM Steven Wu wrote: > >> Thanks to Fokko and Ryan, the unknown type support PR w

[VOTE] Release Apache Iceberg 1.10.0 RC2

2025-08-31 Thread Steven Wu
Hi Everyone, I propose that we release the following RC as the official Apache Iceberg 1.10.0 release. The commit ID is 80809ce59a7f8ab95f82146ba7955628b580d271 * This corresponds to the tag: apache-iceberg-1.10.0-rc2 * https://github.com/apache/iceberg/commits/apache-iceberg-1.10.0-rc2 * https:/

Re: [Discuss] Java version problem for 1.10.0 release

2025-08-29 Thread Steven Wu
g/pull/13946>. > > Kind regards, > Fokko > > Op vr 29 aug 2025 om 07:15 schreef Cheng Pan : > >> Java 17 should be used for Maven repo deployment, see [1] >> >> [1] https://github.com/apache/iceberg/pull/13369 >> >> Thanks, >> Cheng Pan >&g

[Discuss] Java version problem for 1.10.0 release

2025-08-28 Thread Steven Wu
Hi, When building the release candidate for 1.10.0 release, I ran into this Java version problem. Initially, I was using Java 21 locally and the deploy.gradle would fail the step of staging binaries. ``` > Releases must be built with Java 11 ``` So I switched the Java version to Java 11 as requi

Re: Iceberg 1.10.0 release update - August 2025

2025-08-28 Thread Steven Wu
've updated the UnknownType PR > <https://github.com/apache/iceberg/pull/13445> to first block on the > complex cases that will require some more discussion. This way we can > revisit this also after the 1.10.0 release. > > Kind regards, > Fokko > > > > > Op

Re: [QUESTION] What type promotion actually means

2025-08-21 Thread Steven Wu
> This means that you can have writers using different schema to write (use cases include different partitioning or "out-of-date" writers), but the data is still valid. +1 on Dan's point. Both batch and streaming writers can have stale schema. long-running streaming jobs may stay stale for extende

Re: [VOTE] mark 503 as non-retryable error code for the Update Table

2025-08-18 Thread Steven Wu
+1 On Mon, Aug 18, 2025 at 3:35 PM Prashant Singh wrote: > Hi All, > I propose an update to the Rest Spec to mark 503 as non-retryable error > code for the Update Table. As it can lead to table corruption otherwise. > The proposed language in the spec pr also gives rooms for servers who still >

Re: Iceberg 1.10.0 release update - August 2025

2025-08-07 Thread Steven Wu
at 6:56 AM Alexandre Dutra wrote: > Hi Steven, > > A small regression with S3 signing has been reported to me. The fix is > simple: > > https://github.com/apache/iceberg/pull/13718 > > Would it be still possible to have it in 1.10 please? > > Thanks, > Alex > >

Re: Iceberg 1.10.0 release update - July 1, 2025

2025-07-31 Thread Steven Wu
ike Spark not having >> geospatial types. To me, I think that means we should aim to get variant >> and unknown done so that we have a complete implementation with a major >> engine. And it should not be particularly difficult to get unknown done so >> I'd opt to get it in.

Re: [VOTE] Release Apache Iceberg Rust 0.6.0 RC1

2025-07-28 Thread Steven Wu
+1 (binding) Verified checksum, signature. Ran build and unit test on Mac OS (arm64). I wouldn't run the full test related to my container env setup. Tried podman (instead of docker desktop) per doc. Got the same failures. On Mon, Jul 28, 2025 at 10:21 AM Kevin Liu wrote: > Thanks for verifyi

Re: [VOTE] Update the table statistics (puffin stats) spec

2025-07-28 Thread Steven Wu
+1 for fixing the mistake in spec On Mon, Jul 28, 2025 at 10:41 AM Steve wrote: > +1 for using long type for snapshotId > > On Mon, Jul 28, 2025 at 6:24 AM Péter Váry > wrote: > >> +1 for long >> >> Given that it is implemented as a long in every known implementation, we >> might not even want

Re: Iceberg 1.10.0 release update - July 1, 2025

2025-07-25 Thread Steven Wu
t the read path for >> UnknownType. Fokko has a WIP PR >> <https://github.com/apache/iceberg/pull/13445> for that. >> >> On Fri, Jul 25, 2025 at 6:13 PM Steven Wu wrote: >> >>> 3. Spark: fix data frame join based on different versions of the same >&g

Re: Iceberg 1.10.0 release update - July 1, 2025

2025-07-25 Thread Steven Wu
ve both of these as part of the >>> 1.10 release. >>> >>> Best, >>> Kevin Liu >>> >>> >>> On Wed, Jul 23, 2025 at 1:31 PM Kevin Liu wrote: >>> >>>> Here are the 3 PRs to add corresponding tests. >>>> htt

Re: Iceberg 1.10.0 release update - July 1, 2025

2025-07-23 Thread Steven Wu
5f > > Thanks, > Kevin Liu > > On Wed, Jul 23, 2025 at 12:17 PM Steven Wu wrote: > >> Another update on the release. >> >> The existing blocker PRs are almost done. >> >> During today's community sync, we identified the following issues/PRs to >

Re: Iceberg 1.10.0 release update - July 1, 2025

2025-07-23 Thread Steven Wu
4.0. Ryan thinks this is very close and will prioritize the review. Thanks, steven The 1.10.0 milestone can be found here. https://github.com/apache/iceberg/milestone/54 On Wed, Jul 16, 2025 at 9:15 AM Steven Wu wrote: > Ajantha/Robin, thanks for the note. We can include the PR

Re: [DISCUSS] v4 - Improved column statistics

2025-07-22 Thread Steven Wu
It seems reasonable to support stats for computed/calculated columns with assigned field ids. E.g., Flink has "computed columns" for a long time. https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/create/#columns CREATE TABLE MyTable ( `user_id` BIGINT, `price` DOUBLE, `qu

Re: [DISCUSS] Restructuring Docs side navigation

2025-07-21 Thread Steven Wu
ts would be most welcome, please. > > thanks, Robin > > On Wed, 9 Jul 2025 at 17:08, Steven Wu wrote: > >> I really like the new organization of the navigation panel. It is too >> congested currently. >> >> On Wed, Jul 9, 2025 at 7:22 AM Jean-Baptiste Onofré >

Re: [DISCUSS] V4 - indexing support

2025-07-18 Thread Steven Wu
ficant performance overhead in batch pipelines. >>>> >>>> Approach (a): >>>> https://docs.google.com/document/d/1Jz4Fjt-6jRmwqbgHX_u0ohuyTB9ytDzfslS7lYraIjk/ >>>> Converting equality deletes to positional deletes would be a great >>>> achievement. I'm wondering though

Re: [VOTE] Release Apache Iceberg 1.9.2 RC0

2025-07-16 Thread Steven Wu
+1 (binding) Verified signature, checksum, license. Ran some basic Flink SQL testing locally. On Wed, Jul 16, 2025 at 11:20 AM Fokko Driesprong wrote: > +1 (binding) > > Ran checksum and signature. Checked licenses and ran tests. > > Kind regards, > Fokko > > Op wo 16 jul 2025 om 14:43 schreef

Re: Iceberg 1.10.0 release update - July 1, 2025

2025-07-16 Thread Steven Wu
ink plugin. >> It seems we have a CVE from dependency that blocks us from publishing the >> plugin. >> >> Please include the below PR for 1.10.0 release which fixes that. >> https://github.com/apache/iceberg/pull/13561 >> >> - Ajantha >> >

Re: Iceberg 1.10.0 release update - July 1, 2025

2025-07-14 Thread Steven Wu
just me) > > Engines may model operations as deleting/inserting rows or as > modifications to rows that preserve row ids. > > Can you please help to explain? > > > Steven Wu 于2025年7月15日 周二04:41写道: > >> Manu >> >> The spec already covers the row lineage carry over (

Re: Iceberg 1.10.0 release update - July 1, 2025

2025-07-14 Thread Steven Wu
t; Thanks, Steven On Mon, Jul 14, 2025 at 1:38 PM Steven Wu wrote: > another update on the release. > > We have one open PR left for the 1.10.0 milestone > <https://github.com/apache/iceberg/milestone/54> (with 25 closed PRs). > Amogh is actively working on the last blocke

Re: Iceberg 1.10.0 release update - July 1, 2025

2025-07-14 Thread Steven Wu
another update on the release. We have one open PR left for the 1.10.0 milestone (with 25 closed PRs). Amogh is actively working on the last blocker PR. Spark 4.0: Preserve row lineage information on compaction

[DISCUSS] V4 - indexing support

2025-07-09 Thread Steven Wu
Similar to other V4 threads, I am starting a thread to gauge interest in adding index support in Iceberg V4 and gather a focus group in this area. There have been a few discussions related to indexing recently. - Me and Peter Vary are working on a proposal (WIP) to only write position delet

Re: [DISCUSS] Restructuring Docs side navigation

2025-07-09 Thread Steven Wu
I really like the new organization of the navigation panel. It is too congested currently. On Wed, Jul 9, 2025 at 7:22 AM Jean-Baptiste Onofré wrote: > Hi Manu > > At first glance, it's a great improvement with a good multi-level menu > (reducing the first level size of the menu). > > Thanks ! >

Re: Iceberg 1.10.0 release update - July 1, 2025

2025-07-07 Thread Steven Wu
e use case of long literal value for nano timestamp column. But there is no correctness issue. Hence I am favoring moving it out of the 1.10.0 milestone * there is no consensus on the path forward yet. On Thu, Jul 3, 2025 at 2:28 PM Szehon Ho wrote: > Thanks Steven! > > > On Jul 3, 2025

Re: Iceberg 1.10.0 release update - July 1, 2025

2025-07-03 Thread Steven Wu
t; incompatible for the Spark 4.0 jar. > > Thanks, > Szehon > > > > On Thu, Jul 3, 2025 at 1:17 PM Steven Wu wrote: > >> Szehon's backport PR has been merged. Another blocker (dangling DVs for >> rewrite) was also merged. >> Core, Spark: Propagate

Re: Iceberg 1.10.0 release update - July 1, 2025

2025-07-03 Thread Steven Wu
it is a backport of > https://github.com/apache/iceberg/pull/13435 (merged by Amogh) as I > missed to do Spark 3.4, so also should be close. > > Thanks > Szehon > > > > On Wed, Jul 2, 2025 at 11:17 AM Steven Wu wrote: > >> During today's community sync meeting,

Re: Iceberg 1.10.0 release update - July 1, 2025

2025-07-02 Thread Steven Wu
ean-Baptiste Onofré wrote: > Hi > > I'm updated the PR about multi-args transforms today, but not sure I > will have reviews before 1.10.0. Let's try as best effort for 1.10, > else we will include in 1.11. > > Regards > JB > > On Tue, Jul 1, 2025 at 6:42 P

Re: Iceberg 1.10.0 release update - July 1, 2025

2025-07-01 Thread Steven Wu
will also include these two PRs in the release scope https://github.com/apache/iceberg/pull/11868 https://github.com/apache/iceberg/pull/13245/ On Tue, Jul 1, 2025 at 9:42 AM Steven Wu wrote: > Hi, > > I plan to cut a release branch in the next 1 or 2 days. > > Waiting for t

Iceberg 1.10.0 release update - July 1, 2025

2025-07-01 Thread Steven Wu
Hi, I plan to cut a release branch in the next 1 or 2 days. Waiting for this row lineage related PR (and its 3.4 backport afterwards) https://github.com/apache/iceberg/pull/13070 Other items in the 1.10.0 milestone will probably have to be pushed to the next 1.11.0 release https://github.com/apa

Re: Flink: Current and Future state of the sink connectors

2025-06-25 Thread Steven Wu
I will reiterate one point that I mentioned before. For the Flink connector, sink is the dominant use case (compared source). Regardless of the unit test issue, it is probably better to go a little slower on this switch of sink implementation. On Wed, Jun 25, 2025 at 4:22 PM Rodrigo Meneses wrote

Re: Append-only table scans in the presence of OVERWRITE snapshots

2025-06-25 Thread Steven Wu
the Flink streaming read only consumes `append` only commits. This is a snapshot commit `DataOperation` type. You were talking about row-level appends, delete etc. > 2. Add an option to read appended data of overwrite snapshots to allow users to de-duplicate downstream (opt-in config) For update

Re: Iceberg 0.10.0 release update - June 18, 2025

2025-06-25 Thread Steven Wu
n't want to potentially > amplify known incompliance problems by doing a release before they're fixed) > > Thanks, > Amogh Jahagirdar > > On Thu, Jun 19, 2025 at 2:36 AM Péter Váry > wrote: > >> If possible, I would love to have the File Format API interfaces ap

Re: Flink: Current and Future state of the sink connectors

2025-06-23 Thread Steven Wu
seems like a good plan. On Mon, Jun 23, 2025 at 11:28 AM Rodrigo Meneses wrote: > Hi devs, > > > I’d like to start a discussion about the current and future state of our > Flink Sink Connectors. > > > As it stands today, we currently have 3 sink implementations: > >1. FlinkSink [1] >2. I

Re: Iceberg 0.10.0 release update - June 18, 2025

2025-06-18 Thread Steven Wu
sorry, I meant 1.10.0 release. Thanks for catching the error, JB! On Wed, Jun 18, 2025 at 2:29 PM Jean-Baptiste Onofré wrote: > Hi > > I guess you mean 1.10.0 release :) > > Regards > JB > > On Wed, Jun 18, 2025 at 11:01 PM Steven Wu wrote: > > > > V3 relat

Iceberg 0.10.0 release update - June 18, 2025

2025-06-18 Thread Steven Wu
V3 related features reference implementation don’t have much progress, which is probably not going to change significantly in the next 1 or 2 weeks. I would propose to cut the release branch by the end of *next Friday (June 27)*. There are a few important features to be released like Spark 4.0 supp

Re: [DISCUSS] Proposal for Iceberg 1.9.2 Release to Fix Critical REST Client Issue

2025-06-16 Thread Steven Wu
+1 for a 1.9.2 release On Mon, Jun 16, 2025 at 10:53 AM Prashant Singh wrote: > Hey Kevin, > This goes well before 1.8, if you will see the issue that my PR refers to > is reported from iceberg 1.7, It has been there since the beginning of the > IRC client. > We were having similar debates on if

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-06-04 Thread Steven Wu
Also, thanks to Ismail for highlighting the BigQuery approach, >>>>>>>> that's helpful context! >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Xiaoxuan >>>>&g

Re: [DISCUSS] Reduce memory pressure due to column stats in position delete files

2025-06-04 Thread Steven Wu
It seems like a reasonable approach for DeleteFileIndex . I saw equality delete file matching uses column stats. But it seems that column stats (like lower/upper bounds) aren't used for associating position delete files with a data file. Plus with file-scoped position delete files (V2), matching wo

Re: Wide tables in V4

2025-05-29 Thread Steven Wu
project that would be great >>>>>>>>> but it feels like we need to start exploring more drastic options >>>>>>>>> than footer encoding. >>>>>>>>> >>>>>>>>> On Mon, May 26, 2025 at 8:42 PM Gang Wu wrote

Re: [DISCUSS] v4 - One file commits

2025-05-29 Thread Steven Wu
This will be great for users. metadata can self adapt. Start with a compacted one file. As the table grows in size, the metadata can adapt to a tree or linked structure. On Thu, May 29, 2025 at 3:44 PM Russell Spitzer wrote: > I’m also super excited about this idea > > On Thu, May 29, 2025 at 3:

Re: [DISCUSS] Apache Iceberg 1.10.0 release

2025-05-29 Thread Steven Wu
at Spark "testRemoveDanglingDVsAfterCompaction" creates a V3 table and performs delete compaction. On Thu, May 29, 2025 at 2:42 AM ConradJam wrote: > I would like to know whether Spark 3.5 can perform some basic queries or > provide file merging capabilities in the current or next version of V3? &g

Re: [DISCUSS] Apache Iceberg 1.10.0 release

2025-05-28 Thread Steven Wu
at 11:28 AM Jean-Baptiste Onofré > wrote: > > > > Hi > > > > I think I have multi-args transforms in good shape to be in the scope > > for 1.10.0. Related to V3 spec, it would be great to include it in > > 1.10.0 release. > > > > Thanks ! > > Rega

[DISCUSS] Apache Iceberg 1.10.0 release

2025-05-27 Thread Steven Wu
As discussed in the community sync, we are planning for the next 1.10.0 release. I will serve as the release manager after chatting with Russel (the original RM volunteer). The adoption of V3 spec changes

Re: Wide tables in V4

2025-05-26 Thread Steven Wu
The Parquet metadata proposal (linked by Fokko) is mainly addressing the read performance due to bloated metadata. What Peter described in the description seems useful for some ML workload of feature engineering. A new set of features/columns are added to the table. Currently, Iceberg would requi

Re: [VOTE] Release Apache Iceberg 1.9.1 RC1

2025-05-23 Thread Steven Wu
+1 (binding) Checked signature, checksum, and licenses. "./gradlew build" passed with the source bundle. Ran Flink 1.20 with SQL On Fri, May 23, 2025 at 10:42 AM Russell Spitzer wrote: > Discussion was back here - > https://lists.apache.org/thread/497qxkq3nfplwo27fh959zhsc2o7hkmy > > On Thu, Ma

Re: [VOTE] [REST SPEC] Add row lineage fields.

2025-05-22 Thread Steven Wu
+1 (binding) On Thu, May 22, 2025 at 3:39 PM Prashant Singh wrote: > Hi All, > I propose an update to the Rest Spec to include the Row lineage fields. As > these need to be passed from server to client for reads, as it is > inferred during planning during server side via inheritance from Manifes

Re: [Discuss] Make identity(String sourceName, String targetName) Public

2025-05-21 Thread Steven Wu
It seems that the PR has made two valid arguments to support to change of public scope * identity transform builder is the only one where targetName builder is not public * handle the partition column rename use case So it seems reasonable to me. On Wed, May 21, 2025 at 2:49 PM Russell Spitzer

Re: [VOTE] Adopt the v3 spec changes

2025-05-20 Thread Steven Wu
+1 (binding) On Tue, May 20, 2025 at 5:25 AM Manu Zhang wrote: > +1 (non-binding). Thanks Ryan for driving this and everyone contributing > to the new features. > > Regards, > Manu > > Péter Váry 于2025年5月20日 周二20:14写道: > >> +1 (binding) >> Well done everyone who was working on this! >> >> Fokko

Re: [VOTE] Release Apache Iceberg 1.9.1 RC0

2025-05-18 Thread Steven Wu
+1 (binding) Checked signature, checksum, and licenses. Also ran Flink 1.20 with SQL. Thanks Russel for driving the release! On Sun, May 18, 2025 at 2:27 PM huaxin gao wrote: > +1 (non-binding) > Verified signature, checksum and license. Thanks Russell for driving this > release! > > Huaxin >

Re: [VOTE] Clarify writer requirements in the spec to prevent orphan DVs

2025-05-14 Thread Steven Wu
+1 (binding) On Wed, May 14, 2025 at 9:31 AM Akashdeep Gupta wrote: > +1 (non binding) > > Regards, > Akashdeep Gupta > > > On Wed, May 14, 2025 at 9:59 PM Daniel Weeks wrote: > >> +1 (binding) >> >> On Wed, May 14, 2025 at 9:02 AM Russell Spitzer < >> russell.spit...@gmail.com> wrote: >> >>> +

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-05-12 Thread Steven Wu
agree with Peter that 1:1 mapping of data files and inverted indexes are not as useful. With columnar format like Parquet, this can also be achieved equivalently by reading the data file with projection on the identifier columns. On Mon, May 12, 2025 at 4:20 AM Péter Váry wrote: > Hi Xiaoxuan,

Re: [VOTE] Merge details about GZip metadata files to the spec.

2025-05-12 Thread Steven Wu
+1 (binding) On Mon, May 12, 2025 at 1:10 PM Ryan Blue wrote: > +1 (binding) > > On Mon, May 12, 2025 at 10:50 AM Szehon Ho > wrote: > >> +1 (binding) >> >> Thanks >> Szehon >> >> On Mon, May 12, 2025 at 9:19 AM Russell Spitzer < >> russell.spit...@gmail.com> wrote: >> >>> +1 (binding) >>> >>>

Re: [DISCUSS] Table Identifiers in Iceberg View Spec

2025-05-08 Thread Steven Wu
match. This also aligns with the identifier >> resolution being late binding. >> >> -Dan >> >> On Wed, May 7, 2025 at 10:45 PM Walaa Eldin Moustafa < >> wa.moust...@gmail.com> wrote: >> >>> Thanks Steven! So would you agree that resolution using defa

Re: [DISCUSS] Table Identifiers in Iceberg View Spec

2025-05-07 Thread Steven Wu
lt-catalog + default-namespace + table name to re-identify the correct > table, without UUID validation? > > +1 on involving other communities. I’m happy to help facilitate a > cross-community discussion if we aren’t able to reach a resolution here. > > Thanks, > Walaa. > &

Re: [DISCUSS] Table Identifiers in Iceberg View Spec

2025-05-07 Thread Steven Wu
gt;>>>>>>>> Generally we have different environments we want to support >>>>>>>>>>>> with the view spec: >>>>>>>>>>>> >>>>>>>>>>>> 1. Consistent catalog nam

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-05-07 Thread Steven Wu
Xiaoxuan, it is unclear to me what exactly we are trying to achieve here. It started with equality vs position deletes. But the proposal mentioned inverted indexes for every column. Note that equality deletes have equality fields (similar to primary key) concept. if we are only talking about row-le

Re: [DISCUSS] Finalizing the v3 spec

2025-05-07 Thread Steven Wu
For the delete vection change, should we add the following constraint/requirement for the write path in the spec? I don't know if this is already the behavior of the Spark implementation. "if a data file is removed from the table, the corresponding DV reference must also be removed from delete man

Re: [DISCUSS] Table Identifiers in Iceberg View Spec

2025-04-25 Thread Steven Wu
The core issue is on the fall back behavior when `default-catalog` is not defined. Current view spec says the fallback should be the catalog where the view is defined. It doesn't really matter what the catalog is named (catalogX) by the read engine. - If a view refers to the tables in the same cata

Re: [VOTE] Update row lineage spec ID assignment

2025-04-17 Thread Steven Wu
+1 (binding) On Thu, Apr 17, 2025 at 11:09 AM Amogh Jahagirdar <2am...@gmail.com> wrote: > +1 (binding) > > On Thu, Apr 17, 2025 at 11:54 AM Szehon Ho > wrote: > >> +1 (binding) Seems cleaner to me. >> >> Thanks >> Szehon >> >> On Thu, Apr 17, 2025 at 10:31 AM Russell Spitzer < >> russell.spit.

Re: [DISCUSS] Row lineage required for v3

2025-04-05 Thread Steven Wu
During the sync, we were mostly aligned that the row lineage semantics for updates depends on how the writer engine interprets/implements (e.g. Flink with equality deletes). Now, if we make it required for V3 tables, what if users don't need the row lineage feature. There is a bit overhead (althou

Re: [Flink] Remove FlinkSink for Flink 2.0

2025-03-13 Thread Steven Wu
for > conserving development resources and chose option 3, unless there are > objections from the userbase. > > On Wed, Mar 12, 2025, 18:45 Rodrigo Meneses wrote: > >> Once we deprecate FlinkSink, we should also upgrade IcebergSink from >> `Experimental` to `PublicEvolv

Re: [Flink] Remove FlinkSink for Flink 2.0

2025-03-12 Thread Steven Wu
gt; run. I don't see real value in bringing over legacy sources / sinks to > a new Flink major release. > > -Max > > On Tue, Mar 11, 2025 at 10:46 PM Steven Wu wrote: > > > > I assume Flink 2.0 will remove the old source and sink interfaces. > > > > With

  1   2   3   >