Re: Iceberg 1.10.0 release update - July 1, 2025

2025-07-31 Thread Steven Wu
ike Spark not having >> geospatial types. To me, I think that means we should aim to get variant >> and unknown done so that we have a complete implementation with a major >> engine. And it should not be particularly difficult to get unknown done so >> I'd opt to get it in.

Re: [VOTE] Release Apache Iceberg Rust 0.6.0 RC1

2025-07-28 Thread Steven Wu
+1 (binding) Verified checksum, signature. Ran build and unit test on Mac OS (arm64). I wouldn't run the full test related to my container env setup. Tried podman (instead of docker desktop) per doc. Got the same failures. On Mon, Jul 28, 2025 at 10:21 AM Kevin Liu wrote: > Thanks for verifyi

Re: [VOTE] Update the table statistics (puffin stats) spec

2025-07-28 Thread Steven Wu
+1 for fixing the mistake in spec On Mon, Jul 28, 2025 at 10:41 AM Steve wrote: > +1 for using long type for snapshotId > > On Mon, Jul 28, 2025 at 6:24 AM Péter Váry > wrote: > >> +1 for long >> >> Given that it is implemented as a long in every known implementation, we >> might not even want

Re: Iceberg 1.10.0 release update - July 1, 2025

2025-07-25 Thread Steven Wu
t the read path for >> UnknownType. Fokko has a WIP PR >> <https://github.com/apache/iceberg/pull/13445> for that. >> >> On Fri, Jul 25, 2025 at 6:13 PM Steven Wu wrote: >> >>> 3. Spark: fix data frame join based on different versions of the same >&g

Re: Iceberg 1.10.0 release update - July 1, 2025

2025-07-25 Thread Steven Wu
ve both of these as part of the >>> 1.10 release. >>> >>> Best, >>> Kevin Liu >>> >>> >>> On Wed, Jul 23, 2025 at 1:31 PM Kevin Liu wrote: >>> >>>> Here are the 3 PRs to add corresponding tests. >>>> htt

Re: Iceberg 1.10.0 release update - July 1, 2025

2025-07-23 Thread Steven Wu
5f > > Thanks, > Kevin Liu > > On Wed, Jul 23, 2025 at 12:17 PM Steven Wu wrote: > >> Another update on the release. >> >> The existing blocker PRs are almost done. >> >> During today's community sync, we identified the following issues/PRs to >

Re: Iceberg 1.10.0 release update - July 1, 2025

2025-07-23 Thread Steven Wu
4.0. Ryan thinks this is very close and will prioritize the review. Thanks, steven The 1.10.0 milestone can be found here. https://github.com/apache/iceberg/milestone/54 On Wed, Jul 16, 2025 at 9:15 AM Steven Wu wrote: > Ajantha/Robin, thanks for the note. We can include the PR

Re: [DISCUSS] v4 - Improved column statistics

2025-07-22 Thread Steven Wu
It seems reasonable to support stats for computed/calculated columns with assigned field ids. E.g., Flink has "computed columns" for a long time. https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/create/#columns CREATE TABLE MyTable ( `user_id` BIGINT, `price` DOUBLE, `qu

Re: [DISCUSS] Restructuring Docs side navigation

2025-07-21 Thread Steven Wu
ts would be most welcome, please. > > thanks, Robin > > On Wed, 9 Jul 2025 at 17:08, Steven Wu wrote: > >> I really like the new organization of the navigation panel. It is too >> congested currently. >> >> On Wed, Jul 9, 2025 at 7:22 AM Jean-Baptiste Onofré >

Re: [DISCUSS] V4 - indexing support

2025-07-18 Thread Steven Wu
ficant performance overhead in batch pipelines. >>>> >>>> Approach (a): >>>> https://docs.google.com/document/d/1Jz4Fjt-6jRmwqbgHX_u0ohuyTB9ytDzfslS7lYraIjk/ >>>> Converting equality deletes to positional deletes would be a great >>>> achievement. I'm wondering though

Re: [VOTE] Release Apache Iceberg 1.9.2 RC0

2025-07-16 Thread Steven Wu
+1 (binding) Verified signature, checksum, license. Ran some basic Flink SQL testing locally. On Wed, Jul 16, 2025 at 11:20 AM Fokko Driesprong wrote: > +1 (binding) > > Ran checksum and signature. Checked licenses and ran tests. > > Kind regards, > Fokko > > Op wo 16 jul 2025 om 14:43 schreef

Re: Iceberg 1.10.0 release update - July 1, 2025

2025-07-16 Thread Steven Wu
ink plugin. >> It seems we have a CVE from dependency that blocks us from publishing the >> plugin. >> >> Please include the below PR for 1.10.0 release which fixes that. >> https://github.com/apache/iceberg/pull/13561 >> >> - Ajantha >> >

Re: Iceberg 1.10.0 release update - July 1, 2025

2025-07-14 Thread Steven Wu
just me) > > Engines may model operations as deleting/inserting rows or as > modifications to rows that preserve row ids. > > Can you please help to explain? > > > Steven Wu 于2025年7月15日 周二04:41写道: > >> Manu >> >> The spec already covers the row lineage carry over (

Re: Iceberg 1.10.0 release update - July 1, 2025

2025-07-14 Thread Steven Wu
t; Thanks, Steven On Mon, Jul 14, 2025 at 1:38 PM Steven Wu wrote: > another update on the release. > > We have one open PR left for the 1.10.0 milestone > <https://github.com/apache/iceberg/milestone/54> (with 25 closed PRs). > Amogh is actively working on the last blocke

Re: Iceberg 1.10.0 release update - July 1, 2025

2025-07-14 Thread Steven Wu
another update on the release. We have one open PR left for the 1.10.0 milestone (with 25 closed PRs). Amogh is actively working on the last blocker PR. Spark 4.0: Preserve row lineage information on compaction

[DISCUSS] V4 - indexing support

2025-07-09 Thread Steven Wu
Similar to other V4 threads, I am starting a thread to gauge interest in adding index support in Iceberg V4 and gather a focus group in this area. There have been a few discussions related to indexing recently. - Me and Peter Vary are working on a proposal (WIP) to only write position delet

Re: [DISCUSS] Restructuring Docs side navigation

2025-07-09 Thread Steven Wu
I really like the new organization of the navigation panel. It is too congested currently. On Wed, Jul 9, 2025 at 7:22 AM Jean-Baptiste Onofré wrote: > Hi Manu > > At first glance, it's a great improvement with a good multi-level menu > (reducing the first level size of the menu). > > Thanks ! >

Re: Iceberg 1.10.0 release update - July 1, 2025

2025-07-07 Thread Steven Wu
e use case of long literal value for nano timestamp column. But there is no correctness issue. Hence I am favoring moving it out of the 1.10.0 milestone * there is no consensus on the path forward yet. On Thu, Jul 3, 2025 at 2:28 PM Szehon Ho wrote: > Thanks Steven! > > > On Jul 3, 2025

Re: Iceberg 1.10.0 release update - July 1, 2025

2025-07-03 Thread Steven Wu
t; incompatible for the Spark 4.0 jar. > > Thanks, > Szehon > > > > On Thu, Jul 3, 2025 at 1:17 PM Steven Wu wrote: > >> Szehon's backport PR has been merged. Another blocker (dangling DVs for >> rewrite) was also merged. >> Core, Spark: Propagate

Re: Iceberg 1.10.0 release update - July 1, 2025

2025-07-03 Thread Steven Wu
it is a backport of > https://github.com/apache/iceberg/pull/13435 (merged by Amogh) as I > missed to do Spark 3.4, so also should be close. > > Thanks > Szehon > > > > On Wed, Jul 2, 2025 at 11:17 AM Steven Wu wrote: > >> During today's community sync meeting,

Re: Iceberg 1.10.0 release update - July 1, 2025

2025-07-02 Thread Steven Wu
ean-Baptiste Onofré wrote: > Hi > > I'm updated the PR about multi-args transforms today, but not sure I > will have reviews before 1.10.0. Let's try as best effort for 1.10, > else we will include in 1.11. > > Regards > JB > > On Tue, Jul 1, 2025 at 6:42 P

Re: Iceberg 1.10.0 release update - July 1, 2025

2025-07-01 Thread Steven Wu
will also include these two PRs in the release scope https://github.com/apache/iceberg/pull/11868 https://github.com/apache/iceberg/pull/13245/ On Tue, Jul 1, 2025 at 9:42 AM Steven Wu wrote: > Hi, > > I plan to cut a release branch in the next 1 or 2 days. > > Waiting for t

Iceberg 1.10.0 release update - July 1, 2025

2025-07-01 Thread Steven Wu
Hi, I plan to cut a release branch in the next 1 or 2 days. Waiting for this row lineage related PR (and its 3.4 backport afterwards) https://github.com/apache/iceberg/pull/13070 Other items in the 1.10.0 milestone will probably have to be pushed to the next 1.11.0 release https://github.com/apa

Re: Flink: Current and Future state of the sink connectors

2025-06-25 Thread Steven Wu
I will reiterate one point that I mentioned before. For the Flink connector, sink is the dominant use case (compared source). Regardless of the unit test issue, it is probably better to go a little slower on this switch of sink implementation. On Wed, Jun 25, 2025 at 4:22 PM Rodrigo Meneses wrote

Re: Append-only table scans in the presence of OVERWRITE snapshots

2025-06-25 Thread Steven Wu
the Flink streaming read only consumes `append` only commits. This is a snapshot commit `DataOperation` type. You were talking about row-level appends, delete etc. > 2. Add an option to read appended data of overwrite snapshots to allow users to de-duplicate downstream (opt-in config) For update

Re: Iceberg 0.10.0 release update - June 18, 2025

2025-06-25 Thread Steven Wu
n't want to potentially > amplify known incompliance problems by doing a release before they're fixed) > > Thanks, > Amogh Jahagirdar > > On Thu, Jun 19, 2025 at 2:36 AM Péter Váry > wrote: > >> If possible, I would love to have the File Format API interfaces ap

Re: Flink: Current and Future state of the sink connectors

2025-06-23 Thread Steven Wu
seems like a good plan. On Mon, Jun 23, 2025 at 11:28 AM Rodrigo Meneses wrote: > Hi devs, > > > I’d like to start a discussion about the current and future state of our > Flink Sink Connectors. > > > As it stands today, we currently have 3 sink implementations: > >1. FlinkSink [1] >2. I

Re: Iceberg 0.10.0 release update - June 18, 2025

2025-06-18 Thread Steven Wu
sorry, I meant 1.10.0 release. Thanks for catching the error, JB! On Wed, Jun 18, 2025 at 2:29 PM Jean-Baptiste Onofré wrote: > Hi > > I guess you mean 1.10.0 release :) > > Regards > JB > > On Wed, Jun 18, 2025 at 11:01 PM Steven Wu wrote: > > > > V3 relat

Iceberg 0.10.0 release update - June 18, 2025

2025-06-18 Thread Steven Wu
V3 related features reference implementation don’t have much progress, which is probably not going to change significantly in the next 1 or 2 weeks. I would propose to cut the release branch by the end of *next Friday (June 27)*. There are a few important features to be released like Spark 4.0 supp

Re: [DISCUSS] Proposal for Iceberg 1.9.2 Release to Fix Critical REST Client Issue

2025-06-16 Thread Steven Wu
+1 for a 1.9.2 release On Mon, Jun 16, 2025 at 10:53 AM Prashant Singh wrote: > Hey Kevin, > This goes well before 1.8, if you will see the issue that my PR refers to > is reported from iceberg 1.7, It has been there since the beginning of the > IRC client. > We were having similar debates on if

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-06-04 Thread Steven Wu
Also, thanks to Ismail for highlighting the BigQuery approach, >>>>>>>> that's helpful context! >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Xiaoxuan >>>>&g

Re: [DISCUSS] Reduce memory pressure due to column stats in position delete files

2025-06-04 Thread Steven Wu
It seems like a reasonable approach for DeleteFileIndex . I saw equality delete file matching uses column stats. But it seems that column stats (like lower/upper bounds) aren't used for associating position delete files with a data file. Plus with file-scoped position delete files (V2), matching wo

Re: Wide tables in V4

2025-05-29 Thread Steven Wu
project that would be great >>>>>>>>> but it feels like we need to start exploring more drastic options >>>>>>>>> than footer encoding. >>>>>>>>> >>>>>>>>> On Mon, May 26, 2025 at 8:42 PM Gang Wu wrote

Re: [DISCUSS] v4 - One file commits

2025-05-29 Thread Steven Wu
This will be great for users. metadata can self adapt. Start with a compacted one file. As the table grows in size, the metadata can adapt to a tree or linked structure. On Thu, May 29, 2025 at 3:44 PM Russell Spitzer wrote: > I’m also super excited about this idea > > On Thu, May 29, 2025 at 3:

Re: [DISCUSS] Apache Iceberg 1.10.0 release

2025-05-29 Thread Steven Wu
at Spark "testRemoveDanglingDVsAfterCompaction" creates a V3 table and performs delete compaction. On Thu, May 29, 2025 at 2:42 AM ConradJam wrote: > I would like to know whether Spark 3.5 can perform some basic queries or > provide file merging capabilities in the current or next version of V3? &g

Re: [DISCUSS] Apache Iceberg 1.10.0 release

2025-05-28 Thread Steven Wu
at 11:28 AM Jean-Baptiste Onofré > wrote: > > > > Hi > > > > I think I have multi-args transforms in good shape to be in the scope > > for 1.10.0. Related to V3 spec, it would be great to include it in > > 1.10.0 release. > > > > Thanks ! > > Rega

[DISCUSS] Apache Iceberg 1.10.0 release

2025-05-27 Thread Steven Wu
As discussed in the community sync, we are planning for the next 1.10.0 release. I will serve as the release manager after chatting with Russel (the original RM volunteer). The adoption of V3 spec changes

Re: Wide tables in V4

2025-05-26 Thread Steven Wu
The Parquet metadata proposal (linked by Fokko) is mainly addressing the read performance due to bloated metadata. What Peter described in the description seems useful for some ML workload of feature engineering. A new set of features/columns are added to the table. Currently, Iceberg would requi

Re: [VOTE] Release Apache Iceberg 1.9.1 RC1

2025-05-23 Thread Steven Wu
+1 (binding) Checked signature, checksum, and licenses. "./gradlew build" passed with the source bundle. Ran Flink 1.20 with SQL On Fri, May 23, 2025 at 10:42 AM Russell Spitzer wrote: > Discussion was back here - > https://lists.apache.org/thread/497qxkq3nfplwo27fh959zhsc2o7hkmy > > On Thu, Ma

Re: [VOTE] [REST SPEC] Add row lineage fields.

2025-05-22 Thread Steven Wu
+1 (binding) On Thu, May 22, 2025 at 3:39 PM Prashant Singh wrote: > Hi All, > I propose an update to the Rest Spec to include the Row lineage fields. As > these need to be passed from server to client for reads, as it is > inferred during planning during server side via inheritance from Manifes

Re: [Discuss] Make identity(String sourceName, String targetName) Public

2025-05-21 Thread Steven Wu
It seems that the PR has made two valid arguments to support to change of public scope * identity transform builder is the only one where targetName builder is not public * handle the partition column rename use case So it seems reasonable to me. On Wed, May 21, 2025 at 2:49 PM Russell Spitzer

Re: [VOTE] Adopt the v3 spec changes

2025-05-20 Thread Steven Wu
+1 (binding) On Tue, May 20, 2025 at 5:25 AM Manu Zhang wrote: > +1 (non-binding). Thanks Ryan for driving this and everyone contributing > to the new features. > > Regards, > Manu > > Péter Váry 于2025年5月20日 周二20:14写道: > >> +1 (binding) >> Well done everyone who was working on this! >> >> Fokko

Re: [VOTE] Release Apache Iceberg 1.9.1 RC0

2025-05-18 Thread Steven Wu
+1 (binding) Checked signature, checksum, and licenses. Also ran Flink 1.20 with SQL. Thanks Russel for driving the release! On Sun, May 18, 2025 at 2:27 PM huaxin gao wrote: > +1 (non-binding) > Verified signature, checksum and license. Thanks Russell for driving this > release! > > Huaxin >

Re: [VOTE] Clarify writer requirements in the spec to prevent orphan DVs

2025-05-14 Thread Steven Wu
+1 (binding) On Wed, May 14, 2025 at 9:31 AM Akashdeep Gupta wrote: > +1 (non binding) > > Regards, > Akashdeep Gupta > > > On Wed, May 14, 2025 at 9:59 PM Daniel Weeks wrote: > >> +1 (binding) >> >> On Wed, May 14, 2025 at 9:02 AM Russell Spitzer < >> russell.spit...@gmail.com> wrote: >> >>> +

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-05-12 Thread Steven Wu
agree with Peter that 1:1 mapping of data files and inverted indexes are not as useful. With columnar format like Parquet, this can also be achieved equivalently by reading the data file with projection on the identifier columns. On Mon, May 12, 2025 at 4:20 AM Péter Váry wrote: > Hi Xiaoxuan,

Re: [VOTE] Merge details about GZip metadata files to the spec.

2025-05-12 Thread Steven Wu
+1 (binding) On Mon, May 12, 2025 at 1:10 PM Ryan Blue wrote: > +1 (binding) > > On Mon, May 12, 2025 at 10:50 AM Szehon Ho > wrote: > >> +1 (binding) >> >> Thanks >> Szehon >> >> On Mon, May 12, 2025 at 9:19 AM Russell Spitzer < >> russell.spit...@gmail.com> wrote: >> >>> +1 (binding) >>> >>>

Re: [DISCUSS] Table Identifiers in Iceberg View Spec

2025-05-08 Thread Steven Wu
match. This also aligns with the identifier >> resolution being late binding. >> >> -Dan >> >> On Wed, May 7, 2025 at 10:45 PM Walaa Eldin Moustafa < >> wa.moust...@gmail.com> wrote: >> >>> Thanks Steven! So would you agree that resolution using defa

Re: [DISCUSS] Table Identifiers in Iceberg View Spec

2025-05-07 Thread Steven Wu
lt-catalog + default-namespace + table name to re-identify the correct > table, without UUID validation? > > +1 on involving other communities. I’m happy to help facilitate a > cross-community discussion if we aren’t able to reach a resolution here. > > Thanks, > Walaa. > &

Re: [DISCUSS] Table Identifiers in Iceberg View Spec

2025-05-07 Thread Steven Wu
gt;>>>>>>>> Generally we have different environments we want to support >>>>>>>>>>>> with the view spec: >>>>>>>>>>>> >>>>>>>>>>>> 1. Consistent catalog nam

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-05-07 Thread Steven Wu
Xiaoxuan, it is unclear to me what exactly we are trying to achieve here. It started with equality vs position deletes. But the proposal mentioned inverted indexes for every column. Note that equality deletes have equality fields (similar to primary key) concept. if we are only talking about row-le

Re: [DISCUSS] Finalizing the v3 spec

2025-05-07 Thread Steven Wu
For the delete vection change, should we add the following constraint/requirement for the write path in the spec? I don't know if this is already the behavior of the Spark implementation. "if a data file is removed from the table, the corresponding DV reference must also be removed from delete man

Re: [DISCUSS] Table Identifiers in Iceberg View Spec

2025-04-25 Thread Steven Wu
The core issue is on the fall back behavior when `default-catalog` is not defined. Current view spec says the fallback should be the catalog where the view is defined. It doesn't really matter what the catalog is named (catalogX) by the read engine. - If a view refers to the tables in the same cata

Re: [VOTE] Update row lineage spec ID assignment

2025-04-17 Thread Steven Wu
+1 (binding) On Thu, Apr 17, 2025 at 11:09 AM Amogh Jahagirdar <2am...@gmail.com> wrote: > +1 (binding) > > On Thu, Apr 17, 2025 at 11:54 AM Szehon Ho > wrote: > >> +1 (binding) Seems cleaner to me. >> >> Thanks >> Szehon >> >> On Thu, Apr 17, 2025 at 10:31 AM Russell Spitzer < >> russell.spit.

Re: [DISCUSS] Row lineage required for v3

2025-04-05 Thread Steven Wu
During the sync, we were mostly aligned that the row lineage semantics for updates depends on how the writer engine interprets/implements (e.g. Flink with equality deletes). Now, if we make it required for V3 tables, what if users don't need the row lineage feature. There is a bit overhead (althou

Re: [Flink] Remove FlinkSink for Flink 2.0

2025-03-13 Thread Steven Wu
for > conserving development resources and chose option 3, unless there are > objections from the userbase. > > On Wed, Mar 12, 2025, 18:45 Rodrigo Meneses wrote: > >> Once we deprecate FlinkSink, we should also upgrade IcebergSink from >> `Experimental` to `PublicEvolv

Re: [Flink] Remove FlinkSink for Flink 2.0

2025-03-12 Thread Steven Wu
gt; run. I don't see real value in bringing over legacy sources / sinks to > a new Flink major release. > > -Max > > On Tue, Mar 11, 2025 at 10:46 PM Steven Wu wrote: > > > > I assume Flink 2.0 will remove the old source and sink interfaces. > > > > With

Re: [Flink] Remove FlinkSink for Flink 2.0

2025-03-11 Thread Steven Wu
ling it for removal in the Iceberg release following 1.9. > > > > To summarize, I'm proposing the following: > > > > Iceberg 1.9 > > - Remove FlinkSource in Flink 2.0 code path > > - Deprecate FlinkSink for all supported Flink versions > > > > Ice

Re: [Flink] Remove FlinkSink for Flink 2.0

2025-03-06 Thread Steven Wu
> if Flink 2.0 release is going to release the old source and sink interfaces, t typo above: "release the old" -> "delete the old" On Thu, Mar 6, 2025 at 12:08 PM Steven Wu wrote: > There was a previous thread for the Flink source. The Javadoc deprecation &g

Re: [Flink] Remove FlinkSink for Flink 2.0

2025-03-06 Thread Steven Wu
There was a previous thread for the Flink source. The Javadoc deprecation note for the old `FlinkSource` currently says it will be removed in Iceberg 2.0 release https://lists.apache.org/thread/27kcvo3p86pysk9wrggq4vphzo03sv3l Now regarding deprecating and removing the old `FlinkSink` in favor of

Re: [VOTE] Java implementation notes around current-snapshot-id

2025-02-24 Thread Steven Wu
+1 On Mon, Feb 24, 2025 at 10:13 PM Péter Váry wrote: > +1 > > On Tue, Feb 25, 2025, 04:16 Steve Zhang > wrote: > >> +1 (nb) >> Thanks, >> Steve Zhang >> >> >> >> On Feb 24, 2025, at 6:32 PM, Renjie Liu wrote: >> >> +1 >> >> On Tue, Feb 25, 2025 at 7:00 AM Szehon Ho >> wrote: >> >>> +1 >>> >>

Re: [VOTE] Allow Row-Lineage with Equality Deletes

2025-02-19 Thread Steven Wu
+1 On Wed, Feb 19, 2025 at 2:15 PM Russell Spitzer wrote: > The PR: https://github.com/apache/iceberg/pull/12230 is basically ready > now. So let's do a last vote to make sure everyone is aware of the upcoming > change. > > Before: > Equality deletes are not allowed to be used when row-lineage

Re: [DISCUSS] Consolidate docs under Concepts and Project/Terms

2025-02-18 Thread Steven Wu
>>> Manu, Russel and Steven WDYT? >>> >>> Thanks >>> Manish >>> >>> On Sun, Feb 16, 2025 at 7:00 AM Manu Zhang >>> wrote: >>> >>>> Russell and Steven, >>>> >>>> Thanks for your great su

Re: Remove deprecated table properties

2025-02-18 Thread Steven Wu
ing set. Thoughts? >> On 18.02.25 16:21, Robert Stupp wrote: >> >> Agree with both Steve's. Personally, I'm okay with removing those >> properties - but using the proposed phased approach. >> On 17.02.25 23:25, Steven Wu wrote: >> >> I have some con

Re: Remove deprecated table properties

2025-02-17 Thread Steven Wu
I have some concerns on the issue of silent behavior change that Steve Zhang raised in the PR comment. E.g., users may set the location based on the deprecated table property, With this change, it would silently switch to a new location. This can potentially mess up orphan file cleanup etc. Maybe

Re: [Discuss] Print un-pretty metadata JSON files without whitespace

2025-02-17 Thread Steven Wu
+1. it seems reasonable to produce unpretty json by default. On Mon, Feb 17, 2025 at 8:35 AM Fokko Driesprong wrote: > Hey Ian, > > Thanks for raising this. The numbers you mention, do you know if this was > compressed or uncompressed? > > I have read other issues in github which mention gigabyt

Re: [DISCUSS] Consolidate docs under Concepts and Project/Terms

2025-02-14 Thread Steven Wu
pts include all the technical details > * Spec > Terms > > On Thu, Feb 13, 2025 at 11:27 AM Steven Wu wrote: > >> nm. Found the "Terms" from the left navigation menu under "Project". >> Earlier, I was looking for it from the "Project" tab which

Re: FileRewrite API refactor

2025-02-13 Thread Steven Wu
the RewritePositionDeleteFiles#FileGroupInfo > and RewriteFileGroup#FileGroupInfo if the community would not find that too > invasive. > > If we agree on the general approach, I could create a PR for the API > refactor first, and we can move forward from there. > What do you

Re: [DISCUSS] Consolidate docs under Concepts and Project/Terms

2025-02-13 Thread Steven Wu
group than "Project" group. On Thu, Feb 13, 2025 at 9:15 AM Steven Wu wrote: > Manu, how to navigate to the "terms" page? I can't find the "terms" link > from the project page. > > It seems reasonable to move the "Terms" content to the "

Re: [DISCUSS] Consolidate docs under Concepts and Project/Terms

2025-02-13 Thread Steven Wu
Manu, how to navigate to the "terms" page? I can't find the "terms" link from the project page. It seems reasonable to move the "Terms" content to the "Concept" page for easier discoverability. On Thu, Feb 13, 2025 at 8:43 AM Manu Zhang wrote: > Does anyone have objections to this change? If no

Re: [VOTE] Add overwriteRequested to RegisterTableRequest in REST spec

2025-02-13 Thread Steven Wu
+1 here. already approved the PR yesterday On Thu, Feb 13, 2025 at 8:17 AM Russell Spitzer wrote: > +1 > > On Wed, Feb 12, 2025 at 5:30 PM Steve Zhang > wrote: > >> Hi Iceberg Community, >> >> I'm working on supporting the registration of iceberg metadata for an >> existing table in the cata

Re: [Discussion] Spec change for Row Lineage - Allow Equality Deletes

2025-02-12 Thread Steven Wu
by >> tracking changes of the identifier fields between snapshots during the >> rewrite and produce changelog files? >> >> I'm not asking to add changelogs support since it is a large design >> choice. Just want to brainstorm it. >> >> On Wed, Feb 12, 2025 a

Re: [Discussion] Spec change for Row Lineage - Allow Equality Deletes

2025-02-11 Thread Steven Wu
I am fine with the proposed spec change. While it "supports/allows" equality deletes, row lineage semantics needn't/can't be maintained properly for equality deletes (compared to position deletes). Gang pointed out a couple issues with the implications. But we have no choice but to live with those

Re: [VOTE] Add RemoveSchemas update type to REST spec

2025-02-11 Thread Steven Wu
+1 On Tue, Feb 11, 2025 at 8:55 AM Russell Spitzer wrote: > +1 > > On Tue, Feb 11, 2025 at 9:15 AM Fokko Driesprong wrote: > >> +1 >> >> Op di 11 feb 2025 om 13:52 schreef Jean-Baptiste Onofré > >: >> >>> +1 (non binding) >>> >>> Regards >>> JB >>> >>> On Tue, Feb 11, 2025 at 3:38 AM Gabor Kasz

Re: Welcome Huaxin Gao as a committer!

2025-02-07 Thread Steven Wu
Congrats, Huaxin! On Fri, Feb 7, 2025 at 9:17 AM karuppayya wrote: > Congratulations Huaxin! > > On Thu, Feb 6, 2025 at 9:34 AM Wing Yew Poon > wrote: > >> Congratulations Huaxin! Awesome! >> >> >> On Thu, Feb 6, 2025 at 9:27 AM Yufei Gu wrote: >> >>> Congrats Huaxin! >>> >>> Yufei >>> >>> >>>

Re: FileRewrite API refactor

2025-02-06 Thread Steven Wu
many > interfaces/classes with similar implementations. This will make it hard to > understand the code. If we push all of the differences down the group level > then we might be better off pushing the writeMaxFileSize to the group level > as well. This way we can get rid of the PlanCo

Re: FileRewrite API refactor

2025-02-04 Thread Steven Wu
At a high level, it makes sense to separate out the planning and execution to promote reusing the planning code across engines. Just to add 4th class to Russel's list 1) RewriteGroup: A Container that holds all the files that are meant to be compacted along with information about them 2) Rewriter:

Re: [VOTE] Update partition stats spec for V3

2025-02-02 Thread Steven Wu
+1 The spec change makes sense. left a question in the PR. On Sun, Feb 2, 2025 at 8:52 PM roryqi wrote: > +1 > > Amogh Jahagirdar <2am...@gmail.com> 于2025年2月2日周日 10:16写道: > >> +1 >> >> On Sat, Feb 1, 2025 at 11:05 AM huaxin gao >> wrote: >> >>> +1 (non-binding) >>> >>> On Sat, Feb 1, 2025 at 8

Re: [DISCUSS/VOTE] Add in ChangeLog Reserved Field IDs to Spec and Decrement Row Lineage Reserved IDs

2025-01-26 Thread Steven Wu
+1 On Sun, Jan 26, 2025 at 3:01 PM John Zhuge wrote: > +1 (non-binding) > > John Zhuge > > > On Sun, Jan 26, 2025 at 2:59 PM Aihua Xu wrote: > >> +1 (non-binding). >> >> Thanks for fixing it. >> >> On Sun, Jan 26, 2025 at 11:30 AM Anton Okolnychyi >> wrote: >> >>> +1 good catch >>> >>> нд, 26

Re: [Discuss][Vote] Spec Change - Add optional field added-rows to Snapshot for Row Lineage

2025-01-15 Thread Steven Wu
+1 On Wed, Jan 15, 2025 at 9:00 AM Russell Spitzer wrote: > Hi Everyone! > > PR: https://github.com/apache/iceberg/pull/11976/files > > Split out from #11948 > > Working on the row-lineage implementation made it clear that we needed a > way to get i

Re: [DISCUSS] - Deprecate Equality Deletes

2024-11-01 Thread Steven Wu
job should fail in this case. On Fri, Nov 1, 2024 at 10:57 AM Steven Wu wrote: > Shani, > > That is a good point. It is certainly a limitation for the Flink job to > track the inverted index internally (which is what I had in mind). It can't > be shared/synchronized with other

Re: [DISCUSS] - Deprecate Equality Deletes

2024-11-01 Thread Steven Wu
50 AM Shani Elharrar wrote: > Even if Flink can create this state, it would have to be maintained > against the Iceberg table, we wouldn't like duplicates (keys) if other > systems / users update the table (e.g manual insert / updates using DML). > > Shani. > > On 1 Nov 2

Re: [VOTE] Release Apache Iceberg 1.7.0 RC0

2024-11-01 Thread Steven Wu
+1 (binding) Verified signature, checksum, license. Did Flink SQL local testing with the runtime jar. Didn't run build because Azure FileIO testing requires Docker environment. On Fri, Nov 1, 2024 at 5:02 AM Fokko Driesprong wrote: > Thanks Russel for running this release! > > +1 (binding) > >

Re: [DISCUSS] - Deprecate Equality Deletes

2024-11-01 Thread Steven Wu
;>> row by UUID. With position deletes each delete is expensive without an >>> index on that UUID. >>> With equality deletes each delete is cheap and while reads/compaction is >>> expensive but when updates are frequent and reads are sporadic that's a >>>

Re: [DISCUSS] - Deprecate Equality Deletes

2024-10-31 Thread Steven Wu
We probably all agree with the downside of equality deletes: it postpones all the work on the read path. In theory, we can implement position deletes only in the Flink streaming writer. It would require the tracking of last committed data files per key, which can be stored in Flink state (checkpoi

Re: [VOTE] Deletion Vectors in V3

2024-10-30 Thread Steven Wu
+1 On Wed, Oct 30, 2024 at 1:07 AM xianjin wrote: > +1 (non binding) > > On Wed, Oct 30, 2024 at 2:28 PM Jean-Baptiste Onofré > wrote: > >> +1 (non binding) >> >> Regards >> JB >> >> On Tue, Oct 29, 2024 at 10:45 PM Anton Okolnychyi >> wrote: >> > >> > Hi folks, >> > >> > We have been discussi

Re: [Discuss] Different file formats for ingestion and compaction

2024-10-25 Thread Steven Wu
agree with Ryan. Engines usually provide override capability that allows users to choose a different write format (than table default) if needed. There are many production use cases that write columnar formats (like Parquet) in streaming ingestion. I don't necessarily agree that it will be common

Re: [DISCUSS] Remove iceberg-pig module ?

2024-10-17 Thread Steven Wu
+1 On Thu, Oct 17, 2024 at 10:44 AM John Zhuge wrote: > +1 (non-binding) > > On Thu, Oct 17, 2024 at 10:21 AM Yufei Gu wrote: > >> +1 for deprecating it in 1.7 >> Yufei >> >> >> On Thu, Oct 17, 2024 at 9:51 AM Ajantha Bhat >> wrote: >> >>> +1 for dropping it. >>> >>> On Thu, Oct 17, 2024 at 8:

Re: [VOTE] Table V3 Spec: Row Lineage

2024-10-10 Thread Steven Wu
+1 On Thu, Oct 10, 2024 at 2:52 PM Yufei Gu wrote: > +1 > Yufei > > > On Thu, Oct 10, 2024 at 3:47 PM Amogh Jahagirdar <2am...@gmail.com> wrote: > >> +1, I've been reviewing this proposal/spec change for a bit and I think >> it's in a good state for the community to work on an implementation. >>

Re: Iceberg View Spec Improvements

2024-10-08 Thread Steven Wu
t case (in the doc they are more abstract/generic >> than the version you shared). Would be great to provide your feedback on >> the assumptions in the doc. >> >> Thanks, >> Walaa. >> >> >> On Tue, Oct 8, 2024 at 9:40 AM Steven Wu wrote: >> >>

Re: Iceberg View Spec Improvements

2024-10-08 Thread Steven Wu
I like to follow up on Russel's suggestion of using a federated catalog for resolving the catalog name/alias problem. I think Russel's idea is that the federated catalog standardizes the catalog names (for referencing). That could solve the problem. There are two cases/ (1) single catalog: there i

Re: [DISCUSS] Iceberg Summit 2025 ?

2024-10-02 Thread Steven Wu
Regarding content, we can have multiple tracks. - technology deep dive: how things work internally especially with new features and innovations - ecosystem: interesting learnings and development from ecosystem integrations - use cases: success story, learnings, limitations from different industries

Re: [DISCUSS] Iceberg Materialzied Views

2024-10-01 Thread Steven Wu
ue I'd like to bring up about using UUIDs which is >> that these UUIDs are client generated and there's no validation that they >> are indeed globally unique identifiers. The catalog just persists whatever >> it is given without validating that the UUIDs are indeed

Re: [DISCUSS] Iceberg Summit 2025 ?

2024-09-28 Thread Steven Wu
+1 for hybrid with in-person elements. On Sat, Sep 28, 2024 at 4:23 PM Matt Topol wrote: > +1 from me as well, I would love to attend an in person/hybrid iceberg > summit. Workshops seem like a perfect way to help the community. > > On Sat, Sep 28, 2024, 7:11 PM Honah J. wrote: > >> +1 on hosti

Re: [DISCUSS] Modify ThreadPools.newWorkerPool to avoid unnecessary Shutdown Hook registration

2024-09-27 Thread Steven Wu
24 at 1:52 AM Jean-Baptiste Onofré > wrote: > >> Hi Steven, >> >> I agree with you here. I think we can use semantics similar to >> ThreadPoolExecutor/ScheduledThreadPoolExecutor (like >> newFixedThreadPool, newWorkStealingPool, ...). >> >> Regards >

Re: [DISCUSS] Iceberg Materialzied Views

2024-09-25 Thread Steven Wu
t;sql": "SELECT\n COUNT(1), CAST(event_ts AS DATE)\nFROM >>>>>>>> events\nGROUP BY 2", >>>>>>>> >>>>>>>> "dialect": "spark", >>>>>>>> >>>>>>>>

Re: [DISCUSS] Modify ThreadPools.newWorkerPool to avoid unnecessary Shutdown Hook registration

2024-09-25 Thread Steven Wu
First, we should definitely add Javadoc to `ThreadPools.newWorkerPool` on its behavior with a shutdown hook. It is not obvious from the method name. I would actually go further to deprecate `newWorkerPool` with `newExitingWorkerPool`. `newWorkerPool` method name is easy to cause the misuage, as the

Re: [VOTE] Drop Python3.8 Support in PyIceberg 0.8.0

2024-09-23 Thread Steven Wu
+1 (binding). makes sense. On Mon, Sep 23, 2024 at 9:38 AM Yufei Gu wrote: > +1 Thanks for bringing this up. > > Yufei > > > On Mon, Sep 23, 2024 at 9:27 AM Kevin Liu wrote: > >> +1 non-binding. Thanks for starting this conversation! >> >> >> On Fri, Sep 20, 2024 at 2:02 PM Sung Yun wrote: >>

Re: Code structuring question

2024-09-19 Thread Steven Wu
I'll share my take on this. My first choice would be leveraging the Java access modifiers, which enforce the visibility by the programming language. Users won't see non-public classes at all. That is best for the users. Peter mentioned the potential downside of collocating 50 classes under one pac

Re: [Discuss] test logging is broken and Avro 1.12.0 upgraded slf4j-api dep to 2.x

2024-09-16 Thread Steven Wu
ase, I think the best path forward is to upgrade to 2.x but not >> use the new API features that will cause problems if downstream libraries >> are not already on 2.x. >> >> Does that sound reasonable? >> >> On Wed, Sep 11, 2024 at 11:17 AM Steven Wu wrote: &g

Re: [Discuss] test logging is broken and Avro 1.12.0 upgraded slf4j-api dep to 2.x

2024-09-11 Thread Steven Wu
lf4j-api has never been broken."* On Mon, Sep 9, 2024 at 9:22 AM Steven Wu wrote: > Bump the thread to bring the awareness of the issue and implication of > slf4j 2.x upgrade. > > On Mon, Aug 26, 2024 at 12:24 PM Steve Zhang > wrote: > >> I believe dependabot tried

  1   2   3   >