Re: [DISCUSS] Ambiguity over 'total-position-deletes' as V2 (Legacy) or V3 (Deletion Vectors) in Scan Planning

2025-07-27 Thread Manu Zhang
Hi Jordan, FYI, Anton explained his rationale of not adding total-dvs in the original PR. [1]. You may also refer to iceberg-java's implementation[2] for scan planning, which looks straight forward to handle both position deletes and deletion vectors. I'm curious which language you are building y

Re: Iceberg 1.10.0 release update - July 1, 2025

2025-07-24 Thread Manu Zhang
gt;>>> On Wed, 16 Jul 2025 at 04:03, Ajantha Bhat >>>>>>> wrote: >>>>>>> >>>>>>>> I have approached Confluent people >>>>>>>> <https://github.com/apache/iceberg/issues/10745#issuecomment-3058281281> >

Re: [ANNOUNCE] Welcome Prashant Singh as a new Apache Iceberg Committer

2025-07-22 Thread Manu Zhang
Congrats, Prashant! On Wed, Jul 23, 2025 at 7:47 AM Yuya Ebihara wrote: > Big congrats, Prashant! :) > > On Wed, Jul 23, 2025 at 6:40 AM Raúl Cumplido wrote: > >> Congratulations Prashant! >> >> El mar, 22 jul 2025, 22:51, Honah J. escribió: >> >>> Congratulations, Prashant!!! >>> >>> On Tue,

Re: [DISCUSS] Restructuring Docs side navigation

2025-07-22 Thread Manu Zhang
gt; >>>> I really like the new organization of the navigation panel. It is too >>>> congested currently. >>>> >>>> On Wed, Jul 9, 2025 at 7:22 AM Jean-Baptiste Onofré >>>> wrote: >>>> >>>>> Hi Manu >>&

Re: [ANNOUNCE] Apache Iceberg release 1.9.2

2025-07-17 Thread Manu Zhang
Thanks Prashant for driving the release. Can you also create a new release at https://github.com/apache/iceberg/releases and update the links in the release notes? Regards, Manu On Fri, Jul 18, 2025 at 2:57 AM Prashant Singh wrote: > I'm pleased to announce the release of Apache Iceberg 1.9.2!

Re: Iceberg 1.10.0 release update - July 1, 2025

2025-07-14 Thread Manu Zhang
ith 25 closed PRs). >> Amogh is actively working on the last blocker PR. >> Spark 4.0: Preserve row lineage information on compaction >> <https://github.com/apache/iceberg/pull/13555> >> >> I will publish a release candidate after the above blocker is merged and >>

Re: [DISCUSS] Restructuring Docs side navigation

2025-07-08 Thread Manu Zhang
; I do agree that we need to restructure the site to make things less >> cluttered and easier to find. Thanks Manu/Peter for working on this. >> It would be great to have a few more reviewers that can help out here and >> add their opinions about the restructure. >> >> Thank

Re: Iceberg 1.10.0 release update - July 1, 2025

2025-07-07 Thread Manu Zhang
Hi Amogh, Is it defined in the table spec that "replace" operation should carry over existing lineage info insteading of assigning new IDs? If not, we'd better firstly define it in spec because all engines and implementations need to follow it. On Tue, Jul 8, 2025 at 11:44 AM Amogh Jahagirdar <2a

Re: cleanExpiredMetadata in RemoveSnapshots

2025-07-07 Thread Manu Zhang
ons >> for this purpose. I feel that we should still give them the new >> functionality to clean expired metadata (specs, schemas) by extending the >> Spark and Flink procedures. >> >>> >> >>> Regards, >> >>> Gabor >> >>>

Re: [DISCUSS] June board report

2025-06-06 Thread Manu Zhang
Hi Ryan, Thanks for drafting the report. We also removed Spark 3.3 support and Hadoop 2 dependency in Java 1.9 release. Thanks, Manu Kevin Liu 于2025年6月7日 周六06:30写道: > Hey Ryan, > > Thanks for putting this together. I have a few minor comments. > > > Rust 0.5.0 was released on 2025-05-26 > 0.5.0

Re: [DISCUSS] Restructuring Docs side navigation

2025-06-04 Thread Manu Zhang
Hi all, I know you've been busy finalizing v3 spec and discussing new features in v4 spec. When you find time, could you take a look at this as well? I think well-organized docs are also important to further grow the project and the community. Thanks, Manu On Mon, May 26, 2025 at 11:01 AM

[DISCUSS] Restructuring Docs side navigation

2025-05-25 Thread Manu Zhang
Hi all, I’ve noticed that the current Docs side navigation on the Iceberg website is still primarily structured around Iceberg Java. However, the project has evolved significantly, with many new language implementations and integrations. I’d like to highl

Re: [VOTE] Add commit timestamp to CommitReport

2025-05-21 Thread Manu Zhang
wrote: > +0 > > I don't see much value compared to reading the table metadata via snapshot > id. > I'm not against it, but it seems a bit redundant. Giving direct access > via REST Catalog could be discussed also. > > Regards > JB > > On Thu, May 8, 2

Re: [VOTE] Add commit timestamp to CommitReport

2025-05-20 Thread Manu Zhang
nt. > > Given this, I would currently vote -0 (with a strong preference for not > including additional information where it's not necessary). > > -Dan > > > > On Fri, May 16, 2025 at 8:11 AM Manu Zhang > wrote: > >> Thanks Yufei. I'm still seeking more

Re: [VOTE] Adopt the v3 spec changes

2025-05-20 Thread Manu Zhang
+1 (non-binding). Thanks Ryan for driving this and everyone contributing to the new features. Regards, Manu Péter Váry 于2025年5月20日 周二20:14写道: > +1 (binding) > Well done everyone who was working on this! > > Fokko Driesprong ezt írta (időpont: 2025. máj. 20., K, > 8:49): > >> +1 (binding) >> >>

Re: [VOTE] Add commit timestamp to CommitReport

2025-05-16 Thread Manu Zhang
Thanks Yufei. I'm still seeking more votes here. Manu On Wed, May 14, 2025 at 1:37 AM Yufei Gu wrote: > +1 I'm OK to add it as long as it's optional. > > Yufei > > > On Mon, May 12, 2025 at 8:47 PM Manu Zhang > wrote: > >> Hi all, >> >

Re: [VOTE] Add commit timestamp to CommitReport

2025-05-12 Thread Manu Zhang
your vote. Feel free to ping me if you have any questions. Thanks, Manu On Fri, May 9, 2025 at 12:01 AM Manu Zhang wrote: > Hi all, > > I'd like to start a vote to add commit timestamp `timestamp-millis` to > CommitReport in PR 12990 <https://github.com/apache/iceberg

[VOTE] Add commit timestamp to CommitReport

2025-05-08 Thread Manu Zhang
Hi all, I'd like to start a vote to add commit timestamp `timestamp-millis` to CommitReport in PR 12990 . The timestamp info is valuable to schedule maintenance jobs, but currently we need to look it up from the snapshot metadata table. Please take a

Re: [DISCUSS] Finalizing the v3 spec

2025-05-06 Thread Manu Zhang
hings that are noted as part of v3 in the spec. The major additions are > new types, DVs, and row lineage. > > Ryan > > On Tue, May 6, 2025 at 3:32 AM Manu Zhang wrote: > >> I'm wondering what changes we are voting for here. Is it everything >> related to >> htt

Re: [DISCUSS] Finalizing the v3 spec

2025-05-06 Thread Manu Zhang
ing >> the readers/writers. After some discussion on the PR, we've decided to >> leave out the multi-arg bucket transform so the V3 spec can be finalized. >> So V3 only contains the scaffolding for multi-arg transforms. >> >>>> >> >>>>> For Ic

Re: [DISCUSS] Finalizing the v3 spec

2025-04-29 Thread Manu Zhang
Agree with Russell and JB that we make a "RC" release for V3 spec to test implementations, compatibility, etc before finalizing it. Thanks, Manu On Wed, Apr 30, 2025 at 12:24 PM Jean-Baptiste Onofré wrote: > Hi Ryan > > It sounds good. > > About multi-args transforms, with the clarification we

Re: [ANNOUNCE] Apache Iceberg release 1.9.0

2025-04-28 Thread Manu Zhang
Thanks Ajantha for making the release. Have we updated the latest version on the website? Regards, Manu Ajantha Bhat 于2025年4月28日 周一15:05写道: > I'm pleased to announce the release of Apache Iceberg 1.9.0! > > Apache Iceberg is an open table format for huge analytic datasets. Iceberg > delivers hig

Re: [VOTE] Release Apache PyIceberg 0.9.1rc1

2025-04-26 Thread Manu Zhang
+1 (non-binding) Built and tested with python 3.10. Thanks, Manu On Sat, Apr 26, 2025 at 12:02 PM Jean-Baptiste Onofré wrote: > +1 (non binding) > > I checked: > - LICENSE and NOTICE are good in the source distribution > - ASF header present in all expected files > - Hash and checksum are good

Re: [DISCUSS] Table Identifiers in Iceberg View Spec

2025-04-24 Thread Manu Zhang
> > For example, if we want to validate that the tables referenced in the view > exist, how can we do that when default-catalog isn't defined, since the > view hasn't been created or loaded yet? I don't think this is related to view spec. How do we validate that a table exists without a default ca

[DISCUSS] Release 1.8.2?

2025-04-21 Thread Manu Zhang
Hi all, I thought we had a consensus on releasing 1.8.2 and volunteered to be the release manager following these discussions[1][2]. However, when working with Fokko to make a release, he expressed concerns over the release. Let me quote his words here. I did some checks, and it looks like the vu

Re: [VOTE] Small spec change for default values

2025-04-21 Thread Manu Zhang
+1 (non-binding) except for some ambiguity between struct field and fields within struct (Russell already made a nice suggestion). Thanks, Manu On Tue, Apr 22, 2025 at 7:10 AM Amogh Jahagirdar <2am...@gmail.com> wrote: > +1 (binding) > > On Mon, Apr 21, 2025 at 3:38 PM Russell Spitzer > wrote:

Re: [DISCUSS] Fix CVE-2025-30065 on 1.8.x / 1.7.x / 1.6.x?

2025-04-15 Thread Manu Zhang
t; Ryan >> >> On Mon, Apr 14, 2025 at 2:49 AM Jean-Baptiste Onofré >> wrote: >> >>> Hi Manu, >>> >>> See my comments from few days ago (in the 1.9.x release discussion): >>> https://lists.apache.org/thread/4c4hg85c8qxq4cznp3drnyro88qp0rjr

Re: [DISCUSS] Fix CVE-2025-30065 on 1.8.x / 1.7.x / 1.6.x?

2025-04-13 Thread Manu Zhang
962430525040a2a5c3eed0fb5848a3R293> > method and ran the test suite of the parquet module, but it didn't > trigger on my end. > > Kind regards, > Fokko > > > Op za 12 apr 2025 om 16:50 schreef Manu Zhang : > >> Hi all, >> >> https://nvd.nist.gov/

[DISCUSS] Fix CVE-2025-30065 on 1.8.x / 1.7.x / 1.6.x?

2025-04-12 Thread Manu Zhang
Hi all, https://nvd.nist.gov/vuln/detail/CVE-2025-30065 (10.0 critical) has been fixed on the main branch for 1.9+ (upgrade parquet to 1.15.1). Shall we fix on 1.8.x, 1.7.x and 1.6.x? There's an open issue[1] and PRs for 1.7.x[2] and 1.6.x[3] 1. https://github.com/apache/iceberg/issues/12749 2.

Re: cleanExpiredMetadata in RemoveSnapshots

2025-03-19 Thread Manu Zhang
I think a catalog service can also use Spark/Flink procedures for table maintenance, to utilize existing systems and cluster resources. If we no longer support new functionality in Spark/Flink procedures, we are effectively deprecating them, right? Gabor Kaszab 于2025年3月20日 周四00:07写道: > Thanks fo

Re: [Discuss] Apache Iceberg 1.9.0 release

2025-03-18 Thread Manu Zhang
Hi Ajantha, Thanks for driving the release. Can we include https://github.com/apache/iceberg/pull/12120? On Tue, Mar 18, 2025 at 3:18 AM Steve Loughran wrote: > > Can I get this reviewed and merged; gives all hadoop filesystems with bulk > delete calls the ability to issue bulk deletes up to th

Re: [VOTE] Release Apache Iceberg 1.7.2 RC1

2025-03-04 Thread Manu Zhang
Hi JB, Is the tag apache-iceberg-1.7.2-rc1 created on main branch[1] instead of on 1.7.x[2]? 1. https://github.com/apache/iceberg/commit/6323aa9405f23e3992f243b1134cbafdbb24d73c 2. https://github.com/apache/iceberg/commit/f057e877bdbdb6a3361d45b777b8b1a8e56ee816 On Tue, Mar 4, 2025 at 5:17 PM J

Re: Time-based partitioning on long column type

2025-02-27 Thread Manu Zhang
, Sep 10, 2024 at 8:50 AM rdb...@gmail.com >> wrote: >> >>> Maybe we could update the time-based partition functions to be applied >>> to a long column directly. It would treat that column like a timestamp in >>> milliseconds. Would that work? I need to think m

Re: [VOTE] Allow Row-Lineage with Equality Deletes

2025-02-19 Thread Manu Zhang
+1 (non-binding) Regards Manu On Thu, Feb 20, 2025 at 2:57 PM Jean-Baptiste Onofré wrote: > +1 > > Regards > JB > > On Wed, Feb 19, 2025 at 11:13 PM Russell Spitzer > wrote: > > > > The PR: https://github.com/apache/iceberg/pull/12230 is basically > ready now. So let's do a last vote to make

Re: [DISCUSS] Spark 3.3 support?

2025-02-18 Thread Manu Zhang
Since 1.8.0 has been released, I submitted https://github.com/apache/iceberg/pull/12279 to remove Spark 3.3 support on the main branch. Please help review. Thanks, Manu On Wed, Nov 20, 2024 at 6:32 AM Anton Okolnychyi wrote: > Here we go then: > > https://github.com/apache/iceberg/pull/11596 >

Re: [DISCUSS] Consolidate docs under Concepts and Project/Terms

2025-02-16 Thread Manu Zhang
ich shows the >>>> "Community" page. >>>> >>>> Anyway, I agree that the content of "Terms" is more aligned with the >>>> "Concept" group than "Project" group. >>>> >>>> On Thu, Feb 13, 2025 at 9:15 A

Re: [DISCUSS] Consolidate docs under Concepts and Project/Terms

2025-02-13 Thread Manu Zhang
Does anyone have objections to this change? If not, I'm going to open a PR. On Tue, Feb 11, 2025 at 11:43 AM Manu Zhang wrote: > Hi all, > > On the website, we have docs under Concepts with only Catalogs[1] and > Project/Terms[2] serving similar purposes. Do you think it

Re: [VOTE] Deprecate or remove distinct_count

2025-02-10 Thread Manu Zhang
Hi Jacob, Thanks for initiating the vote. Typically, we would first have a DISCUSSION thread to reach a consensus on the preferred option and then follow it up with a VOTE thread for confirmation. Maybe we can take this as a DISCUSSION thread? Best, Manu On Tue, Feb 11, 2025 at 7:20 AM Jacob M

[DISCUSS] Consolidate docs under Concepts and Project/Terms

2025-02-10 Thread Manu Zhang
Hi all, On the website, we have docs under Concepts with only Catalogs[1] and Project/Terms[2] serving similar purposes. Do you think it's a good idea to consolidate the two pages? For example, move all items under Terms to Concepts for better visibility. 1. https://iceberg.apache.org/concepts/ca

Re: [DISCUSS] Apache Iceberg (java) 1.8.0 release

2025-02-10 Thread Manu Zhang
e milestone. The VOTE is > out, so we can start verifying. > > Kind regards, > Fokko > > Op ma 10 feb 2025 om 05:41 schreef Manu Zhang : > >> There's still https://github.com/apache/iceberg/pull/11216 under 1.8.0 >> milestone. >> Do we want to include it? &g

[DISCUSS] Table name in table metadata

2025-02-09 Thread Manu Zhang
Hi all, >From time to time, users ask me about the status of their Iceberg tables by sending me a *path*, which they've received in a file system alert email. Usually I look for the corresponding *table name *and query metadata tables through Spark SQL. However, it's not easy to find the table nam

Re: [DISCUSS] Apache Iceberg (java) 1.8.0 release

2025-02-09 Thread Manu Zhang
There's still https://github.com/apache/iceberg/pull/11216 under 1.8.0 milestone. Do we want to include it? On Sun, Feb 9, 2025 at 3:01 PM Jean-Baptiste Onofré wrote: > Thanks Amogh > > I updated the PR with some cleanups. > > Regards > JB > > On Sun, Feb 9, 2025 at 4:04 AM Amogh Jahagirdar <2am

Re: [VOTE] Simplify multi-arg table metadata

2025-02-09 Thread Manu Zhang
+1 (non-binding) On Mon, Feb 10, 2025 at 10:25 AM roryqi wrote: > +1 > > xianjin 于2025年2月10日周一 10:02写道: > >> +1 (non-binding) >> >> On Mon, Feb 10, 2025 at 2:03 AM Hussein Awala wrote: >> >>> +1 (non-binding) >>> >>> On Sun, Feb 9, 2025 at 6:15 PM Matt Topol >>> wrote: >>> +1 (non-bindin

Re: [VOTE] Release Apache Iceberg 1.7.2 rc0

2025-01-27 Thread Manu Zhang
Hi JB, Thanks for driving the release. It looks the 1.7.2 milestone has more changes than diffs between 1.7.2-rc0 and 1.7.1

Re: [VOTE] Document Snapshot Summary Optional Fields as Subsection of Appendix F in Spec

2025-01-21 Thread Manu Zhang
+1 (non-binding) Thanks & Regards On Wed, Jan 22, 2025 at 8:06 AM Daniel Weeks wrote: > +1 (binding) > > On Tue, Jan 21, 2025 at 1:05 PM Szehon Ho wrote: > >> +1 (binding) >> >> Thanks >> Szehon >> >> On Tue, Jan 21, 2025 at 12:55 PM Yufei Gu wrote: >> >>> +1 Thanks Honah! >>> >>> Yufei >>> >

Re: [DISCUSS] Support keeping at most N snapshots

2025-01-21 Thread Manu Zhang
understand what you're trying to achieve here and I feel like >>> the most important part is to have an updated version of the retention >>> procedure <https://iceberg.apache.org/spec/#snapshot-retention-policy> to >>> clearly state how this interacts with

Re: [DISCUSS] Support keeping at most N snapshots

2025-01-17 Thread Manu Zhang
The intention is to set an upper limit for table size while keeping as much snapshots as possible. Setting max-snapshot-age-ms to a small value will lose history for some tables while setting min-snapshots-to-keep to a medium value will keep too much history for others. Lewis, William 于2025年1月18日

Re: [DISCUSS] Support keeping at most N snapshots

2025-01-16 Thread Manu Zhang
Hi all, Do you have more comments on this feature? Do you have concerns about adding a new field to SnapshotRef? Thanks, Manu On Tue, Jan 7, 2025 at 2:37 PM Manu Zhang wrote: > Hi Ajantha, > > `history.expire.min-snapshots-to-keep` is the *minimum number of > snapshots* we c

Re: [VOTE] Drop Hive runtime

2025-01-10 Thread Manu Zhang
This vote has passed with the following results: 3 +1 binding votes 2 +1 non-binding votes Thanks to everyone who participated in the discussions! Manu On Sun, Jan 5, 2025 at 1:15 PM Matt Topol wrote: > +1 (non-binding) > > On Sat, Jan 4, 2025, 11:20 PM Manu Zhang wrote: > >&

Re: [DISCUSS] Hive Support

2025-01-07 Thread Manu Zhang
Thanks Wing Yew for filling in the missing part. > > The built-in version is also used for other things that Spark may use from > Hive (aside from interaction with HMS), such as Hive SerDes. AFAIK, this is blocking Spark itself from upgrade the built-in version to Hive 4. Thanks Peter for recap.

Re: [DISCUSS] Support keeping at most N snapshots

2025-01-06 Thread Manu Zhang
gular. Maintaining by snapshot count makes >> a lot of sense and prevents table sizes from growing excessively when >> change rate is frequent. >> >> Thanks, >> Walaa. >> >> >> On Mon, Jan 6, 2025 at 8:38 PM Manu Zhang >> wrote: >> >&g

[DISCUSS] Support keeping at most N snapshots

2025-01-06 Thread Manu Zhang
Hi all, While maintaining Iceberg tables for our customers, I find it's difficult to set a default snapshot expiration time (`history.expire.max-snapshot-age-ms`) for different workloads. The default value of 5 days looks good for daily batch jobs but is too long for frequently-updated jobs. I'm

Re: [VOTE] Drop Hive runtime

2025-01-04 Thread Manu Zhang
>> I did a pass on the PRs and they look good to me. >> >> Thanks Manu ! >> Regards >> JB >> >> On Wed, Dec 18, 2024 at 2:59 AM Manu Zhang >> wrote: >> > >> > Hi all, >> > >> > Thanks for sharing your ideas in the

Re: [DISCUSS] Hive Support

2025-01-04 Thread Manu Zhang
ions > which are used by Spark, and we need to exclude our own Hive version from > the Spark runtime. > > On Thu, Dec 19, 2024, 04:00 Manu Zhang wrote: > >> Hi Peter, >> >>> I think we should make sure that the Iceberg Hive version is independent >>> f

Re: [DISCUSS] Add a implementation status page for iceberg

2024-12-23 Thread Manu Zhang
Thanks Renjie for landing the page and others for the review! Merry Christmas! | -+- A /=\ /\ /\___ _ __ _ __ ____ i/ O \i/ \/ \ / _ \| '__|| '__|\ \ / / /=\ / /\ /\ \| __/| | | |\ \/ /

Re: [DISCUSS] Hive Support

2024-12-18 Thread Manu Zhang
for Java version issues. As long as the API is >> compatible (and we haven't heard complaints that it is not) then I think >> users can override the version in their environments. >> >> Ryan >> >> On Sun, Dec 15, 2024 at 5:55 PM Manu Zhang >> wrote: >

[VOTE] Drop Hive runtime

2024-12-17 Thread Manu Zhang
Hi all, Thanks for sharing your ideas in the discussion of Hive support[1]. We have a consensus to drop Hive runtime and upgrade Hive metastore connector to Hive 4. However, it looks like we can't upgrade metastore support till Spark 4[2]. Hence, I went on to create a separate PR to remove Hive ru

Re: [DISCUSS] Hive Support

2024-12-15 Thread Manu Zhang
Thu, Dec 12, 2024 at 11:03 AM Daniel Weeks wrote: >> >>> Hey Manu, >>> >>> I agree with the direction here, but we should probably hold a quick >>> procedural vote just to confirm since this is a significant change in >>> support for Hive. >>

Re: [DISCUSS] Hive Support

2024-12-11 Thread Manu Zhang
Thanks all for sharing your thoughts. It looks there's a consensus on upgrading to Hive 4 and dropping hive-runtime. I've submitted a PR[1] as the first step. Please help review. 1. https://github.com/apache/iceberg/pull/11750 Thanks, Manu On Thu, Nov 28, 2024 at 11:26 PM Shohei Okumiya wrote:

Re: New committer: Scott Donnelly

2024-12-11 Thread Manu Zhang
Congratulations Scott! Thanks, Manu On Wed, Dec 11, 2024 at 3:21 PM Eduard Tudenhöfner wrote: > Congrats Scott! > > On Wed, Dec 11, 2024 at 7:35 AM roryqi wrote: > >> Congrats! >> >> Fenil Jain 于2024年12月11日周三 14:26写道: >> >>> Congratulations Scott! >>> >>> On Wed, Dec 11, 2024 at 8:56 AM Renji

Re: New committer: Matt Topol

2024-12-10 Thread Manu Zhang
Congratulations, Matt! Thanks, Manu On Wed, Dec 11, 2024 at 6:16 AM Steve Zhang wrote: > Congrats Matt! > > Thanks, > Steve Zhang > > > > On Dec 10, 2024, at 7:24 AM, Gang Wu wrote: > > Congrats Matt! > > >

Re: Storing catalog directly on object store

2024-11-27 Thread Manu Zhang
I think one major issue with current HadoopCatalog is that there's no way to manage tables by name. If adding one metadata layer on top of it, we need to handle more consistency challenges. Manu On Wed, Nov 27, 2024 at 8:03 PM Gabor Kaszab wrote: > Hi All, > > Xuanwo, I recall the reasoning aga

[DISCUSS] Enforce table properties at catalog level

2024-11-27 Thread Manu Zhang
Hi all, Currently, we can *enforce default table properties* at catalog level with configs like spark.sql.catalog.*catalog-name*.table-override.*propertyKey*[1]. It prevents users from overriding those properties when creating a table. However, users can still override later through altering the

Re: [Discuss] Simplify tableExists API in HiveCatalog

2024-11-27 Thread Manu Zhang
> > The current behavior's intent is not to check whether the metadata is > valid, it is to detect whether the table is an Iceberg table. Is there a way to detect this from HiveCatalog without loading the table? On Wed, Nov 27, 2024 at 2:01 PM Péter Váry wrote: > I think we have an agreement,

Re: [DISCUSS] Hive Support

2024-11-22 Thread Manu Zhang
as well. > > Kind regards, > Fokko > > Op vr 22 nov 2024 om 07:38 schreef Péter Váry >: > >> I would prefer B, and only revert to A if we find that B becomes too >> complicated. >> >> On Fri, Nov 22, 2024, 04:26 Manu Zhang wrote: >> >>> Hi Pe

Re: [DISCUSS] Hive Support

2024-11-21 Thread Manu Zhang
t; > Thanks, > Peter > > Jean-Baptiste Onofré ezt írta (időpont: 2024. nov. 21., > Cs, 14:21): > >> Hi Manu >> >> It sounds like a plan. I think it makes sense to drop Hive 2 & 3 and >> encourage use of Hive 4 (mostly documentation task). >> >&

Re: [DISCUSS] Hive Support

2024-11-20 Thread Manu Zhang
ache/iceberg/pull/10996, this is Hive 3. Does > this present any problem? > > > On Tue, Nov 19, 2024 at 10:26 PM Manu Zhang > wrote: > >> To clarify, the changes discussed here don't affect hive connectors in >> engines, which either use the built-in hive version (S

Re: [VOTE] Deprecate and remove last-column-id

2024-11-20 Thread Manu Zhang
Thanks Fokko. To be clear, are you proposing to deprecate last-column-id in 1.8.0 and remove in 1.9.0+? On Tue, Nov 19, 2024 at 4:18 PM Fokko Driesprong wrote: > Hi everyone, > > Based on the positive feedback on the [DISCUSS] thread >

Re: [DISCUSS] Hive Support

2024-11-20 Thread Manu Zhang
To clarify, the changes discussed here don't affect hive connectors in engines, which either use the built-in hive version (Spark) or can be upgraded to hive 3 (Flink). On Wed, Nov 20, 2024 at 2:19 PM Manu Zhang wrote: > Okay, let me add this option > > D. Drop Hive 2 & 3 supp

Re: [DISCUSS] Hive Support

2024-11-19 Thread Manu Zhang
; > As Hive 2 and 3 do not support Java 11+, and Iceberg 1.8 requires Java > 11+, the combination is invalid. How about simply dropping support for Hive > 2&3 and suggesting the Hive user upgrade Hive 4 to gain the built-in > Iceberg support? > > Thanks, > Cheng Pan > &g

[DISCUSS] Hive Support

2024-11-19 Thread Manu Zhang
Hi all, We previously reached consensus[1] to deprecate Hive 2 in 1.7 and drop in 1.8. However, when working on the removal PR[2], multiple tests failed in Hive 3 due to not supporting JDK11[3]. The fix has been back-ported to branch-3.1[4] but not released yet. As announced on Hive website, Hive

Re: [DISCUSS] - Deprecate Equality Deletes

2024-11-19 Thread Manu Zhang
velopment. >> Hence, my specific question was about kafka connect upsert operation. >> >> @Manu: I meant the delta writers for kafka connect Iceberg sink (which in >> turn used for upsetting the CDC records) >> https://github.com/apache/iceberg/issues/10842 >> >> &g

Re: [DISCUSS] - Deprecate Equality Deletes

2024-11-19 Thread Manu Zhang
I second Anton's proposal to standardize on a view-based approach to handle CDC cases. Actually, it's already been explored in detail[1] by Jack before. [1] Improving Change Data Capture Use Case for Apache Iceberg

Fwd: Notification: Iceberg Community Sync (Recorded) @ Thu Nov 14, 2024 1am - 2am (GMT+8) (Manu Zhang)

2024-11-14 Thread Manu Zhang
:00 PM Subject: Notification: Iceberg Community Sync (Recorded) @ Thu Nov 14, 2024 1am - 2am (GMT+8) (Manu Zhang) To: Manu Zhang Iceberg Community Sync (Recorded) Join with Google Meet – Google Meet Link: https://meet.google.com/ujy-njjo-vreTriweekly Iceberg meeting for anyone wanting to get involved

Re: [VOTE] Release Apache Iceberg 1.7.0 RC0

2024-11-04 Thread Manu Zhang
+1 (non-binding) Build with JDK17 and test with SparkCatalog / SparkSessionCatalog (type hive) on Spark 3.5.0. BTW, according to https://iceberg.apache.org/how-to-release/#validating-a-source-release-candidate , release announcement should include links to GitHub change comparison? On Mon, Nov

Re: [DISCUSS] - Deprecate Equality Deletes

2024-10-30 Thread Manu Zhang
I think Apache Paimon could point us in the direction of supporting streaming upserts use cases. We are already working on some of the building blocks like deletion vectors and Flink compaction. +1 to the proposal since users are not recommended to use equality deletes for streaming upserts anyway

Re: [VOTE] Release Apache Iceberg 1.7.0 RC0

2024-10-30 Thread Manu Zhang
Thanks Russell for making the release. Not a blocker but I really wish this doc change[1] can be included into 1.7.0. [1] https://github.com/apache/iceberg/pull/11417 On Thu, Oct 31, 2024 at 7:00 AM Russell Spitzer wrote: > Convenience binary artifacts are staged on Nexus. The Maven repository

Re: [REVIEW] 1.7.0 Remaining Milestone PR's

2024-10-28 Thread Manu Zhang
Hi Russel, Spark 3.5: Fix NotSerializableException when migrating Spark tables > Given this can't make it in time, I submitted a PR[1] to add a warning in the doc that "parallelism > 1" doesn't work for migration procedures. Please help review it. [

Re: [DISCUSS] Apache Iceberg 1.7.0 Release Cutoff

2024-10-23 Thread Manu Zhang
Can we also include https://github.com/apache/iceberg/pull/11157? Much appreciated if I can get more eyes on it. Thanks, Manu On Wed, Oct 23, 2024 at 11:03 PM Russell Spitzer wrote: > Keep up coming :) I did a pass on Prashant's as well > > On Wed, Oct 23, 2024 at 12:47 AM Jean-Baptiste Onofré

Re: [DISCUSS] Remove iceberg-pig module ?

2024-10-17 Thread Manu Zhang
+1 On Fri, Oct 18, 2024 at 8:50 AM Rodrigo Meneses wrote: > +1 > > On Thu, Oct 17, 2024 at 4:38 PM Bryan Keller wrote: > >> +1 >> >> >> On Oct 17, 2024, at 1:51 PM, Anton Okolnychyi >> wrote: >> >> +1 >> >> чт, 17 жовт. 2024 р. о 13:42 Steven Wu пише: >> >>> +1 >>> >>> On Thu, Oct 17, 2024 at

Re: Meeting Minutes 2024-10-02

2024-10-11 Thread Manu Zhang
Thanks Brian for sharing the notes. Some questions here. Target release date set for October 31st, 2023 > Should be 2024?😀 Proposal to create new Iceberg C++ library approved In this thread[1], Xuanwo and Renjie mentioned iceberg-rust implementation and c++ bindings. Do you have a strong opinion

Re: Spec changes for deletion vectors

2024-10-11 Thread Manu Zhang
Hi Ryan, Do you mean the doc Improve Position Deletes in V3 by Anton? I don't recall Anton used the term "deletion vector" in his proposal. On Sat, Oct 12, 2024 at 12:30 AM Micah Kornfield wrote: > I

Re: [Discuss] Apache Iceberg 1.6.2 release because of Avro CVE ?

2024-10-10 Thread Manu Zhang
Hi Ajantha, There is a bug[1] in migration procedures (e.g. add_files) when the option `parallelism` is larger than 1. I've submitted a fix[2] against the main branch and would like to back-port to 1.6.x. [1] https://github.com/apache/iceberg/issues/11147 [2] https://github.com/apache/iceberg/pul

Re: [Discuss] Replace Hadoop Catalog Examples with JDBC Catalog in Documentation

2024-10-09 Thread Manu Zhang
I'd vote for JDBC catalog as it's simple for a quick-start guide. Setting up a REST Service with docker image could be cumbersome. We can have another page for REST Catalog. Regards, Manu On Thu, Oct 10, 2024 at 2:50 AM Marc Cenac wrote: > I support the idea of updating the docs to replace the

Re: [Notice] Update to catalog sync meeting timezone 2

2024-09-25 Thread Manu Zhang
BTW, have recent community sync recordings been uploaded to YouTube? I only see catalog sync recordings. On Wed, Sep 25, 2024 at 1:56 AM Sung Yun wrote: > Thank you Kevin! > > Sung > > On 2024/09/24 17:51:54 Kevin Liu wrote: > > > https://docs.google.com/document/d/1iPGVCIcr-M0XtAiudOguWAvmqIdVg

[DISCUSS] Spark 3.5.3 breaks Iceberg SparkSessionCatalog

2024-09-22 Thread Manu Zhang
Hi Iceberg and Spark community, I'd like to bring your attention to a recent change[1] in Spark 3.5.3 that effectively breaks Iceberg's SparkSessionCatalog[2] and blocks Iceberg upgrading to Spark 3.5.3[3]. SparkSessionCatalog, as a customized Spark V2 session catalog, supports creating a V1 tabl

Re: [DISCUSS] Row Lineage Proposal

2024-09-13 Thread Manu Zhang
Thanks Russel. Not a question on the proposal itself, I find it a bit hard to follow and maintain all the three specs in one place. We are also publishing a unfinalized spec to the website. Would it be better to maintain the spec in a "copy-on-write" style, i.e. each spec having its own format file

Re: [DISCUSS] Define calendar used in specification?

2024-09-12 Thread Manu Zhang
Spark doc refers to this Java 8 page https://docs.oracle.com/javase/8/docs/api/java/time/chrono/IsoChronology.html, which says > > This chronology defines the rules of the ISO calendar system. This > calendar system is based on the ISO-8601 standard, which is the *de facto* > world > calendar. > +

Re: [DISCUSS] Drop Hive 2 support

2024-09-01 Thread Manu Zhang
remove in 1.8). > Engines still depending to Hive 2 will have to use Iceberg <= 1.6, which > is OK. > > Regards > JB > > On Tue, Aug 27, 2024 at 2:34 AM Manu Zhang > wrote: > > > > Hi all, > > > > I'd like to start a discussion on droppin

Re: [DISCUSS] Drop Hive 2 support

2024-08-28 Thread Manu Zhang
g-api >>> iceberg-bundled-guava >>> iceberg-core >>> iceberg-nessie >>> iceberg-orc >>> iceberg-parquet >>> iceberg-snowflake >>> >>> Best >>> Piotr >>> >>> >>> >>> On Tue, 27 Aug 2024 at

Re: write distribution change when setting local order to a partition table

2024-08-27 Thread Manu Zhang
ake a closer look. >> >> вт, 16 лип. 2024 р. о 06:48 Manu Zhang пише: >> >>> Hi all, >>> >>> When I recently set a local order to a partitioned table, its write >>> distribution was altered from HASH to NONE. That's unexpected but the &g

Re: [DISCUSS] Improving Position Deletes in V3

2024-08-26 Thread Manu Zhang
Anton, Thanks for the write-up. It's very entertaining to read. I left some minor comments on the doc. +1 to the proposal. Regards, Manu On Tue, Aug 27, 2024 at 7:21 AM Steven Wu wrote: > Anton, > > Thanks a lot for the improvement proposal and great write-up with > quantitative supporting arg

Re: [VOTE] Merge REST Spec change to add RemovePartitionSpecsUpdate update type

2024-08-26 Thread Manu Zhang
+1 (non-binding) On Tue, Aug 27, 2024 at 11:00 AM xianjin wrote: > +1 (non-binding) > Sent from my iPhone > > On Aug 27, 2024, at 4:22 AM, Fokko Driesprong wrote: > >  > +1 > > Op ma 26 aug 2024 om 22:00 schreef Yufei Gu : > >> +1 >> Yufei >> >> >> On Mon, Aug 26, 2024 at 11:06 AM Ryan Blue >

[DISCUSS] Drop Hive 2 support

2024-08-26 Thread Manu Zhang
Hi all, I'd like to start a discussion on dropping Hive 2 support, which reached EOL three months ago[1]. It's also a prerequisite for migration to Hadoop 3, as shown by Steve's PR[2]. For your reference, I have a draft PR[3] to show the needed changes. Since we've not deprecated Hive 2 support y

Re: [VOTE] Spec changes in preparation for v3

2024-08-19 Thread Manu Zhang
+1 (non-binding) Micah Kornfield 于2024年8月20日 周二07:44写道: > +1 (non-binding) > > On Mon, Aug 19, 2024 at 4:33 PM Steve Zhang > wrote: > >> +1 (non-binding) >> >> Thanks, >> Steve Zhang >> >> >> >> On Aug 19, 2024, at 1:47 PM, John Zhuge wrote: >> >> +1 (non-binding) >> >> On Mon, Aug 19, 2024 at

Re: [DISCUSS] Variant Spec Location

2024-08-14 Thread Manu Zhang
+1 to copy the spec into our repository. I think the best way to keep compatibility is building integration tests. Thanks, Manu On Wed, Aug 14, 2024 at 8:27 PM Péter Váry wrote: > Thanks Russell and Aihua for pushing Variant support! > > Given the differences between the supported types and the

Re: Flink Table Maintenance - Tag based locking

2024-08-05 Thread Manu Zhang
in the doc, in most cases having concurrent runs are a waste of >> resources, because of the commit conflicts. >> >> If we decide to pursue the Flink only solution, I would implement a JDBC >> based locking implementation for the LockFactory interface based on the >> fee

Re: Flink Table Maintenance - Tag based locking

2024-08-04 Thread Manu Zhang
Not familiar with Flink, I'm wondering how Flink resolves concurrency issues in common Flink use cases. For example, how does Flink prevent two jobs from writing to the same file? On the other hand, an Iceberg tag is eventually an atomic change to a file. It's the same as using a file lock. I don'

Re: [VOTE] Drop Java 8 support in Iceberg 1.7.0

2024-08-04 Thread Manu Zhang
Thanks Ryan for reaching out. It's great to have a path ahead for everyone. Thanks, Manu On Fri, Aug 2, 2024 at 11:55 PM Ryan Blue wrote: > To follow up on this, I also reached out to Manu who was the only -1 vote. > I can understand his concern about forcing people to stay on the 1.6 > release

Re: [DISCUSS] Enable the discussion tab for iceberg github repos

2024-07-31 Thread Manu Zhang
A reminder. GitHub Discussion has been enabled on iceberg-rust and there are already interesting ideas open for discussion . Please weigh in. On Mon, Jul 15, 2024 at 9:39 PM Renjie Liu wrote: > Hi: > > >> But one minor concern

  1   2   >