Re: Welcome Huaxin Gao as a committer!

2025-02-06 Thread Gang Wu
Congrats Huaxin! Best, Gang On Thu, Feb 6, 2025 at 5:10 PM Szehon Ho wrote: > Hi everyone, > > The Project Management Committee (PMC) for Apache Iceberg has > invited Huaxin Gao to become a committer, and I am happy to announce that > she has accepted. Huaxin has done a lot of impressive work

Re: [discuss] Standardizing Naming Schemes for Language-Specific Configurations

2025-01-23 Thread Gang Wu
Generally it makes sense to define separate language-specific configurations. I think we need to think about the following items: 1. Is it python-specific to add the prefix? Should Rust/Go be -rs/-go as the convention? 2. Which part of the spec is the best place to describe this? It seems that we

Re: [Discuss][Vote] Spec Change - Add optional field added-rows to Snapshot for Row Lineage

2025-01-15 Thread Gang Wu
+1 (non-binding) On Thu, Jan 16, 2025 at 2:30 PM Péter Váry wrote: > +1 > > Steven Wu ezt írta (időpont: 2025. jan. 16., Cs, > 0:46): > >> +1 >> >> On Wed, Jan 15, 2025 at 9:00 AM Russell Spitzer < >> russell.spit...@gmail.com> wrote: >> >>> Hi Everyone! >>> >>> PR: https://github.com/apache/ic

Re: [DISCUSS] Relocate Parquet to Iceberg Core

2024-12-18 Thread Gang Wu
IIUC, iceberg-parquet depends on iceberg-arrow for the vectored reader implementation (though partially supported). Should we relocate iceberg-arrow together? Since I have mentioned that the vectored reader implementation is partially supported, is it a direction that needs to be improved? There i

Re: [DISCUSS] December board report

2024-12-11 Thread Gang Wu
For C++, I think it is aimed for a full featured C++ library (not for puffin implementation only). On Thu, Dec 12, 2024 at 6:14 AM rdb...@gmail.com wrote: > I'll update it. Thanks! > > (By the way, the Avro default value support was in the Java section) > > On Wed, Dec 11, 2024 at 2:00 PM Matt T

Re: New committer: Matt Topol

2024-12-10 Thread Gang Wu
Congrats Matt! On Tue, Dec 10, 2024 at 8:57 PM Sung Yun wrote: > Congratulations Matt! > > On 2024/12/10 12:49:25 Alex Dutra wrote: > > Congratulations, Matt! Go!! > > > > On Tue, Dec 10, 2024 at 1:08 PM Péter Váry > > wrote: > > > > > Congratulations Matt! > > > > > > On Tue, Dec 10, 2024, 12:

Re: [DISCUSS] Additional language implementations for Iceberg Puffin reader/writer

2024-11-22 Thread Gang Wu
umplido wrote: > >> This sounds awesome. I am looking forward to the slack channel being >> available so I can also help! >> >> El vie, 22 nov 2024 a las 10:03, Gang Wu () escribió: >> > >> > Thanks for the support, Fokko and JB! >> > >> &g

Re: [DISCUSS] Additional language implementations for Iceberg Puffin reader/writer

2024-11-22 Thread Gang Wu
from > the Impala community we could add some additional auxiliary functionality > for the V3 positional deletes later on. > > 2) I learned that a part of the community is interested in having a C++ > implementation of the Iceberg lib in general for their C++ engine. cc @Gang &g

Re: [DISCUSS] Additional language implementations for Iceberg Puffin reader/writer

2024-11-22 Thread Gang Wu
d that a part of the community is interested in having a C++ > implementation of the Iceberg lib in general for their C++ engine. cc @Gang > Wu > > There seemed to be general support from the community to start up such a > sub-project, so I'm reaching out now to ask for some gu

Re: [DISCUSS] - Deprecate Equality Deletes

2024-10-30 Thread Gang Wu
Thanks Russell for bringing this up! +1 on deprecating equality deletes. IMHO, this is something that should reside only in the ingestion engine. Best, Gang On Thu, Oct 31, 2024 at 5:07 AM Russell Spitzer wrote: > Background: > > 1) Position Deletes > > > Writers determine what rows are delet

Re: [VOTE] Deletion Vectors in V3

2024-10-29 Thread Gang Wu
+1 (non-binding) Best, Gang On Wed, Oct 30, 2024 at 5:46 AM Anton Okolnychyi wrote: > Hi folks, > > We have been discussing the new layout for position deletes in V3 for a > while now. It seems the community reached consensus. I'd like to start a > vote on adding deletion vectors to the V3 spec

Re: [DISCUSS] Additional language implementations for Iceberg Puffin reader/writer

2024-08-29 Thread Gang Wu
Hi, It won't be an issue if there is already an iceberg-cpp implementation. However, it is unfortunate to see duplicate efforts from different query engines to implement their own C++ Iceberg reader and writers. Is it a good chance to add official C++ implementation by providing a puffin reader/wr

Re: [DISCUSS] Variant Spec Location

2024-08-23 Thread Gang Wu
>>> >>> This could be developed separately and then be represented in Arrow >>> using an extension type (perhaps a canonical one as in >>> https://arrow.apache.org/docs/dev/format/CanonicalExtensions.html). >>> >>> What do other Arrow developers

Re: [DISCUSS] Variant Spec Location

2024-08-22 Thread Gang Wu
gt;> >> Hi Gang, >> >> Sorry, but can you give a pointer to the start of this discussion thread >> in a readable format (for example a mailing-list archive)? It appears >> that dev@arrow wasn't cc'ed from the start and that can make it >> difficult to und

Re: [DISCUSS] Variant Spec Location

2024-08-21 Thread Gang Wu
usion > > extension that operates on this [1], and already have some ideas on how > > such an extension type might be defined. I'm not yet caught up on the > > shredded specification, but I think having just the binary format would > be > > beneficial for in-memory an

Re: Type promotion in v3

2024-08-19 Thread Gang Wu
Hi Micah, If we go with the approach that type promotion results in a change in the field-id, what happens when a certain field has been changed multiple times? Does it mean that we end up with tracking the lineage of field change history? Thanks, Gang On Tue, Aug 20, 2024 at 7:34 AM Micah Kornf

Re: [DISCUSS] Variant Spec Location

2024-08-15 Thread Gang Wu
. > > It is worth noting that we also need to standardize many functions > related to it. > > A neutral place to maintain it is a great choice. > > - As Gang Wu said, a standalone project is good, just like RoaringBitmap > [1]. > - As Ryan said, Parquet community is a ne

Re: [DISCUSS] Variant Spec Location

2024-08-15 Thread Gang Wu
different and I don't think this should >> block forking the spec, but we should make sure that the decision is >> publicly documented within both communities. >> >> Thanks, >> Micah >> >> On Thu, Aug 15, 2024 at 7:47 AM Russell Spitzer < >> russel

Re: [DISCUSS] Variant Spec Location

2024-08-14 Thread Gang Wu
Sorry for chiming in late. >From the discussion in https://lists.apache.org/thread/xcyytoypgplfr74klg1z2rgjo6k5b0sq, I don't quite understand why it is logistically complicated to create a sub-project to hold the variant spec and impl. IMHO, coping the variant type spec into Apache Iceberg has so

Re: [DISCUSS] Implementing a table-level statistics file to store column statistics

2024-08-06 Thread Gang Wu
Just give my two cents. Not all tables have partition definition and table-level stats would benefit these tables. In addition, NDV might not be easily populated from partition-level statistics. Thanks, Gang On Tue, Aug 6, 2024 at 9:48 PM Xianjin YE wrote: > Thanks for raising the discussion Hu

Re: [ANNOUNCE] Welcoming new committers and PMC members

2024-07-23 Thread Gang Wu
Congrats! On Tue, Jul 23, 2024 at 10:17 PM Russell Spitzer wrote: > "so many" :) > > On Tue, Jul 23, 2024 at 9:14 AM Russell Spitzer > wrote: > >> This is truly an exciting day. To have to many qualified folks being >> recognized by the Iceberg project fills me with pride. I can't wait to see >

Re: [Discuss] Geospatial Support

2024-06-05 Thread Gang Wu
> The min/max stats are discussed in the doc (Phase 2), depending on the non-trivial encoding. Just want to add that min/max stats filtering could be supported by file format natively. Adding geometry type to parquet spec is under discussion: https://github.com/apache/parquet-format/pull/240 Best

Re: [Early Feedback] Variant and Subcolumnarization Support

2024-05-14 Thread Gang Wu
> We may need some guidance on just how many we need to look at; > we were planning on Spark and Trino, but weren't sure how much > further down the rabbit hole we needed to go。 There are some engines living outside the Java world. It would be good if the proposal could cover the effort it takes t

Re: [Early Feedback] Variant and Subcolumnarization Support

2024-05-10 Thread Gang Wu
Hi, This sounds very interesting! IIUC, the current variant type in the Apache Spark stores data in the BINARY type. When it comes to subcolumnarization, does it require the file format (e.g. Apache Parquet/ORC/Avro) to support variant type natively? Best, Gang On Sat, May 11, 2024 at 1:07 PM T