Re: [Early Feedback] Variant and Subcolumnarization Support

2024-05-10 Thread Gang Wu
Hi, This sounds very interesting! IIUC, the current variant type in the Apache Spark stores data in the BINARY type. When it comes to subcolumnarization, does it require the file format (e.g. Apache Parquet/ORC/Avro) to support variant type natively? Best, Gang On Sat, May 11, 2024 at 1:07 PM T

Re: [Early Feedback] Variant and Subcolumnarization Support

2024-05-14 Thread Gang Wu
> We may need some guidance on just how many we need to look at; > we were planning on Spark and Trino, but weren't sure how much > further down the rabbit hole we needed to go。 There are some engines living outside the Java world. It would be good if the proposal could cover the effort it takes t

Re: [Discuss] Geospatial Support

2024-06-05 Thread Gang Wu
> The min/max stats are discussed in the doc (Phase 2), depending on the non-trivial encoding. Just want to add that min/max stats filtering could be supported by file format natively. Adding geometry type to parquet spec is under discussion: https://github.com/apache/parquet-format/pull/240 Best

Re: [ANNOUNCE] Welcoming new committers and PMC members

2024-07-23 Thread Gang Wu
Congrats! On Tue, Jul 23, 2024 at 10:17 PM Russell Spitzer wrote: > "so many" :) > > On Tue, Jul 23, 2024 at 9:14 AM Russell Spitzer > wrote: > >> This is truly an exciting day. To have to many qualified folks being >> recognized by the Iceberg project fills me with pride. I can't wait to see >

Re: [DISCUSS] Implementing a table-level statistics file to store column statistics

2024-08-06 Thread Gang Wu
Just give my two cents. Not all tables have partition definition and table-level stats would benefit these tables. In addition, NDV might not be easily populated from partition-level statistics. Thanks, Gang On Tue, Aug 6, 2024 at 9:48 PM Xianjin YE wrote: > Thanks for raising the discussion Hu

Re: [DISCUSS] Variant Spec Location

2024-08-14 Thread Gang Wu
Sorry for chiming in late. >From the discussion in https://lists.apache.org/thread/xcyytoypgplfr74klg1z2rgjo6k5b0sq, I don't quite understand why it is logistically complicated to create a sub-project to hold the variant spec and impl. IMHO, coping the variant type spec into Apache Iceberg has so

Re: [DISCUSS] Variant Spec Location

2024-08-15 Thread Gang Wu
different and I don't think this should >> block forking the spec, but we should make sure that the decision is >> publicly documented within both communities. >> >> Thanks, >> Micah >> >> On Thu, Aug 15, 2024 at 7:47 AM Russell Spitzer < >> russel

Re: [DISCUSS] Variant Spec Location

2024-08-15 Thread Gang Wu
. > > It is worth noting that we also need to standardize many functions > related to it. > > A neutral place to maintain it is a great choice. > > - As Gang Wu said, a standalone project is good, just like RoaringBitmap > [1]. > - As Ryan said, Parquet community is a ne

Re: Type promotion in v3

2024-08-19 Thread Gang Wu
Hi Micah, If we go with the approach that type promotion results in a change in the field-id, what happens when a certain field has been changed multiple times? Does it mean that we end up with tracking the lineage of field change history? Thanks, Gang On Tue, Aug 20, 2024 at 7:34 AM Micah Kornf

Re: [DISCUSS] Variant Spec Location

2024-08-21 Thread Gang Wu
usion > > extension that operates on this [1], and already have some ideas on how > > such an extension type might be defined. I'm not yet caught up on the > > shredded specification, but I think having just the binary format would > be > > beneficial for in-memory an

Re: [DISCUSS] Variant Spec Location

2024-08-22 Thread Gang Wu
gt;> >> Hi Gang, >> >> Sorry, but can you give a pointer to the start of this discussion thread >> in a readable format (for example a mailing-list archive)? It appears >> that dev@arrow wasn't cc'ed from the start and that can make it >> difficult to und

Re: [DISCUSS] Variant Spec Location

2024-08-23 Thread Gang Wu
>>> >>> This could be developed separately and then be represented in Arrow >>> using an extension type (perhaps a canonical one as in >>> https://arrow.apache.org/docs/dev/format/CanonicalExtensions.html). >>> >>> What do other Arrow developers

Re: [DISCUSS] Additional language implementations for Iceberg Puffin reader/writer

2024-08-29 Thread Gang Wu
Hi, It won't be an issue if there is already an iceberg-cpp implementation. However, it is unfortunate to see duplicate efforts from different query engines to implement their own C++ Iceberg reader and writers. Is it a good chance to add official C++ implementation by providing a puffin reader/wr

Re: [VOTE] Deletion Vectors in V3

2024-10-29 Thread Gang Wu
+1 (non-binding) Best, Gang On Wed, Oct 30, 2024 at 5:46 AM Anton Okolnychyi wrote: > Hi folks, > > We have been discussing the new layout for position deletes in V3 for a > while now. It seems the community reached consensus. I'd like to start a > vote on adding deletion vectors to the V3 spec

Re: [DISCUSS] - Deprecate Equality Deletes

2024-10-30 Thread Gang Wu
Thanks Russell for bringing this up! +1 on deprecating equality deletes. IMHO, this is something that should reside only in the ingestion engine. Best, Gang On Thu, Oct 31, 2024 at 5:07 AM Russell Spitzer wrote: > Background: > > 1) Position Deletes > > > Writers determine what rows are delet

Re: [DISCUSS] Additional language implementations for Iceberg Puffin reader/writer

2024-11-22 Thread Gang Wu
d that a part of the community is interested in having a C++ > implementation of the Iceberg lib in general for their C++ engine. cc @Gang > Wu > > There seemed to be general support from the community to start up such a > sub-project, so I'm reaching out now to ask for some gu

Re: [DISCUSS] Additional language implementations for Iceberg Puffin reader/writer

2024-11-22 Thread Gang Wu
from > the Impala community we could add some additional auxiliary functionality > for the V3 positional deletes later on. > > 2) I learned that a part of the community is interested in having a C++ > implementation of the Iceberg lib in general for their C++ engine. cc @Gang &g

Re: Welcome Huaxin Gao as a committer!

2025-02-06 Thread Gang Wu
Congrats Huaxin! Best, Gang On Thu, Feb 6, 2025 at 5:10 PM Szehon Ho wrote: > Hi everyone, > > The Project Management Committee (PMC) for Apache Iceberg has > invited Huaxin Gao to become a committer, and I am happy to announce that > she has accepted. Huaxin has done a lot of impressive work

Re: [VOTE] Add Geometry and Geography types for V3

2025-02-06 Thread Gang Wu
+1 (non-binding) On Fri, Feb 7, 2025 at 8:20 AM Daniel Weeks wrote: > +1 > > On Thu, Feb 6, 2025, 4:02 PM Russell Spitzer > wrote: > >> +1 >> >> On Fri, Feb 7, 2025 at 12:57 AM Denny Lee wrote: >> >>> +1 (non-binding) - super exciting! >>> >>> On Thu, Feb 6, 2025 at 3:52 PM rdb...@gmail.com >

Re: New committer: Matt Topol

2024-12-10 Thread Gang Wu
Congrats Matt! On Tue, Dec 10, 2024 at 8:57 PM Sung Yun wrote: > Congratulations Matt! > > On 2024/12/10 12:49:25 Alex Dutra wrote: > > Congratulations, Matt! Go!! > > > > On Tue, Dec 10, 2024 at 1:08 PM Péter Váry > > wrote: > > > > > Congratulations Matt! > > > > > > On Tue, Dec 10, 2024, 12:

Re: [DISCUSS] December board report

2024-12-11 Thread Gang Wu
For C++, I think it is aimed for a full featured C++ library (not for puffin implementation only). On Thu, Dec 12, 2024 at 6:14 AM rdb...@gmail.com wrote: > I'll update it. Thanks! > > (By the way, the Avro default value support was in the Java section) > > On Wed, Dec 11, 2024 at 2:00 PM Matt T

Re: [DISCUSS] Relocate Parquet to Iceberg Core

2024-12-18 Thread Gang Wu
IIUC, iceberg-parquet depends on iceberg-arrow for the vectored reader implementation (though partially supported). Should we relocate iceberg-arrow together? Since I have mentioned that the vectored reader implementation is partially supported, is it a direction that needs to be improved? There i

Re: [DISCUSS] Additional language implementations for Iceberg Puffin reader/writer

2024-11-22 Thread Gang Wu
umplido wrote: > >> This sounds awesome. I am looking forward to the slack channel being >> available so I can also help! >> >> El vie, 22 nov 2024 a las 10:03, Gang Wu () escribió: >> > >> > Thanks for the support, Fokko and JB! >> > >> &g

Re: [discuss] Standardizing Naming Schemes for Language-Specific Configurations

2025-01-23 Thread Gang Wu
Generally it makes sense to define separate language-specific configurations. I think we need to think about the following items: 1. Is it python-specific to add the prefix? Should Rust/Go be -rs/-go as the convention? 2. Which part of the spec is the best place to describe this? It seems that we

Re: [Discuss][Vote] Spec Change - Add optional field added-rows to Snapshot for Row Lineage

2025-01-15 Thread Gang Wu
+1 (non-binding) On Thu, Jan 16, 2025 at 2:30 PM Péter Váry wrote: > +1 > > Steven Wu ezt írta (időpont: 2025. jan. 16., Cs, > 0:46): > >> +1 >> >> On Wed, Jan 15, 2025 at 9:00 AM Russell Spitzer < >> russell.spit...@gmail.com> wrote: >> >>> Hi Everyone! >>> >>> PR: https://github.com/apache/ic

Re: [VOTE] Simplify multi-arg table metadata

2025-02-09 Thread Gang Wu
+1 (non-binding) (she says hi to your cat!) Best, Gang On Sun, Feb 9, 2025 at 5:02 PM Fokko Driesprong wrote: > (Second attempt, the cat ran over the keyboard) > > Hey everyone, > > After the positiv

Re: [VOTE] Add RemoveSchemas update type to REST spec

2025-02-11 Thread Gang Wu
+1 (non-binding) On Wed, Feb 12, 2025 at 6:17 AM Amogh Jahagirdar <2am...@gmail.com> wrote: > +1 thanks for driving this Gabor! > > On Wed, Feb 12, 2025 at 2:35 AM rdb...@gmail.com wrote: > >> +1 >> >> On Tue, Feb 11, 2025 at 10:50 AM Steve Zhang >> wrote: >> >>> +1 nb >>> >>> Thanks, >>> Steve

Re: [Discussion] Spec change for Row Lineage - Allow Equality Deletes

2025-02-11 Thread Gang Wu
eletes. Instead, this spec change now says > the updated row is a complete new row with new row_id. > > On Tue, Feb 11, 2025 at 7:39 PM Gang Wu wrote: > >> Hi Russell, >> >> Thanks for supporting equality deletes to row lineage! >> >> > accept that "

Re: [Discussion] Spec change for Row Lineage - Allow Equality Deletes

2025-02-11 Thread Gang Wu
Hi Russell, Thanks for supporting equality deletes to row lineage! > accept that "updates" will be treated as "delete" and "insert" I would say that it has obvious drawbacks below (though it is better than not supported): 1) updates will be populated differently when outputting changelogs to use

Re: [VOTE] Allow Row-Lineage with Equality Deletes

2025-02-19 Thread Gang Wu
+1 (non-binding) On Thu, Feb 20, 2025 at 7:12 AM Steven Wu wrote: > +1 > > On Wed, Feb 19, 2025 at 2:15 PM Russell Spitzer > wrote: > >> The PR: https://github.com/apache/iceberg/pull/12230 is basically ready >> now. So let's do a last vote to make sure everyone is aware of the upcoming >> cha

Re: [DISCUSS] Introduce C FFI for iceberg rust

2025-02-17 Thread Gang Wu
Thanks Xuanwo! Looking forward to the possibility of iceberg-cpp integration with the C FFI! Best, Gang On Tue, Feb 18, 2025 at 3:21 PM Renjie Liu wrote: > Hi: > > Thanks Xuanwo for raising this. > > As xuanwo mentioned, rust implementation + c binding will provide a good > foundation for cros

Re: Clarification on sorting floating-point numbers

2025-02-27 Thread Gang Wu
FYI: there was an effort from Jan (cc'd) to introduce a total order for floating-point numbers on the Parquet side: [1][2]. [1] https://github.com/apache/parquet-format/pull/221 [2] https://github.com/apache/parquet-format/pull/196 On Thu, Feb 27, 2025 at 4:24 AM Devin Smith wrote: > The spec h

Re: [VOTE] Minor simplifications for Geo Spec

2025-03-18 Thread Gang Wu
Makes sense. +1 (non-binding) On Wed, Mar 19, 2025 at 8:07 AM Jia Yu wrote: > +1 (non-binding) > > Thank you! > > On 2025/03/19 00:01:00 Szehon Ho wrote: > > Hi everyone, > > > > While working on the reference implementation for Geometry/Geography > spec, > > we noticed some parts that can be s