Re: [QUESTION][Parquet][Decimal] Why not implement the INT32/INT64 to store Decimal logical type in parquet file

2023-01-06 Thread Gang Wu
I have created an issue and will work on it: [C++][Parquet] Parquet writer supports writing int32/int64 for decimal type · Issue #15239 · apache/arrow (github.com) Best, Gang On Sat, Jan 7, 2023 at 1:39 AM Micah Kornfield wrote: > > > > Hi Kun, >

Re: [C++] Parquet and Arrow overlap

2023-02-01 Thread Gang Wu
Hi Will, AFAIK, the Apache Parquet community no longer considers contribution to parquet-cpp when promoting new committers after the donation to Apache Arrow. It would be a dilemma for the parquet-cpp contributors if none of the Apache Arrow community or Apache Parquet community recognizes their

Re: [ANNOUNCE] New Arrow PMC member: Will Jones

2023-03-13 Thread Gang Wu
Congrats, Will! Best, Gang On Tue, Mar 14, 2023 at 9:21 AM Junming Chen wrote: > Congrats, Will!😄 > > From: David Li > Sent: Tuesday, March 14, 2023 5:16 AM > To: dev@arrow.apache.org > Subject: Re: [ANNOUNCE] New Arrow PMC member: Will Jones > > Congrats, Wil

Re: Proposal: add a bot to close PRs that haven't been updated in 30 days

2023-03-31 Thread Gang Wu
>From a contributor perspective, it would be great if a bot could detect a PR is waiting for review for a certain period of time and then automatically notify reviewers if possible. On Sat, Apr 1, 2023 at 12:21 AM Joris Van den Bossche < jorisvandenboss...@gmail.com> wrote: > On Fri, 31 Mar 202

Re: Arrow community meeting April 12 at 16:00 UTC

2023-04-15 Thread Gang Wu
AFAIK, the Parquet PMC no longer governs parquet-cpp in the practice. We should probably raise the issue to the priv...@parquet.apache.org for a formal discussion. Best, Gang On Sat, Apr 15, 2023 at 7:52 PM Andrew Lamb wrote: > > Rust Parquet was donated directly to the Arrow project and > dev

Re: [DISCUSS][C++][Parquet] Expose the API to customize the compression parameter

2023-04-23 Thread Gang Wu
It is a good idea to extend the Codec factory to offer more options. However, I don't think adding a `window_bits` parameter as `compression_level` is a good approach as it does not apply to some codecs. IMO, the proposed new `Codec::Options` can be as simple as a std::map. To avoid misuse, we nee

Re: [Format] Is it legal to have a struct array with a shorter length than its children?

2023-05-05 Thread Gang Wu
IMHO, this is valid. As you have demonstrated in the example, a sliced struct array will result in a length shorter than its child arrays. This kind of flexibility can make it easy to reuse child arrays within the struct array. > Struct: a nested layout consisting of a collection of named child fi

Re: Arrow community meeting May 10 at 16:00 UTC

2023-05-10 Thread Gang Wu
quet C++ issues to be tagged properly in Jira > - These issues are: PARQUET-2201, PARQUET-2225, PARQUET-2232, > PARQUET-2250 > - We need to: > - Tag them with fix version cpp-12.0.0 > - Mark the cpp-12.0.0 version as closed > - Create a new cpp-13.0.0 version >

Re: [ANNOUNCE] New Arrow committer: Gang Wu

2023-05-16 Thread Gang Wu
ang! > > > > > > > > > > > > On Mon, May 15, 2023, 9:57 AM Ian Cook > > > wrote: > > > > > > > > > > > > > Congratulations Gang! > > > > > > > > > > > > > > On Mon, May 15, 2

Re: [DISCUSS] Acero's ScanNode and Row Indexing across Scans

2023-06-01 Thread Gang Wu
IMO, the adding a row_index column from the reader is orthogonal to the mask implementation. Table formats (e.g. Apache Iceberg and Delta) require the knowledge of row index to finalize row deletion. It would be trivial to natively support row index from the file reader. Best, Gang On Fri, Jun 2,

Re: [DISCUSS][Format] Draft implementation of string view array format

2023-06-15 Thread Gang Wu
Hi Ben, The posted benchmark [1] looks pretty good to me. However, I want to raise a possible issue from the perspective of parquet-cpp. Parquet-cpp uses a customized parquet::ByteArray type [2] for string/binary, I would expect some regression of conversions between parquet reader/writer and the

Re: [Parquet C++] Plan to bump default write version from 2.4 -> 2.6 (include nanoseconds LogicalType)

2023-06-15 Thread Gang Wu
+ dev@parquet On Fri, Jun 16, 2023 at 7:43 AM Jacob Wujciak-Jens wrote: > +1 on the update but also on properly communicating the change to avoid > surprising issues :) > > On Thu, Jun 15, 2023 at 7:53 PM Joris Van den Bossche < > jorisvandenboss...@gmail.com> wrote: > > > On Thu, 15 Jun 2023 at

Re: [VOTE][Format] Add Utf8View Arrays to Arrow Format

2023-06-29 Thread Gang Wu
+1 (non-binding) Thanks, Gang On Thu, Jun 29, 2023 at 3:35 AM Benjamin Kietzman wrote: > Hello, > > I'd like to propose adding Utf8View arrays to the arrow format. > Previous discussion in [1], columnar format description in [2], > flatbuffers changes in [3]. > > There are implementations avail

Re: [C++][Parquet] Handling empty files while reading Parquet files using C++

2023-07-03 Thread Gang Wu
Hi Luca, It seems to me that the problem comes from node->type_length(). It should be 0 instead of 64. Could you please check the value of column_index_ in the CheckColumn() before it throws? If you need further assistance, please create an issue on Github and it would be good to provide a file to

Re: [VOTE] Release Apache Arrow 13.0.0 - RC0

2023-07-24 Thread Gang Wu
Hi, Sorry to reply without a vote. I tried to run the verify-release-candidate.sh script on my Mac M1 but it took me forever to fix various environment issues. Is it better to verify this in a pure docker environment instead? Thanks, Gang On Tue, Jul 25, 2023 at 12:30 PM Yibo Cai wrote: > +1.

Re: [DISCUSS] Canonical alternative layout proposal

2023-07-30 Thread Gang Wu
I am also in favor of the idea of an alternative layout. IIRC, a new alternative layout still goes into a process of standardization though it is the choice of each implementation to decide support now or later. I'd like to ask if we can provide the flexibility for implementations or downstream pro

Re: [C++] Potential cache/memory leak when reading parquet

2023-09-06 Thread Gang Wu
Hi Jin, Do you have more information about the parquet file? What came to my mind is this issue: https://github.com/apache/arrow/issues/35393 If you have observed something, please feel free to create a new issue and post what you have found there. Thanks, Gang On Wed, Sep 6, 2023 at 11:56 PM Li

Re: [C++] Potential cache/memory leak when reading parquet

2023-09-06 Thread Gang Wu
As suggested from other comments, I also highly recommend using a heap profiling tool to investigate what's going on there. BTW, 800 columns look suspicious to me. Could you try to test them without reading any batch? Not sure if the file metadata is the root cause. Or you may want to try another

Re: [Java][Discuss]: consensus for JDK 8 deprecation

2023-09-14 Thread Gang Wu
Thanks for bringing this up! I have two concerns of dropping Java 8 support: - As a low level library, users have to add specific flags [1] to use Java 9 and up with Arrow to resolve issues with java.nio. This has been annoying for our customers constantly. If this is not resolved, I would say

Re: [DISCUSS][C++] Raw pointer string views

2023-09-26 Thread Gang Wu
Could you please simply describe the layout of DuckDB and Velox so we can know what kind of conversion is required from the raw pointer variant? If any engine simply represents string array in the form of something like std::vector, should we provide a similar variant in C++ to minimize the convers

Re: [Java][Discuss]: consensus for JDK 8 deprecation

2023-10-10 Thread Gang Wu
tch releases if necessary for JDK 8 users > * There is an open question to decide if JDK 11 should be dropped > simultaneously > > Gang Wu, I'm curious what are your thoughts given your initial concerns? > > -Dane > > On Sat, Oct 7, 2023 at 12:00 AM Jacob Wujciak-

Re: [ANNOUNCE] New Arrow committer: Curt Hagenlocher

2023-10-15 Thread Gang Wu
Congrats! On Sun, Oct 15, 2023 at 10:49 PM David Li wrote: > Congrats & welcome Curt! > > On Sun, Oct 15, 2023, at 09:03, wish maple wrote: > > Congratulations! > > > > Raúl Cumplido 于2023年10月15日周日 20:48写道: > > > >> Congratulations and welcome! > >> > >> El dom, 15 oct 2023, 13:57, Ian Cook es

Re: [ANNOUNCE] New Arrow committer: Xuwei Fu

2023-10-22 Thread Gang Wu
Congrats Xuwei! Best, Gang On Mon, Oct 23, 2023 at 12:56 PM Sutou Kouhei wrote: > On behalf of the Arrow PMC, I'm happy to announce that Xuwei Fu > has accepted an invitation to become a committer on Apache > Arrow. Welcome, and thank you for your contributions! > > -- > kou >

Re: [ANNOUNCE] New Arrow PMC member: Raúl Cumplido

2023-11-13 Thread Gang Wu
Congratulations! Best, Gang On Tue, Nov 14, 2023 at 7:31 AM Jonathan Keane wrote: > Congratulations and welcome! > > -Jon >

Re: [ANNOUNCE] New Arrow PMC chair: Andy Grove

2023-11-27 Thread Gang Wu
Congrats Andy! Thanks Andrew for the past year as well. Best, Gang On Mon, Nov 27, 2023 at 10:59 PM Matt Topol wrote: > Congrats Andy! > > On Mon, Nov 27, 2023 at 9:44 AM Gavin Ray wrote: > > > Yay, congrats Andy! Well-deserved! > > > > On Mon, Nov 27, 2023 at 9:13 AM Kevin Gurney > > > > >

Re: [ANNOUNCE] New Arrow committer: Felipe Oliveira Carvalho

2023-12-07 Thread Gang Wu
Congrats! On Fri, Dec 8, 2023 at 8:37 AM Dewey Dunnington wrote: > Congrats! > > On Thu, Dec 7, 2023 at 4:28 PM Andrew Lamb wrote: > > > > Congratulations! > > > > On Thu, Dec 7, 2023 at 3:09 PM Kevin Gurney > > > wrote: > > > > > Congratulations, Felipe! > > >

Re: [VOTE] Accept donation of Comet Spark native engine

2024-01-27 Thread Gang Wu
+1 (non-binding) On Sun, Jan 28, 2024 at 10:25 AM James Duong wrote: > +1 (non-binding) > > Get Outlook for Android > > From: Matt Topol > Sent: Saturday, January 27, 2024 3:22:01 PM > To: dev@arrow.apache.org > Subject: Re: [VOTE] Acce

Re: [ANNOUNCE] New Arrow committer: Bryce Mecum

2024-03-17 Thread Gang Wu
Congrats Bryce! Best, Gang On Mon, Mar 18, 2024 at 10:44 AM wish maple wrote: > Congrats! > > Best, > Xuwei Fu > > Nic Crane 于2024年3月18日周一 10:24写道: > > > On behalf of the Arrow PMC, I'm happy to announce that Bryce Mecum has > > accepted an invitation to become a committer on Apache Arrow. Wel

Re: [C++][Parquet] Support different compression algorithms per row group

2024-03-20 Thread Gang Wu
Hi Andrei, What is your use case? IMHO, exposing this kind of configuration will force users to know how will the writer split row groups, which does not look simple to me. Best, Gang On Thu, Mar 21, 2024 at 2:25 AM Andrei Lazăr wrote: > Hi all, > > I would like proposing adding support for wr

Re: [C++][Parquet] Support different compression algorithms per row group

2024-03-25 Thread Gang Wu
ce of the same compression algorithm over different row > > groups in my Parquet files. Therefore, I was thinking that the best > > compression configuration for my data would be to use a different > algorithm > > for every column, for every row group in my files. In a real-world

Re: [ANNOUNCE] New Arrow committer: Sarah Gilmore

2024-04-12 Thread Gang Wu
Congrats! On Fri, Apr 12, 2024 at 9:11 PM Patrick Horan wrote: > Congratulations! > > On Thu, Apr 11, 2024, at 11:10 AM, Raúl Cumplido wrote: > > Congratulations Sarah! > > > > El jue, 11 abr 2024 a las 13:13, Sutou Kouhei () > escribió: > > > > > > Hi, > > > > > > On behalf of the Arrow PMC, I'

Re: [VOTE] Release Apache Arrow 16.0.0 - RC0

2024-04-17 Thread Gang Wu
+1 (non-binding) Successfully verified C++ on macOS 12.5.1 with AppleClang 13.1.6.13160021 by running `TEST_DEFAULT=0 TEST_CPP=1 ./verify-release-candidate.sh 16.0.0 0` Best, Gang On Thu, Apr 18, 2024 at 3:20 AM Rok Mihevc wrote: > +1 > > I've successfully verified sources on Ubuntu 22.04: >

Re: Fwd: PyArrow Using Parquet V2

2024-04-24 Thread Gang Wu
Spark leverages parquet writer from parquet-mr, which hard-codes the format version to 1 [1] even when v2 features are enabled. That's why I said in dev@parquet that we cannot really tell if a parquet file is v1 or v2 simply from the format version field. [1] https://github.com/apache/parquet-mr/b

Re: Fwd: PyArrow Using Parquet V2

2024-04-24 Thread Gang Wu
write through V2 encoding as per Parquet community > V2 is not final yet*. > > Do we have any date when the parquet-mr jar will have Parquet V2 writing > functionality so that Spark can adhere to it. > > *or if i will add this "hadoopConfiguration.set(“parquet.writer.versio

Re: [ANNOUNCE] New Arrow committer: Dane Pitkin

2024-05-07 Thread Gang Wu
Congratulations Dane! Best, Gang On Tue, May 7, 2024 at 10:12 PM Ian Cook wrote: > Congratulations Dane! > > On Tue, May 7, 2024 at 10:10 AM Alenka Frim .invalid> > wrote: > > > Yay, congratulations Dane!! > > > > On Tue, May 7, 2024 at 4:00 PM Rok Mihevc wrote: > > > > > Congrats Dane! > > >

Re: [VOTE] Release Apache Arrow 16.1.0 - RC1

2024-05-09 Thread Gang Wu
+1 (non-binding) > TEST_DEFAULT=0 TEST_CPP=1 ./verify-release-candidate.sh 16.1.0 1 Release candidate 16.1.0-RC1 looks good! Best, Gang On Thu, May 9, 2024 at 9:34 PM Ruoxi Sun wrote: > +1 (non-binding) > > On my M1 Mac, OS version Sonoma 14.4.1 (23E224), compiler Apple clang > version 15.0.

Re: Fwd: [C++] Parquet and Arrow overlap

2024-05-10 Thread Gang Wu
gt; Thanks, > Jacob > > Arrow committer > > On 2024/04/25 05:31:18 Gang Wu wrote: > > I know we have some non-Java committers and PMCs. But after the > parquet-cpp > > donation, it seems that no one worked on Parquet from arrow (cpp, rust, > go, > > etc.) >

Re: Fwd: [C++] Parquet and Arrow overlap

2024-05-12 Thread Gang Wu
> > > > > > > Thank you, that sounds great! On first glance some seem to be rather > > old > > > > and probably don't apply anymore. > > > > > > > > > BTW, do we really need to make a full copy of them to have a mirror > >

Re: [DISCUSS] Drop Java 8 support

2024-05-26 Thread Gang Wu
Hi, IMHO, Apache Parquet Java [1] cannot drop Java 8 in all 1.x releases to keep maximum backward compatibility. There was a discussion on the 2.x major release [2] and v3 format [3]. I think it is a good chance to drop Java 8 from the 2.x release. [1] https://github.com/apache/parquet-java [2] h

Re: [DISCUSS] Migration of parquet-cpp issues to GitHub

2024-05-28 Thread Gang Wu
+1 on this. IIUC, I didn't see any objection to this in the discussion [1]. Perhaps we can directly proceed to a vote? Sorry that I was intended to initialize the vote but got distracted by other stuff. [1] https://lists.apache.org/thread/jf9wos3t6xxk6xdyx2dof1jlkbpkr56p Best, Gang On Wed, May

Re: [VOTE] Migration of parquet-cpp issues to Arrow's issue tracker

2024-05-29 Thread Gang Wu
+1 (binding for Parquet) Thanks! Gang On Wed, May 29, 2024 at 10:47 PM Fokko Driesprong wrote: > +1 (non-binding) > > Op wo 29 mei 2024 om 16:46 schreef Felipe Oliveira Carvalho < > felipe...@gmail.com>: > > > +1 (non-binding) > > > > On Wed, 29 May 2024 at 11:30 Micah Kornfield > > wrote: > >

Re: [DISCUSS] Migration of parquet-cpp issues to GitHub

2024-05-29 Thread Gang Wu
Just want to mention that these apache/parquet-* Github repositories have not yet enabled issues and INFRA tickets are required before migration. Best, Gang On Thu, May 30, 2024 at 1:55 AM Micah Kornfield wrote: > SGTM +1 > > On Wed, May 29, 2024 at 10:50 AM Rok Mihevc wrote: > > > On Wed, May

Re: [DISCUSS] Migration of parquet-cpp issues to GitHub

2024-05-30 Thread Gang Wu
che/arrow-nanoarrow/blob/81711045e8bb4ded1cb3b5a6fa354b35f18aa4e7/.asf.yaml#L24-L25 > > On Wed, May 29, 2024 at 10:39 PM Gang Wu wrote: > > > > Just want to mention that these apache/parquet-* Github repositories > > have not yet enabled issues and INFRA tickets are required before > > mi

Re: [python][parquet] enable_store_decimal_as_integer

2024-06-11 Thread Gang Wu
Hi Brian, I agree that it is pretty straightforward to implement it in the way you described. Feel free to open a PR when you are ready. Thanks! Gang On Tue, Jun 11, 2024 at 7:53 AM Brian Kiefer wrote: > Hello, > > I am interested in exposing the `enable_store_decimal_as_integer` parquet > opt

Re: [DISCUSS] Migration of parquet-cpp issues to GitHub

2024-06-12 Thread Gang Wu
t; On Fri, May 31, 2024 at 10:04 AM Rok Mihevc wrote: > > > Would we also want to add issue templates to encourage some structure? > See > > [1] for inspiration. > > > > [1] https://github.com/apache/arrow/blob/main/.github/ISSUE_TEMPLATE > > > > On Fri, May 31,

Re: [DISCUSS] Migration of parquet-cpp issues to GitHub

2024-06-12 Thread Gang Wu
non-parquet-cpp repos before the > action. > > Agreed. Did we discuss this enough to call for a vote yet? > > On Wed, Jun 12, 2024 at 5:23 PM Gang Wu wrote: > > > Thanks Rok for the update! > > > > Yes, the copied issues look good to me. Perhaps we need a separate

Re: [VOTE] Release Apache Arrow 17.0.0 - RC2

2024-07-14 Thread Gang Wu
+1 (non-binding) Verified C++ on my M1 Mac by running: - TEST_DEFAULT=0 TEST_CPP=1 ./verify-release-candidate.sh 17.0.0 2 BTW, I ran into this issue as well: https://github.com/apache/arrow/issues/43167 Best, Gang On Mon, Jul 15, 2024 at 1:39 PM Jean-Baptiste Onofré wrote: > +1 (non binding)

Re: [VOTE][Format] Opaque canonical extension type

2024-07-24 Thread Gang Wu
+1 (non-binding) Checked spec change and C++ impl. On Wed, Jul 24, 2024 at 6:52 PM Joel Lubinitsky wrote: > +1 (non-binding) > > Go implementation LGTM > > On Wed, Jul 24, 2024 at 5:12 AM Raúl Cumplido wrote: > > > +1 (binding) > > > > Format change looks good to me. I haven't reviewed the ind

Re: [VOTE][Format] Bool8 Canonical Extension Type

2024-08-06 Thread Gang Wu
+1 (non-binding) Looked through the spec and C++ impl. Best, Gang On Tue, Aug 6, 2024 at 11:55 AM wish maple wrote: > +1 (non-binding) > > Best, > Xuwei Fu > > David Li 于2024年8月6日周二 10:20写道: > > > +1 (binding) > > > > On Tue, Aug 6, 2024, at 10:17, Sutou Kouhei wrote: > > > +1 (binding) > > >

Re: [DISCUSS] Variant Spec Location

2024-08-15 Thread Gang Wu
. > > It is worth noting that we also need to standardize many functions > related to it. > > A neutral place to maintain it is a great choice. > > - As Gang Wu said, a standalone project is good, just like RoaringBitmap > [1]. > - As Ryan said, Parquet community is a ne

Re: [DISCUSS] Variant Spec Location

2024-08-21 Thread Gang Wu
usion > > extension that operates on this [1], and already have some ideas on how > > such an extension type might be defined. I'm not yet caught up on the > > shredded specification, but I think having just the binary format would > be > > beneficial for in-memory an

Re: [DISCUSS] Variant Spec Location

2024-08-22 Thread Gang Wu
hread > in a readable format (for example a mailing-list archive)? It appears > that dev@arrow wasn't cc'ed from the start and that can make it > difficult to understand what this is about. > > Regards > > Antoine. > > > Le 22/08/2024 à 08:32, Gang Wu a écrit : &

Re: [DISCUSS][C++] Indent #if (preprocessor directives)

2024-08-28 Thread Gang Wu
I believe this is already done by clang-format [1] [1] https://github.com/apache/arrow/pull/43798/files#diff-1026e0038b722990204a42bed8a6f7c0ec2302aa79e3fad1959d62ba968edfa2 Best, Gang On Wed, Aug 28, 2024 at 4:35 PM Antoine Pitrou wrote: > > Is there a way to ensure this is done automatically

Re: [VOTE] Release Apache Arrow 18.0.0 - RC0

2024-10-18 Thread Gang Wu
+1 (non-binding) Ran following command on my Mac M1 with AppleClang 13.1.6.13160021: > TEST_DEFAULT=0 TEST_CPP=1 ./verify-release-candidate.sh 18.0.0 0 Best, Gang On Fri, Oct 18, 2024 at 7:23 PM Ruoxi Sun wrote: > +1 (non-binding) > > On my M1 Mac, OS version Sonoma 14.5 (23F79), AppleClang 1

Re: [ANNOUNCE] New Arrow committer: Rossi Sun

2024-10-22 Thread Gang Wu
Welcome Rossi! Best, Gang On Wed, Oct 23, 2024 at 12:50 PM David Li wrote: > Welcome Rossi! > > On Wed, Oct 23, 2024, at 11:41, wish maple wrote: > > Congrats Ruoxi! > > > > Best, > > Xuwei Fu > > > > Felipe Oliveira Carvalho 于2024年10月23日周三 08:18写道: > > > >> Great news! Congratulations. > >> >

Re: [ANNOUNCE] New Arrow PMC chair: Neil Richardson

2024-10-30 Thread Gang Wu
Thanks Andy and congrats Neal! Best, Gang On Wed, Oct 30, 2024 at 8:21 PM Rok Mihevc wrote: > Thanks Andy and Neil! > > Rok > > On Wed, Oct 30, 2024 at 1:18 PM Vibhatha Abeykoon > wrote: > > > Congratulations Neil! > > > > Vibhatha Abeykoon > > > > > > On Wed, Oct 30, 2024 at 5:41 PM Kevin Gur

Re: [ANNOUNCE] New Arrow PMC member: Curt Hagenlocher

2024-10-30 Thread Gang Wu
Congratulations! On Thu, Oct 31, 2024 at 7:50 AM Kevin Gurney wrote: > Congratulations, Curt! > > From: David Li > Sent: Wednesday, October 30, 2024 7:48:03 PM > To: dev@arrow.apache.org > Subject: Re: [ANNOUNCE] New Arrow PMC member: Curt Hagenlocher > > Congr

Re: [ANNOUNCE] New Arrow committer: Will Ayd

2024-10-02 Thread Gang Wu
Congrats and welcome! Best regards, Gang On Wed, Oct 2, 2024 at 10:16 PM Vibhatha Abeykoon wrote: > Congratulations, Will! > > On Wed, Oct 2, 2024 at 3:18 PM Joris Van den Bossche < > jorisvandenboss...@gmail.com> wrote: > > > Congratulations Will, and we are happy to have you! > > > > On Wed,

Re: [ANNOUNCE] New Arrow committer: Adam Reeve

2024-11-18 Thread Gang Wu
Congrats Adam! On Tue, Nov 19, 2024 at 9:11 AM Curt Hagenlocher wrote: > Congratulations Adam! > > On Mon, Nov 18, 2024 at 4:47 PM Ian Cook wrote: > > > Congratulations Adam! > > > > On Mon, Nov 18, 2024 at 19:31 Sutou Kouhei wrote: > > > > > On behalf of the Arrow PMC, I'm happy to announce t

Re: [VOTE] Release Apache Arrow 18.1.0 - RC2

2024-11-18 Thread Gang Wu
+1 (non-binding) Ran TEST_DEFAULT=0 TEST_CPP=1 ./verify-release-candidate.sh 18.1.0 2 on my Mac M1. ``` 100% tests passed, 0 tests failed out of 96 Label Time Summary: arrow-compute-tests= 35.99 sec*proc (13 tests) arrow-tests= 155.56 sec*proc (39 tests) arrow_acero=

Re: [DISCUSS] Split Java release process

2024-11-18 Thread Gang Wu
+1 on splitting the Java codebase! I'm not an active contributor/reviewer to the Java codebase, though I have several contributions to it in the past. I can volunteer to be a release manager on the Java side if I can help. I have some experience in releasing orc and parquet-java in the past. Best

Re: [ANNOUNCE] New Arrow committer: Laurent Goujon

2024-11-25 Thread Gang Wu
Congrats and welcome Laurent! On Mon, Nov 25, 2024 at 7:57 PM Ruoxi Sun wrote: > Congratulations Laurent! > > > *Regards,* > *Rossi SUN* > > > Antoine Pitrou 于2024年11月25日 周一18:26写道: > > > > > Welcome to the team Laurent! > > > > > > Le 25/11/2024 à 10:39, Raúl Cumplido a écrit : > > > Thanks and

Re: [VOTE] Statistics through the C data interface

2024-12-04 Thread Gang Wu
+1 (binding) I've left a minor comment to solicit concrete examples of data in the statistics array if this is reasonable. Best, Gang On Thu, Dec 5, 2024 at 11:17 AM wish maple wrote: > +1 (non-binding) > > Best, > Xuwei Fu > > Sutou Kouhei 于2024年12月5日周四 10:58写道: > > > Hi, > > > > I would lik

Re: [ANNOUNCE] New Arrow PMC member: Gang Wu

2024-12-04 Thread Gang Wu
Committee (PMC) for Apache Arrow has invited > > Gang Wu to become a PMC member and we are pleased to announce > > that Gang Wu has accepted. > > > > Congratulations and welcome! > > >

Re: [ANNOUNCE] New Arrow PMC member: Bryce Mecum

2025-02-05 Thread Gang Wu
Congrats Bryce! On Thu, Feb 6, 2025 at 9:57 AM Ruoxi Sun wrote: > Congrats Bryce, well deserved! > > > *Regards,* > *Rossi SUN* > > > David Li 于2025年2月6日 周四08:50写道: > > > Congrats Bryce!! > > > > On Thu, Feb 6, 2025, at 07:58, Rok Mihevc wrote: > > > Congrats Bryce! > > > > > > On Wed, Feb 5, 20

Re: [VOTE][Java] Release Apache Arrow Java 18.2.0 RC5

2025-02-06 Thread Gang Wu
+1 Ran dev/release/verify_rc.sh 18.2.0 5 on my MacBook M1 with openjdk version "17.0.13" 2024-10-15 OpenJDK Runtime Environment Homebrew (build 17.0.13+0) OpenJDK 64-Bit Server VM Homebrew (build 17.0.13+0, mixed mode, sharing) Thanks JB! Best, Gang On Thu, Feb 6, 2025 at 10:21 PM Jean-Baptiste

Re: [ANNOUNCE] New Arrow committer: Ed Seidl (etseidl)

2025-01-31 Thread Gang Wu
Congrats Ed! On Fri, Jan 31, 2025 at 4:15 PM Antoine Pitrou wrote: > > Congratulations and welcome, Ed! > > > Le 29/01/2025 à 11:18, Andrew Lamb a écrit : > > On behalf of the Arrow PMC, I'm happy to announce that Ed Seidl > > has accepted an invitation to become a committer on Apache > > Arrow.

Re: [C++] Bump required CMake version

2024-12-10 Thread Gang Wu
+1 I'm excited to use new features of CMake. On Tue, Dec 10, 2024 at 6:02 PM Raúl Cumplido wrote: > I am also +1 on bumping the CMake version to 3.25 > > El mar, 10 dic 2024 a las 3:36, wish maple () > escribió: > > > +1 on 3.25 > > > > Best, > > Xuwei Fu > > > > Ruoxi Sun 于2024年12月10日周二 08:36

Re: When is bit width 0 in dictionary encoded parquet files?

2024-12-18 Thread Gang Wu
IIUC, the bit-width could be 0 when the dictionary contains a single entry and then all entry ids are zeros. I haven't tried to create such a file and I suspect that RLE (rather than bit-packing) is in use in this case. Best, Gang On Thu, Dec 19, 2024 at 9:43 AM Marko Divjak wrote: > Hi, > > I

Re: [VOTE] Split Java release process

2024-11-21 Thread Gang Wu
+1 (non-binding) On Fri, Nov 22, 2024 at 10:10 AM Jacob Wujciak wrote: > +1 (non-binding) > > Am Fr., 22. Nov. 2024 um 03:06 Uhr schrieb David Li : > > > > +1 (binding) > > > > On Fri, Nov 22, 2024, at 10:47, Sutou Kouhei wrote: > > > +1 (binding) > > > > > > (I replied to wrong thread...) > > >

Re: [VOTE] Release Apache Arrow 19.0.0 - RC0

2025-01-13 Thread Gang Wu
+1 (binding) Ran TEST_DEFAULT=0 TEST_CPP=1 ./verify-release-candidate.sh 19.0.0 0 on my Mac M1 with AppleClang 16.0.0.1626 Best, Gang On Mon, Jan 13, 2025 at 2:11 AM Bryce Mecum wrote: > Hi, > > I would like to propose the following release candidate (RC0) of Apache > Arrow version 19.0.

Re: [VOTE] Release Apache Arrow 19.0.1 - RC1

2025-02-13 Thread Gang Wu
+1 Verified the C++ build on my MacOS M1. Thanks! On Thu, Feb 13, 2025 at 2:12 PM Ruoxi Sun wrote: > +1 (non-binding) > > On my M1 Mac, OS version Sonoma 14.7.1 (23H222), AppleClang > 15.0.0.15000309, verified cpp: > > TEST_DEFAULT=0 TEST_CPP=1 ./verify-release-candidate.sh 19.0.1 1 > > *Regar

Re: [ANNOUNCE] New Arrow committer: Matthijs Brobbel (mbrobbel)

2025-03-22 Thread Gang Wu
Congrats! On Sun, Mar 23, 2025 at 12:21 AM Bryce Mecum wrote: > Congrats! > > On Fri, Mar 21, 2025 at 1:51 PM Andrew Lamb wrote: > > > > Hi, > > > > On behalf of the Arrow PMC, I'm happy to announce that Matthijs Brobbel > > has accepted an invitation to become a committer on Apache > > Arrow.

Re: [ANNOUNCE] New Arrow PMC member: Jacob Wujciak

2025-03-16 Thread Gang Wu
Congrats Jacob! On Mon, Mar 17, 2025 at 1:23 PM Sutou Kouhei wrote: > The Project Management Committee (PMC) for Apache Arrow has invited > Jacob Wujciak to become a PMC member and we are pleased to announce > that Jacob Wujciak has accepted. > > Congratulations and welcome! > > >

Re: [ANNOUNCE] New Arrow PMC member: Ian Cook

2025-03-20 Thread Gang Wu
Congrats Ian! On Thu, Mar 20, 2025 at 4:10 PM Ruoxi Sun wrote: > Congrats Ian! > > And my special appreciation for your consistently wonderful hosting of the > community meeting! > > *Regards,* > *Rossi SUN* > > > On Thu, Mar 20, 2025 at 4:05 PM Sutou Kouhei wrote: > > > The Project Management

Re: [ANNOUNCE] New Arrow PMC member: Rok Mihevc

2025-04-04 Thread Gang Wu
Congrats Rok! On Thu, Mar 20, 2025 at 7:31 AM Felipe Oliveira Carvalho < felipe...@gmail.com> wrote: > Congratulations Rok! Well deserved. > > On Wed, Mar 19, 2025 at 7:42 PM David Li wrote: > > > Congrats Rok! > > > > On Thu, Mar 20, 2025, at 06:09, Fokko Driesprong wrote: > > > Congrats Rok! >

Re: [VOTE] Enable GitHub Discussions for apache/arrow-*

2025-04-05 Thread Gang Wu
+1 On Fri, Mar 21, 2025 at 10:58 PM Dewey Dunnington < dewey.dunning...@gmail.com> wrote: > +1! > > On Fri, Mar 21, 2025 at 7:58 AM Neal Richardson < > neal.p.richard...@gmail.com> > wrote: > > > +1 > > > > On Fri, Mar 21, 2025 at 5:18 AM Raúl Cumplido wrote: > > > > > +1 > > > > > > happy to tr

Re: [DISCUSS] Turtle canonical extension type

2025-04-01 Thread Gang Wu
+1 (binding) I'll propose a Rabbit canonical extension type next year. Best, Gang On Wed, Apr 2, 2025 at 10:49 AM wish maple wrote: > Out of curiosity, so this turtle type is like an array > containing the info arrow stream ipc batches? > > Do binary values have some alignas rule? And > is `l

Re: [PROPOSAL] Apache Arrow Java 19.0.0 release ?

2025-04-20 Thread Gang Wu
+1 If this is not in a hurry, I'd like to be the release manager to go through the release process. On Sun, Apr 20, 2025 at 2:21 PM Jean-Baptiste Onofré wrote: > As the release will include only dependency updates and fixes (no new > feature for now), 18.3 makes more sense to me. > > Thoughts ?

Re: [PROPOSAL] Apache Arrow Java 19.0.0 release ?

2025-04-22 Thread Gang Wu
gt; On Tue, Apr 22, 2025 at 4:03 PM Gang Wu wrote: > > > > It seems that there is a consensus to release 18.3.0. I'll follow up with > > it. > > > > On Mon, Apr 21, 2025 at 12:12 AM Jacob Wujciak-Jens > > wrote: > > > > > Yep in that case

Re: [PROPOSAL] Apache Arrow Java 19.0.0 release ?

2025-04-22 Thread Gang Wu
t; > El dom, 20 abr 2025 a las 10:19, Jacob Wujciak-Jens () > > escribió: > > > > > Following SemVer if it's only fixes and dependencies it should be a > patch > > > version bump so 18.2.1 > > > > > > Gang Wu schrieb am So., 20. Apr. 2025, 10

Re: [VOTE] Release Apache Arrow 20.0.0 - RC2

2025-04-23 Thread Gang Wu
+1 (binding) Ran TEST_DEFAULT=0 TEST_CPP=1 ./verify-release-candidate.sh 20.0.0 2 on my Mac laptop. I ran into a flaky test failure below and retried with success: [ RUN ] TaskScheduler.AbortContOnTaskErrorParallel /var/folders/1_/mzm1z4sj7m72mk8lbs_z71

Re: [ANNOUNCE] New Arrow committer: Jean-Baptiste Onofré

2025-03-11 Thread Gang Wu
Congrats JB! On Tue, Mar 11, 2025 at 5:02 PM Ruoxi Sun wrote: > Congratulations JB! > > *Regards,* > *Rossi SUN* > > > On Tue, Mar 11, 2025 at 4:06 AM Weston Pace wrote: > > > Congrats and welcome JB! > > > > On Mon, Mar 10, 2025 at 12:42 PM Bryce Mecum > wrote: > > > > > Congrats JB. And welc

Re: [DISCUSS][Java] Will arrow support overall compression at the FieldVector level in the future, rather than separately compressing each ArrowBuffer within the FieldVector

2025-02-20 Thread Gang Wu
(I'm not sure why I cannot see the original post but only David's reply) I agree with David that this is not a Java issue. We might need more evidence to support new compression strategies. Yunhong, do you have any experiment results to support your statement? IMHO, from the perspective of entro

Re: [PROPOSAL] Arrow Java quarter release pace

2025-02-24 Thread Gang Wu
A quarterly schedule sounds good. Thanks! Best, Gang On Mon, Feb 24, 2025 at 8:23 PM Jean-Baptiste Onofré wrote: > Some subprojects have their own page. But yeah. Agree. > > Regards > JB > > Le lun. 24 févr. 2025 à 11:45, David Li a écrit : > > > This sounds good to me. We may want to start ad

Re: [VOTE][Java] Release Apache Arrow Java 18.3.0 RC2

2025-05-12 Thread Gang Wu
ork/arrow-java/arrow-java/binaries/,,' > *.sha* > for x in *.sha256; do sha256sum -c $x; done > for x in *.sha512; do sha512sum -c $x; done > > (JNI and .jar aren't checked.) > > Thanks, > -- > kou > > > In > "[VOTE][Java] Release Apache

[VOTE][Java] Release Apache Arrow Java 18.3.0 RC2

2025-05-10 Thread Gang Wu
Hi, I would like to propose the following release candidate (RC2) of Apache Arrow Java version 18.3.0. This release candidate is based on commit: 8e84e4c8bbe041f362690e4ea54280ec682dbb1f [1] The source release rc2 is hosted at [2]. Please download, verify checksums and signatures, run the unit

[RESULT][VOTE][Java] Release Apache Arrow Java 18.3.0 RC2

2025-05-13 Thread Gang Wu
Hi, This vote passed with 4 +1 binding votes and 1 +1 non binding vote. The vote thread is https://lists.apache.org/thread/43nl349xgtgq18z5sp95mjkxov6m32qr Thanks everyone for your vote! Best, Gang

[ANNOUNCE] Apache Arrow Java 18.3.0 released

2025-05-13 Thread Gang Wu
The Apache Arrow community is pleased to announce the Arrow Java 18.3.0 release. The release is available now from our website: https://arrow.apache.org/install/ and https://www.apache.org/dyn/closer.cgi/arrow/apache-arrow-java-18.3.0/ Read about what's new in the release at: https

Re: [DISCUSS][C++] Switch to C++20

2025-05-19 Thread Gang Wu
+1 On Tue, May 20, 2025 at 7:09 AM Ruoxi Sun wrote: > > Is it fair to say most users of Arrow C++ do that via Python/R or shared > libraries? Making the migration to a recent C++ standard relatively safe? > > I would say so. For some C++ dependents I know of, they either don't depend > on very r

Re: [VOTE] Release nanoarrow 0.7.0 RC1

2025-06-30 Thread Gang Wu
+1 Verified on my macOS BTW, I think [1] is https://github.com/apache/arrow-nanoarrow/milestone/7?closed=1 On Tue, Jul 1, 2025 at 9:21 AM Sutou Kouhei wrote: > +1 (binding) > > I ran the following on Debian GNU/Linux sid: > > dev/release/verify-release-candidate.sh 0.7.0 1 > > with: > > *

Re: [ANNOUNCE] New Arrow PMC member: Alenka Frim

2025-07-01 Thread Gang Wu
Congrats, Alenka! On Tue, Jul 1, 2025 at 4:07 PM Jacob Wujciak wrote: > Well deserved, congratulations Alenka! > > Raúl Cumplido schrieb am Di., 1. Juli 2025, 09:38: > > > The Project Management Committee (PMC) for Apache Arrow has invited > Alenka > > Frim to become a PMC member and we are ple