On 2023/04/23 09:38:02 "Yang, Yang10" wrote:
> Hi,
>
> As discussed in this issue: https://github.com/apache/arrow/issues/35287,
currently Arrow only supports one parameter: compression_level to be
customized. We would like to make more compression parameters (such as
window_bits) customizable when
I think the ArrayVector can have benefits above:
1. Converting a Batch in Velox or other system to arrow array could be much
more lightweight.
2. Modifying, filter and copy array or string could be much more
lightweight
Velox can make a Vector mutable, seems that arrow array cannot. Seems it
m
I have two parquet related bug fixes and I wonder if we can release them in
12.0.1
1. https://github.com/apache/arrow/pull/35428
2. https://github.com/apache/arrow/pull/35520
Patch 1 can cause BYTE_STREAM_SPLIT unable to be read if the previous
parquet page is larger than the incoming one.
Patch 2
On 2023/06/15 16:24:44 Joris Van den Bossche wrote:
> Hi all,
>
> Bringing up https://github.com/apache/arrow/issues/35746 to the
> mailing list: this issue proposes to bump the default Parquet version
> we use for writing to Parquet files in the C++ library (and in the
> various bindings including
Hi,
By looking at the arrow standard, when it comes to nested structure, like
StructArray[1] or FixedListArray[2], when parent is not valid, the
correspond child leaves "undefined".
If it's a BinaryArray, when when it parent is not valid, would a validity
member point to a undefined address?
And
/c6frlr9gcxy8qdhbmv8cn3rdjbrqxb1v
[4] https://arrow.apache.org/docs/format/Columnar.html#validity-bitmaps
Thanks,
Xuwei Fu
On 2023/06/28 15:03:11 wish maple wrote:
> Hi,
>
> By looking at the arrow standard, when it comes to nested structure, like
> StructArray[1] or FixedListArray[2], when parent is no
ity = true`, there offset might point to a invalid
position
Am I right?
On 2023/06/29 12:10:52 Antoine Pitrou wrote:
>
> Le 29/06/2023 à 13:42, wish maple a écrit :
> > Thanks all!
> > So, in general:
> > 1. For our Binary Like [1] format, and List formats [2], if the
Hi,
By looking into the code of arrow compute, I found there it uses
`TypeHolder` [1], and expression might call `GetTypes` to get the input or
output types. The document for `TypeHolder` says that it's a container for
dynamically created `shared_ptr`. However, my view is:
1. It's widely used, an
Hi, Li
Parquet 2.6 has been supported for a long time, and recently, in Parquet C++
and Python, Parquet 2.6 has been set to the default version of Parquet
writer [1] [2].
So I think you can just use it! However, I don't know whether nanoarrow
supports it.
Best,
Xuwei Fu
[1] https://lists.apache.o
+1 (non-binding)
It would help a lot when processing UTF-8 related data!
Xuwei
Andrew Lamb 于2023年8月22日周二 00:11写道:
> +1
>
> This is a great example of collaboration
>
> On Sat, Aug 19, 2023 at 4:10 PM Chao Sun wrote:
>
> > +1 (non-binding)!
> >
> > On Fri, Aug 18, 2023 at 12:59 PM Felipe Olive
I've met lots of Parquet Dataset issues. The main problem is that currently
we have 2 sets or API
and they have different scan-options. And sometimes different interfaces
like `to_batches()` or
others would enable different scan options.
I think [2] is similar to your problem. 1-4 are some issues
rmation (perhaps
> metadata) per file scanned?
>
> On Wed, Sep 6, 2023 at 12:10 PM wish maple wrote:
>
> > I've met lots of Parquet Dataset issues. The main problem is that
> currently
> > we have 2 sets or API
> > and they have different scan-options. And sometimes diff
By the way, you can try to use a memory-profiler like [1] and [2] .
It would be help to find how the memory is used
Best,
Xuwei Fu
[1] https://github.com/jemalloc/jemalloc/wiki/Use-Case%3A-Heap-Profiling
[2] https://google.github.io/tcmalloc/gperftools.html
Felipe Oliveira Carvalho 于2023年9月7日周
+1
LGTM, thanks!
Ian Cook 于2023年9月30日周六 00:49写道:
> +1 (non-binding)
>
> Thanks very much Felipe for your persistence and your commitment to
> addressing the numerous questions and comments that have been raised
> since the beginning of the discussion on this in April.
>
> On Fri, Sep 29, 2023 a
Congratulations!
Raúl Cumplido 于2023年10月15日周日 20:48写道:
> Congratulations and welcome!
>
> El dom, 15 oct 2023, 13:57, Ian Cook escribió:
>
> > Congratulations Curt!
> >
> > On Sun, Oct 15, 2023 at 05:32 Andrew Lamb wrote:
> >
> > > On behalf of the Arrow PMC, I'm happy to announce that Curt Ha
Arrow IPC file is great, it focuses on in-memory representation and direct
computation.
Basically, it can support compression and dictionary encoding, and can
zero-copy
deserialize the file to memory Arrow format.
Parquet provides some strong functionality, like Statistics, which could
help prunin
he format affords. It is comparatively
> > > expensive
> > > > > to encode and decode, and instead relies on index structures and
> > > > > statistics to accelerate access.
> > > > >
> > > > > Both are therefore perfectly viable options d
Thanks kou and every nice person in arrow community!
I've learned a lot during learning and contribution to arrow and
parquet. Thanks for everyone's help.
Hope we can bring more fancy features in the future!
Best,
Xuwei Fu
Sutou Kouhei 于2023年10月23日周一 12:48写道:
> On behalf of the Arrow PMC, I'm
Congrats Raul!
Best,
Xuwei Fu
Andrew Lamb 于2023年11月14日周二 03:28写道:
> The Project Management Committee (PMC) for Apache Arrow has invited
> Raúl Cumplido to become a PMC member and we are pleased to announce
> that Raúl Cumplido has accepted.
>
> Please join me in congratulating them.
>
> Andre
Hi,
The parquet is divided into arrow and parquet part.
1. The parquet part lowest position is parquet decoder, in [1].
The float point might choosing PLAIN, RLE_DCIT or BYTE_STREAM_SPLIT
encoding.
2. parquet::ColumnReader is applied beyond decoder, each row-group might
have
one or tw
Congrats Andy!
Best,
Xuwei Fu
Andrew Lamb 于2023年11月27日周一 20:47写道:
> I am pleased to announce that the Arrow Project has a new PMC chair and VP
> as per our tradition of rotating the chair once a year. I have resigned and
> Andy Grove was duly elected by the PMC and approved unanimously by the
>
Congrats Felipe!!!
Best,
Xuwei Fu
Benjamin Kietzman 于2023年12月7日周四 23:42写道:
> On behalf of the Arrow PMC, I'm happy to announce that Felipe Oliveira
> Carvalho
> has accepted an invitation to become a committer on Apache
> Arrow. Welcome, and thank you for your contributions!
>
> Ben Kietzman
>
+1 (binding)
Verified C++ and Python in my M1 MacOS
Best,
Xuwei Fu
Jean-Baptiste Onofré 于2023年12月15日周五 00:19写道:
> +1 (non binding)
>
> I checked:
> - hash and signature are OK
> - build is OK as soon as submodule are added (see the discussion on
> another thread)
> - LICENSE and NOTICE look go
Hi, all.
We're proposing Page Filtering in parquet-cpp implementation[1]. Currently,
parquet-cpp and arrow only support RowGroup/ColumnChunk level pruning. Now
we can support filtering with Parquet PageIndex[2]. The interface can be
also used to helping implementing the iceberg positional delete f
+1
verified C++ and Python on M1 MacOS
Best,
Xuwei Fu
Raúl Cumplido 于2024年3月4日周一 17:05写道:
> Hi,
>
> I would like to propose the following release candidate (RC0) of Apache
> Arrow version 15.0.1. This is a release consisting of 37
> resolved GitHub issues[1].
>
> This release candidate is based
I was working on this previously[1]. But forgot the context for it. Now I'll
moving this forward
[1] https://github.com/apache/arrow/pull/37400
Best regards,
Xuwei Fu
Andrei Lazăr 于2024年3月17日周日 03:14写道:
> Hi,
>
> I would like proposing extending the C++ library to add support for writing
> blo
Congrats!
Best,
Xuwei Fu
Nic Crane 于2024年3月18日周一 10:24写道:
> On behalf of the Arrow PMC, I'm happy to announce that Bryce Mecum has
> accepted an invitation to become a committer on Apache Arrow. Welcome, and
> thank you for your contributions!
>
> Nic
>
Congrats Joel!
Best,
Xuwei Fu
Matt Topol 于2024年4月1日周一 22:59写道:
> On behalf of the Arrow PMC, I'm happy to announce that Joel Lubinitsky has
> accepted an invitation to become a committer on Apache Arrow. Welcome, and
> thank you for your contributions!
>
> --Matt
>
+1 (non binding)
Best,
Xuwei Fu
ulk ingestion support for Flight SQL
David Li 于2024年4月5日周五 16:38写道:
> Hello,
>
> Joel Lubinitsky has proposed adding bulk ingestion support to Arrow Flight
> SQL [1]. This provides a path for uploading an Arrow dataset to a Flight
> SQL server to create or append
The issue [1] mentions about the syntax change about arrow parquet. In
general, when reading from a Parquet file with legacy timestamp not written
by arrow, isAdjustedToUTC would be ignored during read. And when filtering
a file like this, filtering would not work.
When casting from a "deprecated
Congrats!
Best,
Xuwei Fu
Kevin Gurney 于2024年4月11日周四 23:22写道:
> Congratulations, Sarah!! Well deserved!
>
> From: Jacob Wujciak
> Sent: Thursday, April 11, 2024 11:14 AM
> To: dev@arrow.apache.org
> Subject: Re: [ANNOUNCE] New Arrow committer: Sarah Gilmore
>
>
Congrats!
Best,
Xuwei Fu
Joris Van den Bossche 于2024年5月7日周二 21:53写道:
> On behalf of the Arrow PMC, I'm happy to announce that Dane Pitkin has
> accepted an invitation to become a committer on Apache Arrow. Welcome,
> and thank you for your contributions!
>
> Joris
>
+1 (binding)
TEST_DEFAULT=0 TEST_CPP=1 ./verify-release-candidate.sh 16.1.0 1
Release candidate 16.1.0 works well on my M1 MacOS
Best,
Xuwei Fu
David Li 于2024年5月10日周五 09:30写道:
> +1 (binding)
>
> Tested sources with Conda on Debian 12/x86_64 (binaries failed due to
> download flakiness)
>
> On
Ah, only PMC can vote binding
Please regard me as non-binding
Best,
Xuwei Fu
wish maple 于2024年5月10日周五 10:39写道:
> +1 (binding)
>
> TEST_DEFAULT=0 TEST_CPP=1 ./verify-release-candidate.sh 16.1.0 1
> Release candidate 16.1.0 works well on my M1 MacOS
>
> Best,
> Xuwei Fu
>
Some configs, like use_thread would be true in Python but false in C++
Maybe we call fill all configs explicitly with same values
Best,
Xuwei Fu
J N 于2024年6月13日周四 13:32写道:
> Hello,
> We all know that there inherent overhead in Python, and we wanted to
> compare the performance of reading d
+1 (non-binding)
Checked spec change and C++ impl.
Best,
Xuwei Fu
Gang Wu 于2024年7月24日周三 20:51写道:
> +1 (non-binding)
>
> Checked spec change and C++ impl.
>
> On Wed, Jul 24, 2024 at 6:52 PM Joel Lubinitsky
> wrote:
>
> > +1 (non-binding)
> >
> > Go implementation LGTM
> >
> > On Wed, Jul 24,
+1 (non-binding)
Best,
Xuwei Fu
David Li 于2024年8月6日周二 10:20写道:
> +1 (binding)
>
> On Tue, Aug 6, 2024, at 10:17, Sutou Kouhei wrote:
> > +1 (binding)
> >
> > In
> > "[VOTE][Format] Bool8 Canonical Extension Type" on Mon, 5 Aug 2024
> > 08:59:42 -0400,
> > Joel Lubinitsky wrote:
> >
> >> H
+1 (non-binding)
Best,
Xuwei Fu
Raúl Cumplido 于2024年8月26日周一 15:48写道:
> +1 (binding)
>
> El lun, 26 ago 2024, 6:23, Matt Topol escribió:
>
> > +1 (binding)
> >
> > On Mon, Aug 26, 2024, 12:08 AM Ruoxi Sun wrote:
> >
> > > +1 non-binding
> > >
> > >
> > > *Regards,*
> > > *Rossi SUN*
> > >
> >
+1 (non-binding)
LGTM in Parquet part
Best,
Xuwei Fu
Sutou Kouhei 于2024年8月28日周三 09:07写道:
> Hi,
>
> How about indenting preprocessor directives for readability?
>
> Issue: https://github.com/apache/arrow/issues/43796
> PR: https://github.com/apache/arrow/pull/43798
>
> For example:
>
> Before:
Congrats!
Best,
Xuwei Fu
David Li 于2024年10月5日周六 19:15写道:
> Welcome, Will!
>
> On Wed, Oct 2, 2024, at 23:25, Gang Wu wrote:
> > Congrats and welcome!
> >
> > Best regards,
> > Gang
> >
> > On Wed, Oct 2, 2024 at 10:16 PM Vibhatha Abeykoon
> > wrote:
> >
> >> Congratulations, Will!
> >>
> >> On
Congrats Ruoxi!
Best,
Xuwei Fu
Felipe Oliveira Carvalho 于2024年10月23日周三 08:18写道:
> Great news! Congratulations.
>
> —
> Felipe
>
> On Tue, 22 Oct 2024 at 16:03 Weston Pace wrote:
>
> > On behalf of the Arrow PMC, I'm happy to announce that Rossi Sun has
> > accepted an invitation to become a co
+1 (non-binding)
Best,
Xuwei Fu
David Li 于2024年10月29日周二 07:51写道:
> +1 (binding) for me
>
> On Sat, Oct 26, 2024, at 10:39, Ian Cook wrote:
> > Oh ok, thanks Matt, I understand.
> >
> > In that case I am +1 on the proposal but I would like to see notes added
> to
> > the documentation to make th
Thanks Andy and congrats Neal!
Andrew Lamb 于2024年10月30日周三 19:28写道:
> I am pleased to announce that the Arrow Project has a new PMC chair and VP
> as per our tradition of rotating the chair once a year. Andy Grove has
> resigned and
> Neil Richardson was duly elected by the PMC and approved unani
Congrets Adam!
Best,
Xuwei Fu
Sutou Kouhei 于2024年11月19日周二 08:31写道:
> On behalf of the Arrow PMC, I'm happy to announce that Adam Reeve
> has accepted an invitation to become a committer on Apache
> Arrow. Welcome, and thank you for your contributions!
>
> --
> kou
>
>
Congrats!
Best,
Xuwei Fu
David Li 于2024年11月25日周一 17:35写道:
> On behalf of the Arrow PMC, I'm happy to announce that Laurent Goujon has
> accepted an invitation to become a committer on Apache Arrow. Welcome, and
> thank you for your contributions!
>
> --
> David
>
+1 (non-binding)
Best,
Xuwei Fu
Sutou Kouhei 于2024年12月5日周四 10:58写道:
> Hi,
>
> I would like to propose standardizing how to pas statistics
> through the C data interface.
>
> Motivation:
>
> * We want to pass not only Apache Arrow data but also
> statistics of them through the C data interface
Congrats!
Best,
Xuwei Fu
Sutou Kouhei 于2024年12月4日周三 05:20写道:
> The Project Management Committee (PMC) for Apache Arrow has invited
> Gang Wu to become a PMC member and we are pleased to announce
> that Gang Wu has accepted.
>
> Congratulations and welcome!
>
Congrats!
Best,
Xuwei Fu
Raúl Cumplido 于2025年2月6日周四 15:47写道:
> Congrats Bryce!
>
> El jue, 6 feb 2025, 6:22, Weston Pace escribió:
>
> > Congrats Bryce!
> >
> > On Wed, Feb 5, 2025 at 8:35 PM Saurabh Singh
> > wrote:
> >
> > > Congratulations Bryce.
> > >
> > > On Thu, 6 Feb 2025 at 07:41, Ga
Congratulations Ed! Well deserved!
Best,
Xuwei Fu
Weston Pace 于2025年1月29日周三 20:19写道:
> Congratulations Ed!
>
> On Wed, Jan 29, 2025 at 2:20 AM Andrew Lamb
> wrote:
>
> > On behalf of the Arrow PMC, I'm happy to announce that Ed Seidl
> > has accepted an invitation to become a committer on Apac
+1 on 3.25
Best,
Xuwei Fu
Ruoxi Sun 于2024年12月10日周二 08:36写道:
> I would +1 on 3.25.
>
> Thanks kou for driving this.
>
> *Regards,*
> *Rossi SUN*
>
>
> On Tue, Dec 10, 2024 at 7:41 AM Jacob Wujciak-Jens
> wrote:
>
> > +1 on 3.25
> >
> > Thanks for the summary kou.
> >
> > Am Mo., 9. Dez. 2024 um
Congratulations Rok!
Best,
Xuwei Fu
Antoine Pitrou 于2025年3月20日周四 03:10写道:
>
> Hello all,
>
> The Project Management Committee (PMC) for Apache Arrow has invited
> Rok Mihevc to become a PMC member and we are pleased to announce that
> Rok has accepted.
>
> Regards
>
> Antoine.
>
Out of curiosity, so this turtle type is like an array
containing the info arrow stream ipc batches?
Do binary values have some alignas rule? And
is `label` and `value` all non-nullable?
Best,
Xuwei Fu
Weston Pace 于2025年4月2日周三 02:52写道:
> I've written a draft at [1] but for simplicity's sake I
Congrats Ian!
Best,
Xuwei Fu
Sutou Kouhei 于2025年3月20日周四 16:04写道:
> The Project Management Committee (PMC) for Apache Arrow has invited
> Ian Cook to become a PMC member and we are pleased to announce
> that Ian Cook has accepted.
>
> Congratulations and welcome!
>
+1 (non-binding)
Best,
Xuwei Fu
Antoine Pitrou 于2025年5月20日周二 00:14写道:
>
> Hello,
>
> I am proposing that we switch Arrow C++ to require C++20.
>
> C++20 will offer support for more C++ language and standard library
> features, such as:
>
> - concepts
> - generic lambdas with explicit type param
When I went through the parquet variant spec, I found that an arrow
extension type might be a must because decoding the parquet row
by row is so inefficient.
I've draft a decoding tool in parquet c++ and ready for review now [1]
[1] https://github.com/apache/arrow/pull/46372
Best,
Xuwei Fu
Matt
+1
Best,
Xuwei Fu
Alenka Frim 于2025年7月14日周一 20:41写道:
> +1 from me too!
>
> I really like that this topic is being shared in the form of a blog post —
> it's well written and nice to read. I especially like the introduction!
> I’d also be happy to read a bit more about hash joins here, as Nic
>
Congrats, Alenka!
Best,
Xuwei Fu
Krisztián Szűcs 于2025年7月1日周二 17:13写道:
> Congrats Alenka!
>
> > On 2025. Jul 1., at 9:38, Raúl Cumplido wrote:
> >
> > The Project Management Committee (PMC) for Apache Arrow has invited
> Alenka
> > Frim to become a PMC member and we are pleased to announce tha
57 matches
Mail list logo