Great news. Congratulations Alenka!
--
Felipe
On Wed, Jul 2, 2025 at 6:27 PM Nic Crane wrote:
> Congrats Alenka!
>
> On Wed, 2 Jul 2025 at 05:03, Alenka Frim wrote:
>
> > Thank you all for the support and for welcoming me so openly
> > into the community right from the start. I’m really lookin
What about adding a canonical extension type so teams using Arrow don't
have to keep re-inventing timestamps and duration types?
Using Decimal128 as storage type for these since we are missing 128-bit
integers (another debate).
--
Felipe
On Sun, Jun 22, 2025 at 9:48 AM Antoine Pitrou wrote:
>
+1
Is it fair to say most users of Arrow C++ do that via Python/R or shared
libraries? Making the migration to a recent C++ standard relatively safe?
--
Felipe
On Mon, May 19, 2025 at 1:14 PM Antoine Pitrou wrote:
>
> Hello,
>
> I am proposing that we switch Arrow C++ to require C++20.
>
> C++
> As far as relying on union types, the reason we can't do so is because
> the specific purpose of this Variant type is that we don't know the
> types up front, it's dynamic.
This is why "VARIANT" is a misnomer for this type. It's a DYNAMIC type, not
a VARIANT (a type that can be a sum of multiple
+1 from for the same reasons listed by Weston above.
On Tue, Apr 15, 2025 at 6:02 AM Weston Pace wrote:
> +1 from me, assuming this is acceptable to domoritz / trxcllnt. I feel we
> have struggled to find maintainers for JS (outside of a few dedicated and
> extremely helpful ones).
>
> Ideally
Congrats!
On Sat, 22 Mar 2025 at 13:23 Gang Wu wrote:
> Congrats!
>
> On Sun, Mar 23, 2025 at 12:21 AM Bryce Mecum wrote:
>
> > Congrats!
> >
> > On Fri, Mar 21, 2025 at 1:51 PM Andrew Lamb
> wrote:
> > >
> > > Hi,
> > >
> > > On behalf of the Arrow PMC, I'm happy to announce that Matthijs Bro
Hi Alina,
I don't speak for the whole community but one approach I took in the past
[1] was opening a huge PR that does "everything" and as the reviews on that
one progressed I would extract smaller, self-contained PRs that got more
detailed reviews. Once those were merged, I would rebase and repe
Hi,
All this complexity everywhere when arrow-rs could simply check the
alignment when they ingest external buffers and re-allocate to ensure
alignment.
I'm in favor of producers of Arrow arrays like a Flight client ensuring
alignment as early as possible (when buffers are allocated for arrays
de
Congrats! 🚀
On Fri, Mar 21, 2025 at 7:23 AM Nic Crane wrote:
> Congrats!
>
> On Thu, 20 Mar 2025, 23:15 Ed Seidl, wrote:
>
> > Congrats Ian!
> >
> > Cheers,
> > Ed
> >
> > On 2025/03/20 08:04:03 Sutou Kouhei wrote:
> > > The Project Management Committee (PMC) for Apache Arrow has invited
> > >
Congratulations Rok! Well deserved.
On Wed, Mar 19, 2025 at 7:42 PM David Li wrote:
> Congrats Rok!
>
> On Thu, Mar 20, 2025, at 06:09, Fokko Driesprong wrote:
> > Congrats Rok!
> >
> > Op wo 19 mrt 2025 om 22:08 schreef Adam Reeve
> >
> >> Congratulations Rok!
> >>
> >> On Thu, 20 Mar 2025 at
Great news! Congratulations Jacob.
--
Felipe
On Tue, Mar 18, 2025 at 3:33 AM Jean-Baptiste Onofré
wrote:
> Congrats Jacob !
>
> Regards
> JB
>
> On Mon, Mar 17, 2025 at 6:23 AM Sutou Kouhei wrote:
> >
> > The Project Management Committee (PMC) for Apache Arrow has invited
> > Jacob Wujciak to
Great news! Congratulations, Bryce.
On Wed, Feb 5, 2025 at 6:15 PM Neal Richardson
wrote:
> Congrats, Bryce!
>
> On Wed, Feb 5, 2025 at 2:09 PM William Ayd .invalid>
> wrote:
>
> > Congrats!
> >
> > Sent from my iPhone
> >
> > > On Feb 5, 2025, at 2:51 PM, Ian Cook wrote:
> > >
> > > Congratu
+1
On Mon, Dec 23, 2024 at 2:37 AM Sutou Kouhei wrote:
> Hi,
>
> I would like to propose standardizing how to represent
> statistics as Apache Arrow array.
>
> Motivation:
>
> * We want to pass not only Apache Arrow data but also
> statistics of them through the C data interface for query
>
> I think it's fair not to mention any other Arrow-like transport mechanism
> since the benefits of transporting the statistics as an Arrow array are
> less clear right now.
When we (or applications) start thinking about more advanced statistics
like compressed histograms and sketch data structure
Congratulations Gang Wu!
On Tue, Dec 3, 2024 at 9:21 PM Weston Pace wrote:
> Congratulations!
>
> On Tue, Dec 3, 2024, 3:21 PM Ian Cook wrote:
>
> > Congratulations and thanks for all your great work—not just on Arrow but
> on
> > so many parts of the surrounding ecosystem!
> >
> > On Tue, Dec
+1 from me.
I reviewed the PR some time ago and it's not a trivial protocol, but the
complexity seems warranted and necessary.
On Thu, Oct 24, 2024 at 6:02 PM Dewey Dunnington
wrote:
> Thanks Matt for putting this together!
>
> I was initially concerned about the complexity of the proposal;
> h
Great news! Congratulations.
—
Felipe
On Tue, 22 Oct 2024 at 16:03 Weston Pace wrote:
> On behalf of the Arrow PMC, I'm happy to announce that Rossi Sun has
> accepted an invitation to become a committer on Apache Arrow. Welcome,
> and thank you for your contributions!
>
Hi Susmit,
For an example of what David Li is proposing, you can take a look at this
project (https://github.com/voltrondata/sqlflite). It's a Flight SQL server
(in C++ though) that can forward queries to either SQLite or DuckDB.
--
Felipe
On Wed, Oct 16, 2024 at 10:22 AM David Li wrote:
> If
I say we remove it.
Arrow has great header hygiene (compared to other codebases I've worked
on). With a little bit more effort we can probably eliminate long header
include chains.
--
Felipe
On Wed, Oct 2, 2024 at 6:53 AM Antoine Pitrou wrote:
>
> Hello,
>
> Long ago, we added a ARROW_USE_PREC
In what language are you implementing your Flight service?
For C++ you implement a flight::ServerMiddlewareFactory. That allows you to
populate response headers with the returned JWT token [1]
--
Felipe
[1]
https://github.com/voltrondata/sqlflite/blob/f8c72976ea9eef7c7de264b7f93da3ae1fa2bcd7/src
+1 (non-binding)
Micah Kornfield, you have a good point. One spec without at least 2
implementations is not a serious spec.
But I think we can run the vote now and defer the merging of the
implementations until we are confident both implementations are almost
complete, tested and the spec text co
+1 (non-binding)
--
Felipe
On Tue, Aug 6, 2024 at 6:24 AM Gang Wu wrote:
> +1 (non-binding)
>
> Looked through the spec and C++ impl.
>
> Best,
> Gang
>
> On Tue, Aug 6, 2024 at 11:55 AM wish maple wrote:
>
> > +1 (non-binding)
> >
> > Best,
> > Xuwei Fu
> >
> > David Li 于2024年8月6日周二 10:20写道:
I think it would confuse implementors of the spec and people implementing
kernels way too much. “the bool Arrow type” should probably not start
meaning two different things.
—
Felipe
On Fri, 19 Jul 2024 at 01:26 Micah Kornfield wrote:
> As Boolean is already in the arrow type system I think it
Hi,
The markers are necessary to offer file system semantics on top of object
stores. You will get a ton of subtle bugs otherwise.
If instead of arrow::FileSystem, Arrow offered an arrow::ObjectStore
interface that wraps local filesystems and object stores with object-store
semantics (i.e. no con
o leverage the fact that libraries handles
unions gracefully, this could be:
map, dense_union<...needed types based on stat kinds
in the keys...>>
X is either sparse or dense.
A possible alternative is to use a custom struct instead of map and reduce
the levels of nesting:
struct>
--
f {
> > >> >> >>>let left = left.as_primitive();
> > >> >> >>>let right = right.as_primitive();
> > >> >> >>>res = binary(left, right, |l, r| gcd(l, r));
> > >> >> >>>Arc::new(res
Hi,
You can promise that well-known int32 statistic keys won't ever be higher
than a certain value (2^18) [1] like TCP IP ports (well-known ports in [0,
2^10)) but for non-standard statistics from open-source products the key=0
combined with string label is the way to go, otherwise collisions woul
On Fri, Jun 28, 2024 at 11:07 AM Andrew Lamb wrote:
>
> Hi Xuanwo,
>
> Sorry for the delay in responding. I think the ability to easily write
> functions that "feel" like native functions in whatever language and be
> able to generate arrow / vectorized versions of them is quite valuable.
> This
On Sun, Jun 9, 2024 at 7:53 PM Sutou Kouhei wrote:
>
> Hi,
>
> In
> "Re: [DISCUSS] Statistics through the C data interface" on Sun, 9 Jun 2024
> 22:11:54 +0200,
> Antoine Pitrou wrote:
>
> Fields:
> | Name | Type | Comments |
> ||---
+1. I think the benefits outweigh the risks.
On Wed, Jun 5, 2024 at 3:05 PM Anja wrote:
>
> I did want to start off by acknowledging that all of the pros you listed
> for mimalloc are accurate.
>
> I did want to contribute the times that people have been caught off-guard
> by the perceived increa
te:
>
>
>
> Le 07/06/2024 à 18:30, Felipe Oliveira Carvalho a écrit :
> > On Fri, Jun 7, 2024 at 6:24 AM Antoine Pitrou wrote:
> >>
> >>
> >> Le 07/06/2024 à 04:27, Felipe Oliveira Carvalho a écrit :
> >>> I've been thinking about how t
On Fri, Jun 7, 2024 at 6:24 AM Antoine Pitrou wrote:
>
>
> Le 07/06/2024 à 04:27, Felipe Oliveira Carvalho a écrit :
> > I've been thinking about how to encode statistics on Arrow arrays and
> > how to keep the set of statistics known by both producers and
> &
I've been thinking about how to encode statistics on Arrow arrays and
how to keep the set of statistics known by both producers and
consumers (i.e. standardized).
The statistics array(s) could be a
map<
// the column index or null if the statistics refer to whole table or batch
column:
+1 (non-binding)
On Wed, 29 May 2024 at 11:30 Micah Kornfield wrote:
> +1 (non-binding for Parquet, Binding for Arrow if that makes a difference)
>
>
>
> On Wed, May 29, 2024 at 7:15 AM Rok Mihevc wrote:
>
> > # sending this to both dev@arrow and dev@parquet
> >
> > Hi all,
> >
> > Following th
I want to +1 on what Dewey is saying here and some comments.
Sutou Kouhei wrote:
> ADBC may be a bit larger to use only for transmitting statistics. ADBC has
> statistics related APIs but it has more other APIs.
It's impossible to keep the responsibility of communication protocols
cleanly separa
Great news. Congratulations Dane!
On Tue, May 7, 2024 at 7:57 PM Vibhatha Abeykoon wrote:
>
> Congratulations Dane!!!
>
> Vibhatha Abeykoon
>
>
> On Wed, May 8, 2024 at 4:02 AM Jacob Wujciak wrote:
>
> > Congrats!
> >
> > Am Di., 7. Mai 2024 um 23:19 Uhr schrieb Bryce Mecum > >:
> >
> > > Congr
Isn't that easily decodable from the UUID data itself?
If you allow the version to be specified as metadata, you now have to
validate and make sure it's consistent with the version encoded in the
contents of the UUID column. And UUID versions are more of a concern
for UUID generation than consumpt
The OP used UUID as an example. Would that be enough or the request is for
a flexible mechanism that allows the creation of one-off nominal types for
very specific use-cases?
—
Felipe
On Thu, 11 Apr 2024 at 05:06 Antoine Pitrou wrote:
>
> Yes, JSON and UUID are obvious candidates for new canoni
Algebraic Data Types (Sums and Products) are very abstract. This means
they don't fully specify a concrete/physical layout [1]: different
physical layouts can match the same algebraic definition. As an
in-memory data format specification, Arrow doesn't and shouldn't
rigidly specify concretization r
Two comments:
——
Since this library is analogous to things like ADBC, ODBC, and JDBC, it’s
more of a “driver” than a “connector”. This might make your life easier
when explaining what it does.
It’s not a black and white thing, but “connector” might imply networking to
some people.
I believe you
> I have found Twitter an extremely effective way for an open-source
project to communicate with the “exo-community” — people who are interested
in the project but not so invested that they join the email list. An open
source project needs to perform pretty much all of the functions of a
for-profit
gt; wrote:
> > > >
> > > > > Congratulations, Felipe!
> > > > > ________
> > > > > From: Daniël Heres
> > > > > Sent: Thursday, December 7, 2023 2:59 PM
> > > > > To: dev@arrow.apache.org
> &
Congratulations! Well deserved.
On Mon, Nov 13, 2023 at 5:16 PM Neal Richardson
wrote:
> Congratulations!
>
> On Mon, Nov 13, 2023 at 3:10 PM Matt Topol wrote:
>
> > Congratulations Raul!!
> >
> > On Mon, Nov 13, 2023, 3:09 PM Antoine Pitrou wrote:
> >
> > >
> > > Welcome Raul, we're glad to h
Congratulations Xuwei!
—
Felipe
On Mon, 23 Oct 2023 at 10:26 Vibhatha Abeykoon wrote:
> Congratulations Xuwei!
>
> On Mon, Oct 23, 2023 at 6:38 PM Weston Pace wrote:
>
> > Congratulations Xuwei!
> >
> > On Mon, Oct 23, 2023 at 3:38 AM wish maple
> wrote:
> >
> > > Thanks kou and every nice pe
+1
On Wed, Oct 18, 2023 at 2:49 PM Dewey Dunnington
wrote:
> +1!
>
> On Wed, Oct 18, 2023 at 2:14 PM Matt Topol wrote:
> >
> > +1
> >
> > On Wed, Oct 18, 2023 at 1:05 PM Antoine Pitrou
> wrote:
> >
> > > +1
> > >
> > > Le 18/10/2023 à 19:02, Benjamin Kietzman a écrit :
> > > > Hello all,
> > >
It’s not the best since the format is really focused on in- memory
representation and direct computation, but you can do it:
https://arrow.apache.org/docs/python/feather.html
—
Felipe
On Tue, 17 Oct 2023 at 23:26 Nara wrote:
> Hi,
>
> Is it a good idea to use Apache Arrow as a file format? Loo
The Zulip is
https://ursalabs.zulipchat.com/
On Tue, Oct 17, 2023 at 9:55 PM Will Jones wrote:
> Hi Curt,
>
> I think the most visible place for now would be creating an issue for
> discussion.
>
> In the future, if you and some others want to have a place to discuss C#
> development, you could
> > > But I also reiterate my plea that these existing parsers get fixed so
> as
> > > to entirely validate the format string instead of stopping early.
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > &
Hello,
I'm writing to propose "+vl" and "+vL" as format strings for list-view and
large list-view arrays passing through the Arrow C data interface [1].
The previous proposal was considered a bad idea because existing parsers of
these format strings might be looking at only the first `l` (or `L`)
n. My vote would be +1 for +vl and
> +vL.
>
> On Thu, Oct 5, 2023 at 6:40 PM Felipe Oliveira Carvalho
> wrote:
> >
> > > Union format strings share enough properties that having them in the
> > > same switch case doesn't result in additional complexity...lis
here a reason
> >> that +lv and +Lv were chosen over a single-character version (i.e.,
> >> maybe +v and +V)? A single-character version is (slightly) easier to
> >> parse in C.
> >>
> >> On Thu, Oct 5, 2023 at 2:00 PM Felipe Oliveira Carvalho
&
to parse the format string are already rather
> unwieldy...it would be a nice quality-of-life improvement (although by
> no means a required one) to use a separate character.
>
> On Thu, Oct 5, 2023 at 3:34 PM Felipe Oliveira Carvalho
> wrote:
> >
> > This ma
where this discussion may have occurred...is there a reason
> that +lv and +Lv were chosen over a single-character version (i.e.,
> maybe +v and +V)? A single-character version is (slightly) easier to
> parse in C.
>
> On Thu, Oct 5, 2023 at 2:00 PM Felipe Oliveira Carvalho
> wrot
Hello,
I'm writing to propose "+lv" and "+Lv" as format strings for list-view and
large list-view arrays passing through the Arrow C data interface [1].
The vote will be open for at least 72 hours.
[ ] +1 - I'm in favor of this new C Data Format string
[ ] +0
[ ] -1 - I'm against adding this new
> > >> > There'll probably be some minor comments to the format PR, but those
> > >> > don't deter from accepting these new layouts into the standard.
> > >> >
> > >> > Regards
> > >> >
> > >> > Antoi
run into similar issues as [1]?
>
> Kind Regards,
>
> Raphael Taylor-Davies
>
> [1]: https://lists.apache.org/thread/l8t1vj5x1wdf75mdw3wfjvnxrfy5xomy
>
> On 29/09/2023 13:09, Felipe Oliveira Carvalho wrote:
> > Hello,
> >
> > I'd like to propose adding L
Hello,
I'd like to propose adding ListView and LargeListView arrays to the Arrow
format.
Previous discussion in [1][2], columnar format description and flatbuffers
changes in [3].
There are implementations available in both C++ [4] and Go [5]. I'm working
on the integration tests which I will pus
My take here is that Ben did an excellent job in hiding the fact that C++
has two variations of the format without leaking the pointer version via
the interfaces through which Arrow arrays are communicated to other
implementations.
As things stand right now, there is no zero-copy transfer of point
> (a) stays pretty stable throughout the scan (stays < 1G), (b) keeps
increasing during the scan (looks linear to the number of files scanned).
I wouldn't take this to mean a memory leak but the memory allocator not
paging out virtual memory that has been allocated throughout the scan.
Could you r
I marked the C++ implementation PR ready for review today and will soon be
working on the Go implementation.
https://github.com/apache/arrow/pull/35345
Note that differently from Velox's ArrayVector, the Arrow implementation
(ListView) also features a 64-bit version (LargeListView) to be
symmetri
+1 (non-binding)
—
Felipe
On Fri, 18 Aug 2023 at 18:48 Jacob Wujciak-Jens
wrote:
> +1 (non-binding)
>
> On Fri, Aug 18, 2023 at 6:04 PM L. C. Hsieh wrote:
>
> > +1 (binding)
> >
> > On Fri, Aug 18, 2023 at 5:53 AM Neal Richardson
> > wrote:
> > >
> > > +1
> > >
> > > Thanks all for the though
Hello,
I'm writing to inform you that I'm proposing "+r" as format string for
run-end encoded arrays passing through the Arrow C data interface [1].
Feel free to also discuss in the linked PR with the changes to bridge.cc
and reference docs.
[1] https://arrow.apache.org/docs/format/CDataInterfac
at a single logical type may have
> multiple physical layouts. I agree. E.g. variable size list<32>,
variable
> size list<64>, and REE are the physical layouts that, combined with the
> logical type "string", give you "string", "large string", an
A major difficulty in making the Arrow array types open for extension [1]
is that as soon as we define an (a) universal representation* or (b)
abstract interface, we close the door for vectorization. (a) prevents
having new vectorization friendly formats and (b) limits the implementation
of new vec
int8(), int16()… all return the same shared_ptr that gets
inc-ref’d on every "creation".
But any code taking type pointers shouldn't assume it comes from `static`
storage. All uses of a non-owning TypeHolder should be based on something
else ensuring the shared_ptr is alive while the TypeHolder is
Values in the `offsets` Buffer of a ListArray can’t be left undefined
because the length of a valid entry before a NULL entry is the offset
associated with that NULL entry minus the previous offset.
The ListViewArray format I’m working on doesn’t have that restriction
because all the information a
gt; don't see anything inherently wrong with it, and if it ain't broke we
> really shouldn't be trying to fix it.
>
> Kind Regards,
>
> Raphael Taylor-Davies
>
> On 14 June 2023 17:52:52 BST, Felipe Oliveira Carvalho
> wrote:
>
> Genera
tView aspires to, such an addition could require non trivial changes to
> many / all of those implementations (and the APIs they expose).
>
> Andrew
>
> On Wed, Jun 14, 2023 at 12:53 PM Felipe Oliveira Carvalho <
> felipe...@gmail.com> wrote:
>
> > General appr
just between
> systems?"
>
>
> On Wed, Jun 14, 2023 at 2:07 AM Antoine Pitrou wrote:
>
> >
> > I agree that ListView cannot be an extension type, given that it
> > features a new layout, and therefore cannot reasonably be backed by an
> > existing storage type (AFAICT
ypes might be deprecated in favor of view variants [2]. Others
> > > were
> > > > > > > worried that it might undermine the perception that the Arrow
> > > format
> > > > is
> > > > > > > stable. I think it might be worth thinking about &quo
+1 for me.
The C structs are clean and leave good room for extension.
--
Felipe
On Thu, May 25, 2023 at 12:04 PM David Li wrote:
> +1 for me.
>
> (Heads up: on the PR, there was some discussion since the last email and
> the meaning of 'experimental' was clarified.)
>
> On Tue, May 23, 2023, a
Have you considered using fixed-length binary values for these?
Crypto algorithms might logically be defined in terms of mathematical
operations on integers, but their efficient implementation tends to feature
inlined operations at the machine word level instead of generic add, div,
mod, mul opera
. For example,
> operations
> >> that slice these containers can be implemented in a zero-copy manner by
> >> just rearranging the lengths/offsets indices, without ever touching the
> >> larger internal buffers. This is a similar motivation as for StringView
> >&
ort for the type, including compute kernels? Or are they likely to
> just
> > convert this type to ListArray at import boundaries?
> >
> > Because if it turns out to be the latter, then we might as well ask Velox
> > to export this type as ListArray and save the rest of
> I am actually trying to switch to arrow_static.lib.
Perhaps the issue is arrow_static.lib being linked with a static crt that's
not the one you are using in your project?
On Fri, May 12, 2023 at 3:13 PM Arkadiy Vertleyb (BLOOMBERG/ 120 PARK) <
avertl...@bloomberg.net> wrote:
> This is not only
1m2ggz2kdq
> > >>
> > >> On Tue, Apr 25, 2023 at 3:13 PM Will Jones
> > wrote:
> > >>
> > >>> Hi Felipe,
> > >>>
> > >>> Thanks for the introduction. I'd be interested to hear about the
> > >>
Congratulations, Matt!
On Wed, 3 May 2023 at 14:37 Andrew Lamb wrote:
> The Project Management Committee (PMC) for Apache Arrow has invited
> Matt Topol (zeroshade) to become a PMC member and we are pleased to
> announce
> that Matt has accepted.
>
> Congratulations and welcome!
>
After Weston's suggestion above, I've renamed files and classes in my WIP
implementation:
ArrayView -> ListView
On Wed, Apr 26, 2023 at 11:08 AM Ian Cook wrote:
> +1 to what Weston and Joris suggested regarding the name. "ListView"
> seems like the best name to use for this layout in Arrow.
>
>
Hi folks,
I would like to start a public discussion on the inclusion of a new array
format to Arrow — array-view array. The name is also up for debate.
This format is inspired by Velox's ArrayVector format [1]. Logically, this
array represents an array of arrays. Each element is an array-view (of
+1 for "pull request title *and* description".
Being able to read descriptions without leaving the editor is handy.
Keeping that information tracked in the repo means we don’t depend on
GitHub to reconstruct the history of the project.
On Tue, 31 Jan 2023 at 06:43 Antoine Pitrou wrote:
>
> +1 f
80 matches
Mail list logo