+1, I am supportive of this change as well.
Regards
Antoine.
Le 16/05/2025 à 10:48, Sutou Kouhei a écrit :
Hi,
This is a similar discussion to the "[DISCUSS] Split Go
release process" thread[1], the "[DISCUSS] Split Java
release process" thread[2], the "[DISCUSS] Split R release
process" t
Hello,
I am proposing that we switch Arrow C++ to require C++20.
C++20 will offer support for more C++ language and standard library
features, such as:
- concepts
- generic lambdas with explicit type parameters
- designated initializers
- calendar and timezone functions (currently, our Wind
Le 12/05/2025 à 18:20, Matt Topol a écrit :
> It's not just Parquet Variant, it's also Iceberg (which has
> standardized on this) and Spark in-memory (where this encoding scheme
> originated).
Ok, but it's called Parquet Variant now, since that's where the binary
spec lives:
https://github.co
Hello,
I'm sure the technical details can be ironed out, but the question is
more whether someone is willing to do the maintenance work required to
keep Arrow working on big-endian platforms, and if possible enable it
for more components (most of us don't have access to such a platform).
If
Hi Matt,
Thanks for putting this together.
I think we should make clear that this extension type is for
transporting Parquet Variants. If we were to design a Variant type
specifically for Arrow, it would probably look a bit different (in
particular, we would make a better use of validity bi
ith
selection!
--Matt
On Fri, May 9, 2025, 8:26 AM Raúl Cumplido wrote:
Hi,
I already plan to attend PyData Paris so I would like to volunteer too.
Thanks,
Raúl
El vie, 9 may 2025 a las 13:33, Antoine Pitrou ()
escribió:
Hi JB,
I'm volunteer too.
Regards
Antoine.
Le 09/05/2025
Hi JB,
I'm volunteer too.
Regards
Antoine.
Le 09/05/2025 à 13:00, Jean-Baptiste Onofré a écrit :
Hi everyone,
The Arrow PMC is pleased to announce Arrow Summit 25.
The Arrow Summit 2025 is a community event, organised by a Selection Committee.
The event’s focus is to build community arou
+1 (binding)
Le 07/05/2025 à 10:48, Raúl Cumplido a écrit :
Hi,
I would like to propose splitting the JS implementation and the
corresponding release process to its own repository.
Motivation:
* We want to reduce needless major releases to avoid unnecessary user
burden.
* We want to avoid
Hello,
"Skyhook" is a little-known C++ component that interfaces Arrow with the
Ceph distributed filesystem. It received it last non-trivial change in 2022:
https://github.com/apache/arrow/commit/546c3771a209cbcac5e03cf26e07bcd8c9601d5a
You won't find much documentation for it except for an
Hello,
+1 from me (binding).
Side question: is the FlightSQL spec versioned?
Regards
Antoine.
Le 27/04/2025 à 11:45, David Li a écrit :
Hello,
Mateusz Rzeszutek has proposed adding a "remarks" field in xDBC column metadata
in Flight SQL [1]. This better aligns Flight SQL with existing A
Hi Ben and all,
Sorry for chiming in lately. I do find the URI-and-kv-pairs interface
attractive.
That said, some filesystem options can't reasonably be expressed as
strings. For example, `S3Options` has a `std::shared_ptrKeyValueMetadata> default_metadata` and a
`std::shared_ptr`.
So, p
er hygiene (compared to other codebases I've worked
on). With a little bit more effort we can probably eliminate long header
include chains.
--
Felipe
On Wed, Oct 2, 2024 at 6:53 AM Antoine Pitrou wrote:
Hello,
Long ago, we added a ARROW_USE_PRECOMPILED_HEADERS to the Arrow C++
CMake
As I suggested in
https://github.com/apache/arrow/pull/44279#issuecomment-2757128297 , do
we want to make this a `enum` for a more future-proof API?
i.e., instead of:
```
bool ensure_alignment = false;
```
have:
```
enum Alignment {
kNoAlignment, kNaturalAlignment
};
Alignment ensure_ali
Hello all,
The Project Management Committee (PMC) for Apache Arrow has invited
Rok Mihevc to become a PMC member and we are pleased to announce that
Rok has accepted.
Regards
Antoine.
Le 27/03/2025 à 18:14, Raphael Taylor-Davies a écrit :
It's obviously preferrable to be zero-copy but it's certainly not
mandatory, especially as the data being shared is assumed to be
read-only in most use cases.
In which case we should probably remove the comment about alignment from
the C i
Hello,
Le 27/03/2025 à 17:53, Raphael Taylor-Davies a écrit :
The current ambiguity, however, makes it hard to set reasonable
defaults, as it isn't clear if FFI should be zero-copy and therefore
have alignment restrictions or not.
It's obviously preferrable to be zero-copy but it's certainl
Indeed, it doesn't sound like a terrific use of Arrow maintainer time...
Especially as there's a growing feeling that Flight was not very well
designed, and should perhaps be slowly obsoleted in favor of more
focussed initiative (such as the Arrow-over-HTTP effort that's still not
finished :-
Congratulations Jacob :)
Le 17/03/2025 à 18:28, Jacob Wujciak a écrit :
Thank you everyone!
Bryce Mecum schrieb am Mo., 17. März 2025, 17:25:
Congrats!
On Sun, Mar 16, 2025 at 10:23 PM Sutou Kouhei wrote:
The Project Management Committee (PMC) for Apache Arrow has invited
Jacob Wujciak
We can start with users@ and, if the experience is subpar, switch to
something else.
Regards
Antoine.
Le 16/03/2025 à 15:04, Weston Pace a écrit :
+1
A possible reason for hesitation is that it provides us yet another
stream that requires maintainer attention
I had been lukewarm on di
Hi JB,
This is a great idea, I like it.
+1 for doing this in Europe, also less risky these days given the
geopolitical context.
Half a day is probably too short given the breadth of topics. Though, of
course, the longer the more difficult to organize (and the more expensive).
Regards
An
I agree with Neal that the decoupling is less obviously desirable on the
R side. About the number of R-related CI jobs, is there still a need for
testing so many different configurations?
Le 03/03/2025 à 15:32, Neal Richardson a écrit :
Thanks for raising this, Kou. I'm personally torn on
Hi Nic,
Le 17/02/2025 à 12:18, Nic Crane a écrit :> It'd give us useful insights into
where we have gaps in our docs where we
can improve things, or what are common things that users struggle with.
Would Kapa provide us with stats about usage of their service?
What do folks think of the i
Hi Kou,
Le 17/02/2025 à 07:43, Sutou Kouhei a écrit :
Here are some ideas to improve our PR template:
1. Remove them entirely:
[...]
2. Keep minimal notes as a normal text not a comment
something like:
I think our template is useful (it forces us to better describe PRs), so
I'm in
+1. I don't think it makes sense to keep them since we are not able to
produce nightly builds anymore.
Regards
Antoine.
Le 07/02/2025 à 12:06, Raúl Cumplido a écrit :
Hi,
In the past we used to run nightly jobs for our conda recipes.
The CI jobs were turned off approximately 6 months ago
Hello,
The Project Management Committee (PMC) for Apache Arrow has invited
Bryce Mecum to become a PMC member and we are pleased to announce
that Bryce has accepted.
Congratulations and welcome!
Regards
Antoine.
Congratulations and welcome, Ed!
Le 29/01/2025 à 11:18, Andrew Lamb a écrit :
On behalf of the Arrow PMC, I'm happy to announce that Ed Seidl
has accepted an invitation to become a committer on Apache
Arrow. Welcome, and thank you for your contributions!
Andrew
Hi Neal,
I've tweaked the wording in the community health section a bit, please
review!
Regards
Antoine.
Le 06/01/2025 à 22:45, Neal Richardson a écrit :
Thanks Andrew, and thanks to everyone else who has added stuff. I went
through the dev mailing list to look for notable discussions/vo
Hi Bryce,
This sounds good to me.
Thanks
Antoine.
Le 18/12/2024 à 19:08, Bryce Mecum a écrit :
Hello all,
I'd like to propose a feature freeze date of Monday, January 6th, 2025
for the upcoming 19.0.0 release of Arrow. Please take a look through
the milestone [1] to ensure it includes the
Yes, exactly. There's actually a 0-bitwidth example for
DELTA_BINARY_PACKED in the spec (see "Example 1"):
https://github.com/apache/parquet-format/blob/master/Encodings.md#delta-encoding-delta_binary_packed--5
Regards
Antoine.
Le 19/12/2024 à 05:04, Micah Kornfield a écrit :
I seem to re
Hi,
I'm not a Rust user, but I would expect invalid input files to return
regular errors, not panic. Unlike API usage errors, invalid input files
are not a bug in the calling code.
This is also much nicer for bindings in high-level languages such as
Python.
Regards
Antoine.
On Tue, 17 Dec 2
My vote is on CMake 3.25.
Best regards
Antoine.
Le 09/12/2024 à 22:08, Sutou Kouhei a écrit :
Hi,
Currently, we require CMake 3.16 or later:
https://github.com/apache/arrow/blob/e0f8c5e8e6f8b328a997f7e21bc6fd1a01b3b3fd/cpp/CMakeLists.txt#L18
cmake_minimum_required(VERSION 3.16)
We want
I don't think a second implementation is strictly necessary because this
is just defining a schema and some conventions around it. Though of
course a second implementation is always better to have.
Regards
Antoine.
Le 05/12/2024 à 17:47, Matt Topol a écrit :
* I implemented this proposal
Hi,
While I'm generally in favor of accepting this soon, I'm -1 on accepting
it right now because it seems the PR hasn't had enough review attention
on it (I posted some comments).
A spec is an important document that will bind us for years, so let's
make sure we write something that will
Hi,
Le 02/12/2024 à 08:12, Jerry Adair a écrit :
Hi Weston,
Thank you for the reply.
IIRC, this is a limitation given to use by the AWS C++ SDK. See [1]. The AWS
C++ SDK has static state and they do not manage it with static local variables.
As
result, the initialization and finalizati
Welcome to the team Laurent!
Le 25/11/2024 à 10:39, Raúl Cumplido a écrit :
Thanks and welcome Laurent!
El lun, 25 nov 2024 a las 10:36, David Li () escribió:
On behalf of the Arrow PMC, I'm happy to announce that Laurent Goujon has
accepted an invitation to become a committer on Apache A
+1 (binding)
Le 22/11/2024 à 02:31, Sutou Kouhei a écrit :
Hi,
I would like to propose splitting Java release process.
Motivation:
* We want to reduce needless major releases because major
releases require users' change
* We want to reduce apache/arrow's release cost
Approach:
1. Extract
Hi Kou,
Thanks a lot for bringing this.
I'm +1 on the principle, both for splitting the Java release process and
moving the Java implementation into another repository.
We do need to find more maintainers for Arrow Java, but that is true
regardless of whether the Java implementation stays
+1, with the same comments as Felipe and Dewey.
Just at one condition from me: the API should be marked experimental.
Regards
Antoine.
Le 24/10/2024 à 23:17, Felipe Oliveira Carvalho a écrit :
+1 from me.
I reviewed the PR some time ago and it's not a trivial protocol, but the
complexity
I also agree that letting conda-forge carry the patch until 19.0.0 is a
reasonable solution. It's much more light-weight than having us issue a
new RC just for it, unfortunately.
Regards
Antoine.
Le 24/10/2024 à 17:07, Raúl Cumplido a écrit :
El jue, 24 oct 2024 a las 0:14, Sutou Kouhei
Welcome Rossi, and thanks a lot for all your contributions, past and future!
Le 22/10/2024 à 21:02, Weston Pace a écrit :
On behalf of the Arrow PMC, I'm happy to announce that Rossi Sun has
accepted an invitation to become a committer on Apache Arrow. Welcome,
and thank you for your contribut
I see that there's a European variant of that event which seems more
adapted for at least some of the Arrow development community:
https://eu.communityovercode.org/
Le 04/10/2024 à 10:50, Raúl Cumplido a écrit :
Hi Jarek,
It seems really interesting, I won't be able to attend. Do you know
Hello,
Long ago, we added a ARROW_USE_PRECOMPILED_HEADERS to the Arrow C++
CMake options in the hope of speeding up builds by reducing C++ header
parsing time.
However, we later started to use a concurrent (*) solution added in
CMake itself: CMAKE_UNITY_BUILD, which merges batches of sourc
Hello Will, and thanks a lot for your involvement!
Le 01/10/2024 à 18:55, Dewey Dunnington a écrit :
On behalf of the Arrow PMC, I'm happy to announce that Will Wyd has
accepted an invitation to become a committer on Apache Arrow. Welcome,
and thank you for your contributions!
-dewey
Hi Kou,
That sounds fine to me.
Regards
Antoine.
Le 01/10/2024 à 03:55, Sutou Kouhei a écrit :
Hi,
The current decimal implementation omits the fractional part
if the fractional part is 0. For example: "0.E+1" not "0.0E+1"
Most environments such as Python, Node.js, PostgreSQL and
MySQL a
*they receive
Le 30/09/2024 à 11:57, Antoine Pitrou a écrit :
There might be a misunderstanding, but this is a report for the Apache
Software Foundation (they recent reports from hundreds of projects).
It's not really useful to copy our release notes there.
Regards
Antoine.
Le
There might be a misunderstanding, but this is a report for the Apache
Software Foundation (they recent reports from hundreds of projects).
It's not really useful to copy our release notes there.
Regards
Antoine.
Le 30/09/2024 à 11:46, Vibhatha Abeykoon a écrit :
Hi Andy,
Thanks for sha
se's (Databend, Doris, Druid,
DeepLake, Firebolt, Lance, Oxla, Pinot, QuestDB, SingleStore, etc.) native
at-rest partition file formats.
On Fri, 13 Sept 2024 at 16:43, Antoine Pitrou wrote:
Hello,
I'm perplexed by this discussion. If you want to send highly-compressed
files over
Hello,
I'm perplexed by this discussion. If you want to send highly-compressed
files over the network that is already possible: just send Parquet
files via HTTP(S) (or another protocol of choice).
Arrow Flight is simply a *streaming* protocol that allows
sending/requesting the Arrow format over
Hi,
I sympathize with the security argument. If no other library allows for
embedding the Azure password directly in the URL, then I would be ok for
deprecating it.
Regards
Antoine.
Le 10/09/2024 à 03:24, Sutou Kouhei a écrit :
Hi,
The current Azure file system URI accepts account key
Hi,
I don't have a specific opinion on this, but as a data point, this
already happens from time to time (though rarely).
Regards
Antoine.
Le 11/09/2024 à 17:32, Joris Van den Bossche a écrit :
Hi all,
This is a discussion specifically for the GitHub development workflow
we use in the m
+1 (binding).
Can you open a PR with the spec updates?
Regards
Antoine.
Le 04/09/2024 à 23:17, Matt Topol a écrit :
Based on various discussions among the ecosystem and to continue expanding
the zero-copy interoperability for Arrow to be used with different
libraries and databases (such as
Is there a way to ensure this is done automatically?
Regards
Antoine.
On Wed, 28 Aug 2024 10:05:45 +0900 (JST)
Sutou Kouhei wrote:
> Hi,
>
> How about indenting preprocessor directives for readability?
>
> Issue: https://github.com/apache/arrow/issues/43796
> PR: https://github.com/apache
+1 (binding)
Le 26/08/2024 à 04:37, Sutou Kouhei a écrit :
Hi,
I would like to propose splitting Go release process.
Motivation:
* We want to reduce needless major releases because major
releases require users' change
Approach:
1. Extract go/ in apache/arrow to apache/arrow-go like
a
Le 22/08/2024 à 17:08, Curt Hagenlocher a écrit :
(I also happen to want a canonical Arrow representation for variant data,
as this type occurs in many databases but doesn't have a great
representation today in ADBC results. That's why I filed [Format] Consider
adding an official variant type
u, Aug 22, 2024 at 3:51 PM Antoine Pitrou wrote:
Hi Gang,
Sorry, but can you give a pointer to the start of this discussion thread
in a readable format (for example a mailing-list archive)? It appears
that dev@arrow wasn't cc'ed from the start and that can make it
difficult to unde
Hi Gang,
Sorry, but can you give a pointer to the start of this discussion thread
in a readable format (for example a mailing-list archive)? It appears
that dev@arrow wasn't cc'ed from the start and that can make it
difficult to understand what this is about.
Regards
Antoine.
Le 22/08/2
Binding +1 (but posted one minor comment on the format PR).
Thank you Joel!
Regards
Antoine.
Le 05/08/2024 à 14:59, Joel Lubinitsky a écrit :
Hello Devs,
I would like to propose a new canonical extension type: Bool8
The prior mailing list discussion thread can be found at [1].
The format
I don't have any concrete data to test this against, but using 64-bit
offsets sounds like an obvious improvement to me.
Regards
Antoine.
Le 01/08/2024 à 13:05, Ruoxi Sun a écrit :
Hello everyone,
We've identified an issue with Acero's hash join/aggregation, which is
currently limited to
Le 22/07/2024 à 21:25, Joel Lubinitsky a écrit :
If Canonical Extensions had existed at the time, I think there's a chance
we may have ended up with int32 Date as a first class type and int64
MillisecondDate as a Canonical Extension type.
Agreed.
Are there any lessons we've
learned from im
I can't
> find now that new types should be implemented as extension types if
> possible for these (and perhaps other) reasons.
>
>
> On Fri, Jul 19, 2024 at 5:39 AM Antoine Pitrou wrote:
> >
> >
> > Agreed with Felipe. This is meant for communicating with no
out any provisions on the
specification that might make this impossible.
-dewey
[1]
https://github.com/duckdb/duckdb/blob/85a82d86aa11a2695fc045deaf4f88fc63dd4fec/src/common/arrow/appender/bool_data.cpp#L28-L37
On Tue, Jul 16, 2024 at 11:25 AM Antoine Pitrou <
anto...@python.org>
Hi Kou,
Le 18/07/2024 à 11:33, Sutou Kouhei a écrit :
Here is my idea how to proceed this:
1. Extract go/ in apache/arrow to apache/arrow-go like
apache/arrow-rs
* Filter go/ related commits from apache/arrow and create
apache/arrow-go with them like we did for apache/arrow-rs
Hello,
Thanks all for this discussion. Given that there was no strong argument
against doing this, I decided to move forward and the change was made in
https://github.com/apache/arrow/pull/40875
Regards
Antoine.
On Wed, 5 Jun 2024 17:18:36 +0200
Antoine Pitrou wrote:
> Hello,
>
>
Hi Carl,
Le 08/07/2024 à 18:43, Carl Boettiger a écrit :
As an observer to both communities, I'm interested in if there is or might
be more communication between the Pangeo community's focus on Zarr
serialization with what the Arrow team has done with Parquet. I recognize
that these are diff
Hi Joel,
This looks good to me on the principle. Can you split the spec and the
implementation(s) into separate PRs?
Regards
Antoine.
Le 16/07/2024 à 13:18, Joel Lubinitsky a écrit :
Hi Arrow devs,
I'm working on adding an extension type for 8-bit booleans, and wanted to
start a discuss
[1]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-folders.html
# --
# Aldrin
https://github.com/drin/
https://gitlab.com/octalene
https://keybase.io/octalene
On Monday, July 15th, 2024 at 07:59, Antoine Pitrou wrote:
No, because these marke
No, because these markers also communicate the information to other
implementations of S3 abstractions.
An example of this is: https://docs.cyberduck.io/protocols/s3/#folders
Regards
Antoine.
Le 13/07/2024 à 07:15, Aldrin a écrit :
...then I still expect the directory /foo to exist
Rig
Hi,
Le 12/07/2024 à 12:21, Hyunseok Seo a écrit :
*### Why Maintain Empty Directory Markers?*
From what I understand, object stores like S3 do not have a concept of
directories. The motivation behind maintaining these markers could be to
manage the object store as if it were a traditional fi
Hmmm, I strive to understand why a `(int32, utf8)` tuple for statistic
keys would be any simpler to implement than either `int32` *or* `utf8`
*or* `dictionary(int32, utf8)`.
Let's keep in mind that we would like to keep things simple for
consumers and producers of statistics.
We should al
Is this UDF implementation based on DataFusion? If so, it makes sense
for it to be part of the DataFusion project.
OTOH, if it can work with any data in the Arrow format, then it would
sound weird to maintain it in the DataFusion repo IMHO.
Regards
Antoine.
Le 28/06/2024 à 21:52, Andrew
I'll note that PyArrow also allows defining user-defined functions and
they are vectorized (the function arguments can be PyArrow arrays or
scalars, depending on the context in which a function is being executed):
https://arrow.apache.org/docs/python/compute.html#user-defined-functions
My vo
Le 12/06/2024 à 04:45, Sutou Kouhei a écrit :
It seems that we need to disable MI_OVERRIDE explicitly to
not define malloc() in libmimalloc.so:
https://github.com/microsoft/mimalloc/blob/03020fbf81541651e24289d2f7033a772a50f480/CMakeLists.txt#L10
Yes, that's what we do when building the bund
Sorry, I had forgotten to comment on this. I think this is generally a
good idea, but it would obviously need more eyes on it :-)
Can other people go and take a look at David's PR below?
Le 25/05/2024 à 04:47, David Li a écrit :
I've put up a draft PR here: https://github.com/apache/arrow/
Le 11/06/2024 à 10:35, Sutou Kouhei a écrit :
Hi,
In <2a32f61c-dd22-4f3f-bc98-822dcb6b0...@python.org>
"Re: [Discuss][C++] Switch to mimalloc by default?" on Tue, 11 Jun 2024
10:21:12 +0200,
Antoine Pitrou wrote:
I was thinking about find_package(). Good to know
Le 11/06/2024 à 10:01, Sutou Kouhei a écrit :
2. Is it OK that we add support for system mimalloc?
Hmm... that sounds legitimate, but with the caveat that a system
mimalloc can override the standard malloc/free functions. Would that
affect an application using Arrow C++?
Are you saying th
Hi Kou,
Le 09/06/2024 à 09:16, Sutou Kouhei a écrit :
Questions:
1. Do we need to keep jemalloc support? Compatibility? Can we
drop support for jemalloc to decrease maintenance cost?
I'm not sure there's much maintenance cost. I expect some people might
prefer jemalloc, and perhaps it
Le 09/06/2024 à 08:33, Sutou Kouhei a écrit :
Fields:
| Name | Type | Comments |
||---| |
| column | utf8 | (2) |
| key| utf8 not null | (3) |
1. Should the key be
Le 09/06/2024 à 09:01, Sutou Kouhei a écrit :
Hi,
One thing that a plain integer makes more difficult is representing
non-standard statistics. For example some engine might want to expose
elaborate quantile-based statistics even if it not officially defined
here. With a `utf8` or `dictionary(
Le 07/06/2024 à 18:30, Felipe Oliveira Carvalho a écrit :
On Fri, Jun 7, 2024 at 6:24 AM Antoine Pitrou wrote:
Le 07/06/2024 à 04:27, Felipe Oliveira Carvalho a écrit :
I've been thinking about how to encode statistics on Arrow arrays and
how to keep the set of statistics known by
Le 07/06/2024 à 04:27, Felipe Oliveira Carvalho a écrit :
I've been thinking about how to encode statistics on Arrow arrays and
how to keep the set of statistics known by both producers and
consumers (i.e. standardized).
The statistics array(s) could be a
map<
// the column index or n
Hi Kou,
Thanks for pushing for this!
Le 06/06/2024 à 11:27, Sutou Kouhei a écrit :
4. Standardize Apache Arrow schema for statistics and
transmit statistics via separated API call that uses the
C data interface
[...]
I think that 4. is the best approach in these candidates.
I agr
Hello,
Arrow C++ features a MemoryPool abstraction that allows using different
allocators interchangeably. Several MemoryPool implementations are
provided with Arrow C++ (though one can also build their own):
- a jemalloc-based implementation, currently the default on Linux
- a mimalloc-bas
(Gang Wu, Antoine Pitrou, Wes McKinney)
9x +1 non-binding (Micah Kornfield, Felipe Oliveira Carvalho, Fokko
Driesprong, Alenka Frim, Andy Grove, Raúl Cumplido, Sutou Kouhei, Jiashen
Zhang, Rok Mihevc)
Arrow:
6x +1 binding (Micah Kornfield, Antoine Pitrou, Andy Grove, Raúl Cumplido,
Wes McKinney
Hi Li!
Sorry for the delay.
It seems the problem lies here:
https://github.com/apache/arrow/blob/9f5899019d23b2b1eae2fedb9f6be8827885d843/cpp/src/arrow/filesystem/s3fs.cc#L1858
The Future is marked finished with the ObjectOutputStream's mutex taken,
and the Future's callback then triggers a c
+1 (binding).
Thanks for taking this up, Rok!
Regards
Antoine.
Le 29/05/2024 à 16:14, Rok Mihevc a écrit :
# sending this to both dev@arrow and dev@parquet
Hi all,
Following the ML discussion [1] I would like to propose a vote for
parquet-cpp issues to be moved from Parquet Jira [2] to Arr
Is it somehow possible to be a "member" of this account to indicate that
we have PMC status, or is that not possible within the LinkedIn
membership/permissions model?
Le 24/05/2024 à 18:04, Ian Cook a écrit :
Following the discussion [1] earlier this year about the status of the
Apache Ar
> 2. We'll provide pre-defined keys such as "max", "min",
> > >"byte_width" and "distinct_count" but users can also use
> > >application specific keys.
> > > 3. If true, then the value is approximate or best-effort.
Le 23/05/2024 à 16:09, Felipe Oliveira Carvalho a écrit :
Protocols that produce/consume statistics might want to use the C Data
Interface as a primitive for passing Arrow arrays of statistics.
This is also my opinion.
I think what we are slowly converging on is the need for a spec to
desc
Hi Kou,
I agree that Dewey that this is overstretching the capabilities of the C
Data Interface. In particular, stuffing a pointer as metadata value and
decreeing it immortal doesn't sound like a good design decision.
Why not simply pass the statistics ArrowArray separately in your
produce
I think these flags should be advisory and consumers should be free to
ignore them. However, some consumers apparently would benefit from them
to more faithfully represent the producer's intention.
For example, in Arrow C++, we could perhaps have a ImportDatum function
whose actual return t
+1 (binding)
Le 19/04/2024 à 22:22, Rok Mihevc a écrit :
Hi all,
Following initial requests [1][2] and recent tangential ML discussion [3] I
would like to propose a vote to add language for UUID canonical extension
type to CanonicalExtensions.rst as in PR [4] and written below.
A draft C++ and
+1 (binding) for the current proposal, i.e. with the RFC 8289
requirement and the 3 current String types allowed.
Regards
Antoine.
Le 30/04/2024 à 19:26, Rok Mihevc a écrit :
Hi all, thanks for the votes and comments so far.
I've amended [1] the proposed language with the RFC-8259 requiremen
mes, and so we could use this in that context).
I think that I would still prefer a canonical extension type (with storage
type null) over a new dedicated type.
On Wed, Apr 17, 2024 at 5:39 AM Antoine Pitrou wrote:
Ah! Well, I think this could be an interesting proposal, but someone
should
Ah! Well, I think this could be an interesting proposal, but someone
should put a more formal proposal, perhaps as a draft PR.
Regards
Antoine.
Le 17/04/2024 à 11:57, David Li a écrit :
For an unsupported/other extension type.
On Wed, Apr 17, 2024, at 18:32, Antoine Pitrou wrote:
What
Out of curiosity, did you notice this by chance or do you have some kind
of script that processes ASF mailing-list archives for possible voting
irregularities?
Regards
Antoine.
Le 17/04/2024 à 10:44, Christofer Dutz a écrit :
When looking at whimsy, I can’t see any person named Sutou Kou
eation of one-off nominal types for
very specific use-cases?
—
Felipe
On Thu, 11 Apr 2024 at 05:06 Antoine Pitrou wrote:
Yes, JSON and UUID are obvious candidates for new canonical extension
types. XML also comes to mind, but I'm not sure there's much of a use
case for it.
Regards
A
:06 Antoine Pitrou wrote:
Yes, JSON and UUID are obvious candidates for new canonical extension
types. XML also comes to mind, but I'm not sure there's much of a use
case for it.
Regards
Antoine.
Le 10/04/2024 à 22:55, Wes McKinney a écrit :
In the past we have discussed adding a
Yes, JSON and UUID are obvious candidates for new canonical extension
types. XML also comes to mind, but I'm not sure there's much of a use
case for it.
Regards
Antoine.
Le 10/04/2024 à 22:55, Wes McKinney a écrit :
In the past we have discussed adding a canonical type for UUID and JSON.
Hello John,
Arrow IPC files can be backed quite naturally by shared memory, simply
by memory-mapping them for reading. So if you have some pieces of shared
memory containing Arrow IPC files, and they are reachable using a
filesystem mount point, you're pretty much done.
You can see an exam
It seems that perhaps this discussion should be rebooted for each
individual component, one at a time?
Let's start with something simple and obvious, with some frequent
contribution activity, such as perhaps Go?
Le 09/04/2024 à 14:27, Joris Van den Bossche a écrit :
I am also in favor o
1 - 100 of 1047 matches
Mail list logo