On Tue, Oct 8, 2019 at 3:34 PM Wes McKinney wrote:
>
> hi Jacques,
>
> On Tue, Oct 8, 2019 at 1:54 PM Jacques Nadeau wrote:
> >
> > I removing all my objections to this work.
> >
> > I wish there was more feedback from additional community members. I
> > continue to be concerned about fragmentat
hi Jacques,
On Tue, Oct 8, 2019 at 1:54 PM Jacques Nadeau wrote:
>
> I removing all my objections to this work.
>
> I wish there was more feedback from additional community members. I continue
> to be concerned about fragmentation. I don't agree with the arguments here
> that we need to add a n
I'm not sure whether flatbuffers is actually an issue in the end but keeping it
out of the C-API definitely simplifies it a bit adoption-wise. I don't think
that though that using protobuf would make a difference here.
In general, I really like the C-interface work as sadly C-APIs are still the
I removing all my objections to this work.
I wish there was more feedback from additional community members. I
continue to be concerned about fragmentation. I don't agree with the
arguments here that we need to add a new api to make it easy for people to
*not* use Arrow codebase. It seems like a p
Hi Wes,
I agree for third-parties "A" (Field data structures) is the most useful.
At least in my mind the discussion was for both first and third-parties. I
was trying to point out that "A" is less necessary as a first step for
first-party integrations and could potentially require more effort if
On Wed, Oct 2, 2019 at 11:05 PM Micah Kornfield wrote:
>
> I've tried to summarize my understanding of the debate so far and give some
> initial thoughts. I think there are two potentially different sets of users
> that we are targeting with a stable C API/ABI ourselves and external
> parties.
>
>
I've tried to summarize my understanding of the debate so far and give some
initial thoughts. I think there are two potentially different sets of users
that we are targeting with a stable C API/ABI ourselves and external
parties.
1. Different language implementations within the Arrow project that
On Wed, Oct 2, 2019 at 10:19 PM Wes McKinney wrote:
>
> On Wed, Oct 2, 2019 at 7:46 PM Jacques Nadeau wrote:
> >
> > I'd like to hear more opinions from others on this topic. This conversation
> > seems mostly dominated by comments from myself, Wes and Antoine.
> >
> > I think it is reasonable to
On Wed, Oct 2, 2019 at 7:46 PM Jacques Nadeau wrote:
>
> I'd like to hear more opinions from others on this topic. This conversation
> seems mostly dominated by comments from myself, Wes and Antoine.
>
> I think it is reasonable to argue that keeping any ABI (or header/struct
> pattern) as narrow
I'd like to hear more opinions from others on this topic. This conversation
seems mostly dominated by comments from myself, Wes and Antoine.
I think it is reasonable to argue that keeping any ABI (or header/struct
pattern) as narrow as possible would allow us to minimize overlap with the
existing
I had an e-mail editing snafu so you can ignore the bottom "inline"
portion since it's just a restatement of what is written more clearly
above
On Tue, Oct 1, 2019 at 9:32 PM Wes McKinney wrote:
>
> hi Jacques,
>
> I think we've veered off course a bit and maybe we could reframe the
> discussion
hi Jacques,
I think we've veered off course a bit and maybe we could reframe the discussion.
Goals
* A "drop-in" header-only C file that projects can use as a
programming interface either internally only or to expose in-memory
data structures between C functions at call sites. Ideally little to
n
On Tue, Oct 1, 2019 at 3:22 PM Jed Brown wrote:
>
> I'd just like to chime in with the use case of in-situ data analysis for
> simulations. This domain tends to be cautious with dependencies and
> there is a lot of C and Fortran, but the in-situ analysis tools will
> preferably reside in separate
As currently designed, it's entirely in-process. Shared memory with
buffer lifetime handling is taking care of by something like Plasma.
Regards
Antoine.
Le 01/10/2019 à 22:22, Jed Brown a écrit :
> I'd just like to chime in with the use case of in-situ data analysis for
> simulations. This
I'd just like to chime in with the use case of in-situ data analysis for
simulations. This domain tends to be cautious with dependencies and
there is a lot of C and Fortran, but the in-situ analysis tools will
preferably reside in separate processes while sharing memory via shared
memory (/dev/shm
I disagree with this statement:
- the IPC format is meant for serialization while the C data protocol is
meants for in-memory communication, so different concerns apply
If that is how the a particular implementation presents it, that is a
weaknesses of the implementation, not the format. The prim
hi Antoine,
On Tue, Oct 1, 2019 at 4:29 AM Antoine Pitrou wrote:
>
>
> Le 01/10/2019 à 00:39, Wes McKinney a écrit :
> > A couple things:
> >
> > * I think a C protocol / FFI for Arrow array/vectors would be better
> > to have the same "shape" as an assembled array. Note that the C
> > structs he
Le 01/10/2019 à 00:39, Wes McKinney a écrit :
> A couple things:
>
> * I think a C protocol / FFI for Arrow array/vectors would be better
> to have the same "shape" as an assembled array. Note that the C
> structs here have very nearly the same "shape" as the data structure
> representing a C++
A couple things:
* I think a C protocol / FFI for Arrow array/vectors would be better
to have the same "shape" as an assembled array. Note that the C
structs here have very nearly the same "shape" as the data structure
representing a C++ Array object [1]. The disassembly and reassembly
here is sub
FlatCC is still a dependency, with generated files etc.
Perhaps you want to evaluate FlatCC on a schema-like example and see
what the generated code and compile instructions look like?
I'll point out again that the format string in my proposal uses an
extremely simple mini-format, that should be
FlatCC seems germane: https://github.com/dvidelabs/flatcc
It compiles flatbuffer schemas down to (idiomatic?) C
Perhaps the schema and batch serialization problems should be solved by
storing everything in the flatbuffer format.
Then the results of running flatcc plus a few simple helpers can be
One basic design point is to allow exchanging Arrow data with no
mandatory dependency (the exception is JSON and base64 if you want to
act on metadata - but that's highly optional, and those are extremely
widespread formats). I'm afraid that Flatbuffers may be a deterrent:
not only it introduces
Le 29/09/2019 à 19:59, Jacques Nadeau a écrit :
>
> It seems like you're saying: "flatbuffers is too complex an encoding, let's
> create a new encoding".
Most of the spec is a plain C-level struct in the native ABI, so it
avoids any kind of encoding issue. And, yes, flatbuffers must be dealt
w
There are two pieces of serialized data needed to communicate a record
batch from one library to another
* Serialized schema (i.e. what's in Schema.fbs)
* Serialized "data header", i.e. the "RecordBatch" message in Message.fbs
You _do_ need to use a Flatbuffers library to fully create these
messa
On Sun, Sep 29, 2019 at 12:59 AM Antoine Pitrou wrote:
>
> Le 29/09/2019 à 06:10, Jacques Nadeau a écrit :
> > * No dependency on Flatbuffers.
> > * No buffer reassembly (data is already exposed in logical Arrow format).
> > * Zero-copy by design.
> > * Easy to reimplement from scratch.
> >
> > I
Le 29/09/2019 à 06:10, Jacques Nadeau a écrit :
> * No dependency on Flatbuffers.
> * No buffer reassembly (data is already exposed in logical Arrow format).
> * Zero-copy by design.
> * Easy to reimplement from scratch.
>
> I don't see how the flatbuffer pattern for data headers doesn't accompl
* No dependency on Flatbuffers.
* No buffer reassembly (data is already exposed in logical Arrow format).
* Zero-copy by design.
* Easy to reimplement from scratch.
I don't see how the flatbuffer pattern for data headers doesn't accomplish
all of these things. At its definition, is a very simple r
I'm not clear on why we need to introduce something beyond what flatbuffers
already provides. Can someone explain that to me? I'm not really a fan of
introducing a second representation of the same data (as I understand it).
On Thu, Sep 19, 2019 at 1:15 PM Wes McKinney wrote:
> This is helpful,
This is helpful, I will leave some comments on the proposal when I
can, sometime in the next week.
I agree that it would likely be opening a can of worms to create a
semantic mapping between a generalized type grammar and Arrow's
specific logical types defined in Schema.fbs. If we go down this
rou
I've posted a draft specification PR here, this should help orient the
discussion a bit:
https://github.com/apache/arrow/pull/5442
Regards
Antoine.
On Wed, 18 Sep 2019 19:52:38 +0200
Antoine Pitrou wrote:
> Hello,
>
> One thing that was discussed in the sync call is the ability to easily
>
I suppose it could be possible for an Arrow array to describe itself
using the ndtypes vocabulary at some point. However, this is
non-trivial, both on the producer and consumer side. Moreover, both
sides must ensure they use the same ndtypes description.
The idea here is to have a C data proto
I know some on this list are familiar, but many may not have seen ndtypes
in xnd: https://github.com/xnd-project/ndtypes
It generalizes PEP 3118 for cross-language data-structure handling.
Either a dependency on the small C-library libndtypes or using the concepts
could be done.
-Travis
On We
On Thu, Sep 19, 2019 at 10:56 Antoine Pitrou wrote:
>
> Le 19/09/2019 à 19:52, Zhuo Peng a écrit :
> >
> > The problems are only potential and theoretical, and won't bite anyone
> > until it occurs though, and it's more likely to happen with pip/wheel
> than
> > with conda.
> >
> > But anyways, t
Le 19/09/2019 à 19:52, Zhuo Peng a écrit :
>
> The problems are only potential and theoretical, and won't bite anyone
> until it occurs though, and it's more likely to happen with pip/wheel than
> with conda.
>
> But anyways, this idea is still nice. I could imagine at least in arrow's
> Python
On Thu, Sep 19, 2019 at 10:18 AM Antoine Pitrou wrote:
>
> No, the plan for this proposal is to avoid providing a C API. Each
> Arrow implementation could produce and consume the C data protocol, for
> example the C++ Array class could add these methods:
>
> class Array {
> // ...
>
> public:
Le 19/09/2019 à 19:11, Uwe L. Korn a écrit :
> Hello,
>
> I like this proposal as it will make interfacing inside a process between
> various Arrow supports much easier. I'm a bit critical though of using a
> string as the format representation as one needs to parse it correctly.
> Couldn't w
No, the plan for this proposal is to avoid providing a C API. Each
Arrow implementation could produce and consume the C data protocol, for
example the C++ Array class could add these methods:
class Array {
// ...
public:
// Export array to the C data protocol
void Share(ArrowArray* out)
Hello,
I like this proposal as it will make interfacing inside a process between
various Arrow supports much easier. I'm a bit critical though of using a string
as the format representation as one needs to parse it correctly. Couldn't we
use the enums we already have and reimplement them as C-d
Hi Antoine,
I'm also interested in a stable ABI (previously I posted on this mailing
list about the ABI issues I had [1]). Does having such an ABI-stable
C-struct imply that there will be a set of C-APIs exposed by the Arrow
(C++) library (which I think would lead to a solution to all the inherit
Le 19/09/2019 à 09:39, Micah Kornfield a écrit :
> I like the idea of a stable ABI for in-processing that can be used for in
> process communication. For instance, there was a recent question on
> stack-overflow on how to solve this [1].
>
> A couple of thoughts/questions:
> * Would ArrowArray
I like the idea of a stable ABI for in-processing that can be used for in
process communication. For instance, there was a recent question on
stack-overflow on how to solve this [1].
A couple of thoughts/questions:
* Would ArrowArray also need a self reference for children arrays?
* Should trans
41 matches
Mail list logo