Agreed. I hope that I didn't come off as flippant with respect to
performance.
I was hoping to convey that I think focusing on performance before we have
the semantics and high level design nailed down is not time well spent.
I think the current design doesn't depend on the format,
which is a goo
> Personally, I do not care about the speed of IR processing right now.
> Any non-trivial (and probably trivial too) computation done
> by an IR consumer will dwarf the cost of IR processing. Of course,
> we shouldn't prematurely pessimize either, but there's no reason
> to spend time worrying abou
I believe you would need a JSON compatible version of the type system
(including binary values) because you'd need to at least encode
literals. However, I don't think that creating a human readable
encoding of the Arrow type system is a bad thing in and of itself. We
have tickets and get question
>
> I just thought of one other requirement: the format needs to support
> arbitrary byte sequences.
>
Can you clarify why this is needed? Is it that custom_metadata maps should
allow byte sequences as values?
On Fri, Aug 13, 2021 at 10:00 AM Phillip Cloud wrote:
> On Fri, Aug 13, 2021 at 11:43
On Fri, Aug 13, 2021 at 11:43 AM Antoine Pitrou wrote:
>
> Le 13/08/2021 à 17:35, Phillip Cloud a écrit :
> >
> >> I.e. make the ability to read and write by humans be more important than
> >> speed of validation.
> >
> > I think I differ on whether the IR should be easy to read and write by
> >
Le 13/08/2021 à 17:35, Phillip Cloud a écrit :
I.e. make the ability to read and write by humans be more important than
speed of validation.
I think I differ on whether the IR should be easy to read and write by
humans.
IR is going to be predominantly read and written by machines, though of
On Fri, Aug 13, 2021 at 8:03 AM Jorge Cardoso Leitão <
jorgecarlei...@gmail.com> wrote:
> Hi,
>
> The requirements for the compute IR as I see it are:
> >
> > * Implementations in IR producer and consumer languages.
> > * Strongly typed or the ability to easily validate a payload
> >
>
> What abou
On Fri, Aug 13, 2021 at 2:03 PM Jorge Cardoso Leitão <
jorgecarlei...@gmail.com> wrote:
> Hi,
>
> The requirements for the compute IR as I see it are:
> >
> > * Implementations in IR producer and consumer languages.
> > * Strongly typed or the ability to easily validate a payload
> >
>
> What abou
Hi,
The requirements for the compute IR as I see it are:
>
> * Implementations in IR producer and consumer languages.
> * Strongly typed or the ability to easily validate a payload
>
What about:
1. easy to read and write by a large number of programming languages
2. easy to read and write by hum
On Thu, Aug 12, 2021 at 1:03 PM Jorge Cardoso Leitão <
jorgecarlei...@gmail.com> wrote:
> I agree with Antoine that we should weigh the pros and cons of flatbuffers
> (or protobuf or thrift for that matter) over a more human-friendly,
> simpler, format like json or MsgPack. I also struggle a bit t
I agree with Antoine that we should weigh the pros and cons of flatbuffers
(or protobuf or thrift for that matter) over a more human-friendly,
simpler, format like json or MsgPack. I also struggle a bit to reason with
the complexity of using flatbuffers for this.
E.g. there is no async support for
Le 12/08/2021 à 15:05, Wes McKinney a écrit :
It seems that one adjacent problem here is how to make it simpler for
third parties (especially ones that act as front end interfaces) to
build and serialize/deserialize the IR structures with some kind of
ready-to-go middleware library, written in
On Thu, Aug 12, 2021 at 3:16 PM Neal Richardson
wrote:
>
> > Maintain this "Arrow types and ComputeIR library" as an always
> zero-dependency library to facilitate vendoring
>
> Would/should this hypothetical zero-dep, vendorable library also include
> the IPC format? Or if you want to interact wi
> Maintain this "Arrow types and ComputeIR library" as an always
zero-dependency library to facilitate vendoring
Would/should this hypothetical zero-dep, vendorable library also include
the IPC format? Or if you want to interact with IPC in that case, the C
data interface is the best/only option?
On Thu, Aug 12, 2021 at 9:06 AM Wes McKinney wrote:
> It seems that one adjacent problem here is how to make it simpler for
> third parties (especially ones that act as front end interfaces) to
> build and serialize/deserialize the IR structures with some kind of
> ready-to-go middleware library,
It seems that one adjacent problem here is how to make it simpler for
third parties (especially ones that act as front end interfaces) to
build and serialize/deserialize the IR structures with some kind of
ready-to-go middleware library, written in a language like C++.
To do that, one would need t
I support the idea of an independent repo that has the arrow flatbuffers
format definition files.
My rationale is that the Rust implementation has a copy of the `format`
directory [1] and potential drift worries me (a bit). Having a single
source of truth for the format that is not part of the lar
On Wed, Aug 11, 2021, 19:05 Weston Pace wrote:
> >> The benefit is that IR components don't interact much with
> `flatbuffers` or
> >> `flatc` directly.
> >>
> [...]
> >>
> >> One counter-proposal might be to just put the compute IR IDL in a
> separate
> >> repo,
> >> but that isn't tenable becau
>> The benefit is that IR components don't interact much with `flatbuffers` or
>> `flatc` directly.
>>
[...]
>>
>> One counter-proposal might be to just put the compute IR IDL in a separate
>> repo,
>> but that isn't tenable because the compute IR needs arrow's type information
>> contained in `Sch
Le 11/08/2021 à 23:06, Phillip Cloud a écrit :
On Wed, Aug 11, 2021 at 4:22 PM Antoine Pitrou wrote:
Le 11/08/2021 à 22:16, Phillip Cloud a écrit :
Yeah, that is a drawback here, though I don't see needing to run flatc
as a
major downside given the upside
of not having to write additiona
On Wed, Aug 11, 2021 at 4:21 PM David Li wrote:
> If the worry is public distribution (i.e. requiring all downstream
> projects to also run flatc in their builds) we could perhaps ship a package
> that just consists of the generated code (though that's definitely more
> packaging burden, and won'
On Wed, Aug 11, 2021 at 4:22 PM Antoine Pitrou wrote:
>
> Le 11/08/2021 à 22:16, Phillip Cloud a écrit :
> >
> > Yeah, that is a drawback here, though I don't see needing to run flatc
> as a
> > major downside given the upside
> > of not having to write additional code to move between formats.
>
Le 11/08/2021 à 22:20, David Li a écrit :
If the worry is public distribution (i.e. requiring all downstream projects to
also run flatc in their builds) we could perhaps ship a package that just
consists of the generated code (though that's definitely more packaging burden,
and won't help wh
Le 11/08/2021 à 22:16, Phillip Cloud a écrit :
Yeah, that is a drawback here, though I don't see needing to run flatc as a
major downside given the upside
of not having to write additional code to move between formats.
That's only an advantage if you already know how to read the Arrow IPC
f
If the worry is public distribution (i.e. requiring all downstream projects to
also run flatc in their builds) we could perhaps ship a package that just
consists of the generated code (though that's definitely more packaging burden,
and won't help when you're doing development against in-progres
On Wed, Aug 11, 2021 at 4:05 PM Antoine Pitrou wrote:
>
> Le 11/08/2021 à 22:02, Phillip Cloud a écrit :
> > On Wed, Aug 11, 2021 at 3:58 PM Antoine Pitrou
> wrote:
> >
> >>
> >> Le 11/08/2021 à 21:56, Phillip Cloud a écrit :
> >>> I can see how that might be a bit circular. Let me start from th
Le 11/08/2021 à 22:02, Phillip Cloud a écrit :
On Wed, Aug 11, 2021 at 3:58 PM Antoine Pitrou wrote:
Le 11/08/2021 à 21:56, Phillip Cloud a écrit :
I can see how that might be a bit circular. Let me start from the
perspective of requirements. We want to be able to reuse the arrow's
types
On Wed, Aug 11, 2021 at 3:58 PM Antoine Pitrou wrote:
>
> Le 11/08/2021 à 21:56, Phillip Cloud a écrit :
> > I can see how that might be a bit circular. Let me start from the
> > perspective of requirements. We want to be able to reuse the arrow's
> types
> > and schema, without having to write a
Le 11/08/2021 à 21:56, Phillip Cloud a écrit :
I can see how that might be a bit circular. Let me start from the
perspective of requirements. We want to be able to reuse the arrow's types
and schema, without having to write additional code to move back and forth
between compute IR and not-compu
I can see how that might be a bit circular. Let me start from the
perspective of requirements. We want to be able to reuse the arrow's types
and schema, without having to write additional code to move back and forth
between compute IR and not-compute-IR. I think that leaves only flatbuffers
as an o
On Wed, Aug 11, 2021 at 3:51 PM Antoine Pitrou wrote:
>
>
> Le 11/08/2021 à 21:39, Phillip Cloud a écrit :
> > The benefit is that IR components don't interact much with `flatbuffers`
> or
> > `flatc` directly.
> >
> [...]
> >
> > One counter-proposal might be to just put the compute IR IDL in a
Le 11/08/2021 à 21:39, Phillip Cloud a écrit :
The benefit is that IR components don't interact much with `flatbuffers` or
`flatc` directly.
[...]
One counter-proposal might be to just put the compute IR IDL in a separate
repo,
but that isn't tenable because the compute IR needs arrow's ty
Hi all,
I'd like to bring up an idea from a recent thread ([1]) about moving the
`format/` directory out of the primary apache/arrow repository.
I understand from that thread there are some concerns about using
submodules,
and I definitely sympathize with those concerns.
In talking with David Li
33 matches
Mail list logo