Re: [DISCUSS] Splitting out the Arrow format directory

Phillip Cloud Fri, 13 Aug 2021 09:00:52 -0700

On Fri, Aug 13, 2021 at 11:43 AM Antoine Pitrou <anto...@python.org> wrote:


>
> Le 13/08/2021 à 17:35, Phillip Cloud a écrit :
> >
> >> I.e. make the ability to read and write by humans be more important than
> >> speed of validation.
> >
> > I think I differ on whether the IR should be easy to read and write by
> > humans.
> > IR is going to be predominantly read and written by machines, though of
> > course
> > we will need a way to inspect it for debugging.
>
> But the code executed by machines is written by humans.  I think that's
> mostly where the contention resides: is it easy to code, in any given
> language, the routines required to produce or consume the IR?
>

Definitely not for flatbuffers, since flatbuffers is IMO annoying to use in
any language except C++,
and it's borderline annoying there too. Protobuf is similar (less annoying
in Rust,
but still annoying in Python and C++ IMO), though I think any binary format
is going to be
less human-friendly, by construction.

If we were to use something like JSON or msgpack, can someone sketch out
the interaction
between the IR and the rest of arrow's type system?

Would we need a JSON-encoded-arrow-type -> in-memory representation for an
Arrow type in a given language?

I just thought of one other requirement: the format needs to support
arbitrary byte sequences. JSON
doesn't support untransformed byte sequences, though it's not uncommon to
base64-encode a byte sequence.
IMO that adds an unnecessary layer of complexity, which is another tradeoff
to consider.

Re: [DISCUSS] Splitting out the Arrow format directory

Reply via email to