I think I'm +0 but lean slightly towards JSON.

In favor of binary I would guess that most extension types are going
to have relatively simple parameterization (to the point that
protobuf/flatbuffers isn't really needed).  For example, the substrate
consumer PR has five extension types at the moment (e.g. uuid,
varchar) and only two of them are parameterized and each of these by a
single int32_t.  It might be interesting to see what kinds of
extension types the geospatial community uses.

That being said, this sort of parsing isn't really on any kind of
critical path.  It's very likely that users (not Arrow developers)
will be creating and working with extension types.  These users are
likely going to default to JSON (or pickle or XML).  If our "well
known types" use JSON then it will be more easily recognizable to
users what is going on.

-Weston

On Tue, Feb 8, 2022 at 8:14 AM Joris Van den Bossche
<jorisvandenboss...@gmail.com> wrote:
>
> On Tue, 8 Feb 2022 at 17:37, Jorge Cardoso Leitão <jorgecarlei...@gmail.com>
> wrote:
>
> > ...
> >
> > Wrt to binary, imo the challenge is:
> > * we state that backward incompatible changes to the c data interface
> > require a new spec [1]
> >
>
> Note that this discussion wouldn't change anything about the C Data
> Interface spec itself. The discussion is only about the *value* that is put
> in one of the key-value metadata fields. The C Data Interface spec defines
> how the metadata needs to be stored, but doesn't specify anything about the
> actual value of one of the key-value metadata fields.
>
>
> > * we state that the metadata is a binary string [2]
> > * a valid string is a subset of all valid byte arrays and thus removing "
> > *string*" from the spec is backward incompatible
> >
> > If we write invalid utf8 to it and a reader assumes utf8 when reading it,
> > we trigger undefined behavior.
> >
> > I was a bit surprised by ARROW-15613 - my understanding is that the c++
> > implementation is not following the spec, and if we at arrow2 were not be
> > checking for utf8, we would be exposing a vulnerability (at least according
> > to Rust's standards). We just checked it out of luck (it is O(1), so why
> > not).
> >
>
> Yes, the C++ implementation is indeed not following the spec. See the
> "[DISCUSS] Binary Values in Key value pairs" thread (
> https://lists.apache.org/thread/blmj0cgv34dgdxqd3ow60ln68khnz0qr). Let's
> maybe keep this part of the discussion there?

Reply via email to