>
> It would be reasonable to restrict JSON to utf8, and tell people they
> need to transcode in the rare cases where some obnoxious software
> outputs utf16-encoded JSON.

+1 I think this aligns with the latest JSON RFC [1] as well.

Sounds good to me too. +1 on the canonical extension type option; maybe it
> should end up as a first-class type, but I'd like to see us try it without
> first and see what that tells us about the path for having an extension
> type get promoted to being a first-class type. This is something that has
> been discussed in principle before, but I don't know we've worked out what
> it would look like in practice.

>From prior discussions, we agreed that it made sense to approach JSON as an
extension type [2].  As noted previously on the thread, I don't think this
precludes having API's in C++/Python that make the type look the same as a
natively supported type, but there might be constraints we uncover as we
move forward with implementation.  I don't think we reached an exact
conclusion on canonical extension types but [3] was the last conversation.
I think the main question is if there are maintainers for other languages
that want to add the extension type, I can probably find some time for Java.


[1] https://datatracker.ietf.org/doc/html/rfc8259#section-8.1
[2] https://lists.apache.org/thread/3nls3222ggnxlrp0s46rxrcmgbyhgn8t (sorry
I still need to document the outcome of this discussion).
[3] https://lists.apache.org/thread/bd0ttt725jqn5ylsp8v006rpfymow3mn

On Sat, Jul 30, 2022 at 12:14 PM Antoine Pitrou <anto...@python.org> wrote:

>
> Le 30/07/2022 à 01:02, Wes McKinney a écrit :
> > I think either path:
> >
> > * Canonical extension type
> > * First-class type in the Type union in Flatbuffers
> >
> > would be OK. The canonical extension type option is the preferable
> > path here, I think, because it allows Arrow implementations without
> > any special handling for JSON to allow the data to pass through as
> > Binary or String. Implementations like C++ could see the extension
> > type metadata and construct an instance of arrow::Type::JSON /
> > JsonArray, etc., but when it gets serialized back to Parquet or Arrow
> > IPC it looks like binary/string (since JSON can be utf-16/utf-32,
> > right?) with additional field metadata.
>
> It would be reasonable to restrict JSON to utf8, and tell people they
> need to transcode in the rare cases where some obnoxious software
> outputs utf16-encoded JSON.
>
> And I agree a canonical extension type would be massively more useful
> for JSON than for UUID (which basically doesn't make sense: a UUID is an
> opaque binary string for all practical purposes).
>
> Regards
>
> Antoine.
>

Reply via email to