>
> > 2. What do we do about different non-utf8 encodings? There does not
> appear
> > to be a consensus yet on this point. One option is to only allow utf8
> > encoding and force implementers to convert non-utf8 to utf8. Second
> option
> > is to allow all encodings and capture the encoding in the metadata (I'm
> > leaning towards this option).


Allowing non-utf8 encodings adds complexity for everyone. Disallowing
> them only adds complexity for the tiny minority of producers of non-utf8
> JSON.


I'd also add that if we only allow extension on utf8 today, it would be a
forward/backward compatible change to allow parameterizing the extension
for bytes type by encoding if we wanted to support it in the future.
Parquet also only supports UTF-8 [1] for its logical JSON type.

[1]
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#json

On Mon, Aug 1, 2022 at 11:39 PM Antoine Pitrou <anto...@python.org> wrote:

>
> Le 01/08/2022 à 22:53, Pradeep Gollakota a écrit :
> > Thanks for all the great feedback.
> >
> > To proceed forward, we seem to need decisions around the following:
> >
> > 1. Whether to use arrow extensions or first class types. The consensus is
> > building towards using arrow extensions.
>
> +1
>
> > 2. What do we do about different non-utf8 encodings? There does not
> appear
> > to be a consensus yet on this point. One option is to only allow utf8
> > encoding and force implementers to convert non-utf8 to utf8. Second
> option
> > is to allow all encodings and capture the encoding in the metadata (I'm
> > leaning towards this option).
>
> Allowing non-utf8 encodings adds complexity for everyone. Disallowing
> them only adds complexity for the tiny minority of producers of non-utf8
> JSON.
>
> > 3. What do we do about the different formats of JSON (string, BSON,
> UBJSON,
> > etc.)?
>
> There are no "different formats of JSON". BSON etc. are unrelated formats.
>
> Regards
>
> Antoine.
>

Reply via email to