While I do like having a json type, adding processing functionality especially 
around compute capabilities might be limiting.

Arrow already supports nested lists and structs which can cover json structures 
while offering vectorized processing. Json should only be a logical 
representation of what arrow physically supports today.

A bad example is Snowflake semi structured data support. They have a Java 
engine for tabular data and a JavaScript engine for json data. The JS engine is 
a second class citizen that requires a lot of compute to string parse json data 
before json content can be filtered, sorted, aggregated, etc..

Sent from my iPad

> On Aug 2, 2022, at 11:38 AM, Wes McKinney <wesmck...@gmail.com> wrote:
> 
> External Email: Use caution with links and attachments
> 
> 
> I should add that since Parquet has JSON, BSON, and UUID types, that
> while UUID is just a simple fixed sized binary, that having the
> extension types so that the metadata flows through accurately to
> Parquet would be net beneficial:
> 
> https://urldefense.com/v3/__https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift*L342__;Iw!!KSjYCgUGsB4!aTjWsSjJoE1gN7iM84QJUDoTt3F1A9BBpaLGscg9jYN26Eohr9bN8y0ccxgI8S3zLfGUjXBV2ewE9myKupUW$
> 
> Implementing JSON (and BSON and UUID if we want them) as extension
> types and restricting JSON to UTF-8 sounds good to me.
> 
>> On Tue, Aug 2, 2022 at 12:43 AM Micah Kornfield <emkornfi...@gmail.com> 
>> wrote:
>> 
>>> 
>>>> 2. What do we do about different non-utf8 encodings? There does not
>>> appear
>>>> to be a consensus yet on this point. One option is to only allow utf8
>>>> encoding and force implementers to convert non-utf8 to utf8. Second
>>> option
>>>> is to allow all encodings and capture the encoding in the metadata (I'm
>>>> leaning towards this option).
>> 
>> 
>> Allowing non-utf8 encodings adds complexity for everyone. Disallowing
>>> them only adds complexity for the tiny minority of producers of non-utf8
>>> JSON.
>> 
>> 
>> I'd also add that if we only allow extension on utf8 today, it would be a
>> forward/backward compatible change to allow parameterizing the extension
>> for bytes type by encoding if we wanted to support it in the future.
>> Parquet also only supports UTF-8 [1] for its logical JSON type.
>> 
>> [1]
>> https://urldefense.com/v3/__https://github.com/apache/parquet-format/blob/master/LogicalTypes.md*json__;Iw!!KSjYCgUGsB4!aTjWsSjJoE1gN7iM84QJUDoTt3F1A9BBpaLGscg9jYN26Eohr9bN8y0ccxgI8S3zLfGUjXBV2ewE9sNlK7dP$
>> 
>>> On Mon, Aug 1, 2022 at 11:39 PM Antoine Pitrou <anto...@python.org> wrote:
>>> 
>>> 
>>> Le 01/08/2022 à 22:53, Pradeep Gollakota a écrit :
>>>> Thanks for all the great feedback.
>>>> 
>>>> To proceed forward, we seem to need decisions around the following:
>>>> 
>>>> 1. Whether to use arrow extensions or first class types. The consensus is
>>>> building towards using arrow extensions.
>>> 
>>> +1
>>> 
>>>> 2. What do we do about different non-utf8 encodings? There does not
>>> appear
>>>> to be a consensus yet on this point. One option is to only allow utf8
>>>> encoding and force implementers to convert non-utf8 to utf8. Second
>>> option
>>>> is to allow all encodings and capture the encoding in the metadata (I'm
>>>> leaning towards this option).
>>> 
>>> Allowing non-utf8 encodings adds complexity for everyone. Disallowing
>>> them only adds complexity for the tiny minority of producers of non-utf8
>>> JSON.
>>> 
>>>> 3. What do we do about the different formats of JSON (string, BSON,
>>> UBJSON,
>>>> etc.)?
>>> 
>>> There are no "different formats of JSON". BSON etc. are unrelated formats.
>>> 
>>> Regards
>>> 
>>> Antoine.
>>> 


This message may contain information that is confidential or privileged. If you 
are not the intended recipient, please advise the sender immediately and delete 
this message. See 
http://www.blackrock.com/corporate/compliance/email-disclaimers for further 
information.  Please refer to 
http://www.blackrock.com/corporate/compliance/privacy-policy for more 
information about BlackRock’s Privacy Policy.


For a list of BlackRock's office addresses worldwide, see 
http://www.blackrock.com/corporate/about-us/contacts-locations.

© 2022 BlackRock, Inc. All rights reserved.

Reply via email to