Can a logical extension be based on another logical extension?
HOCON support might be nice..
-----Original Message-----
From: Micah Kornfield <emkornfi...@gmail.com>
Sent: Monday, November 28, 2022 11:50 AM
To: dev@arrow.apache.org
Subject: Re: [DISCUSS] JSON Canonical Extension Type
External Email: Use caution with links and attachments
This seems like a reasonable definition to me. Since there hasn't been
much feedback, I think maybe following through an implementation + this
description in a PR would be the next steps. If there isn't further
feedback on this, once the PR is up we can have try to vote (which might
bring up some more feedback, but hopefully wouldn't cause too much
implementation churn).
Thanks,
Micah
On Thu, Nov 17, 2022 at 3:58 PM Pradeep Gollakota
<pgollak...@google.com.invalid> wrote:
Hi folks!
I put together this specification for canonicalizing the JSON type in
Arrow.
## Introduction
JSON is a widely used text based data interchange format. There are
many use cases where a user has a column whose contents are a JSON
encoded string. BigQuery's [JSON Type][1] and Parquet’s [JSON Logical
Type][2] are two such examples.
The JSON specification is defined in [RFC-8259][3]. However, many of
the most popular parsers support non standard extensions. Examples of
non standard extensions to JSON include comments, unquoted keys,
trailing commas, etc.
## Extension Specification
* The name of the extension is `arrow.json`
* The storage type of the extension is `utf8`
* The extension type has no parameters
* The metadata MUST be either empty or a valid JSON object
- There is no canonical metadata
- Implementations MAY include implementation-specific metadata by
using a namespaced key. For example `{"google.bigquery": {"my":
"metadata"}}`
* Implementations...
- MUST produce valid UTF-8 encoded text
- SHOULD produce valid standard JSON
- MAY produce valid non-standard JSON
- MUST support parsing standard JSON
- MAY support parsing non standard JSON
- SHOULD pass through contents that they do not understand
## Forward compatibility
In the future we might allow this logical type to annotate a byte
storage type with a different text encoding. Implementations
consuming JSON logical types should verify this.
[1]:
https://urldefense.com/v3/__https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types*json_type__;Iw!!KSjYCgUGsB4!YhB-EpSLu8HTacaUsWvTVqF0kYh81UlVwNFBAc4-f95F7bGtdGuyWN_JObBkRSee-jTU20_MmGe2WUH8UMqTxPY$
[2]:
https://urldefense.com/v3/__https://github.com/apache/parquet-format/blob/master/LogicalTypes.md*json__;Iw!!KSjYCgUGsB4!YhB-EpSLu8HTacaUsWvTVqF0kYh81UlVwNFBAc4-f95F7bGtdGuyWN_JObBkRSee-jTU20_MmGe2WUH8RFfD8NY$
[3]:
https://urldefense.com/v3/__https://datatracker.ietf.org/doc/html/rfc8259__;!!KSjYCgUGsB4!YhB-EpSLu8HTacaUsWvTVqF0kYh81UlVwNFBAc4-f95F7bGtdGuyWN_JObBkRSee-jTU20_MmGe2WUH8MGoes7Q$
This message may contain information that is confidential or privileged.
If you are not the intended recipient, please advise the sender immediately
and delete this message. See
http://www.blackrock.com/corporate/compliance/email-disclaimers for
further information. Please refer to
http://www.blackrock.com/corporate/compliance/privacy-policy for more
information about BlackRock’s Privacy Policy.
For a list of BlackRock's office addresses worldwide, see
http://www.blackrock.com/corporate/about-us/contacts-locations.
© 2022 BlackRock, Inc. All rights reserved.