emkornfield commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1868952437
########## VariantEncoding.md: ########## @@ -416,14 +444,36 @@ Field names are case-sensitive. Field names are required to be unique for each object. It is an error for an object to contain two fields with the same name, whether or not they have distinct dictionary IDs. -# Versions and extensions +## Versions and extensions An implementation is not expected to parse a Variant value whose metadata version is higher than the version supported by the implementation. However, new types may be added to the specification without incrementing the version ID. In such a situation, an implementation should be able to read the rest of the Variant value if desired. -# Shredding +## Shredding A single Variant object may have poor read performance when only a small subset of fields are needed. A better approach is to create separate columns for individual fields, referred to as shredding or subcolumnarization. [VariantShredding.md](VariantShredding.md) describes the Variant shredding specification in Parquet. + +## Conversion to JSON + +Values stored in the Variant encoding are a superset of JSON values. +For example, a Variant value can be a date that has no equivalent type in JSON. +To maximize compatibility with readers that can process JSON but not Variant, the following conversions should be used when producing JSON from a Variant: + +| Variant type | JSON type | Representation requirements | Example | +|---------------|-----------|----------------------------------------------------------|--------------------------------------| +| Null type | null | `null` | `null` | +| Boolean | boolean | `true` or `false` | `true` | +| Exact Numeric | number | Digits in fraction must match scale, no exponent | `34`, 34.00 | Review Comment: > Why would we require an engine to produce a normalized value? At least for me, I don't think it is about "requiring" and engine to produce a normalized value first. I think if an engine is reading variant and converting it to JSON, it is possibly doing so through an internal representation so it can still apply operators on top of the JSON value and possibly even storing it as an internal representation. Conversion to a string is really only an end-user visible thing. So when I read this it seems to be requiring an engine to NOT normalize which could be hard to implement for some engines. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@parquet.apache.org For additional commands, e-mail: issues-h...@parquet.apache.org