rdblue commented on code in PR #520:
URL: https://github.com/apache/parquet-format/pull/520#discussion_r2305572805
##########
VariantShredding.md:
##########
@@ -51,32 +75,17 @@ required group measurement (VARIANT) {
}
```
-The Parquet columns used to store variant metadata and values must be accessed
by name, not by position.
-
The series of measurements `34, null, "n/a", 100` would be stored as:
| Value | `metadata` | `value` | `typed_value` |
|---------|------------------|-----------------------|---------------|
-| 34 | `01 00` v1/empty | null | `34` |
-| null | `01 00` v1/empty | `00` (null) | null |
-| "n/a" | `01 00` v1/empty | `13 6E 2F 61` (`n/a`) | null |
-| 100 | `01 00` v1/empty | null | `100` |
-
-Both `value` and `typed_value` are optional fields used together to encode a
single value.
-Values in the two fields must be interpreted according to the following table:
-
-| `value` | `typed_value` | Meaning
|
-|----------|---------------|-------------------------------------------------------------|
-| null | null | The value is missing; only valid for shredded
object fields |
-| non-null | null | The value is present and may be any type,
including null |
-| null | non-null | The value is present and is the shredded type
|
-| non-null | non-null | The value is present and is a partially shredded
object |
-
-An object is _partially shredded_ when the `value` is an object and the
`typed_value` is a shredded object.
-Writers must not produce data where both `value` and `typed_value` are
non-null, unless the Variant value is an object.
+| 34 | `01 00` v1/empty | NULL | `34` |
+| `null` | `01 00` v1/empty | `00` (null) | NULL |
+| "n/a" | `01 00` v1/empty | `13 6E 2F 61` (`n/a`) | NULL |
+| 100 | `01 00` v1/empty | NULL | `100` |
-If a Variant is missing in a context where a value is required, readers must
return a Variant null (`00`): basic type 0 (primitive) and physical type 0
(null).
-For example, if a Variant is required (like `measurement` above) and both
`value` and `typed_value` are null, the returned `value` must be `00` (Variant
null).
+NOTE: If the `measurement` group were `optional` instead of `required`, then
rows with missing
+values (SQL NULL) would be encoded by the entire group having missing values
for those rows.
Review Comment:
This is not allowed by the spec, which is why it was not included. I also
think it is obvious that if the group is null then the columns should be
treated as null when interpreting the value.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]