rdblue commented on code in PR #520:
URL: https://github.com/apache/parquet-format/pull/520#discussion_r2305558780
##########
VariantShredding.md:
##########
@@ -142,26 +155,25 @@ optional group tags (VARIANT) {
}
```
-All elements of an array must be present (not missing) because the `array`
Variant encoding does not allow missing elements.
-That is, either `typed_value` or `value` (but not both) must be non-null.
-Null elements must be encoded in `value` as Variant null: basic type 0
(primitive) and physical type 0 (null).
+All elements of a variant array must be present (not missing) because the
`array` Variant arrays cannot encode missing (NULL) elements.
+That is, at least one of `typed_value` or `value` must be present (possibly
both, if the elements are partially shredded objects).
The series of `tags` arrays `["comedy", "drama"], ["horror", null], ["comedy",
"drama", "romance"], null` would be stored as:
-| Array | `value` | `typed_value `|
`typed_value...value` | `typed_value...typed_value` |
+| Array | `value` | `typed_value `|
`value...value` | `typed_value...typed_value` |
Review Comment:
This was correct before. This uses the example schema above, which is
helpful for understanding the structure:
```
optional group tags (VARIANT) {
required binary metadata;
optional binary value;
optional group typed_value (LIST) { # must be optional to allow a null
list
repeated group list {
required group element { # shredded element
optional binary value;
optional binary typed_value (STRING);
}
}
}
}
```
The `value` column is binary and is used for any non-array. When the value
is an array, `typed_value` is used. The `...` is used as a placeholder to
shorten `typed_value.list.element.value` and
`typed_value.list.element.typed_value`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]