rdblue commented on code in PR #520:
URL: https://github.com/apache/parquet-format/pull/520#discussion_r2305563376


##########
VariantShredding.md:
##########
@@ -142,26 +155,25 @@ optional group tags (VARIANT) {
 }
 ```
 
-All elements of an array must be present (not missing) because the `array` 
Variant encoding does not allow missing elements.
-That is, either `typed_value` or `value` (but not both) must be non-null.
-Null elements must be encoded in `value` as Variant null: basic type 0 
(primitive) and physical type 0 (null).
+All elements of a variant array must be present (not missing) because the 
`array` Variant arrays cannot encode missing (NULL) elements.
+That is, at least one of `typed_value` or `value` must be present (possibly 
both, if the elements are partially shredded objects).
 
 The series of `tags` arrays `["comedy", "drama"], ["horror", null], ["comedy", 
"drama", "romance"], null` would be stored as:
 
-| Array                            | `value`     | `typed_value `| 
`typed_value...value` | `typed_value...typed_value`    |
+| Array                            | `value`     | `typed_value `| 
`value...value`       | `typed_value...typed_value`    |
 
|----------------------------------|-------------|---------------|-----------------------|--------------------------------|
-| `["comedy", "drama"]`            | null        | non-null      | [null, 
null]          | [`comedy`, `drama`]            |
-| `["horror", null]`               | null        | non-null      | [null, 
`00`]          | [`horror`, null]               |
-| `["comedy", "drama", "romance"]` | null        | non-null      | [null, 
null, null]    | [`comedy`, `drama`, `romance`] |
-| null                             | `00` (null) | null          |             
          |                                |
+| `["comedy", "drama"]`            | NULL        | non-NULL      | [NULL, 
NULL]          | [`comedy`, `drama`]            |
+| `["horror", null]`               | NULL        | non-NULL      | [NULL, 
`00`]          | [`horror`, NULL]            |
+| `["comedy", "drama", "romance"]` | NULL        | non-NULL      | [NULL, 
NULL, NULL]    | [`comedy`, `drama`, `romance`] |
+| NULL                             | `00` (null) | NULL          |             
          |                                |
 
 #### Objects
 
 Fields of an object can be shredded using a Parquet group for `typed_value` 
that contains shredded fields.
 
-If the value is an object, `typed_value` must be non-null.
-If the value is not an object, `typed_value` must be null.
-Readers can assume that a value is not an object if `typed_value` is null and 
that `typed_value` field values are correct; that is, readers do not need to 
read the `value` column if `typed_value` fields satisfy the required fields.
+If the value is an object, `typed_value` must be present.
+If the value is not an object, `typed_value` must be NULL.
+Readers can assume that a value is not an object if `typed_value` is NULL and 
that `typed_value` field values are correct when present; that is, readers do 
not need to read the `value` column if `typed_value` fields satisfy the 
required fields.

Review Comment:
   The addition of "when present" here is not correct. The aim is to make it 
clear that readers can project just the shredded fields if other fields in the 
object are not needed. For instance, `variant_get(var_col, '$.type', 'string')` 
only needs to project `typed_value.type.value` (or validate all null from 
stats) and `typed_value.type.typed_value`. If both values are null then the 
reader can assume that the field is missing from the object.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to