rdblue commented on code in PR #520:
URL: https://github.com/apache/parquet-format/pull/520#discussion_r2305558780


##########
VariantShredding.md:
##########
@@ -142,26 +155,25 @@ optional group tags (VARIANT) {
 }
 ```
 
-All elements of an array must be present (not missing) because the `array` 
Variant encoding does not allow missing elements.
-That is, either `typed_value` or `value` (but not both) must be non-null.
-Null elements must be encoded in `value` as Variant null: basic type 0 
(primitive) and physical type 0 (null).
+All elements of a variant array must be present (not missing) because the 
`array` Variant arrays cannot encode missing (NULL) elements.
+That is, at least one of `typed_value` or `value` must be present (possibly 
both, if the elements are partially shredded objects).
 
 The series of `tags` arrays `["comedy", "drama"], ["horror", null], ["comedy", 
"drama", "romance"], null` would be stored as:
 
-| Array                            | `value`     | `typed_value `| 
`typed_value...value` | `typed_value...typed_value`    |
+| Array                            | `value`     | `typed_value `| 
`value...value`       | `typed_value...typed_value`    |

Review Comment:
   This was correct before. This uses the example schema above, which is 
helpful for understanding the structure:
   
   ```
   optional group tags (VARIANT) {
     required binary metadata;
     optional binary value;
     optional group typed_value (LIST) {   # must be optional to allow a null 
list
       repeated group list {
         required group element {          # shredded element
           optional binary value;
           optional binary typed_value (STRING);
         }
       }
     }
   }
   ```
   
   The `value` column is binary and is used for any non-array. When the value 
is an array, `typed_value` is used. The `...` is used as a placeholder to 
shorten `typed_value.list.element.value` and 
`typed_value.list.element.typed_value`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to