rdblue commented on code in PR #520:
URL: https://github.com/apache/parquet-format/pull/520#discussion_r2305569326


##########
VariantShredding.md:
##########
@@ -192,31 +204,32 @@ optional group event (VARIANT) {
 
 The group for each named field must use repetition level `required`.
 
-A field's `value` and `typed_value` are set to null (missing) to indicate that 
the field does not exist in the variant.
-To encode a field that is present with a null value, the `value` must contain 
a Variant null: basic type 0 (primitive) and physical type 0 (null).
+At least one of `value` and `typed_value` should be NULL, unless the field is 
a partially shredded object. If both of them are NULL, the variant does not 
contain that field at all.
+
+To encode a field that is present with a `null` value, the `value` must 
contain a Variant null: basic type 0 (primitive) and physical type 0 (null).
 
-When both `value` and `typed_value` for a field are non-null, engines should 
fail.
-If engines choose to read in such cases, then the `typed_value` column must be 
used.
-Readers may always assume that data is written correctly and that only `value` 
or `typed_value` is defined.
-As a result, reads when both `value` and `typed_value` are defined may be 
inconsistent with optimized reads that require only one of the columns.
+Readers may always assume that shredded data is written correctly, and that 
only one of `value` or `typed_value` is present (unless the field is a 
partially shredded object).
+In particular, if a reader determines, based on the shredding schema, that a 
query needs only one of the two columns, the reader is not required to validate 
the other column.

Review Comment:
   We had a lot of discussion about this and compromised on the wording 
"engines should fail".
   
   I think this update is okay since it fits the intent that readers can 
projection just one and move on, but I'd like the people from the original 
discussion to sign off on changing it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to