rdblue commented on code in PR #520:
URL: https://github.com/apache/parquet-format/pull/520#discussion_r2305549518


##########
VariantShredding.md:
##########
@@ -110,18 +119,22 @@ Shredded values must use the following Parquet types:
 
 Primitive values can be shredded using the equivalent Parquet primitive type 
from the table above for `typed_value`.
 
-Unless the value is shredded as an object (see [Objects](#objects)), 
`typed_value` or `value` (but not both) must be non-null.
+Unless the value is shredded as an object (see [Objects](#objects)), exactly 
one of `typed_value` or `value` must be present in each row.
+
+NULL values in an optional group are encoded in the usual way, with a 
definition level that excludes
+that group. For example, if `measurement` above were a top-level column in the 
parquet file, NULL
+measurements would have definition level 0.
 
 #### Arrays
 
 Arrays can be shredded by using a 3-level Parquet list for `typed_value`.
 
-If the value is not an array, `typed_value` must be null.
-If the value is an array, `value` must be null.
+If the value is not an array, `typed_value` must be NULL.
+If the value is an array, `value` must be NULL.
 
 The list `element` must be a required group.
 The `element` group can contain `value` and `typed_value` fields.
-The element's `value` field stores the element as Variant-encoded `binary` 
when the `typed_value` is not present or cannot represent it.
+The element's `value` field stores the element as Variant-encoded `binary` 
when the `typed_value` field is missing or cannot represent it.

Review Comment:
   I think this was using "not present" instead of "missing" to call out that 
the column was not in the physical Parquet schema. A "missing" object field can 
still be physically present in the Parquet schema because the schema is based 
on the shredded fields across Variant values. Objects encode missing as a 
Parquet null value for both `value` and `typed_value` Parquet columns.
   
   In addition, this doc attempts to use "field" when referring to an field of 
a Variant object and "column" to refer to a physical Parquet column or field in 
a Parquet group. It's fine to state what is absent, but for consistency I think 
it should be "column" here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to