rdblue commented on code in PR #520:
URL: https://github.com/apache/parquet-format/pull/520#discussion_r2305572805


##########
VariantShredding.md:
##########
@@ -51,32 +75,17 @@ required group measurement (VARIANT) {
 }
 ```
 
-The Parquet columns used to store variant metadata and values must be accessed 
by name, not by position.
-
 The series of measurements `34, null, "n/a", 100` would be stored as:
 
 | Value   | `metadata`       | `value`               | `typed_value` |
 |---------|------------------|-----------------------|---------------|
-| 34      | `01 00` v1/empty | null                  | `34`          |
-| null    | `01 00` v1/empty | `00` (null)           | null          |
-| "n/a"   | `01 00` v1/empty | `13 6E 2F 61` (`n/a`) | null          |
-| 100     | `01 00` v1/empty | null                  | `100`         |
-
-Both `value` and `typed_value` are optional fields used together to encode a 
single value.
-Values in the two fields must be interpreted according to the following table:
-
-| `value`  | `typed_value` | Meaning                                           
          |
-|----------|---------------|-------------------------------------------------------------|
-| null     | null          | The value is missing; only valid for shredded 
object fields |
-| non-null | null          | The value is present and may be any type, 
including null    |
-| null     | non-null      | The value is present and is the shredded type     
          |
-| non-null | non-null      | The value is present and is a partially shredded 
object     |
-
-An object is _partially shredded_ when the `value` is an object and the 
`typed_value` is a shredded object.
-Writers must not produce data where both `value` and `typed_value` are 
non-null, unless the Variant value is an object.
+| 34      | `01 00` v1/empty | NULL                  | `34`          |
+| `null`  | `01 00` v1/empty | `00` (null)           | NULL          |
+| "n/a"   | `01 00` v1/empty | `13 6E 2F 61` (`n/a`) | NULL          |
+| 100     | `01 00` v1/empty | NULL                  | `100`         |
 
-If a Variant is missing in a context where a value is required, readers must 
return a Variant null (`00`): basic type 0 (primitive) and physical type 0 
(null).
-For example, if a Variant is required (like `measurement` above) and both 
`value` and `typed_value` are null, the returned `value` must be `00` (Variant 
null).
+NOTE: If the `measurement` group were `optional` instead of `required`, then 
rows with missing
+values (SQL NULL) would be encoded by the entire group having missing values 
for those rows.

Review Comment:
   This is not allowed by the spec, which is why it was not included. I also 
think it is obvious that if the group is null then the columns should be 
treated as null when interpreting the value.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to