rdblue commented on code in PR #520:
URL: https://github.com/apache/parquet-format/pull/520#discussion_r2305567188


##########
VariantShredding.md:
##########
@@ -192,31 +204,32 @@ optional group event (VARIANT) {
 
 The group for each named field must use repetition level `required`.
 
-A field's `value` and `typed_value` are set to null (missing) to indicate that 
the field does not exist in the variant.
-To encode a field that is present with a null value, the `value` must contain 
a Variant null: basic type 0 (primitive) and physical type 0 (null).
+At least one of `value` and `typed_value` should be NULL, unless the field is 
a partially shredded object. If both of them are NULL, the variant does not 
contain that field at all.
+
+To encode a field that is present with a `null` value, the `value` must 
contain a Variant null: basic type 0 (primitive) and physical type 0 (null).
 
-When both `value` and `typed_value` for a field are non-null, engines should 
fail.
-If engines choose to read in such cases, then the `typed_value` column must be 
used.
-Readers may always assume that data is written correctly and that only `value` 
or `typed_value` is defined.
-As a result, reads when both `value` and `typed_value` are defined may be 
inconsistent with optimized reads that require only one of the columns.
+Readers may always assume that shredded data is written correctly, and that 
only one of `value` or `typed_value` is present (unless the field is a 
partially shredded object).
+In particular, if a reader determines, based on the shredding schema, that a 
query needs only one of the two columns, the reader is not required to validate 
the other column.
+A reader that accesses both both `value` and `typed_value` columns should fail 
if they are both non-NULL and the value is not a partially shredded object.
+If readers choose to tolerate such cases, then the `typed_value` column must 
be used.
 
 The table below shows how the series of objects in the first column would be 
stored:
 
 | Event object                                                                 
      | `value`                           | `typed_value` | 
`typed_value.event_type.value` | `typed_value.event_type.typed_value` | 
`typed_value.event_ts.value` | `typed_value.event_ts.typed_value` | Notes       
                                     |
 
|------------------------------------------------------------------------------------|-----------------------------------|---------------|--------------------------------|--------------------------------------|------------------------------|------------------------------------|--------------------------------------------------|
-| `{"event_type": "noop", "event_ts": 1729794114937}`                          
      | null                              | non-null      | null                
           | `noop`                               | null                        
 | 1729794114937                      | Fully shredded object                   
         |

Review Comment:
   I think the original intent was to avoid quotes and to present just the 
string contents. We should probably _remove_ the quotes around the date.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to