scovich commented on code in PR #520:
URL: https://github.com/apache/parquet-format/pull/520#discussion_r2307463855


##########
VariantShredding.md:
##########
@@ -42,7 +42,31 @@ Variant values are stored in Parquet fields named `value`.
 Each `value` field may have an associated shredded field named `typed_value` 
that stores the value when it matches a specific type.
 When `typed_value` is present, readers **must** reconstruct shredded values 
according to this specification.
 
-For example, a Variant field, `measurement` may be shredded as long values by 
adding `typed_value` with type `int64`:
+The Parquet columns used to store variant metadata and values must be accessed 
by name, not by position.
+
+In order to avoid ambiguity, this specification always uses the term "`null`" 
to mean the variant
+null value (binary encoding: `00`). The phrase "missing" or "NULL" (all caps) 
always refers to an
+`optional` value that is not present (= SQL NULL).

Review Comment:
   The parquet spec for 
[nullability](https://github.com/apache/parquet-format?tab=readme-ov-file#nulls)
 calls them "NULL", not "null":
   > Nullity is encoded in the definition levels (which is run-length encoded). 
NULL values are not encoded in the data. For example, in a non-nested schema, a 
column with 1000 NULLs would be encoded with run-length encoding (0, 1000 
times) for the definition levels and nothing else.
   
   IMO using "NULL" to mean parquet null and "`null`" to mean variant null 
would be the clearest -- because it's quite visually distinct. 
   
   ... but unfortunately the [logical types 
spec](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md) 
uses lowercase "null" exclusively when referring to parquet nulls (except when 
it uses "null" to refer to variant null).
   
   So perhaps another solution is to use (lowercase) "null" to mean "parquet 
null" or "`null`" to mean "variant null", but that's less visually distinct.
   
   A third possibility could be to always say "parquet null" or "variant null" 
to disambiguate, but that will become wordy fast.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to