rdblue commented on code in PR #461:
URL: https://github.com/apache/parquet-format/pull/461#discussion_r1859086423


##########
VariantShredding.md:
##########
@@ -25,290 +25,316 @@
 The Variant type is designed to store and process semi-structured data 
efficiently, even with heterogeneous values.
 Query engines encode each Variant value in a self-describing format, and store 
it as a group containing `value` and `metadata` binary fields in Parquet.
 Since data is often partially homogenous, it can be beneficial to extract 
certain fields into separate Parquet columns to further improve performance.
-We refer to this process as **shredding**.
-Each Parquet file remains fully self-describing, with no additional metadata 
required to read or fully reconstruct the Variant data from the file.
-Combining shredding with a binary residual provides the flexibility to 
represent complex, evolving data with an unbounded number of unique fields 
while limiting the size of file schemas, and retaining the performance benefits 
of a columnar format.
+This process is **shredding**.
 
-This document focuses on the shredding semantics, Parquet representation, 
implications for readers and writers, as well as the Variant reconstruction.
-For now, it does not discuss which fields to shred, user-facing API changes, 
or any engine-specific considerations like how to use shredded columns.
-The approach builds upon the [Variant Binary Encoding](VariantEncoding.md), 
and leverages the existing Parquet specification.
+Shredding enables the use of Parquet's columnar representation for more 
compact data encoding, column statistics for data skipping, and partial 
projections.

Review Comment:
   I think JSON makes it more confusing because these objects are not JSON and 
contain typed values.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@parquet.apache.org
For additional commands, e-mail: issues-h...@parquet.apache.org

Reply via email to