rahil-c commented on code in PR #18274:
URL: https://github.com/apache/hudi/pull/18274#discussion_r2907259450
##########
rfc/rfc-99/rfc-99.md:
##########
@@ -209,4 +209,299 @@ SQL Extensions needs to be added to define the table in a
hudi type native way.
TODO: There is an open question regarding the need to maintain type ids to
track schema evolution and how it would interplay with NBCC.
-The main implementation change would require replacing the Avro schema
references with the new type system.
+The main implementation change would require replacing the Avro schema
references with the new type system.
+
+---
+
+## Variant Type Implementation
+
+This section documents the implementation of the VARIANT type in Hudi, which
provides first-class support for semi-structured data (e.g., JSON). The Variant
type is implemented following Spark 4.0's native VariantType specification.
+
+### Overview
+
+The Variant type enables Hudi to store and query semi-structured data
efficiently. It is particularly useful for:
+- Schema-on-read flexibility for evolving data structures
+- Storing JSON-like data without requiring predefined schemas
+
+### Architecture
+
+Variant support is built on a **layered architecture** with version-specific
adapters:
+
+```
+┌────────────────────────────────────────────────────┐
+│ Application Layer (Spark SQL) │
+│ SELECT parse_json('{"a": 1}') as data │
+└────────────────────────────────────────────────────┘
+ │
+ ▼
+┌────────────────────────────────────────────────────┐
+│ Spark Version Adapters │
+│ ┌──────────────────┐ ┌────────────────────────┐ │
+│ │ BaseSpark3Adapter│ │ BaseSpark4Adapter │ │
+│ │ (No Variant) │ │ (Full Variant) │ │
+│ └──────────────────┘ └────────────────────────┘ │
+└────────────────────────────────────────────────────┘
+ │
+ ▼
+┌────────────────────────────────────────────────────┐
+│ HoodieSchema.Variant │
+│ (Avro Logical Type + Record Schema) │
+└────────────────────────────────────────────────────┘
+ │
+ ▼
+┌────────────────────────────────────────────────────┐
+│ Parquet Storage │
+│ GROUP { value: BINARY, metadata: BINARY } │
+└────────────────────────────────────────────────────┘
+```
+
+### Variant Schema Definition
+
+The `HoodieSchema.Variant` class in `hudi-common` defines the Variant type:
+
+```java
+public static class Variant extends HoodieSchema {
+ private static final String VARIANT_METADATA_FIELD = "metadata";
+ private static final String VARIANT_VALUE_FIELD = "value";
+ private static final String VARIANT_TYPED_VALUE_FIELD = "typed_value";
+
+ private final boolean isShredded;
Review Comment:
I am wondering if before we go into the VARIANT Schema definition, if we can
cover in a paragraph what Shredded vs Unshredded means?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]