yihua commented on code in PR #18274:
URL: https://github.com/apache/hudi/pull/18274#discussion_r3036400524
##########
rfc/rfc-99/rfc-99.md:
##########
@@ -194,5 +195,444 @@ SQL Extensions needs to be added to define the table in a
hudi type native way.
TODO: There is an open question regarding the need to maintain type ids to
track schema evolution and how it would interplay with NBCC.
+---
+
+## Variant Type Implementation
+
+This section documents the implementation of the VARIANT type in Hudi, which
provides first-class support for
+semi-structured data (e.g., JSON). The Variant type is implemented following
Spark 4.0's native VariantType
+specification.
+
+### Overview
+
+The Variant type enables Hudi to store and query semi-structured data
efficiently. It is particularly useful for:
+
+- Schema-on-read flexibility for evolving data structures
+- Storing JSON-like data without requiring predefined schemas
+
+### Motivation
+
+Hudi readers and writers should be able to handle datasets with variant types.
Review Comment:
π€ The motivation section is quite brief. It might be worth adding a sentence
or two about why the Variant type matters specifically for Hudi (e.g., how it
interacts with Hudi's merge/compaction semantics, or why storing JSON-as-string
is insufficient for Hudi's indexing and data skipping). Right now it reads more
like a general Variant primer than a Hudi-specific design rationale.
##########
rfc/rfc-99/rfc-99.md:
##########
@@ -194,5 +195,444 @@ SQL Extensions needs to be added to define the table in a
hudi type native way.
TODO: There is an open question regarding the need to maintain type ids to
track schema evolution and how it would interplay with NBCC.
+---
+
+## Variant Type Implementation
Review Comment:
π€ The Avro mapping changed from `string or union` to `record of 2 byte
fields + variant logical type`. Have you considered what happens when existing
tables that stored Variant as `string` (under the old mapping) need to be read
with the new code path? Is there a compatibility concern here, or was the old
mapping never implemented?
##########
rfc/rfc-99/rfc-99.md:
##########
@@ -194,5 +195,444 @@ SQL Extensions needs to be added to define the table in a
hudi type native way.
TODO: There is an open question regarding the need to maintain type ids to
track schema evolution and how it would interplay with NBCC.
+---
+
+## Variant Type Implementation
+
+This section documents the implementation of the VARIANT type in Hudi, which
provides first-class support for
+semi-structured data (e.g., JSON). The Variant type is implemented following
Spark 4.0's native VariantType
+specification.
+
+### Overview
+
+The Variant type enables Hudi to store and query semi-structured data
efficiently. It is particularly useful for:
+
+- Schema-on-read flexibility for evolving data structures
+- Storing JSON-like data without requiring predefined schemas
+
+### Motivation
+
+Hudi readers and writers should be able to handle datasets with variant types.
+This will allow users to work with semi-structured data more easily.
+
+The variant type is now formally defined in Parquet and engines like Spark
have full support for this type.
+Users with semi-structured data are otherwise forced to use strings or byte
arrays to store this data.
+
+### What is the VARIANT Type?
+
+The `VARIANT` type is a new data type designed to store semi-structured data
(like JSON) efficiently.
+Unlike storing JSON as a plain string, `VARIANT` uses an optimized binary
encoding that allows for fast navigation
+and element extraction without needing to parse the entire document.
+It offers the flexibility of a schema-less design (like JSON) with performance
closer to structured columns.
+
+### Storage Modes: Shredded vs. Unshredded
+
+- **Unshredded** (Binary Blob):
+ - The entire JSON structure is encoded into binary metadata and value
blobs.
+ - **Pros**: Fast write speed; handles completely dynamic/random schemas
easily.
+ - **Cons**: To read a single field (e.g., `user.id`), the engine must load
the entire binary blob.
+- **Shredded** (Columnar Optimization):
+ - The engine identifies common paths in the data (e.g., `v.a` or `v.c`)
and extracts them into separate, native
+ Parquet columns (e.g., Int32, Decimal).
+ - **Pros**: Massive performance gain for queries. If you query `SELECT
v:a`, the engine reads only the specific
+ Int32 column and skips the rest of the binary data (Columnar Pruning).
+ - **Cons**: Higher write overhead to analyze and split the data, prone to
read jitter if there are large variation
+ - in shredding output.
+
+### Architecture
+
+Variant support is built on a **layered architecture** with version-specific
adapters:
+
+```
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+β Application Layer (Spark SQL) β
+β SELECT parse_json('{"a": 1}') as data β
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+ β
+ βΌ
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+β Spark Version Adapters β
+β ββββββββββββββββββββ ββββββββββββββββββββββββββ β
+β β BaseSpark3Adapterβ β BaseSpark4Adapter β β
+β β (No Variant) β β (Full Variant) β β
+β ββββββββββββββββββββ ββββββββββββββββββββββββββ β
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+ β
+ βΌ
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+β HoodieSchema.Variant β
+β (Avro Logical Type + Record Schema) β
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+ β
+ βΌ
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+β Parquet Storage β
+β GROUP { value: BINARY, metadata: BINARY } β
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+```
+
+### Variant Schema Definition
+
+The `HoodieSchema.Variant` class in `hudi-common` defines the Variant type:
+
+```java
+public static class Variant extends HoodieSchema {
+ private static final String VARIANT_METADATA_FIELD = "metadata";
+ private static final String VARIANT_VALUE_FIELD = "value";
+ private static final String VARIANT_TYPED_VALUE_FIELD = "typed_value";
+
+ private final boolean isShredded;
+ private final Option<HoodieSchema> typedValueSchema;
+}
+```
+
+#### Two Storage Modes
+
+1. **Unshredded Variant** (Default):
+ - Created with: `HoodieSchema.createVariant()`
+ - Structure: Record with two REQUIRED binary fields
+ - Fields: `metadata` (BYTES, REQUIRED), `value` (BYTES, REQUIRED)
+ - Use case: Simple semi-structured data storage
+
+2. **Shredded Variant**:
+ - Created with: `HoodieSchema.createVariantShredded(typedValueSchema)`
+ - Structure: Record with `typed_value` group containing extracted typed
columns
+ - Fields: `value` (BYTES, OPTIONAL), `metadata` (BYTES, REQUIRED),
`typed_value` (GROUP, OPTIONAL)
+ - Use case: Frequently accessed fields are extracted into native typed
columns for columnar reads
+
+#### How a Reader Distinguishes the Two Modes
+
+Both modes use the same `VARIANT` logical type annotation in the Parquet
footer β the annotation does not change.
+The reader inspects the Parquet file schema to determine which mode was used:
+
+- If the Variant group contains only `metadata` and `value`, it is
**unshredded**.
+- If the Variant group also contains a `typed_value` child group, it is
**shredded**.
+
+In the shredded case, `value` becomes OPTIONAL because when all fields are
fully shredded, the binary blob
+may be `null` β the content is entirely represented in the typed columns. The
reader falls back to `value`
+only for fields that were not shredded.
+
+#### Custom Avro Logical Type
+
+Variant uses a custom Avro logical type for identification:
+
+```java
+public static class VariantLogicalType extends LogicalType {
+ private static final String VARIANT_LOGICAL_TYPE_NAME = "variant";
+}
+```
+
+### On-Disk Representation (Parquet)
+
+Hudi's on-disk Variant representation intentionally aligns with the
+[Parquet Variant
spec](https://github.com/apache/parquet-format/blob/master/VariantEncoding.md).
+The spec defines a Variant as a group annotated with the `VARIANT` logical
type. Hudi writes Variant columns using
+this exact layout in both modes:
+
+#### Unshredded
+
+```
+optional group variant_column (VARIANT) {
+ required binary metadata;
+ required binary value;
+}
+```
+
+The entire JSON value is encoded into the `value` blob. Both fields are
REQUIRED β every row carries the full binary
+representation.
+
+#### Shredded
+
+```
+optional group variant_column (VARIANT) {
+ required binary metadata;
+ optional binary value;
+ optional group typed_value {
+ optional group a {
+ optional binary value;
+ optional int32 typed_value;
+ }
+ optional group b {
+ optional binary value;
+ optional binary typed_value (STRING);
+ }
+ optional group c {
+ optional binary value;
+ optional int64 typed_value;
+ }
+ }
+}
+```
+
+Each child under `typed_value` corresponds to a shredded field. The child's
`typed_value` column holds the extracted
+native-typed value (e.g., `int32`, `string`, `int64`), while the child's
`value` column is a fallback binary blob for
+rows where the field's type does not match the shredded type. The top-level
`value` becomes OPTIONAL β it is `null`
+when all fields in the row are fully covered by shredded columns, and
populated only for fields that were not shredded.
+
+**How a reader uses this**: The Parquet file schema in the footer tells the
reader which mode was used. If the Variant
+group contains only `metadata` and `value`, it is unshredded. If a
`typed_value` child group is present, the reader
+knows which fields have been shredded and can read them as native typed
columns β enabling column pruning, predicate
+pushdown, and data skipping without touching the binary blob.
+
+The `VARIANT` annotation is what allows readers to recognize the column as a
Variant and expose semantic operations
+(e.g., `variant_get`, JSON path access). Without it, the group is
indistinguishable from an ordinary
+`Struct<metadata: Binary, value: Binary>`.
+
+**Alignment with the Parquet spec**: Hudi does not diverge from the Parquet
Variant spec. The annotation, field names,
+field types, and repetition levels all follow the spec exactly. This means any
Parquet reader that implements the
+Variant spec can read Hudi-written Variant columns natively, and vice versa.
+
+**Reader compatibility**: Engines that do not yet implement the Parquet
Variant spec (e.g., Spark 3.5) will ignore the
+`VARIANT` annotation and fall back to reading the column as a plain struct of
binary fields.
+The data remains physically accessible, but users lose Variant query functions
and must manually parse
+the binary encoding. See [variant-appendix.md](variant-appendix.md) for
detailed backward compatibility findings.
+
+**Migration path**: If a future Parquet spec revision changes the Variant
physical layout, Hudi can introduce a
+table property (e.g., `hoodie.variant.parquet.format.version`) to control
which format is written, and the reader
+can inspect the Parquet footer to determine which layout to expect. (This is
outside the scope of this RFC for now)
+
+#### Binary Format
+
+The Variant binary encoding follows the
+[Parquet Variant Binary Encoding
spec](https://github.com/apache/parquet-format/blob/master/VariantEncoding.md):
+
+| Component | Description
|
+|--------------|---------------------------------------------------------------------|
+| **metadata** | Dictionary of field names and type information for efficient
access |
+| **value** | Binary encoding of the actual data (scalars, objects, arrays)
|
+
+Example for `{"updated": true, "new_field": 123}`:
+
+```
+Metadata Bytes: [0x01, 0x02, 0x00, 0x07, 0x10, "updated", "new_field"]
+Value Bytes: [0x02, 0x02, 0x01, 0x00, 0x01, 0x00, 0x03, 0x04, 0x0C, 0x7B]
+```
+
+The metadata contains a dictionary of all field names, while the value
contains references to these fields plus the
+actual data values.
+
+### Schema Evolution Support
+
+Variant types provide **schema-on-read** flexibility:
+
+| Aspect | Behavior
|
+|--------------------------|-------------------------------------------------------------------|
+| Adding new fields | β
Supported - New JSON fields can be added
without schema changes |
+| Removing fields | β
Supported - Missing fields return null on read
|
+| Type changes within JSON | β
Supported - Variant can store any
JSON-compatible type |
+| Table schema evolution | β
Supported - Variant column can be added to
existing tables |
+| Hudi schema evolution | β
Supported - Works with Hudi's standard schema
evolution |
+
+**Important**: The schema flexibility is within the Variant column itself. The
table-level schema (including the Variant
+column definition) still follows Hudi's standard schema evolution rules.
+
+### Column Statistics and Indexing
+
+Variant statistics and indexing capabilities depend on whether a field is
accessed from the unshredded binary blob
+or from a shredded (extracted) typed column. With shredding, extracted fields
become regular typed Parquet columns
+that automatically leverage all existing Hudi metadata infrastructure.
+
+#### Unshredded Variant Column
+
+| Feature | Status | Notes
|
+|--------------------|--------|--------------------------------------------------------------------|
+| Value/null counts | β
| Standard column-level counts via MDT
`column_stats` partition |
+| Min/max bounds | β | Binary blob has no meaningful sort order
|
+| Data skipping | β | Requires comparable min/max bounds
|
+| Bloom filter | β | Bloom filters are used for record key lookups
only |
+| Partition stats | β | Variant columns are not used as partition keys
|
+| Predicate pushdown | β οΈ | Structural predicates only (`IS NULL`, `IS NOT
NULL`) |
+
+#### Shredded Variant Fields
+
+| Feature | Status | Notes
|
+|--------------------|--------|------------------------------------------------------------------------------------------------------------------------|
+| Min/max bounds | β
| Shredded columns are regular typed columns;
`HoodieTableMetadataUtil.isColumnTypeSupported()` accepts all primitives |
+| Data skipping | β
| `DataSkippingUtils` translates filters on
shredded fields to min/max range checks |
+| Expression index | β
| Index over `variant_get()` expressions enables
stats collection without full shredding |
+| Bloom filter | β | Bloom filters are used for record key lookups
only |
Review Comment:
π€ Could you elaborate on how Variant columns interact with Hudi's record
merging in MOR tables? The Avro serialization section shows the schema, but
doesn't discuss merge semantics β e.g., when a log record updates only some
fields inside the Variant, is the entire binary blob replaced, or is there
field-level merging? This seems important for users to understand the write
amplification trade-offs.
##########
rfc/rfc-99/rfc-99.md:
##########
@@ -194,5 +195,444 @@ SQL Extensions needs to be added to define the table in a
hudi type native way.
TODO: There is an open question regarding the need to maintain type ids to
track schema evolution and how it would interplay with NBCC.
+---
+
+## Variant Type Implementation
+
+This section documents the implementation of the VARIANT type in Hudi, which
provides first-class support for
+semi-structured data (e.g., JSON). The Variant type is implemented following
Spark 4.0's native VariantType
+specification.
+
+### Overview
+
+The Variant type enables Hudi to store and query semi-structured data
efficiently. It is particularly useful for:
+
+- Schema-on-read flexibility for evolving data structures
+- Storing JSON-like data without requiring predefined schemas
+
+### Motivation
+
+Hudi readers and writers should be able to handle datasets with variant types.
+This will allow users to work with semi-structured data more easily.
+
+The variant type is now formally defined in Parquet and engines like Spark
have full support for this type.
+Users with semi-structured data are otherwise forced to use strings or byte
arrays to store this data.
+
+### What is the VARIANT Type?
+
+The `VARIANT` type is a new data type designed to store semi-structured data
(like JSON) efficiently.
+Unlike storing JSON as a plain string, `VARIANT` uses an optimized binary
encoding that allows for fast navigation
+and element extraction without needing to parse the entire document.
+It offers the flexibility of a schema-less design (like JSON) with performance
closer to structured columns.
+
+### Storage Modes: Shredded vs. Unshredded
+
+- **Unshredded** (Binary Blob):
+ - The entire JSON structure is encoded into binary metadata and value
blobs.
+ - **Pros**: Fast write speed; handles completely dynamic/random schemas
easily.
+ - **Cons**: To read a single field (e.g., `user.id`), the engine must load
the entire binary blob.
+- **Shredded** (Columnar Optimization):
+ - The engine identifies common paths in the data (e.g., `v.a` or `v.c`)
and extracts them into separate, native
+ Parquet columns (e.g., Int32, Decimal).
+ - **Pros**: Massive performance gain for queries. If you query `SELECT
v:a`, the engine reads only the specific
+ Int32 column and skips the rest of the binary data (Columnar Pruning).
+ - **Cons**: Higher write overhead to analyze and split the data, prone to
read jitter if there are large variation
+ - in shredding output.
+
+### Architecture
+
+Variant support is built on a **layered architecture** with version-specific
adapters:
+
+```
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+β Application Layer (Spark SQL) β
+β SELECT parse_json('{"a": 1}') as data β
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+ β
+ βΌ
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+β Spark Version Adapters β
+β ββββββββββββββββββββ ββββββββββββββββββββββββββ β
+β β BaseSpark3Adapterβ β BaseSpark4Adapter β β
+β β (No Variant) β β (Full Variant) β β
+β ββββββββββββββββββββ ββββββββββββββββββββββββββ β
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+ β
+ βΌ
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+β HoodieSchema.Variant β
+β (Avro Logical Type + Record Schema) β
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+ β
+ βΌ
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+β Parquet Storage β
+β GROUP { value: BINARY, metadata: BINARY } β
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+```
+
+### Variant Schema Definition
+
+The `HoodieSchema.Variant` class in `hudi-common` defines the Variant type:
+
+```java
+public static class Variant extends HoodieSchema {
+ private static final String VARIANT_METADATA_FIELD = "metadata";
+ private static final String VARIANT_VALUE_FIELD = "value";
+ private static final String VARIANT_TYPED_VALUE_FIELD = "typed_value";
+
+ private final boolean isShredded;
+ private final Option<HoodieSchema> typedValueSchema;
+}
+```
+
+#### Two Storage Modes
+
+1. **Unshredded Variant** (Default):
+ - Created with: `HoodieSchema.createVariant()`
+ - Structure: Record with two REQUIRED binary fields
+ - Fields: `metadata` (BYTES, REQUIRED), `value` (BYTES, REQUIRED)
+ - Use case: Simple semi-structured data storage
+
+2. **Shredded Variant**:
+ - Created with: `HoodieSchema.createVariantShredded(typedValueSchema)`
+ - Structure: Record with `typed_value` group containing extracted typed
columns
+ - Fields: `value` (BYTES, OPTIONAL), `metadata` (BYTES, REQUIRED),
`typed_value` (GROUP, OPTIONAL)
+ - Use case: Frequently accessed fields are extracted into native typed
columns for columnar reads
+
+#### How a Reader Distinguishes the Two Modes
+
+Both modes use the same `VARIANT` logical type annotation in the Parquet
footer β the annotation does not change.
+The reader inspects the Parquet file schema to determine which mode was used:
+
+- If the Variant group contains only `metadata` and `value`, it is
**unshredded**.
+- If the Variant group also contains a `typed_value` child group, it is
**shredded**.
+
+In the shredded case, `value` becomes OPTIONAL because when all fields are
fully shredded, the binary blob
+may be `null` β the content is entirely represented in the typed columns. The
reader falls back to `value`
+only for fields that were not shredded.
+
+#### Custom Avro Logical Type
+
+Variant uses a custom Avro logical type for identification:
+
+```java
+public static class VariantLogicalType extends LogicalType {
+ private static final String VARIANT_LOGICAL_TYPE_NAME = "variant";
+}
+```
+
+### On-Disk Representation (Parquet)
+
+Hudi's on-disk Variant representation intentionally aligns with the
+[Parquet Variant
spec](https://github.com/apache/parquet-format/blob/master/VariantEncoding.md).
+The spec defines a Variant as a group annotated with the `VARIANT` logical
type. Hudi writes Variant columns using
+this exact layout in both modes:
+
+#### Unshredded
+
+```
+optional group variant_column (VARIANT) {
+ required binary metadata;
+ required binary value;
+}
+```
+
+The entire JSON value is encoded into the `value` blob. Both fields are
REQUIRED β every row carries the full binary
+representation.
+
+#### Shredded
+
+```
+optional group variant_column (VARIANT) {
+ required binary metadata;
+ optional binary value;
+ optional group typed_value {
+ optional group a {
+ optional binary value;
+ optional int32 typed_value;
+ }
+ optional group b {
+ optional binary value;
+ optional binary typed_value (STRING);
+ }
+ optional group c {
+ optional binary value;
+ optional int64 typed_value;
+ }
+ }
+}
+```
+
+Each child under `typed_value` corresponds to a shredded field. The child's
`typed_value` column holds the extracted
+native-typed value (e.g., `int32`, `string`, `int64`), while the child's
`value` column is a fallback binary blob for
+rows where the field's type does not match the shredded type. The top-level
`value` becomes OPTIONAL β it is `null`
+when all fields in the row are fully covered by shredded columns, and
populated only for fields that were not shredded.
+
+**How a reader uses this**: The Parquet file schema in the footer tells the
reader which mode was used. If the Variant
+group contains only `metadata` and `value`, it is unshredded. If a
`typed_value` child group is present, the reader
+knows which fields have been shredded and can read them as native typed
columns β enabling column pruning, predicate
+pushdown, and data skipping without touching the binary blob.
+
+The `VARIANT` annotation is what allows readers to recognize the column as a
Variant and expose semantic operations
+(e.g., `variant_get`, JSON path access). Without it, the group is
indistinguishable from an ordinary
+`Struct<metadata: Binary, value: Binary>`.
+
+**Alignment with the Parquet spec**: Hudi does not diverge from the Parquet
Variant spec. The annotation, field names,
+field types, and repetition levels all follow the spec exactly. This means any
Parquet reader that implements the
+Variant spec can read Hudi-written Variant columns natively, and vice versa.
+
+**Reader compatibility**: Engines that do not yet implement the Parquet
Variant spec (e.g., Spark 3.5) will ignore the
+`VARIANT` annotation and fall back to reading the column as a plain struct of
binary fields.
+The data remains physically accessible, but users lose Variant query functions
and must manually parse
+the binary encoding. See [variant-appendix.md](variant-appendix.md) for
detailed backward compatibility findings.
+
+**Migration path**: If a future Parquet spec revision changes the Variant
physical layout, Hudi can introduce a
+table property (e.g., `hoodie.variant.parquet.format.version`) to control
which format is written, and the reader
+can inspect the Parquet footer to determine which layout to expect. (This is
outside the scope of this RFC for now)
+
+#### Binary Format
+
+The Variant binary encoding follows the
+[Parquet Variant Binary Encoding
spec](https://github.com/apache/parquet-format/blob/master/VariantEncoding.md):
+
+| Component | Description
|
+|--------------|---------------------------------------------------------------------|
+| **metadata** | Dictionary of field names and type information for efficient
access |
+| **value** | Binary encoding of the actual data (scalars, objects, arrays)
|
+
+Example for `{"updated": true, "new_field": 123}`:
+
+```
+Metadata Bytes: [0x01, 0x02, 0x00, 0x07, 0x10, "updated", "new_field"]
+Value Bytes: [0x02, 0x02, 0x01, 0x00, 0x01, 0x00, 0x03, 0x04, 0x0C, 0x7B]
+```
+
+The metadata contains a dictionary of all field names, while the value
contains references to these fields plus the
+actual data values.
+
+### Schema Evolution Support
+
+Variant types provide **schema-on-read** flexibility:
+
+| Aspect | Behavior
|
+|--------------------------|-------------------------------------------------------------------|
+| Adding new fields | β
Supported - New JSON fields can be added
without schema changes |
+| Removing fields | β
Supported - Missing fields return null on read
|
+| Type changes within JSON | β
Supported - Variant can store any
JSON-compatible type |
+| Table schema evolution | β
Supported - Variant column can be added to
existing tables |
+| Hudi schema evolution | β
Supported - Works with Hudi's standard schema
evolution |
+
+**Important**: The schema flexibility is within the Variant column itself. The
table-level schema (including the Variant
+column definition) still follows Hudi's standard schema evolution rules.
+
+### Column Statistics and Indexing
+
+Variant statistics and indexing capabilities depend on whether a field is
accessed from the unshredded binary blob
+or from a shredded (extracted) typed column. With shredding, extracted fields
become regular typed Parquet columns
+that automatically leverage all existing Hudi metadata infrastructure.
+
+#### Unshredded Variant Column
+
+| Feature | Status | Notes
|
+|--------------------|--------|--------------------------------------------------------------------|
+| Value/null counts | β
| Standard column-level counts via MDT
`column_stats` partition |
+| Min/max bounds | β | Binary blob has no meaningful sort order
|
+| Data skipping | β | Requires comparable min/max bounds
|
+| Bloom filter | β | Bloom filters are used for record key lookups
only |
+| Partition stats | β | Variant columns are not used as partition keys
|
+| Predicate pushdown | β οΈ | Structural predicates only (`IS NULL`, `IS NOT
NULL`) |
+
+#### Shredded Variant Fields
+
+| Feature | Status | Notes
|
+|--------------------|--------|------------------------------------------------------------------------------------------------------------------------|
+| Min/max bounds | β
| Shredded columns are regular typed columns;
`HoodieTableMetadataUtil.isColumnTypeSupported()` accepts all primitives |
+| Data skipping | β
| `DataSkippingUtils` translates filters on
shredded fields to min/max range checks |
+| Expression index | β
| Index over `variant_get()` expressions enables
stats collection without full shredding |
+| Bloom filter | β | Bloom filters are used for record key lookups
only |
+| Partition stats | β
| Applicable if a shredded field is used as a
partition key |
+| Predicate pushdown | β
| Full pushdown support: equality, range, and
IN-list predicates |
+| Column pruning | β
| Read only the shredded columns needed; skip
the binary blob entirely |
+
+**Recommendation**: Enable shredding for frequently accessed fields to unlock
full MDT column stats, data skipping,
+and predicate pushdown. For lighter-weight optimization, use expression
indexes over `variant_get()` expressions to
+collect per-field statistics without materializing shredded columns.
+
+### Usage Guide
+
+#### Spark 4.0+ (Native Support)
+
+```sql
+-- Create table with Variant column
+CREATE TABLE events
+(
+ id STRING,
+ ts TIMESTAMP,
+ payload VARIANT
+) USING hudi
+OPTIONS (
+ primaryKey = 'id',
+ preCombineField = 'ts'
+);
+
+-- Insert with parse_json
+INSERT INTO events
+VALUES ('1', current_timestamp(), parse_json('{"event": "click", "page":
"/home"}')),
+ ('2', current_timestamp(), parse_json('{"event": "purchase", "amount":
99.99}'));
+
+-- Query Variant data
+SELECT id, payload:event, payload:amount
+FROM events;
+
+-- Update Variant column
+UPDATE events
+SET payload = parse_json('{"event": "click", "page": "/products"}')
+WHERE id = '1';
+
+-- Works with both COW and MOR tables
+CREATE TABLE events_mor
+(
+ id STRING,
+ ts TIMESTAMP,
+ payload VARIANT
+) USING hudi
+TBLPROPERTIES (
+ 'type' = 'mor',
+ 'primaryKey' = 'id',
+ 'preCombineField' = 'ts'
+);
+```
+
+#### Spark 3.x (Backward Compatibility Read)
+
+Spark 3.x does not support VariantType natively, but can read Variant tables
as struct:
+
+```sql
+-- Reading Spark 4.0 Variant table in Spark 3.x
+-- Variant column appears as: STRUCT<value: BINARY, metadata: BINARY>
+
+SELECT id, cast(payload.value as string)
+FROM events;
+```
+
+**Limitations in Spark 3.x**:
+
+- Cannot write Variant data
+- Variant column reads as raw struct with binary fields
+- No helper functions like `parse_json()` available
+
+### Cross-Engine Compatibility
+
+Engines that do not support the Variant type can still read all non-Variant
columns in the table. The Variant column's
+on-disk layout is a plain Parquet struct (`value BINARY, metadata BINARY`), so
engines that lack Variant-aware logic
+can either project it as a struct of binary fields or simply exclude it from
their column projection. This means adding
+a Variant column to a table does not break reads from older or third-party
engines β they continue to access every
+other column as before.
+
+#### Flink Integration
+
+| Operation | Support |
+|--------------------------------------|------------------------------------|
+| Reading Spark-written Variant tables | β
Supported |
+| Variant representation in Flink | `ROW<value BYTES, metadata BYTES>` |
+| Writing Variant from Flink | β Not yet implemented |
+
+Example Flink query reading Variant data:
+
+```sql
+-- Flink sees Variant as ROW type
+SELECT id, variant_col.value, variant_col.metadata
+FROM hudi_variant_table;
+```
+
+#### Avro Serialization (MOR Tables)
+
+For MOR tables, Variant data is serialized to Avro for log files:
+
+```
+{
+ "type": "record",
+ "logicalType": "variant",
+ "fields": [
+ {"name": "value", "type": "bytes"},
+ {"name": "metadata", "type": "bytes"}
+ ]
+}
+```
+
+### Backward Compatibility
+
+The implementation ensures backward compatibility through:
+
+1. **Storage Format**: On-disk layout is a plain Parquet struct (`value
BINARY, metadata BINARY`), readable by any Parquet-compatible engine
+2. **Logical Type Annotation**: Avro logical type allows newer versions to
recognize Variant semantics
+3. **Graceful Degradation**: Older readers see Variant as `STRUCT<value:
BINARY, metadata: BINARY>`
+
+| Scenario | Behavior |
+|-----------------------------------|---------------------------|
+| Spark 4.0 writes, Spark 4.0 reads | Full Variant support |
+| Spark 4.0 writes, Spark 3.x reads | Struct with binary fields |
+| Spark 4.0 writes, Flink reads | ROW with binary fields |
+| Spark 3.x writes Variant | β Not supported |
+
+### Limitations and Constraints
+
+1. **Spark Version Dependency**:
+ - Write support requires Spark 4.0+
+ - Spark 3.x limited to read-only access with degraded experience
+
+2. **Storage Overhead**:
+ - Metadata stored redundantly per row
+ - No column-level compression optimizations for Variant content
+
+3. **Query Performance**:
+ - Predicate pushdown on unshredded Variant content is limited to
structural predicates (`IS NULL`, `IS NOT NULL`)
+ - Accessing fields from the unshredded blob requires loading the full
binary value
+ - Shredded fields and expression indexes enable full predicate pushdown,
data skipping, and column pruning
+
+4. **Shredded Variant**:
+ - `typed_value` field is defined for extracted typed columns within the
Variant group
+ - Shredded fields unlock full MDT column stats and data skipping via
`DataSkippingUtils`
+
+5. **Functions**:
+ - `parse_json()` - Spark 4.0+ only
+ - JSON path access (`payload:field`) - Spark 4.0+ only
+ - No UDFs for Variant manipulation in Spark 3.x
+
+### Key Implementation Files
+
+| File | Description
|
+|-----------------------------------------------------------|--------------------------------------------------|
+| `hudi-common/.../HoodieSchema.java` | Core Variant
schema definition with logical type |
+| `hudi-spark3-common/.../BaseSpark3Adapter.scala` | Spark 3 adapter
(no Variant support) |
+| `hudi-spark4-common/.../BaseSpark4Adapter.scala` | Spark 4 adapter
with full Variant API |
+| `hudi-spark-client/.../HoodieRowParquetWriteSupport.java` | Variant Parquet
writing |
+| `hudi-spark-client/.../HoodieSparkSchemaConverters.scala` | Schema
conversion for Variant |
+| `hudi-spark4.0.x/.../AvroSerializer.scala` | Spark to Avro
Variant conversion |
+| `hudi-spark4.0.x/.../AvroDeserializer.scala` | Avro to Spark
Variant conversion |
+
+### Test Coverage
+
+| Test | Description
|
+|-------------------------------------------|------------------------------------------------------|
Review Comment:
π€ The RFC doesn't have an explicit "Alternatives Considered" section. It
would strengthen the proposal to briefly discuss why a custom Avro logical type
was chosen over alternatives (e.g., reusing an existing union type, or a raw
bytes approach with metadata in table properties). Even a short paragraph would
help future readers understand the design rationale.
##########
rfc/rfc-99/rfc-99.md:
##########
@@ -194,5 +195,444 @@ SQL Extensions needs to be added to define the table in a
hudi type native way.
TODO: There is an open question regarding the need to maintain type ids to
track schema evolution and how it would interplay with NBCC.
+---
+
+## Variant Type Implementation
+
+This section documents the implementation of the VARIANT type in Hudi, which
provides first-class support for
+semi-structured data (e.g., JSON). The Variant type is implemented following
Spark 4.0's native VariantType
+specification.
+
+### Overview
+
+The Variant type enables Hudi to store and query semi-structured data
efficiently. It is particularly useful for:
+
+- Schema-on-read flexibility for evolving data structures
+- Storing JSON-like data without requiring predefined schemas
+
+### Motivation
+
+Hudi readers and writers should be able to handle datasets with variant types.
+This will allow users to work with semi-structured data more easily.
+
+The variant type is now formally defined in Parquet and engines like Spark
have full support for this type.
+Users with semi-structured data are otherwise forced to use strings or byte
arrays to store this data.
+
+### What is the VARIANT Type?
+
+The `VARIANT` type is a new data type designed to store semi-structured data
(like JSON) efficiently.
+Unlike storing JSON as a plain string, `VARIANT` uses an optimized binary
encoding that allows for fast navigation
+and element extraction without needing to parse the entire document.
+It offers the flexibility of a schema-less design (like JSON) with performance
closer to structured columns.
+
+### Storage Modes: Shredded vs. Unshredded
+
+- **Unshredded** (Binary Blob):
+ - The entire JSON structure is encoded into binary metadata and value
blobs.
+ - **Pros**: Fast write speed; handles completely dynamic/random schemas
easily.
+ - **Cons**: To read a single field (e.g., `user.id`), the engine must load
the entire binary blob.
+- **Shredded** (Columnar Optimization):
+ - The engine identifies common paths in the data (e.g., `v.a` or `v.c`)
and extracts them into separate, native
+ Parquet columns (e.g., Int32, Decimal).
+ - **Pros**: Massive performance gain for queries. If you query `SELECT
v:a`, the engine reads only the specific
+ Int32 column and skips the rest of the binary data (Columnar Pruning).
+ - **Cons**: Higher write overhead to analyze and split the data, prone to
read jitter if there are large variation
+ - in shredding output.
+
+### Architecture
+
+Variant support is built on a **layered architecture** with version-specific
adapters:
+
+```
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+β Application Layer (Spark SQL) β
+β SELECT parse_json('{"a": 1}') as data β
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+ β
+ βΌ
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+β Spark Version Adapters β
+β ββββββββββββββββββββ ββββββββββββββββββββββββββ β
+β β BaseSpark3Adapterβ β BaseSpark4Adapter β β
+β β (No Variant) β β (Full Variant) β β
+β ββββββββββββββββββββ ββββββββββββββββββββββββββ β
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+ β
+ βΌ
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+β HoodieSchema.Variant β
+β (Avro Logical Type + Record Schema) β
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+ β
+ βΌ
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+β Parquet Storage β
+β GROUP { value: BINARY, metadata: BINARY } β
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+```
+
+### Variant Schema Definition
+
+The `HoodieSchema.Variant` class in `hudi-common` defines the Variant type:
+
+```java
+public static class Variant extends HoodieSchema {
+ private static final String VARIANT_METADATA_FIELD = "metadata";
+ private static final String VARIANT_VALUE_FIELD = "value";
+ private static final String VARIANT_TYPED_VALUE_FIELD = "typed_value";
+
+ private final boolean isShredded;
+ private final Option<HoodieSchema> typedValueSchema;
+}
+```
+
+#### Two Storage Modes
+
+1. **Unshredded Variant** (Default):
+ - Created with: `HoodieSchema.createVariant()`
+ - Structure: Record with two REQUIRED binary fields
+ - Fields: `metadata` (BYTES, REQUIRED), `value` (BYTES, REQUIRED)
+ - Use case: Simple semi-structured data storage
+
+2. **Shredded Variant**:
+ - Created with: `HoodieSchema.createVariantShredded(typedValueSchema)`
+ - Structure: Record with `typed_value` group containing extracted typed
columns
+ - Fields: `value` (BYTES, OPTIONAL), `metadata` (BYTES, REQUIRED),
`typed_value` (GROUP, OPTIONAL)
+ - Use case: Frequently accessed fields are extracted into native typed
columns for columnar reads
+
+#### How a Reader Distinguishes the Two Modes
+
+Both modes use the same `VARIANT` logical type annotation in the Parquet
footer β the annotation does not change.
+The reader inspects the Parquet file schema to determine which mode was used:
+
+- If the Variant group contains only `metadata` and `value`, it is
**unshredded**.
+- If the Variant group also contains a `typed_value` child group, it is
**shredded**.
+
+In the shredded case, `value` becomes OPTIONAL because when all fields are
fully shredded, the binary blob
+may be `null` β the content is entirely represented in the typed columns. The
reader falls back to `value`
+only for fields that were not shredded.
+
+#### Custom Avro Logical Type
+
+Variant uses a custom Avro logical type for identification:
+
+```java
+public static class VariantLogicalType extends LogicalType {
+ private static final String VARIANT_LOGICAL_TYPE_NAME = "variant";
+}
+```
+
+### On-Disk Representation (Parquet)
+
+Hudi's on-disk Variant representation intentionally aligns with the
+[Parquet Variant
spec](https://github.com/apache/parquet-format/blob/master/VariantEncoding.md).
+The spec defines a Variant as a group annotated with the `VARIANT` logical
type. Hudi writes Variant columns using
+this exact layout in both modes:
+
+#### Unshredded
+
+```
+optional group variant_column (VARIANT) {
+ required binary metadata;
+ required binary value;
+}
+```
+
+The entire JSON value is encoded into the `value` blob. Both fields are
REQUIRED β every row carries the full binary
+representation.
+
+#### Shredded
+
+```
+optional group variant_column (VARIANT) {
+ required binary metadata;
+ optional binary value;
+ optional group typed_value {
+ optional group a {
+ optional binary value;
+ optional int32 typed_value;
+ }
+ optional group b {
+ optional binary value;
+ optional binary typed_value (STRING);
+ }
+ optional group c {
+ optional binary value;
+ optional int64 typed_value;
+ }
+ }
+}
+```
+
+Each child under `typed_value` corresponds to a shredded field. The child's
`typed_value` column holds the extracted
+native-typed value (e.g., `int32`, `string`, `int64`), while the child's
`value` column is a fallback binary blob for
+rows where the field's type does not match the shredded type. The top-level
`value` becomes OPTIONAL β it is `null`
+when all fields in the row are fully covered by shredded columns, and
populated only for fields that were not shredded.
+
+**How a reader uses this**: The Parquet file schema in the footer tells the
reader which mode was used. If the Variant
+group contains only `metadata` and `value`, it is unshredded. If a
`typed_value` child group is present, the reader
+knows which fields have been shredded and can read them as native typed
columns β enabling column pruning, predicate
+pushdown, and data skipping without touching the binary blob.
+
+The `VARIANT` annotation is what allows readers to recognize the column as a
Variant and expose semantic operations
+(e.g., `variant_get`, JSON path access). Without it, the group is
indistinguishable from an ordinary
+`Struct<metadata: Binary, value: Binary>`.
+
+**Alignment with the Parquet spec**: Hudi does not diverge from the Parquet
Variant spec. The annotation, field names,
+field types, and repetition levels all follow the spec exactly. This means any
Parquet reader that implements the
+Variant spec can read Hudi-written Variant columns natively, and vice versa.
+
+**Reader compatibility**: Engines that do not yet implement the Parquet
Variant spec (e.g., Spark 3.5) will ignore the
+`VARIANT` annotation and fall back to reading the column as a plain struct of
binary fields.
+The data remains physically accessible, but users lose Variant query functions
and must manually parse
+the binary encoding. See [variant-appendix.md](variant-appendix.md) for
detailed backward compatibility findings.
+
+**Migration path**: If a future Parquet spec revision changes the Variant
physical layout, Hudi can introduce a
+table property (e.g., `hoodie.variant.parquet.format.version`) to control
which format is written, and the reader
+can inspect the Parquet footer to determine which layout to expect. (This is
outside the scope of this RFC for now)
+
+#### Binary Format
+
+The Variant binary encoding follows the
+[Parquet Variant Binary Encoding
spec](https://github.com/apache/parquet-format/blob/master/VariantEncoding.md):
+
+| Component | Description
|
+|--------------|---------------------------------------------------------------------|
+| **metadata** | Dictionary of field names and type information for efficient
access |
+| **value** | Binary encoding of the actual data (scalars, objects, arrays)
|
+
+Example for `{"updated": true, "new_field": 123}`:
+
+```
+Metadata Bytes: [0x01, 0x02, 0x00, 0x07, 0x10, "updated", "new_field"]
+Value Bytes: [0x02, 0x02, 0x01, 0x00, 0x01, 0x00, 0x03, 0x04, 0x0C, 0x7B]
+```
+
+The metadata contains a dictionary of all field names, while the value
contains references to these fields plus the
+actual data values.
+
+### Schema Evolution Support
+
+Variant types provide **schema-on-read** flexibility:
+
+| Aspect | Behavior
|
+|--------------------------|-------------------------------------------------------------------|
+| Adding new fields | β
Supported - New JSON fields can be added
without schema changes |
+| Removing fields | β
Supported - Missing fields return null on read
|
+| Type changes within JSON | β
Supported - Variant can store any
JSON-compatible type |
+| Table schema evolution | β
Supported - Variant column can be added to
existing tables |
+| Hudi schema evolution | β
Supported - Works with Hudi's standard schema
evolution |
+
+**Important**: The schema flexibility is within the Variant column itself. The
table-level schema (including the Variant
+column definition) still follows Hudi's standard schema evolution rules.
+
+### Column Statistics and Indexing
+
+Variant statistics and indexing capabilities depend on whether a field is
accessed from the unshredded binary blob
+or from a shredded (extracted) typed column. With shredding, extracted fields
become regular typed Parquet columns
+that automatically leverage all existing Hudi metadata infrastructure.
+
+#### Unshredded Variant Column
+
+| Feature | Status | Notes
|
+|--------------------|--------|--------------------------------------------------------------------|
+| Value/null counts | β
| Standard column-level counts via MDT
`column_stats` partition |
+| Min/max bounds | β | Binary blob has no meaningful sort order
|
+| Data skipping | β | Requires comparable min/max bounds
|
+| Bloom filter | β | Bloom filters are used for record key lookups
only |
+| Partition stats | β | Variant columns are not used as partition keys
|
+| Predicate pushdown | β οΈ | Structural predicates only (`IS NULL`, `IS NOT
NULL`) |
+
+#### Shredded Variant Fields
+
+| Feature | Status | Notes
|
+|--------------------|--------|------------------------------------------------------------------------------------------------------------------------|
+| Min/max bounds | β
| Shredded columns are regular typed columns;
`HoodieTableMetadataUtil.isColumnTypeSupported()` accepts all primitives |
+| Data skipping | β
| `DataSkippingUtils` translates filters on
shredded fields to min/max range checks |
+| Expression index | β
| Index over `variant_get()` expressions enables
stats collection without full shredding |
+| Bloom filter | β | Bloom filters are used for record key lookups
only |
+| Partition stats | β
| Applicable if a shredded field is used as a
partition key |
+| Predicate pushdown | β
| Full pushdown support: equality, range, and
IN-list predicates |
+| Column pruning | β
| Read only the shredded columns needed; skip
the binary blob entirely |
+
+**Recommendation**: Enable shredding for frequently accessed fields to unlock
full MDT column stats, data skipping,
+and predicate pushdown. For lighter-weight optimization, use expression
indexes over `variant_get()` expressions to
+collect per-field statistics without materializing shredded columns.
+
+### Usage Guide
+
+#### Spark 4.0+ (Native Support)
+
+```sql
+-- Create table with Variant column
+CREATE TABLE events
+(
+ id STRING,
+ ts TIMESTAMP,
+ payload VARIANT
+) USING hudi
+OPTIONS (
+ primaryKey = 'id',
+ preCombineField = 'ts'
+);
+
+-- Insert with parse_json
+INSERT INTO events
+VALUES ('1', current_timestamp(), parse_json('{"event": "click", "page":
"/home"}')),
+ ('2', current_timestamp(), parse_json('{"event": "purchase", "amount":
99.99}'));
+
+-- Query Variant data
+SELECT id, payload:event, payload:amount
+FROM events;
+
+-- Update Variant column
+UPDATE events
+SET payload = parse_json('{"event": "click", "page": "/products"}')
+WHERE id = '1';
+
+-- Works with both COW and MOR tables
+CREATE TABLE events_mor
+(
+ id STRING,
+ ts TIMESTAMP,
+ payload VARIANT
+) USING hudi
+TBLPROPERTIES (
+ 'type' = 'mor',
+ 'primaryKey' = 'id',
+ 'preCombineField' = 'ts'
+);
+```
+
+#### Spark 3.x (Backward Compatibility Read)
+
+Spark 3.x does not support VariantType natively, but can read Variant tables
as struct:
+
+```sql
+-- Reading Spark 4.0 Variant table in Spark 3.x
+-- Variant column appears as: STRUCT<value: BINARY, metadata: BINARY>
+
+SELECT id, cast(payload.value as string)
+FROM events;
+```
+
+**Limitations in Spark 3.x**:
+
+- Cannot write Variant data
+- Variant column reads as raw struct with binary fields
+- No helper functions like `parse_json()` available
+
+### Cross-Engine Compatibility
+
+Engines that do not support the Variant type can still read all non-Variant
columns in the table. The Variant column's
+on-disk layout is a plain Parquet struct (`value BINARY, metadata BINARY`), so
engines that lack Variant-aware logic
+can either project it as a struct of binary fields or simply exclude it from
their column projection. This means adding
+a Variant column to a table does not break reads from older or third-party
engines β they continue to access every
+other column as before.
+
+#### Flink Integration
+
+| Operation | Support |
+|--------------------------------------|------------------------------------|
+| Reading Spark-written Variant tables | β
Supported |
+| Variant representation in Flink | `ROW<value BYTES, metadata BYTES>` |
+| Writing Variant from Flink | β Not yet implemented |
+
+Example Flink query reading Variant data:
+
+```sql
+-- Flink sees Variant as ROW type
+SELECT id, variant_col.value, variant_col.metadata
+FROM hudi_variant_table;
+```
+
+#### Avro Serialization (MOR Tables)
+
+For MOR tables, Variant data is serialized to Avro for log files:
+
+```
+{
+ "type": "record",
+ "logicalType": "variant",
+ "fields": [
+ {"name": "value", "type": "bytes"},
+ {"name": "metadata", "type": "bytes"}
+ ]
+}
+```
+
+### Backward Compatibility
+
+The implementation ensures backward compatibility through:
+
+1. **Storage Format**: On-disk layout is a plain Parquet struct (`value
BINARY, metadata BINARY`), readable by any Parquet-compatible engine
+2. **Logical Type Annotation**: Avro logical type allows newer versions to
recognize Variant semantics
+3. **Graceful Degradation**: Older readers see Variant as `STRUCT<value:
BINARY, metadata: BINARY>`
+
Review Comment:
π€ The "Limitations and Constraints" section mentions metadata is stored
redundantly per row but doesn't discuss the storage overhead implications
quantitatively. For tables with millions of rows where the Variant schema is
stable, this could be significant. Have you considered whether Hudi could
deduplicate metadata at the file group or row group level, or is this strictly
a Parquet-spec constraint that Hudi can't optimize around?
##########
rfc/rfc-99/rfc-99.md:
##########
@@ -194,5 +195,444 @@ SQL Extensions needs to be added to define the table in a
hudi type native way.
TODO: There is an open question regarding the need to maintain type ids to
track schema evolution and how it would interplay with NBCC.
+---
+
+## Variant Type Implementation
+
+This section documents the implementation of the VARIANT type in Hudi, which
provides first-class support for
+semi-structured data (e.g., JSON). The Variant type is implemented following
Spark 4.0's native VariantType
+specification.
+
+### Overview
+
+The Variant type enables Hudi to store and query semi-structured data
efficiently. It is particularly useful for:
+
+- Schema-on-read flexibility for evolving data structures
+- Storing JSON-like data without requiring predefined schemas
+
+### Motivation
+
+Hudi readers and writers should be able to handle datasets with variant types.
+This will allow users to work with semi-structured data more easily.
+
+The variant type is now formally defined in Parquet and engines like Spark
have full support for this type.
+Users with semi-structured data are otherwise forced to use strings or byte
arrays to store this data.
+
+### What is the VARIANT Type?
+
+The `VARIANT` type is a new data type designed to store semi-structured data
(like JSON) efficiently.
+Unlike storing JSON as a plain string, `VARIANT` uses an optimized binary
encoding that allows for fast navigation
+and element extraction without needing to parse the entire document.
+It offers the flexibility of a schema-less design (like JSON) with performance
closer to structured columns.
+
+### Storage Modes: Shredded vs. Unshredded
+
+- **Unshredded** (Binary Blob):
+ - The entire JSON structure is encoded into binary metadata and value
blobs.
+ - **Pros**: Fast write speed; handles completely dynamic/random schemas
easily.
+ - **Cons**: To read a single field (e.g., `user.id`), the engine must load
the entire binary blob.
+- **Shredded** (Columnar Optimization):
+ - The engine identifies common paths in the data (e.g., `v.a` or `v.c`)
and extracts them into separate, native
+ Parquet columns (e.g., Int32, Decimal).
+ - **Pros**: Massive performance gain for queries. If you query `SELECT
v:a`, the engine reads only the specific
+ Int32 column and skips the rest of the binary data (Columnar Pruning).
+ - **Cons**: Higher write overhead to analyze and split the data, prone to
read jitter if there are large variation
+ - in shredding output.
+
+### Architecture
+
+Variant support is built on a **layered architecture** with version-specific
adapters:
+
+```
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+β Application Layer (Spark SQL) β
+β SELECT parse_json('{"a": 1}') as data β
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+ β
+ βΌ
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+β Spark Version Adapters β
+β ββββββββββββββββββββ ββββββββββββββββββββββββββ β
+β β BaseSpark3Adapterβ β BaseSpark4Adapter β β
+β β (No Variant) β β (Full Variant) β β
+β ββββββββββββββββββββ ββββββββββββββββββββββββββ β
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+ β
+ βΌ
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+β HoodieSchema.Variant β
+β (Avro Logical Type + Record Schema) β
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+ β
+ βΌ
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+β Parquet Storage β
+β GROUP { value: BINARY, metadata: BINARY } β
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+```
+
+### Variant Schema Definition
+
+The `HoodieSchema.Variant` class in `hudi-common` defines the Variant type:
+
+```java
+public static class Variant extends HoodieSchema {
+ private static final String VARIANT_METADATA_FIELD = "metadata";
+ private static final String VARIANT_VALUE_FIELD = "value";
+ private static final String VARIANT_TYPED_VALUE_FIELD = "typed_value";
+
+ private final boolean isShredded;
+ private final Option<HoodieSchema> typedValueSchema;
+}
+```
+
+#### Two Storage Modes
+
+1. **Unshredded Variant** (Default):
+ - Created with: `HoodieSchema.createVariant()`
+ - Structure: Record with two REQUIRED binary fields
+ - Fields: `metadata` (BYTES, REQUIRED), `value` (BYTES, REQUIRED)
+ - Use case: Simple semi-structured data storage
+
+2. **Shredded Variant**:
+ - Created with: `HoodieSchema.createVariantShredded(typedValueSchema)`
+ - Structure: Record with `typed_value` group containing extracted typed
columns
+ - Fields: `value` (BYTES, OPTIONAL), `metadata` (BYTES, REQUIRED),
`typed_value` (GROUP, OPTIONAL)
+ - Use case: Frequently accessed fields are extracted into native typed
columns for columnar reads
+
+#### How a Reader Distinguishes the Two Modes
+
+Both modes use the same `VARIANT` logical type annotation in the Parquet
footer β the annotation does not change.
+The reader inspects the Parquet file schema to determine which mode was used:
+
+- If the Variant group contains only `metadata` and `value`, it is
**unshredded**.
+- If the Variant group also contains a `typed_value` child group, it is
**shredded**.
+
+In the shredded case, `value` becomes OPTIONAL because when all fields are
fully shredded, the binary blob
+may be `null` β the content is entirely represented in the typed columns. The
reader falls back to `value`
+only for fields that were not shredded.
+
+#### Custom Avro Logical Type
+
+Variant uses a custom Avro logical type for identification:
+
+```java
+public static class VariantLogicalType extends LogicalType {
+ private static final String VARIANT_LOGICAL_TYPE_NAME = "variant";
+}
+```
+
+### On-Disk Representation (Parquet)
+
+Hudi's on-disk Variant representation intentionally aligns with the
+[Parquet Variant
spec](https://github.com/apache/parquet-format/blob/master/VariantEncoding.md).
+The spec defines a Variant as a group annotated with the `VARIANT` logical
type. Hudi writes Variant columns using
+this exact layout in both modes:
+
+#### Unshredded
+
+```
+optional group variant_column (VARIANT) {
+ required binary metadata;
+ required binary value;
+}
+```
+
+The entire JSON value is encoded into the `value` blob. Both fields are
REQUIRED β every row carries the full binary
+representation.
+
+#### Shredded
+
+```
+optional group variant_column (VARIANT) {
+ required binary metadata;
+ optional binary value;
+ optional group typed_value {
+ optional group a {
+ optional binary value;
+ optional int32 typed_value;
+ }
+ optional group b {
+ optional binary value;
+ optional binary typed_value (STRING);
+ }
+ optional group c {
+ optional binary value;
+ optional int64 typed_value;
+ }
+ }
+}
+```
+
+Each child under `typed_value` corresponds to a shredded field. The child's
`typed_value` column holds the extracted
+native-typed value (e.g., `int32`, `string`, `int64`), while the child's
`value` column is a fallback binary blob for
+rows where the field's type does not match the shredded type. The top-level
`value` becomes OPTIONAL β it is `null`
+when all fields in the row are fully covered by shredded columns, and
populated only for fields that were not shredded.
+
+**How a reader uses this**: The Parquet file schema in the footer tells the
reader which mode was used. If the Variant
+group contains only `metadata` and `value`, it is unshredded. If a
`typed_value` child group is present, the reader
+knows which fields have been shredded and can read them as native typed
columns β enabling column pruning, predicate
+pushdown, and data skipping without touching the binary blob.
+
+The `VARIANT` annotation is what allows readers to recognize the column as a
Variant and expose semantic operations
+(e.g., `variant_get`, JSON path access). Without it, the group is
indistinguishable from an ordinary
+`Struct<metadata: Binary, value: Binary>`.
+
+**Alignment with the Parquet spec**: Hudi does not diverge from the Parquet
Variant spec. The annotation, field names,
+field types, and repetition levels all follow the spec exactly. This means any
Parquet reader that implements the
+Variant spec can read Hudi-written Variant columns natively, and vice versa.
+
+**Reader compatibility**: Engines that do not yet implement the Parquet
Variant spec (e.g., Spark 3.5) will ignore the
+`VARIANT` annotation and fall back to reading the column as a plain struct of
binary fields.
+The data remains physically accessible, but users lose Variant query functions
and must manually parse
+the binary encoding. See [variant-appendix.md](variant-appendix.md) for
detailed backward compatibility findings.
+
+**Migration path**: If a future Parquet spec revision changes the Variant
physical layout, Hudi can introduce a
+table property (e.g., `hoodie.variant.parquet.format.version`) to control
which format is written, and the reader
+can inspect the Parquet footer to determine which layout to expect. (This is
outside the scope of this RFC for now)
+
+#### Binary Format
+
+The Variant binary encoding follows the
+[Parquet Variant Binary Encoding
spec](https://github.com/apache/parquet-format/blob/master/VariantEncoding.md):
+
+| Component | Description
|
+|--------------|---------------------------------------------------------------------|
+| **metadata** | Dictionary of field names and type information for efficient
access |
+| **value** | Binary encoding of the actual data (scalars, objects, arrays)
|
+
+Example for `{"updated": true, "new_field": 123}`:
+
+```
+Metadata Bytes: [0x01, 0x02, 0x00, 0x07, 0x10, "updated", "new_field"]
+Value Bytes: [0x02, 0x02, 0x01, 0x00, 0x01, 0x00, 0x03, 0x04, 0x0C, 0x7B]
+```
+
+The metadata contains a dictionary of all field names, while the value
contains references to these fields plus the
+actual data values.
+
+### Schema Evolution Support
+
+Variant types provide **schema-on-read** flexibility:
+
+| Aspect | Behavior
|
+|--------------------------|-------------------------------------------------------------------|
+| Adding new fields | β
Supported - New JSON fields can be added
without schema changes |
+| Removing fields | β
Supported - Missing fields return null on read
|
+| Type changes within JSON | β
Supported - Variant can store any
JSON-compatible type |
+| Table schema evolution | β
Supported - Variant column can be added to
existing tables |
+| Hudi schema evolution | β
Supported - Works with Hudi's standard schema
evolution |
+
+**Important**: The schema flexibility is within the Variant column itself. The
table-level schema (including the Variant
+column definition) still follows Hudi's standard schema evolution rules.
+
+### Column Statistics and Indexing
+
+Variant statistics and indexing capabilities depend on whether a field is
accessed from the unshredded binary blob
+or from a shredded (extracted) typed column. With shredding, extracted fields
become regular typed Parquet columns
+that automatically leverage all existing Hudi metadata infrastructure.
+
+#### Unshredded Variant Column
+
+| Feature | Status | Notes
|
+|--------------------|--------|--------------------------------------------------------------------|
+| Value/null counts | β
| Standard column-level counts via MDT
`column_stats` partition |
+| Min/max bounds | β | Binary blob has no meaningful sort order
|
+| Data skipping | β | Requires comparable min/max bounds
|
+| Bloom filter | β | Bloom filters are used for record key lookups
only |
+| Partition stats | β | Variant columns are not used as partition keys
|
+| Predicate pushdown | β οΈ | Structural predicates only (`IS NULL`, `IS NOT
NULL`) |
+
+#### Shredded Variant Fields
+
+| Feature | Status | Notes
|
+|--------------------|--------|------------------------------------------------------------------------------------------------------------------------|
+| Min/max bounds | β
| Shredded columns are regular typed columns;
`HoodieTableMetadataUtil.isColumnTypeSupported()` accepts all primitives |
+| Data skipping | β
| `DataSkippingUtils` translates filters on
shredded fields to min/max range checks |
+| Expression index | β
| Index over `variant_get()` expressions enables
stats collection without full shredding |
+| Bloom filter | β | Bloom filters are used for record key lookups
only |
+| Partition stats | β
| Applicable if a shredded field is used as a
partition key |
+| Predicate pushdown | β
| Full pushdown support: equality, range, and
IN-list predicates |
+| Column pruning | β
| Read only the shredded columns needed; skip
the binary blob entirely |
+
+**Recommendation**: Enable shredding for frequently accessed fields to unlock
full MDT column stats, data skipping,
+and predicate pushdown. For lighter-weight optimization, use expression
indexes over `variant_get()` expressions to
+collect per-field statistics without materializing shredded columns.
+
+### Usage Guide
+
+#### Spark 4.0+ (Native Support)
+
+```sql
+-- Create table with Variant column
+CREATE TABLE events
+(
+ id STRING,
+ ts TIMESTAMP,
+ payload VARIANT
+) USING hudi
+OPTIONS (
+ primaryKey = 'id',
+ preCombineField = 'ts'
+);
+
+-- Insert with parse_json
+INSERT INTO events
+VALUES ('1', current_timestamp(), parse_json('{"event": "click", "page":
"/home"}')),
+ ('2', current_timestamp(), parse_json('{"event": "purchase", "amount":
99.99}'));
+
+-- Query Variant data
+SELECT id, payload:event, payload:amount
+FROM events;
+
+-- Update Variant column
+UPDATE events
+SET payload = parse_json('{"event": "click", "page": "/products"}')
+WHERE id = '1';
+
+-- Works with both COW and MOR tables
+CREATE TABLE events_mor
+(
+ id STRING,
+ ts TIMESTAMP,
+ payload VARIANT
+) USING hudi
+TBLPROPERTIES (
+ 'type' = 'mor',
+ 'primaryKey' = 'id',
+ 'preCombineField' = 'ts'
+);
+```
+
+#### Spark 3.x (Backward Compatibility Read)
+
+Spark 3.x does not support VariantType natively, but can read Variant tables
as struct:
+
+```sql
+-- Reading Spark 4.0 Variant table in Spark 3.x
+-- Variant column appears as: STRUCT<value: BINARY, metadata: BINARY>
+
+SELECT id, cast(payload.value as string)
+FROM events;
+```
+
+**Limitations in Spark 3.x**:
+
+- Cannot write Variant data
+- Variant column reads as raw struct with binary fields
+- No helper functions like `parse_json()` available
+
+### Cross-Engine Compatibility
+
+Engines that do not support the Variant type can still read all non-Variant
columns in the table. The Variant column's
+on-disk layout is a plain Parquet struct (`value BINARY, metadata BINARY`), so
engines that lack Variant-aware logic
+can either project it as a struct of binary fields or simply exclude it from
their column projection. This means adding
+a Variant column to a table does not break reads from older or third-party
engines β they continue to access every
+other column as before.
+
+#### Flink Integration
+
+| Operation | Support |
+|--------------------------------------|------------------------------------|
+| Reading Spark-written Variant tables | β
Supported |
+| Variant representation in Flink | `ROW<value BYTES, metadata BYTES>` |
+| Writing Variant from Flink | β Not yet implemented |
+
+Example Flink query reading Variant data:
+
+```sql
+-- Flink sees Variant as ROW type
+SELECT id, variant_col.value, variant_col.metadata
+FROM hudi_variant_table;
+```
+
+#### Avro Serialization (MOR Tables)
+
+For MOR tables, Variant data is serialized to Avro for log files:
+
+```
+{
+ "type": "record",
+ "logicalType": "variant",
+ "fields": [
+ {"name": "value", "type": "bytes"},
+ {"name": "metadata", "type": "bytes"}
+ ]
+}
+```
+
+### Backward Compatibility
+
+The implementation ensures backward compatibility through:
+
+1. **Storage Format**: On-disk layout is a plain Parquet struct (`value
BINARY, metadata BINARY`), readable by any Parquet-compatible engine
+2. **Logical Type Annotation**: Avro logical type allows newer versions to
recognize Variant semantics
+3. **Graceful Degradation**: Older readers see Variant as `STRUCT<value:
BINARY, metadata: BINARY>`
+
+| Scenario | Behavior |
+|-----------------------------------|---------------------------|
+| Spark 4.0 writes, Spark 4.0 reads | Full Variant support |
+| Spark 4.0 writes, Spark 3.x reads | Struct with binary fields |
+| Spark 4.0 writes, Flink reads | ROW with binary fields |
+| Spark 3.x writes Variant | β Not supported |
+
+### Limitations and Constraints
+
+1. **Spark Version Dependency**:
+ - Write support requires Spark 4.0+
+ - Spark 3.x limited to read-only access with degraded experience
+
+2. **Storage Overhead**:
+ - Metadata stored redundantly per row
+ - No column-level compression optimizations for Variant content
+
+3. **Query Performance**:
+ - Predicate pushdown on unshredded Variant content is limited to
structural predicates (`IS NULL`, `IS NOT NULL`)
+ - Accessing fields from the unshredded blob requires loading the full
binary value
+ - Shredded fields and expression indexes enable full predicate pushdown,
data skipping, and column pruning
+
+4. **Shredded Variant**:
+ - `typed_value` field is defined for extracted typed columns within the
Variant group
+ - Shredded fields unlock full MDT column stats and data skipping via
`DataSkippingUtils`
+
+5. **Functions**:
+ - `parse_json()` - Spark 4.0+ only
+ - JSON path access (`payload:field`) - Spark 4.0+ only
+ - No UDFs for Variant manipulation in Spark 3.x
+
+### Key Implementation Files
+
+| File | Description
|
+|-----------------------------------------------------------|--------------------------------------------------|
+| `hudi-common/.../HoodieSchema.java` | Core Variant
schema definition with logical type |
+| `hudi-spark3-common/.../BaseSpark3Adapter.scala` | Spark 3 adapter
(no Variant support) |
Review Comment:
π€ I notice there's no discussion of how Variant interacts with Hudi's
clustering and compaction strategies. For shredded Variants, does clustering
need to be aware of the shredding schema to maintain optimal column
co-location? And during compaction of MOR tables, are there any special
considerations for merging Variant log records?
##########
rfc/rfc-99/rfc-99.md:
##########
@@ -194,5 +195,444 @@ SQL Extensions needs to be added to define the table in a
hudi type native way.
TODO: There is an open question regarding the need to maintain type ids to
track schema evolution and how it would interplay with NBCC.
+---
+
+## Variant Type Implementation
+
+This section documents the implementation of the VARIANT type in Hudi, which
provides first-class support for
+semi-structured data (e.g., JSON). The Variant type is implemented following
Spark 4.0's native VariantType
+specification.
+
+### Overview
+
+The Variant type enables Hudi to store and query semi-structured data
efficiently. It is particularly useful for:
+
+- Schema-on-read flexibility for evolving data structures
+- Storing JSON-like data without requiring predefined schemas
+
+### Motivation
+
+Hudi readers and writers should be able to handle datasets with variant types.
+This will allow users to work with semi-structured data more easily.
+
+The variant type is now formally defined in Parquet and engines like Spark
have full support for this type.
+Users with semi-structured data are otherwise forced to use strings or byte
arrays to store this data.
+
+### What is the VARIANT Type?
+
+The `VARIANT` type is a new data type designed to store semi-structured data
(like JSON) efficiently.
+Unlike storing JSON as a plain string, `VARIANT` uses an optimized binary
encoding that allows for fast navigation
+and element extraction without needing to parse the entire document.
+It offers the flexibility of a schema-less design (like JSON) with performance
closer to structured columns.
+
+### Storage Modes: Shredded vs. Unshredded
+
+- **Unshredded** (Binary Blob):
+ - The entire JSON structure is encoded into binary metadata and value
blobs.
+ - **Pros**: Fast write speed; handles completely dynamic/random schemas
easily.
+ - **Cons**: To read a single field (e.g., `user.id`), the engine must load
the entire binary blob.
+- **Shredded** (Columnar Optimization):
+ - The engine identifies common paths in the data (e.g., `v.a` or `v.c`)
and extracts them into separate, native
+ Parquet columns (e.g., Int32, Decimal).
+ - **Pros**: Massive performance gain for queries. If you query `SELECT
v:a`, the engine reads only the specific
+ Int32 column and skips the rest of the binary data (Columnar Pruning).
+ - **Cons**: Higher write overhead to analyze and split the data, prone to
read jitter if there are large variation
+ - in shredding output.
+
+### Architecture
+
+Variant support is built on a **layered architecture** with version-specific
adapters:
+
+```
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+β Application Layer (Spark SQL) β
+β SELECT parse_json('{"a": 1}') as data β
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+ β
+ βΌ
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+β Spark Version Adapters β
+β ββββββββββββββββββββ ββββββββββββββββββββββββββ β
+β β BaseSpark3Adapterβ β BaseSpark4Adapter β β
+β β (No Variant) β β (Full Variant) β β
+β ββββββββββββββββββββ ββββββββββββββββββββββββββ β
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+ β
+ βΌ
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+β HoodieSchema.Variant β
+β (Avro Logical Type + Record Schema) β
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+ β
+ βΌ
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+β Parquet Storage β
+β GROUP { value: BINARY, metadata: BINARY } β
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+```
+
+### Variant Schema Definition
+
+The `HoodieSchema.Variant` class in `hudi-common` defines the Variant type:
+
+```java
+public static class Variant extends HoodieSchema {
+ private static final String VARIANT_METADATA_FIELD = "metadata";
+ private static final String VARIANT_VALUE_FIELD = "value";
+ private static final String VARIANT_TYPED_VALUE_FIELD = "typed_value";
+
+ private final boolean isShredded;
+ private final Option<HoodieSchema> typedValueSchema;
+}
+```
+
+#### Two Storage Modes
+
+1. **Unshredded Variant** (Default):
+ - Created with: `HoodieSchema.createVariant()`
+ - Structure: Record with two REQUIRED binary fields
+ - Fields: `metadata` (BYTES, REQUIRED), `value` (BYTES, REQUIRED)
+ - Use case: Simple semi-structured data storage
+
+2. **Shredded Variant**:
+ - Created with: `HoodieSchema.createVariantShredded(typedValueSchema)`
+ - Structure: Record with `typed_value` group containing extracted typed
columns
+ - Fields: `value` (BYTES, OPTIONAL), `metadata` (BYTES, REQUIRED),
`typed_value` (GROUP, OPTIONAL)
+ - Use case: Frequently accessed fields are extracted into native typed
columns for columnar reads
+
+#### How a Reader Distinguishes the Two Modes
+
+Both modes use the same `VARIANT` logical type annotation in the Parquet
footer β the annotation does not change.
+The reader inspects the Parquet file schema to determine which mode was used:
+
+- If the Variant group contains only `metadata` and `value`, it is
**unshredded**.
+- If the Variant group also contains a `typed_value` child group, it is
**shredded**.
+
+In the shredded case, `value` becomes OPTIONAL because when all fields are
fully shredded, the binary blob
+may be `null` β the content is entirely represented in the typed columns. The
reader falls back to `value`
+only for fields that were not shredded.
+
+#### Custom Avro Logical Type
+
+Variant uses a custom Avro logical type for identification:
+
+```java
+public static class VariantLogicalType extends LogicalType {
+ private static final String VARIANT_LOGICAL_TYPE_NAME = "variant";
+}
+```
+
+### On-Disk Representation (Parquet)
+
+Hudi's on-disk Variant representation intentionally aligns with the
+[Parquet Variant
spec](https://github.com/apache/parquet-format/blob/master/VariantEncoding.md).
+The spec defines a Variant as a group annotated with the `VARIANT` logical
type. Hudi writes Variant columns using
+this exact layout in both modes:
+
+#### Unshredded
+
+```
+optional group variant_column (VARIANT) {
+ required binary metadata;
+ required binary value;
+}
+```
+
+The entire JSON value is encoded into the `value` blob. Both fields are
REQUIRED β every row carries the full binary
+representation.
+
+#### Shredded
+
+```
+optional group variant_column (VARIANT) {
+ required binary metadata;
+ optional binary value;
+ optional group typed_value {
+ optional group a {
+ optional binary value;
+ optional int32 typed_value;
+ }
+ optional group b {
+ optional binary value;
+ optional binary typed_value (STRING);
+ }
+ optional group c {
+ optional binary value;
+ optional int64 typed_value;
+ }
+ }
+}
+```
+
+Each child under `typed_value` corresponds to a shredded field. The child's
`typed_value` column holds the extracted
+native-typed value (e.g., `int32`, `string`, `int64`), while the child's
`value` column is a fallback binary blob for
+rows where the field's type does not match the shredded type. The top-level
`value` becomes OPTIONAL β it is `null`
+when all fields in the row are fully covered by shredded columns, and
populated only for fields that were not shredded.
+
+**How a reader uses this**: The Parquet file schema in the footer tells the
reader which mode was used. If the Variant
+group contains only `metadata` and `value`, it is unshredded. If a
`typed_value` child group is present, the reader
+knows which fields have been shredded and can read them as native typed
columns β enabling column pruning, predicate
+pushdown, and data skipping without touching the binary blob.
+
+The `VARIANT` annotation is what allows readers to recognize the column as a
Variant and expose semantic operations
+(e.g., `variant_get`, JSON path access). Without it, the group is
indistinguishable from an ordinary
+`Struct<metadata: Binary, value: Binary>`.
+
+**Alignment with the Parquet spec**: Hudi does not diverge from the Parquet
Variant spec. The annotation, field names,
+field types, and repetition levels all follow the spec exactly. This means any
Parquet reader that implements the
+Variant spec can read Hudi-written Variant columns natively, and vice versa.
+
+**Reader compatibility**: Engines that do not yet implement the Parquet
Variant spec (e.g., Spark 3.5) will ignore the
+`VARIANT` annotation and fall back to reading the column as a plain struct of
binary fields.
+The data remains physically accessible, but users lose Variant query functions
and must manually parse
+the binary encoding. See [variant-appendix.md](variant-appendix.md) for
detailed backward compatibility findings.
+
+**Migration path**: If a future Parquet spec revision changes the Variant
physical layout, Hudi can introduce a
+table property (e.g., `hoodie.variant.parquet.format.version`) to control
which format is written, and the reader
+can inspect the Parquet footer to determine which layout to expect. (This is
outside the scope of this RFC for now)
+
+#### Binary Format
+
+The Variant binary encoding follows the
+[Parquet Variant Binary Encoding
spec](https://github.com/apache/parquet-format/blob/master/VariantEncoding.md):
+
+| Component | Description
|
+|--------------|---------------------------------------------------------------------|
+| **metadata** | Dictionary of field names and type information for efficient
access |
+| **value** | Binary encoding of the actual data (scalars, objects, arrays)
|
+
+Example for `{"updated": true, "new_field": 123}`:
+
+```
+Metadata Bytes: [0x01, 0x02, 0x00, 0x07, 0x10, "updated", "new_field"]
+Value Bytes: [0x02, 0x02, 0x01, 0x00, 0x01, 0x00, 0x03, 0x04, 0x0C, 0x7B]
+```
+
+The metadata contains a dictionary of all field names, while the value
contains references to these fields plus the
+actual data values.
+
+### Schema Evolution Support
+
+Variant types provide **schema-on-read** flexibility:
+
+| Aspect | Behavior
|
+|--------------------------|-------------------------------------------------------------------|
+| Adding new fields | β
Supported - New JSON fields can be added
without schema changes |
+| Removing fields | β
Supported - Missing fields return null on read
|
+| Type changes within JSON | β
Supported - Variant can store any
JSON-compatible type |
+| Table schema evolution | β
Supported - Variant column can be added to
existing tables |
+| Hudi schema evolution | β
Supported - Works with Hudi's standard schema
evolution |
+
+**Important**: The schema flexibility is within the Variant column itself. The
table-level schema (including the Variant
+column definition) still follows Hudi's standard schema evolution rules.
+
+### Column Statistics and Indexing
+
+Variant statistics and indexing capabilities depend on whether a field is
accessed from the unshredded binary blob
+or from a shredded (extracted) typed column. With shredding, extracted fields
become regular typed Parquet columns
+that automatically leverage all existing Hudi metadata infrastructure.
+
+#### Unshredded Variant Column
+
+| Feature | Status | Notes
|
+|--------------------|--------|--------------------------------------------------------------------|
+| Value/null counts | β
| Standard column-level counts via MDT
`column_stats` partition |
+| Min/max bounds | β | Binary blob has no meaningful sort order
|
+| Data skipping | β | Requires comparable min/max bounds
|
+| Bloom filter | β | Bloom filters are used for record key lookups
only |
+| Partition stats | β | Variant columns are not used as partition keys
|
+| Predicate pushdown | β οΈ | Structural predicates only (`IS NULL`, `IS NOT
NULL`) |
+
+#### Shredded Variant Fields
+
+| Feature | Status | Notes
|
+|--------------------|--------|------------------------------------------------------------------------------------------------------------------------|
+| Min/max bounds | β
| Shredded columns are regular typed columns;
`HoodieTableMetadataUtil.isColumnTypeSupported()` accepts all primitives |
+| Data skipping | β
| `DataSkippingUtils` translates filters on
shredded fields to min/max range checks |
+| Expression index | β
| Index over `variant_get()` expressions enables
stats collection without full shredding |
+| Bloom filter | β | Bloom filters are used for record key lookups
only |
+| Partition stats | β
| Applicable if a shredded field is used as a
partition key |
+| Predicate pushdown | β
| Full pushdown support: equality, range, and
IN-list predicates |
+| Column pruning | β
| Read only the shredded columns needed; skip
the binary blob entirely |
+
+**Recommendation**: Enable shredding for frequently accessed fields to unlock
full MDT column stats, data skipping,
+and predicate pushdown. For lighter-weight optimization, use expression
indexes over `variant_get()` expressions to
+collect per-field statistics without materializing shredded columns.
+
+### Usage Guide
+
+#### Spark 4.0+ (Native Support)
+
+```sql
+-- Create table with Variant column
+CREATE TABLE events
+(
+ id STRING,
+ ts TIMESTAMP,
+ payload VARIANT
+) USING hudi
+OPTIONS (
+ primaryKey = 'id',
+ preCombineField = 'ts'
+);
+
Review Comment:
π€ For Flink integration, reading is supported but writing is not. Is there a
risk that Flink jobs reading Variant as `ROW<value BYTES, metadata BYTES>`
could inadvertently write those rows back (e.g., in a streaming upsert
pipeline) and corrupt the Variant semantics? Should there be a guard or
validation to prevent this?
##########
rfc/rfc-99/rfc-99.md:
##########
@@ -209,4 +209,299 @@ SQL Extensions needs to be added to define the table in a
hudi type native way.
TODO: There is an open question regarding the need to maintain type ids to
track schema evolution and how it would interplay with NBCC.
-The main implementation change would require replacing the Avro schema
references with the new type system.
+The main implementation change would require replacing the Avro schema
references with the new type system.
+
+---
+
+## Variant Type Implementation
+
+This section documents the implementation of the VARIANT type in Hudi, which
provides first-class support for semi-structured data (e.g., JSON). The Variant
type is implemented following Spark 4.0's native VariantType specification.
+
+### Overview
+
+The Variant type enables Hudi to store and query semi-structured data
efficiently. It is particularly useful for:
+- Schema-on-read flexibility for evolving data structures
+- Storing JSON-like data without requiring predefined schemas
+
+### Architecture
+
+Variant support is built on a **layered architecture** with version-specific
adapters:
+
+```
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+β Application Layer (Spark SQL) β
+β SELECT parse_json('{"a": 1}') as data β
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+ β
+ βΌ
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+β Spark Version Adapters β
+β ββββββββββββββββββββ ββββββββββββββββββββββββββ β
+β β BaseSpark3Adapterβ β BaseSpark4Adapter β β
+β β (No Variant) β β (Full Variant) β β
+β ββββββββββββββββββββ ββββββββββββββββββββββββββ β
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+ β
+ βΌ
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+β HoodieSchema.Variant β
+β (Avro Logical Type + Record Schema) β
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+ β
+ βΌ
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+β Parquet Storage β
+β GROUP { value: BINARY, metadata: BINARY } β
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+```
+
+### Variant Schema Definition
+
+The `HoodieSchema.Variant` class in `hudi-common` defines the Variant type:
+
+```java
+public static class Variant extends HoodieSchema {
+ private static final String VARIANT_METADATA_FIELD = "metadata";
+ private static final String VARIANT_VALUE_FIELD = "value";
+ private static final String VARIANT_TYPED_VALUE_FIELD = "typed_value";
+
+ private final boolean isShredded;
Review Comment:
π€ Good question. Based on the RFC text, readers distinguish shredded vs
unshredded by inspecting the per-file Parquet schema (presence of `typed_value`
child group). Since each file is self-describing, a table could in theory
contain a mix of shredded and unshredded files without breaking reads. However,
the table-level Avro schema (used for MOR log files and schema evolution
tracking) would need to change, so it's worth calling out explicitly whether
Hudi treats this as a compatible or incompatible schema change β and what the
migration path looks like.
##########
rfc/rfc-99/rfc-99.md:
##########
@@ -194,5 +195,444 @@ SQL Extensions needs to be added to define the table in a
hudi type native way.
TODO: There is an open question regarding the need to maintain type ids to
track schema evolution and how it would interplay with NBCC.
+---
+
+## Variant Type Implementation
+
+This section documents the implementation of the VARIANT type in Hudi, which
provides first-class support for
+semi-structured data (e.g., JSON). The Variant type is implemented following
Spark 4.0's native VariantType
+specification.
+
+### Overview
+
+The Variant type enables Hudi to store and query semi-structured data
efficiently. It is particularly useful for:
+
+- Schema-on-read flexibility for evolving data structures
+- Storing JSON-like data without requiring predefined schemas
+
+### Motivation
+
+Hudi readers and writers should be able to handle datasets with variant types.
+This will allow users to work with semi-structured data more easily.
+
+The variant type is now formally defined in Parquet and engines like Spark
have full support for this type.
+Users with semi-structured data are otherwise forced to use strings or byte
arrays to store this data.
+
+### What is the VARIANT Type?
+
+The `VARIANT` type is a new data type designed to store semi-structured data
(like JSON) efficiently.
+Unlike storing JSON as a plain string, `VARIANT` uses an optimized binary
encoding that allows for fast navigation
+and element extraction without needing to parse the entire document.
+It offers the flexibility of a schema-less design (like JSON) with performance
closer to structured columns.
+
+### Storage Modes: Shredded vs. Unshredded
+
+- **Unshredded** (Binary Blob):
+ - The entire JSON structure is encoded into binary metadata and value
blobs.
+ - **Pros**: Fast write speed; handles completely dynamic/random schemas
easily.
+ - **Cons**: To read a single field (e.g., `user.id`), the engine must load
the entire binary blob.
+- **Shredded** (Columnar Optimization):
+ - The engine identifies common paths in the data (e.g., `v.a` or `v.c`)
and extracts them into separate, native
+ Parquet columns (e.g., Int32, Decimal).
+ - **Pros**: Massive performance gain for queries. If you query `SELECT
v:a`, the engine reads only the specific
+ Int32 column and skips the rest of the binary data (Columnar Pruning).
+ - **Cons**: Higher write overhead to analyze and split the data, prone to
read jitter if there are large variation
+ - in shredding output.
+
+### Architecture
+
+Variant support is built on a **layered architecture** with version-specific
adapters:
+
+```
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+β Application Layer (Spark SQL) β
+β SELECT parse_json('{"a": 1}') as data β
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+ β
+ βΌ
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+β Spark Version Adapters β
+β ββββββββββββββββββββ ββββββββββββββββββββββββββ β
+β β BaseSpark3Adapterβ β BaseSpark4Adapter β β
+β β (No Variant) β β (Full Variant) β β
+β ββββββββββββββββββββ ββββββββββββββββββββββββββ β
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+ β
+ βΌ
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+β HoodieSchema.Variant β
+β (Avro Logical Type + Record Schema) β
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+ β
+ βΌ
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+β Parquet Storage β
+β GROUP { value: BINARY, metadata: BINARY } β
+ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+```
+
+### Variant Schema Definition
+
+The `HoodieSchema.Variant` class in `hudi-common` defines the Variant type:
+
+```java
+public static class Variant extends HoodieSchema {
+ private static final String VARIANT_METADATA_FIELD = "metadata";
+ private static final String VARIANT_VALUE_FIELD = "value";
+ private static final String VARIANT_TYPED_VALUE_FIELD = "typed_value";
+
+ private final boolean isShredded;
+ private final Option<HoodieSchema> typedValueSchema;
+}
+```
+
+#### Two Storage Modes
+
+1. **Unshredded Variant** (Default):
+ - Created with: `HoodieSchema.createVariant()`
+ - Structure: Record with two REQUIRED binary fields
+ - Fields: `metadata` (BYTES, REQUIRED), `value` (BYTES, REQUIRED)
+ - Use case: Simple semi-structured data storage
+
+2. **Shredded Variant**:
+ - Created with: `HoodieSchema.createVariantShredded(typedValueSchema)`
+ - Structure: Record with `typed_value` group containing extracted typed
columns
+ - Fields: `value` (BYTES, OPTIONAL), `metadata` (BYTES, REQUIRED),
`typed_value` (GROUP, OPTIONAL)
+ - Use case: Frequently accessed fields are extracted into native typed
columns for columnar reads
+
+#### How a Reader Distinguishes the Two Modes
+
+Both modes use the same `VARIANT` logical type annotation in the Parquet
footer β the annotation does not change.
+The reader inspects the Parquet file schema to determine which mode was used:
+
+- If the Variant group contains only `metadata` and `value`, it is
**unshredded**.
+- If the Variant group also contains a `typed_value` child group, it is
**shredded**.
+
+In the shredded case, `value` becomes OPTIONAL because when all fields are
fully shredded, the binary blob
+may be `null` β the content is entirely represented in the typed columns. The
reader falls back to `value`
+only for fields that were not shredded.
+
+#### Custom Avro Logical Type
+
+Variant uses a custom Avro logical type for identification:
+
+```java
+public static class VariantLogicalType extends LogicalType {
+ private static final String VARIANT_LOGICAL_TYPE_NAME = "variant";
+}
+```
+
+### On-Disk Representation (Parquet)
+
+Hudi's on-disk Variant representation intentionally aligns with the
+[Parquet Variant
spec](https://github.com/apache/parquet-format/blob/master/VariantEncoding.md).
+The spec defines a Variant as a group annotated with the `VARIANT` logical
type. Hudi writes Variant columns using
+this exact layout in both modes:
+
+#### Unshredded
+
+```
+optional group variant_column (VARIANT) {
+ required binary metadata;
+ required binary value;
+}
+```
+
+The entire JSON value is encoded into the `value` blob. Both fields are
REQUIRED β every row carries the full binary
+representation.
+
+#### Shredded
+
+```
+optional group variant_column (VARIANT) {
+ required binary metadata;
+ optional binary value;
+ optional group typed_value {
+ optional group a {
+ optional binary value;
+ optional int32 typed_value;
+ }
+ optional group b {
+ optional binary value;
+ optional binary typed_value (STRING);
+ }
+ optional group c {
+ optional binary value;
+ optional int64 typed_value;
+ }
+ }
+}
+```
+
+Each child under `typed_value` corresponds to a shredded field. The child's
`typed_value` column holds the extracted
+native-typed value (e.g., `int32`, `string`, `int64`), while the child's
`value` column is a fallback binary blob for
+rows where the field's type does not match the shredded type. The top-level
`value` becomes OPTIONAL β it is `null`
+when all fields in the row are fully covered by shredded columns, and
populated only for fields that were not shredded.
Review Comment:
π€ The schema evolution table covers flexibility *within* the Variant column,
but doesn't address evolving between unshredded and shredded modes for the same
column. Balaji's review comment raised this β it would be good to document
explicitly whether switching modes is a backward-compatible operation and what
the migration path looks like (e.g., does it require rewriting all file
groups?).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]