This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 9101edde046 [HUDI-6958] Simplify Out Of Box Schema Evolution
Functionality - DOCS (#9881)
9101edde046 is described below
commit 9101edde04616d3759bbe1561582518c436793ac
Author: Jon Vexler <[email protected]>
AuthorDate: Mon Nov 27 10:39:16 2023 -0500
[HUDI-6958] Simplify Out Of Box Schema Evolution Functionality - DOCS
(#9881)
---------
Co-authored-by: Jonathan Vexler <=>
---
website/docs/schema_evolution.md | 48 +++++++++++++++++++++++++++-------------
1 file changed, 33 insertions(+), 15 deletions(-)
diff --git a/website/docs/schema_evolution.md b/website/docs/schema_evolution.md
index 8fe04d65238..6597bb31253 100755
--- a/website/docs/schema_evolution.md
+++ b/website/docs/schema_evolution.md
@@ -22,21 +22,39 @@ the previous schema (e.g., renaming a column).
Furthermore, the evolved schema is queryable across high-performance engines
like Presto and Spark SQL without additional overhead for column ID
translations or
type reconciliations. The following table summarizes the schema changes
compatible with different Hudi table types.
-| Schema Change
| COW | MOR | Remarks
|
-|:---------------------------------------------------------------------------------|:---------|:--------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| Add a new nullable column at root level at the end
| Yes | Yes | `Yes` means that a write with evolved schema
succeeds and a read following the write succeeds to read entire dataset.
|
-| Add a new nullable column to inner struct (at the end)
| Yes | Yes |
-| Add a new complex type field with default (map and array)
| Yes | Yes |
|
-| Add a new nullable column and change the ordering of fields
| No | No | Write succeeds but read fails if the write with
evolved schema updated only some of the base files but not all. Currently, Hudi
does not maintain a schema registry with history of changes across base files.
Nevertheless, if the upsert touched all base files then the read will succeed. |
-| Add a custom nullable Hudi meta column, e.g. `_hoodie_meta_col`
| Yes | Yes |
|
-| Promote datatype from `int` to `long` for a field at root level
| Yes | Yes | For other types, Hudi supports promotion as
specified in [Avro schema
resolution](http://avro.apache.org/docs/current/spec#Schema+Resolution).
|
-| Promote datatype from `int` to `long` for a nested field
| Yes | Yes |
-| Promote datatype from `int` to `long` for a complex type (value of map or
array) | Yes | Yes |
|
-| Add a new non-nullable column at root level at the end
| No | No | In case of MOR table with Spark data source, write
succeeds but read fails. As a **workaround**, you can make the field nullable.
|
-| Add a new non-nullable column to inner struct (at the end)
| No | No |
|
-| Change datatype from `long` to `int` for a nested field
| No | No |
|
-| Change datatype from `long` to `int` for a complex type (value of map or
array) | No | No |
|
-
+The incoming schema will automatically have missing columns added with null
values from the table schema.
+For this we need to enable the following config
+`hoodie.write.handle.missing.cols.with.lossless.type.promotion`, otherwise the
pipeline will fail. Note: This particular config will also do best effort to
solve some of the backward incompatible
+type promotions eg., 'long' to 'int'.
+
+| Schema Change | COW | MOR
| Remarks
|
+|:----------------------------------------------------------------|:----|:----|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Add a new nullable column at root level at the end | Yes | Yes
| `Yes` means that a write with evolved schema succeeds and a read following
the write succeeds to read entire dataset.
|
+| Add a new nullable column to inner struct (at the end) | Yes | Yes
|
|
+| Add a new complex type field with default (map and array) | Yes | Yes
|
|
+| Add a new nullable column and change the ordering of fields | No | No
| Write succeeds but read fails if the write with evolved schema updated only
some of the base files but not all. Currently, Hudi does not maintain a schema
registry with history of changes across base files. Nevertheless, if the upsert
touched all base files then the read will succeed. |
+| Add a custom nullable Hudi meta column, e.g. `_hoodie_meta_col` | Yes | Yes
|
|
+| Promote datatype for a field at root level | Yes | Yes
|
|
+| Promote datatype for a nested field | Yes | Yes
|
|
+| Promote datatype for a complex type (value of map or array) | Yes | Yes
|
|
+| Add a new non-nullable column at root level at the end | No | No
| In case of MOR table with Spark data source, write succeeds but read fails.
As a **workaround**, you can make the field nullable.
|
+| Add a new non-nullable column to inner struct (at the end) | No | No
|
|
+| Demote datatype for a field at root level | No | No
|
|
+| Demote datatype for a nested field | No | No
|
|
+| Demote datatype for a complex type (value of map or array) | No | No
|
|
+
+###Type Promotions
+
+The incoming schema will automatically have types promoted to match the table
schema
+
+| Incoming Schema \ Table Schema | int | long | float | double | string
| bytes |
+|---------------------------------|-------|-------|--------|--------|---------|---------|
+| int | Y | Y | Y | Y | Y
| N |
+| long | N | Y | Y | Y | Y
| N |
+| float | N | N | Y | Y | Y
| N |
+| double | N | N | N | Y | Y
| N |
+| string | N | N | N | N | Y
| Y |
+| bytes | N | N | N | N | Y
| Y |
## Schema Evolution on read
