(hudi) branch asf-site updated: [HUDI-6958] Simplify Out Of Box Schema Evolution Functionality - DOCS (#9881)

sivabalan Mon, 27 Nov 2023 07:40:17 -0800

This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 9101edde046 [HUDI-6958] Simplify Out Of Box Schema Evolution 
Functionality - DOCS (#9881)
9101edde046 is described below

commit 9101edde04616d3759bbe1561582518c436793ac
Author: Jon Vexler <[email protected]>
AuthorDate: Mon Nov 27 10:39:16 2023 -0500

    [HUDI-6958] Simplify Out Of Box Schema Evolution Functionality - DOCS 
(#9881)
    
    
    
    ---------
    
    Co-authored-by: Jonathan Vexler <=>
---
 website/docs/schema_evolution.md | 48 +++++++++++++++++++++++++++-------------
 1 file changed, 33 insertions(+), 15 deletions(-)

diff --git a/website/docs/schema_evolution.md b/website/docs/schema_evolution.md
index 8fe04d65238..6597bb31253 100755
--- a/website/docs/schema_evolution.md
+++ b/website/docs/schema_evolution.md
@@ -22,21 +22,39 @@ the previous schema (e.g., renaming a column).
 Furthermore, the evolved schema is queryable across high-performance engines 
like Presto and Spark SQL without additional overhead for column ID 
translations or
 type reconciliations. The following table summarizes the schema changes 
compatible with different Hudi table types.
 
-| Schema Change                                                                
    | COW      | MOR     | Remarks                                              
                                                                                
                                                                                
                                                                         |
-|:---------------------------------------------------------------------------------|:---------|:--------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| Add a new nullable column at root level at the end                           
    | Yes      | Yes     | `Yes` means that a write with evolved schema 
succeeds and a read following the write succeeds to read entire dataset.        
                                                                                
                                                                                
 |
-| Add a new nullable column to inner struct (at the end)                       
    | Yes      | Yes     |
-| Add a new complex type field with default (map and array)                    
    | Yes      | Yes     |                                                      
                                                                                
                                                                                
                                                                         |
-| Add a new nullable column and change the ordering of fields                  
    | No       | No      | Write succeeds but read fails if the write with 
evolved schema updated only some of the base files but not all. Currently, Hudi 
does not maintain a schema registry with history of changes across base files. 
Nevertheless, if the upsert touched all base files then the read will succeed. |
-| Add a custom nullable Hudi meta column, e.g. `_hoodie_meta_col`              
    | Yes      | Yes     |                                                      
                                                                                
                                                                                
                                                                         |
-| Promote datatype from `int` to `long` for a field at root level              
    | Yes      | Yes     | For other types, Hudi supports promotion as 
specified in [Avro schema 
resolution](http://avro.apache.org/docs/current/spec#Schema+Resolution).        
                                                                                
                                                        |
-| Promote datatype from `int` to `long` for a nested field                     
    | Yes      | Yes     |
-| Promote datatype from `int` to `long` for a complex type (value of map or 
array) | Yes      | Yes     |                                                   
                                                                                
                                                                                
                                                                            |
-| Add a new non-nullable column at root level at the end                       
    | No       | No      | In case of MOR table with Spark data source, write 
succeeds but read fails. As a **workaround**, you can make the field nullable.  
                                                                                
                                                                           |
-| Add a new non-nullable column to inner struct (at the end)                   
    | No       | No      |                                                      
                                                                                
                                                                                
                                                                         |
-| Change datatype from `long` to `int` for a nested field                      
    | No       | No      |                                                      
                                                                                
                                                                                
                                                                         |
-| Change datatype from `long` to `int` for a complex type (value of map or 
array)  | No       | No      |                                                  
                                                                                
                                                                                
                                                                             |
-
+The incoming schema will automatically have missing columns added with null 
values from the table schema.
+For this we need to enable the following config
+`hoodie.write.handle.missing.cols.with.lossless.type.promotion`, otherwise the 
pipeline will fail. Note: This particular config will also do best effort to 
solve some of the backward incompatible
+type promotions eg., 'long' to 'int'.
+
+| Schema Change                                                   | COW | MOR 
| Remarks                                                                       
                                                                                
                                                                                
                                                |
+|:----------------------------------------------------------------|:----|:----|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Add a new nullable column at root level at the end              | Yes | Yes 
| `Yes` means that a write with evolved schema succeeds and a read following 
the write succeeds to read entire dataset.                                      
                                                                                
                                                   |
+| Add a new nullable column to inner struct (at the end)          | Yes | Yes 
|                                                                               
                                                                                
                                                                                
                                                |
+| Add a new complex type field with default (map and array)       | Yes | Yes 
|                                                                               
                                                                                
                                                                                
                                                |
+| Add a new nullable column and change the ordering of fields     | No  | No  
| Write succeeds but read fails if the write with evolved schema updated only 
some of the base files but not all. Currently, Hudi does not maintain a schema 
registry with history of changes across base files. Nevertheless, if the upsert 
touched all base files then the read will succeed. |
+| Add a custom nullable Hudi meta column, e.g. `_hoodie_meta_col` | Yes | Yes 
|                                                                               
                                                                                
                                                                                
                                                |
+| Promote datatype for a field at root level                      | Yes | Yes 
|                                                                               
                                                                                
                                                                                
                                                |
+| Promote datatype for a nested field                             | Yes | Yes 
|                                                                               
                                                                                
                                                                                
                                                |
+| Promote datatype for a complex type (value of map or array)     | Yes | Yes 
|                                                                               
                                                                                
                                                                                
                                                |
+| Add a new non-nullable column at root level at the end          | No  | No  
| In case of MOR table with Spark data source, write succeeds but read fails. 
As a **workaround**, you can make the field nullable.                           
                                                                                
                                                  |
+| Add a new non-nullable column to inner struct (at the end)      | No  | No  
|                                                                               
                                                                                
                                                                                
                                                |
+| Demote datatype for a field at root level                       | No  | No  
|                                                                               
                                                                                
                                                                                
                                                |
+| Demote datatype for a nested field                              | No  | No  
|                                                                               
                                                                                
                                                                                
                                                |
+| Demote datatype for a complex type (value of map or array)      | No  | No  
|                                                                               
                                                                                
                                                                                
                                                |
+
+###Type Promotions
+
+The incoming schema will automatically have types promoted to match the table 
schema
+
+| Incoming Schema \ Table Schema  | int   | long  | float  | double | string  
| bytes   |
+|---------------------------------|-------|-------|--------|--------|---------|---------|
+| int                             |   Y   |   Y   |    Y   |    Y   |    Y    
|   N     |
+| long                            |   N   |   Y   |    Y   |    Y   |    Y    
|   N     | 
+| float                           |   N   |   N   |    Y   |    Y   |    Y    
|   N     |  
+| double                          |   N   |   N   |    N   |    Y   |    Y    
|   N     |  
+| string                          |   N   |   N   |    N   |    N   |    Y    
|   Y     | 
+| bytes                           |   N   |   N   |    N   |    N   |    Y    
|   Y     |
 
 ## Schema Evolution on read

(hudi) branch asf-site updated: [HUDI-6958] Simplify Out Of Box Schema Evolution Functionality - DOCS (#9881)

Reply via email to