[GitHub] [hudi] parisni opened a new issue #4525: [SUPPORT] Spark metastore schema evolution broken

GitBox Thu, 06 Jan 2022 07:23:43 -0800


parisni opened a new issue #4525:
URL: https://github.com/apache/hudi/issues/4525



   From my experiments, when a given hudi table gets added columns, then all 
works except spark read from metastore:
   
   - hive read metastore -> New Column added
   - spark read from hudi path -> New column added
   - spark read from metastore (spark.table("database.hudi_table"))-> New 
Column not added
   
   I have looked at the hive metastore content, and apparently the columns are 
store in two tables :
   - COLUMNS_V2 (one row per column)
   - TABLE_PARAMS (a key/value table with a spark json schema in it)
   
   After hive -sync, only the firt hms table get updated with the new column. 
The spark json is not updated with the new column.
   If I purge the table_param table, then magically spark has now the new 
column in the schema.
   
   Then I think the problem is on the spark or hive metastore (not hudi) side, 
which stores it's columns in an alternative table and don't get modified.
   
   But as a result, hudi schema evolution is kind of broken on the spark side. 
People who read the table from metastore won't see the new columns


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] parisni opened a new issue #4525: [SUPPORT] Spark metastore schema evolution broken

Reply via email to