parisni opened a new issue #4525: URL: https://github.com/apache/hudi/issues/4525
From my experiments, when a given hudi table gets added columns, then all works except spark read from metastore: - hive read metastore -> New Column added - spark read from hudi path -> New column added - spark read from metastore (spark.table("database.hudi_table"))-> New Column not added I have looked at the hive metastore content, and apparently the columns are store in two tables : - COLUMNS_V2 (one row per column) - TABLE_PARAMS (a key/value table with a spark json schema in it) After hive -sync, only the firt hms table get updated with the new column. The spark json is not updated with the new column. If I purge the table_param table, then magically spark has now the new column in the schema. Then I think the problem is on the spark or hive metastore (not hudi) side, which stores it's columns in an alternative table and don't get modified. But as a result, hudi schema evolution is kind of broken on the spark side. People who read the table from metastore won't see the new columns -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org