prashantwason commented on issue #2331: URL: https://github.com/apache/hudi/issues/2331#issuecomment-748228679
Thats correct. HUDI does not have a full schema management system. The schema to be used is provided at the time of the write where we validate that the schema being used for current write is compatible with the existing schema (from the previous writes). Hence, HUDI schema management is very simplistic compared to the documentation you have referred. In producer-consumer systems, schema compatibility is a simpler job - by upgrading the producer and consumer code with newer schemas the schema can be changed - as all new data will be generated using a schema which both understand and there is no historical data with older schema version to be processed any longer. But within HUDI there are always versions of data saved with older schema and to continue to provide features like incremental read (which reads data over a time-range) and updates (old data can be changed), we have to restrict the schema modification. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
