saLeox commented on PR #21149: URL: https://github.com/apache/flink/pull/21149#issuecomment-1343829748
Hi @luoyuxia, glad to hear the feedback and the idea behind design! The case that mix different schema files in one single partition will happen when we use date as partition key in the ODS layer in the data warehouse. When the upstream products change their schema, we have to adapt to the schema change as well, and better not overwrite the schema of existing historical data. Further more, in our Flink job, we consume the ODS data, and treat it as unbounded data in the streaming execution modes by using chain function `monitorContinuously`. The `ParquetReader` is the underlying implementation that used to read data for the above mode, and it's expected to handle the schema evolution. Hope it can make more sense, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org