[GitHub] [flink] saLeox commented on pull request #21149: [FLINK-29527][formats/parquet] Make unknownFieldsIndices work for single ParquetReader

GitBox Thu, 08 Dec 2022 20:22:39 -0800


saLeox commented on PR #21149:
URL: https://github.com/apache/flink/pull/21149#issuecomment-1343829748


   Hi @luoyuxia, glad to hear the feedback and the idea behind design!
   The case that mix different schema files in one single partition will happen 
when we use date as partition key in the ODS layer in the data warehouse. When 
the upstream products change their schema, we have to adapt to the schema 
change as well, and better not overwrite the schema of existing historical data.
   Further more, in our Flink job, we consume the ODS data, and treat it as 
unbounded data in the streaming execution modes by using chain function 
`monitorContinuously`.
   The `ParquetReader` is the underlying implementation that used to read data 
for the above mode, and it's expected to handle the schema evolution.
   Hope it can make more sense, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] saLeox commented on pull request #21149: [FLINK-29527][formats/parquet] Make unknownFieldsIndices work for single ParquetReader

Reply via email to