AHeise commented on pull request #15725: URL: https://github.com/apache/flink/pull/15725#issuecomment-848745186
Same as in the other PR please rebase onto 1.13 and update the target branch accordingly. A high-level question to get me faster up-to-speed: - In Avro, we have reader and writer schema. If schema evolves, the writer schema of each record updates and through schema compability, I still get the equivalent record in the reader schema automatically. So for Avro, I'd usually specify an additional schema to make sure that my application is forward and backward compatible. - Now it seems like Parquet (haven't checked the details yet), there is a similar concept. Having a particular reader schema is even more important as it allows us to skip reading large chunks of the file if a specific column is not needed thanks to the columnar layout of the file. - Is your change now effectively disabling the reader schema? Or can it just be omitted and assumed to be the writer schema? - How would it work when I read 2 parquet files with different schemas but both can be mapped to the same reader schema? For example, consider a schema evolution case, where 1 file is written by pipeline v1 and file 2 is written by pipeline v2 with an additional column that is ignored in the consuming Flink application. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org