echauchot opened a new pull request #15156: URL: https://github.com/apache/flink/pull/15156
## What is the purpose of the change *(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)* Implement ParquetAvroInputFormat ## Verifying this change This change added tests and can be verified as follows: new ParquetAvroInputFormatTest test both simple record and projected fields with simple record but sets parquet.avro.write-old-list-structure to true because AvroRowSerializationSchema does not work with parquet.avro.write-old-list-structure set to false ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): no - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: yes adds new API (new input format) - The serializers: not really, just make AvroRowSerializationSchema#convertRowToAvroRecord public - The runtime per-record code paths (performance sensitive): yes as ParquetInputFormat#convert is implemented - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no - The S3 file system connector: no ## Documentation - Does this pull request introduce a new feature? yes - If yes, how is the feature documented? javadoc with code example ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org