echauchot opened a new pull request #15156:
URL: https://github.com/apache/flink/pull/15156


   
   ## What is the purpose of the change
   
   *(For example: This pull request makes task deployment go through the blob 
server, rather than through RPC. That way we avoid re-transferring them on each 
deployment (during recovery).)*
   
   
   Implement ParquetAvroInputFormat
   
   ## Verifying this change
   
   
   This change added tests and can be verified as follows:
   new ParquetAvroInputFormatTest test both simple record and projected fields 
with simple record but sets parquet.avro.write-old-list-structure to true 
because AvroRowSerializationSchema does not work with 
parquet.avro.write-old-list-structure set to false 
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: yes adds new API  (new input format)
     - The serializers: not really, just make 
AvroRowSerializationSchema#convertRowToAvroRecord public
     - The runtime per-record code paths (performance sensitive): yes as 
ParquetInputFormat#convert is implemented
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no
     - The S3 file system connector: no
   
   ## Documentation
   
     - Does this pull request introduce a new feature? yes
     - If yes, how is the feature documented? javadoc with code example
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to