Flink allows hadoop (mapreduce) OutputFormats in Flink jobs[1]. You can have a try with Parquet OutputFormat[2].
And if you can turn to DataStream APIļ¼ StreamingFileSink + ParquetBulkWriter meets your requirement[3][4]. [1] https://github.com/apache/flink/blob/master/flink-connectors/flink-hadoop-compatibility/src/test/java/org/apache/flink/test/hadoopcompatibility/mapreduce/example/WordCount.java [2] https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetOutputFormat.java [3] https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/sink/filesystem/StreamingFileSink.java [4] https://github.com/apache/flink/blob/master/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/ParquetBulkWriter.java *Best Regards,* *Zhenghua Gao* On Fri, Aug 16, 2019 at 1:04 PM Lian Jiang <jiangok2...@gmail.com> wrote: > Hi, > > I am using Flink 1.8.1 DataSet for a batch processing. The data source is > avro files and I want to output the result into parquet. > https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/batch/ > only has no related information. What's the recommended way for doing this? > Do I need to write adapters? Appreciate your help! > > >