Thanks. Which api (dataset or datastream) is recommended for file handling (no window operation required)?
We have similar scenario for real-time processing. May it make sense to use datastream api for both batch and real-time for uniformity? Sent from my iPhone > On Aug 16, 2019, at 00:38, Zhenghua Gao <doc...@gmail.com> wrote: > > Flink allows hadoop (mapreduce) OutputFormats in Flink jobs[1]. You can have > a try with Parquet OutputFormat[2]. > > And if you can turn to DataStream APIļ¼ StreamingFileSink + ParquetBulkWriter > meets your requirement[3][4]. > > [1] > https://github.com/apache/flink/blob/master/flink-connectors/flink-hadoop-compatibility/src/test/java/org/apache/flink/test/hadoopcompatibility/mapreduce/example/WordCount.java > [2] > https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetOutputFormat.java > [3] > https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/sink/filesystem/StreamingFileSink.java > [4] > https://github.com/apache/flink/blob/master/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/ParquetBulkWriter.java > > > Best Regards, > Zhenghua Gao > > >> On Fri, Aug 16, 2019 at 1:04 PM Lian Jiang <jiangok2...@gmail.com> wrote: >> Hi, >> >> I am using Flink 1.8.1 DataSet for a batch processing. The data source is >> avro files and I want to output the result into parquet. >> https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/batch/ only >> has no related information. What's the recommended way for doing this? Do I >> need to write adapters? Appreciate your help! >> >>