the DataStream API should fully subsume the DataSet API (through bounded
streams) in the long run [1]
And you can consider use Table/SQL API in your project.
[1]
https://flink.apache.org/roadmap.html#analytics-applications-and-the-roles-of-datastream-dataset-and-table-api
*Best Regards,*
*Zhenghu
Thanks. Which api (dataset or datastream) is recommended for file handling (no
window operation required)?
We have similar scenario for real-time processing. May it make sense to use
datastream api for both batch and real-time for uniformity?
Sent from my iPhone
> On Aug 16, 2019, at 00:38, Zh
Flink allows hadoop (mapreduce) OutputFormats in Flink jobs[1]. You can
have a try with Parquet OutputFormat[2].
And if you can turn to DataStream APIļ¼
StreamingFileSink + ParquetBulkWriter meets your requirement[3][4].
[1]
https://github.com/apache/flink/blob/master/flink-connectors/flink-hadoop
Hi,
I am using Flink 1.8.1 DataSet for a batch processing. The data source is
avro files and I want to output the result into parquet.
https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/batch/ only
has no related information. What's the recommended way for doing this? Do I
need to wri