Hi Rafi, At the moment I do not see any support of Parquet in DataSet API except HadoopOutputFormat, mentioned in stack overflow question. I have cc’ed Fabian and Aljoscha, maybe they could provide more information.
Best, Andrey > On 25 Oct 2018, at 13:08, Rafi Aroch <rafi.ar...@gmail.com> wrote: > > Hi, > > I'm writing a Batch job which reads Parquet, does some aggregations and > writes back as Parquet files. > I would like the output to be partitioned by year, month, day by event time. > Similarly to the functionality of the BucketingSink. > > I was able to achieve the reading/writing to/from Parquet by using the > hadoop-compatibility features. > I couldn't find a way to partition the data by year, month, day to create a > folder hierarchy accordingly. Everything is written to a single directory. > > I could find an unanswered question about this issue: > https://stackoverflow.com/questions/52204034/apache-flink-does-dataset-api-support-writing-output-to-individual-file-partit > > <https://stackoverflow.com/questions/52204034/apache-flink-does-dataset-api-support-writing-output-to-individual-file-partit> > > Can anyone suggest a way to achieve this? Maybe there's a way to integrate > the BucketingSink with the DataSet API? Another solution? > > Rafi