Re: Flink Dataset to ParquetOutputFormat

aj Thu, 26 Dec 2019 09:51:42 -0800

Thanks Vino.

I am able to write data in parquet now. But now the issue is how to write a
dataset to multiple output path as per timestamp partition.
I want to partition data on date wise.


I am writing like this currently that will write to single output path.

DataSet<Tuple2<Void,GenericRecord>> df = allEvents.flatMap(new
EventMapProcessor(schema.toString())).withParameters(configuration);

Job job = Job.getInstance();
AvroParquetOutputFormat.setSchema(job, book_bike.getClassSchema());
HadoopOutputFormat parquetFormat = new HadoopOutputFormat<Void,
GenericRecord>(new AvroParquetOutputFormat(), job);
FileOutputFormat.setOutputPath(job, new Path(outputDirectory));

df.output(parquetFormat);
env.execute();


Please suggest.

Thanks,
Anuj

On Mon, Dec 23, 2019 at 12:59 PM vino yang <yanghua1...@gmail.com> wrote:

> Hi Anuj,
>
> After searching in Github, I found a demo repository about how to use
> parquet in Flink.[1]
>
> You can have a look. I can not make sure whether it is helpful or not.
>
> [1]: https://github.com/FelixNeutatz/parquet-flinktacular
>
> Best,
> Vino
>
> aj <ajainje...@gmail.com> 于2019年12月21日周六 下午7:03写道：
>
>> Hello All,
>>
>> I am getting a set of events in JSON that I am dumping in the hourly
>> bucket in S3.
>> I am reading this hourly bucket and created a DataSet<String>.
>>
>> I want to write this dataset as a parquet but I am not able to figure
>> out. Can somebody help me with this?
>>
>>
>> Thanks,
>> Anuj
>>
>>
>> <http://www.cse.iitm.ac.in/%7Eanujjain/>
>>
>

-- 
Thanks & Regards,
Anuj Jain
Mob. : +91- 8588817877
Skype : anuj.jain07
<http://www.oracle.com/>


<http://www.cse.iitm.ac.in/%7Eanujjain/>

Re: Flink Dataset to ParquetOutputFormat

Reply via email to