Re: CSV sink partitioning and bucketing

2017-02-17 Thread Fabian Hueske
Hi Flavio, Flink does not come with an OutputFormat that creates buckets. It should not be too hard to implement this in Flink though. However, if you want a solution fast, I would try the following approach: 1) Search for a Hadoop OutputFormat that buckets Strings based on a key (). 2) Implement

CSV sink partitioning and bucketing

2017-02-17 Thread Flavio Pompermaier
Hi to all, in my use case I'd need to output my Row objects into an output folder as CSV on HDFS but creating/overwriting new subfolders based on an attribute (for example create a subfolder for each value of a specified column). Then, it could be interesting to bucketing the data inside those fold

Re: Sink partitioning

2016-04-13 Thread Konstantin Knauf
Hi, calling DataStream.partitionCustom() with the respective arguments before the sink should do the trick, I think. Cheers, Konstantin On 14.04.2016 01:22, neo21 zerro wrote: > Hello everybody, > > I have an elasticsearch sink in my flink topology. > My requirement is to write the data in a p

Sink partitioning

2016-04-13 Thread neo21 zerro
Hello everybody, I have an elasticsearch sink in my flink topology. My requirement is to write the data in a partitioned fashion to my Sink. For example I have Tuple which contains a user id. I want to group all events by a user id and partition all events for one particular user to the same Es