subject:"CSV sink partitioning and bucketing"

Re: CSV sink partitioning and bucketing

2017-02-17 Thread Fabian Hueske

Hi Flavio, Flink does not come with an OutputFormat that creates buckets. It should not be too hard to implement this in Flink though. However, if you want a solution fast, I would try the following approach: 1) Search for a Hadoop OutputFormat that buckets Strings based on a key (). 2) Implement

CSV sink partitioning and bucketing

2017-02-17 Thread Flavio Pompermaier

Hi to all, in my use case I'd need to output my Row objects into an output folder as CSV on HDFS but creating/overwriting new subfolders based on an attribute (for example create a subfolder for each value of a specified column). Then, it could be interesting to bucketing the data inside those fold