Re: writeAsCSV with partitionBy

2016-05-25 Thread Aljoscha Krettek
Hi, the RollingSink can only be used with streaming. Adding support for dynamic paths based on element contents is certainly interesting. I imagine it can be tricky, though, to figure out when to close/flush the buckets. Cheers, Aljoscha On Wed, 25 May 2016 at 08:36 KirstiLaurila wrote: > Maybe

Re: writeAsCSV with partitionBy

2016-05-24 Thread KirstiLaurila
Maybe, I don't know, but with streaming. How about batch? Srikanth wrote > Isn't this related to -- https://issues.apache.org/jira/browse/FLINK-2672 > ?? > > This can be achieved with a RollingSink[1] & custom Bucketer probably. > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/a

Re: writeAsCSV with partitionBy

2016-05-24 Thread Juho Autio
RollingSink is part of Flink Streaming API. Can it be used in Flink Batch jobs, too? As implied in FLINK-2672, RollingSink doesn't support dynamic bucket paths based on the tuple fields. The path must be given when creating the RollingSink instance, ie. before deploying the job. Yes, a custom Buck

Re: writeAsCSV with partitionBy

2016-05-24 Thread Srikanth
Isn't this related to -- https://issues.apache.org/jira/browse/FLINK-2672 ?? This can be achieved with a RollingSink[1] & custom Bucketer probably. [1] https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/streaming/connectors/fs/RollingSink.html Srikanth On Tue, May

Re: writeAsCSV with partitionBy

2016-05-23 Thread KirstiLaurila
Yeah, created this one https://issues.apache.org/jira/browse/FLINK-3961 -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/writeAsCSV-with-partitionBy-tp4893p7118.html Sent from the Apache F

Re: writeAsCSV with partitionBy

2016-05-23 Thread Fabian Hueske
Hi Kirsti, I'm not aware of anybody working on this issue. Would you like to create a JIRA issue for it? Best, Fabian 2016-05-23 16:56 GMT+02:00 KirstiLaurila : > Is there any plans to implement this kind of feature (possibility to write > to > data specified partitions) in the near future? > >

Re: writeAsCSV with partitionBy

2016-05-23 Thread KirstiLaurila
Is there any plans to implement this kind of feature (possibility to write to data specified partitions) in the near future? -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/writeAsCSV-with-partitionBy-tp4893p7099.html Sent from the Apache Fli

Re: writeAsCSV with partitionBy

2016-02-16 Thread Fabian Hueske
Yes, you're right. I did not understand your question correctly. Right now, Flink does not feature an output format that writes records to output files depending on a key attribute. You would need to implement such an output format yourself and append it as follows: val data = ... data.partitionB

Re: writeAsCSV with partitionBy

2016-02-16 Thread Srikanth
Fabian, Not sure if we are on the same page. If I do something like below code, it will groupby field 0 and each task will write a separate part file in parallel. val sink = data1.join(data2) .where(1).equalTo(0) { ((l,r) => ( l._3, r._3) ) } .partitionByHash(0) .writeAsCsv(pathBa

Re: writeAsCSV with partitionBy

2016-02-15 Thread Fabian Hueske
Hi Srikanth, DataSet.partitionBy() will partition the data on the declared partition fields. If you append a DataSink with the same parallelism as the partition operator, the data will be written out with the defined partitioning. It should be possible to achieve the behavior you described using D