Re: writeAsCSV with partitionBy

2016-05-25 Thread Aljoscha Krettek
Hi, the RollingSink can only be used with streaming. Adding support for dynamic paths based on element contents is certainly interesting. I imagine it can be tricky, though, to figure out when to close/flush the buckets. Cheers, Aljoscha On Wed, 25 May 2016 at 08:36 KirstiLaurila wrote: > Maybe

Re: writeAsCSV with partitionBy

2016-05-24 Thread KirstiLaurila
Maybe, I don't know, but with streaming. How about batch? Srikanth wrote > Isn't this related to -- https://issues.apache.org/jira/browse/FLINK-2672 > ?? > > This can be achieved with a RollingSink[1] & custom Bucketer probably. > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/a

Re: writeAsCSV with partitionBy

2016-05-24 Thread Juho Autio
RollingSink is part of Flink Streaming API. Can it be used in Flink Batch jobs, too? As implied in FLINK-2672, RollingSink doesn't support dynamic bucket paths based on the tuple fields. The path must be given when creating the RollingSink instance, ie. before deploying the job. Yes, a custom Buck

Re: writeAsCSV with partitionBy

2016-05-24 Thread Srikanth
Isn't this related to -- https://issues.apache.org/jira/browse/FLINK-2672 ?? This can be achieved with a RollingSink[1] & custom Bucketer probably. [1] https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/streaming/connectors/fs/RollingSink.html Srikanth On Tue, May

Re: writeAsCSV with partitionBy

2016-05-23 Thread KirstiLaurila
Yeah, created this one https://issues.apache.org/jira/browse/FLINK-3961 -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/writeAsCSV-with-partitionBy-tp4893p7118.html Sent from the Apache F

Re: writeAsCSV with partitionBy

2016-05-23 Thread Fabian Hueske
Hi Kirsti, I'm not aware of anybody working on this issue. Would you like to create a JIRA issue for it? Best, Fabian 2016-05-23 16:56 GMT+02:00 KirstiLaurila : > Is there any plans to implement this kind of feature (possibility to write > to > data specified partitions) in the near future? > >

Re: writeAsCSV with partitionBy

2016-05-23 Thread KirstiLaurila
Is there any plans to implement this kind of feature (possibility to write to data specified partitions) in the near future? -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/writeAsCSV-with-partitionBy-tp4893p7099.html Sent from the Apache Fli

Re: writeAsCSV with partitionBy

2016-02-16 Thread Fabian Hueske
Yes, you're right. I did not understand your question correctly. Right now, Flink does not feature an output format that writes records to output files depending on a key attribute. You would need to implement such an output format yourself and append it as follows: val data = ... data.partitionB

Re: writeAsCSV with partitionBy

2016-02-16 Thread Srikanth
Fabian, Not sure if we are on the same page. If I do something like below code, it will groupby field 0 and each task will write a separate part file in parallel. val sink = data1.join(data2) .where(1).equalTo(0) { ((l,r) => ( l._3, r._3) ) } .partitionByHash(0) .writeAsCsv(pathBa

Re: writeAsCSV with partitionBy

2016-02-15 Thread Fabian Hueske
Hi Srikanth, DataSet.partitionBy() will partition the data on the declared partition fields. If you append a DataSink with the same parallelism as the partition operator, the data will be written out with the defined partitioning. It should be possible to achieve the behavior you described using D

Re: writeAsCsv

2015-10-07 Thread Lydia Ickler
ok, thanks! :) I will try that! > Am 07.10.2015 um 21:35 schrieb Lydia Ickler : > > Hi, > > stupid question: Why is this not saved to file? > I want to transform an array to a DataSet but the Graph stops at collect(). > > //Transform Spectrum to DataSet > List> dataList = new LinkedList Str

Re: writeAsCsv

2015-10-07 Thread Robert Schmidtke
Hi, as far as I know only collect, print and execute actually trigger the execution. What you're missing is env.execute() after the writeAsCsv call. Hope this helps. On Wed, Oct 7, 2015 at 9:35 PM, Lydia Ickler wrote: > Hi, > > stupid question: Why is this not saved to file? > I want to transfor

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

2015-07-03 Thread Maximilian Michels
You're welcome. I'm glad I could help out :) Cheers, Max On Thu, Jul 2, 2015 at 9:17 PM, Mihail Vieru wrote: > I've implemented the alternating 2 files solution and everything works > now. > > Thanks a lot! You saved my day :) > > Cheers, > Mihail > > > On 02.07.2015 12:37, Maximilian Michels

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

2015-07-02 Thread Mihail Vieru
I've implemented the alternating 2 files solution and everything works now. Thanks a lot! You saved my day :) Cheers, Mihail On 02.07.2015 12:37, Maximilian Michels wrote: The problem is that your input and output path are the same. Because Flink executes in a pipelined fashion, all the operat

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

2015-07-02 Thread Maximilian Michels
The problem is that your input and output path are the same. Because Flink executes in a pipelined fashion, all the operators will come up at once. When you set WriteMode.OVERWRITE for the sink, it will delete the path before writing anything. That means that when your DataSource reads the input, t

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

2015-07-02 Thread Maximilian Michels
Hi Mihail, Thanks for the code. I'm trying to reproduce the problem now. On Wed, Jul 1, 2015 at 8:30 PM, Mihail Vieru wrote: > Hi Max, > > thank you for your reply. I wanted to revise and dismiss all other factors > before writing back. I've attached you my code and sample input data. > > I ru

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

2015-06-30 Thread Maximilian Michels
HI Mihail, Thank you for your question. Do you have a short example that reproduces the problem? It is hard to find the cause without an error message or some example code. I wonder how your loop works without WriteMode.OVERWRITE because it should throw an exception in this case. Or do you change

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

2015-06-30 Thread Mihail Vieru
I think my problem is related to a loop in my job. Before the loop, the writeAsCsv method works fine, even in overwrite mode. In the loop, in the first iteration, it writes an empty folder containing empty files to HDFS. Even though the DataSet it is supposed to write contains elements. Need

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

2015-06-30 Thread Mihail Vieru
Hi Till, thank you for your reply. I have the following code snippet: /intermediateGraph.getVertices().writeAsCsv(tempGraphOutputPath, "\n", ";", WriteMode.OVERWRITE);/ When I remove the WriteMode parameter, it works. So I can reason that the DataSet contains data elements. Cheers, Mihail

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

2015-06-30 Thread Till Rohrmann
Hi Mihail, have you checked that the DataSet you want to write to HDFS actually contains data elements? You can try calling collect which retrieves the data to your client to see what’s in there. Cheers, Till ​ On Tue, Jun 30, 2015 at 12:01 PM, Mihail Vieru < vi...@informatik.hu-berlin.de> wrote

Re: writeAsCsv on HDFS

2015-06-25 Thread Hawin Jiang
HI Flavio Here is the example from Marton: You can used env.writeAsText method directly. StreamExecutionEnvironment env = StreamExecutionEnvironment. getExecutionEnvironment(); env.addSource(PerisitentKafkaSource(..)) .map(/* do you operations*/) .wirteAsText("hdfs://:/path/to/your/file

Re: writeAsCsv on HDFS

2015-06-25 Thread Stephan Ewen
You could also just qualify the HDFS URL, if that is simpler (put host and port of the namenode in there): "hdfs://myhost:40010/path/to/file" On Thu, Jun 25, 2015 at 3:20 PM, Robert Metzger wrote: > You have to put it into all machines > > On Thu, Jun 25, 2015 at 3:17 PM, Flavio Pompermaier > w

Re: writeAsCsv on HDFS

2015-06-25 Thread Robert Metzger
You have to put it into all machines On Thu, Jun 25, 2015 at 3:17 PM, Flavio Pompermaier wrote: > Do I have to put the hadoop conf file on each task manager or just on the > job-manager? > > On Thu, Jun 25, 2015 at 3:12 PM, Chiwan Park > wrote: > >> It represents the folder containing the hadoo

Re: writeAsCsv on HDFS

2015-06-25 Thread Flavio Pompermaier
Do I have to put the hadoop conf file on each task manager or just on the job-manager? On Thu, Jun 25, 2015 at 3:12 PM, Chiwan Park wrote: > It represents the folder containing the hadoop config files. :) > > Regards, > Chiwan Park > > > > On Jun 25, 2015, at 10:07 PM, Flavio Pompermaier > wrot

Re: writeAsCsv on HDFS

2015-06-25 Thread Chiwan Park
It represents the folder containing the hadoop config files. :) Regards, Chiwan Park > On Jun 25, 2015, at 10:07 PM, Flavio Pompermaier wrote: > > fs.hdfs.hadoopconf represents the folder containing the hadoop config files > (*-site.xml) or just one specific hadoop config file (e.g. core-site

Re: writeAsCsv on HDFS

2015-06-25 Thread Flavio Pompermaier
*fs.hdfs.hadoopconf* represents the folder containing the hadoop config files (*-site.xml) or just one specific hadoop config file (e.g. core-site.xml or the hdfs-site.xml)? On Thu, Jun 25, 2015 at 3:04 PM, Robert Metzger wrote: > Hi Flavio, > > there is a file called "conf/flink-conf.yaml" > Ad

Re: writeAsCsv on HDFS

2015-06-25 Thread Robert Metzger
Hi Flavio, there is a file called "conf/flink-conf.yaml" Add a new line in the file with the following contents: fs.hdfs.hadoopconf: /path/to/your/hadoop/config This should fix the problem. Flink can not load the configuration file from the jar containing the user code, because the file system i

Re: writeAsCsv on HDFS

2015-06-25 Thread Flavio Pompermaier
Could you describe it better with an example please? Why Flink doesn't load automatically the properties of the hadoop conf files within the jar? On Thu, Jun 25, 2015 at 2:55 PM, Robert Metzger wrote: > Hi, > > Flink is not loading the Hadoop configuration from the classloader. You > have to spe

Re: writeAsCsv on HDFS

2015-06-25 Thread Robert Metzger
Hi, Flink is not loading the Hadoop configuration from the classloader. You have to specify the path to the Hadoop configuration in the flink configuration "fs.hdfs.hadoopconf" On Thu, Jun 25, 2015 at 2:50 PM, Flavio Pompermaier wrote: > Hi to all, > I'm experiencing some problem in writing a f