Re: Datastream - writeAsCsv creates empty File

2017-01-27 Thread Timo Walther
Hi Nico, writeAsCsv has limited functionality in this case. I recommend to use the Bucketing File Sink[1] where you can specify a interval and batch size when to flush. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/connectors/filesystem_sink.html#bucketing-file-sink

Datastream - writeAsCsv creates empty File

2017-01-27 Thread Nico
Hi, I am running my Flink job in the local IDE and want to write the results in a csv file using: stream.writeAsCsv("...", FileSystem.WriteMode.OVERWRITE).setParallelism(1) While the file is created, it is empty inside. However, writeAsText works. I have checked the CsvOutputFormat and I think t

Re: writeAsCSV with partitionBy

2016-05-25 Thread Aljoscha Krettek
gt; > > > [1] > > > https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/streaming/connectors/fs/RollingSink.html > > > > > > -- > View this message in context: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/w

Re: writeAsCSV with partitionBy

2016-05-24 Thread KirstiLaurila
rg/projects/flink/flink-docs-master/api/java/org/apache/flink/streaming/connectors/fs/RollingSink.html -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/writeAsCSV-with-partitionBy-tp4893p7140.html Sent from the Apache Flink User Mailing Li

Re: writeAsCSV with partitionBy

2016-05-24 Thread Juho Autio
https://issues.apache.org/jira/browse/FLINK-3961 >> <https://issues.apache.org/jira/browse/FLINK-3961> >> >> >> >> >> -- >> View this message in context: >> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/writeAsCSV-with

Re: writeAsCSV with partitionBy

2016-05-24 Thread Srikanth
Tue, May 24, 2016 at 1:07 AM, KirstiLaurila wrote: > Yeah, created this one https://issues.apache.org/jira/browse/FLINK-3961 > <https://issues.apache.org/jira/browse/FLINK-3961> > > > > > -- > View this message in context: > http://apache-flink-user-mailing-lis

Re: writeAsCSV with partitionBy

2016-05-23 Thread KirstiLaurila
Yeah, created this one https://issues.apache.org/jira/browse/FLINK-3961 <https://issues.apache.org/jira/browse/FLINK-3961> -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/writeAsCSV-with-partitionBy-tp4893p7118.html Sent from the

Re: writeAsCSV with partitionBy

2016-05-23 Thread Fabian Hueske
he near future? > > > > -- > View this message in context: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/writeAsCSV-with-partitionBy-tp4893p7099.html > Sent from the Apache Flink User Mailing List archive. mailing list archive > at Nabble.com. >

Re: writeAsCSV with partitionBy

2016-05-23 Thread KirstiLaurila
Is there any plans to implement this kind of feature (possibility to write to data specified partitions) in the near future? -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/writeAsCSV-with-partitionBy-tp4893p7099.html Sent from the Apache

Re: writeAsCSV with partitionBy

2016-02-16 Thread Fabian Hueske
val sink = data1.join(data2) > .where(1).equalTo(0) { ((l,r) => ( l._3, r._3) ) } > .partitionByHash(0) > .writeAsCsv(pathBase + "output/test", rowDelimiter="\n", > fieldDelimiter="\t" , WriteMode.OVERWRITE) > > This will create f

Re: writeAsCSV with partitionBy

2016-02-16 Thread Srikanth
Fabian, Not sure if we are on the same page. If I do something like below code, it will groupby field 0 and each task will write a separate part file in parallel. val sink = data1.join(data2) .where(1).equalTo(0) { ((l,r) => ( l._3, r._3) ) } .partitionByHash(0) .writeAs

Re: writeAsCSV with partitionBy

2016-02-15 Thread Fabian Hueske
Hi Srikanth, DataSet.partitionBy() will partition the data on the declared partition fields. If you append a DataSink with the same parallelism as the partition operator, the data will be written out with the defined partitioning. It should be possible to achieve the behavior you described using D

writeAsCSV with partitionBy

2016-02-12 Thread Srikanth
Hello, Is there a Hive(or Spark dataframe) partitionBy equivalent in Flink? I'm looking to save output as CSV files partitioned by two columns(date and hour). The partitionBy dataset API is more to partition the data based on a column for further processing. I'm thinking there is no direct

Re: Flink writeAsCsv

2016-02-04 Thread Fabian Hueske
> custom created evictor. In the window and in the evictor you have access to > all data and you can create specific files for each window triggered > > > > > > > > *From:* Radu Prodan [mailto:raduprod...@gmail.com] > *Sent:* Thursday, February 04, 2016 11:58 AM &

RE: Flink writeAsCsv

2016-02-04 Thread Radu Tudoran
triggered From: Radu Prodan [mailto:raduprod...@gmail.com] Sent: Thursday, February 04, 2016 11:58 AM To: user@flink.apache.org Subject: Re: Flink writeAsCsv Hi Marton, Thanks to your comment I managed to get it worked. At least it outputs the results. However, what I need is to output each

Re: Flink writeAsCsv

2016-02-04 Thread Radu Prodan
wrote: > Hey Radu, > > As you are using the streaming api I assume that you call env.execute() in > both cases. Is that the case? > > Do you see any errors appearing? My first call would be if your data type > is not a tuple type then writeAsCsv does not work by default. >

Re: Flink writeAsCsv

2016-02-04 Thread Márton Balassi
Hey Radu, As you are using the streaming api I assume that you call env.execute() in both cases. Is that the case? Do you see any errors appearing? My first call would be if your data type is not a tuple type then writeAsCsv does not work by default. Best, Marton On Thu, Feb 4, 2016 at 11:36

Flink writeAsCsv

2016-02-04 Thread Radu Prodan
Hi all, I am new to flink. I wrote a simple program and I want it to output as csv file. timeWindowAll(Time.of(3, TimeUnit.MINUTES)) .apply(newFunction1()) .writeAsCsv("file:///user/someuser/Documents/somefile.csv"); When I change the sink to . print(), it works and outputs some r

Re: writeAsCsv

2015-10-07 Thread Lydia Ickler
> dataList = new LinkedList String>>(); > double[][] arr = filteredSpectrum.getAs2DDoubleArray(); > for (int i=0;i dataList.add(new Tuple2( arr[0][i], arr[1][i])); > } > env.fromCollection(dataList).writeAsCsv(output); > Best regards, > Lydia >

Re: writeAsCsv

2015-10-07 Thread Robert Schmidtke
Hi, as far as I know only collect, print and execute actually trigger the execution. What you're missing is env.execute() after the writeAsCsv call. Hope this helps. On Wed, Oct 7, 2015 at 9:35 PM, Lydia Ickler wrote: > Hi, > > stupid question: Why is this not saved to fil

writeAsCsv

2015-10-07 Thread Lydia Ickler
Hi, stupid question: Why is this not saved to file? I want to transform an array to a DataSet but the Graph stops at collect(). //Transform Spectrum to DataSet List> dataList = new LinkedList>(); double[][] arr = filteredSpectrum.getAs2DDoubleArray(); for (int i=0;i

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

2015-07-03 Thread Maximilian Michels
ote: >> >>> Hi Max, >>> >>> thank you for your reply. I wanted to revise and dismiss all other >>> factors before writing back. I've attached you my code and sample input >>> data. >>> >>> I run the *APSPNaiveJob* using the follo

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

2015-07-02 Thread Mihail Vieru
/0 100 hdfs://path/to/vertices-test-100 hdfs://path/to/edges-test-100 hdfs://path/to/tempgraph 10 0.5 hdfs://path/to/output-apsp 9/ I was wrong, I originally thought that the first writeAsCsv call (line 50) doesn't work. An exception is thrown without

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

2015-07-02 Thread Maximilian Michels
ample input >> data. >> >> I run the *APSPNaiveJob* using the following arguments: >> >> *0 100 hdfs://path/to/vertices-test-100 hdfs://path/to/edges-test-100 >> hdfs://path/to/tempgraph 10 0.5 hdfs://path/to/output-apsp 9* >> >> I was wrong, I or

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

2015-07-02 Thread Maximilian Michels
mple input data. > > I run the *APSPNaiveJob* using the following arguments: > > *0 100 hdfs://path/to/vertices-test-100 hdfs://path/to/edges-test-100 > hdfs://path/to/tempgraph 10 0.5 hdfs://path/to/output-apsp 9* > > I was wrong, I originally thought that the first writeAsCsv

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

2015-06-30 Thread Maximilian Michels
change the file names on every write? Cheers, Max On Tue, Jun 30, 2015 at 3:47 PM, Mihail Vieru wrote: > I think my problem is related to a loop in my job. > > Before the loop, the writeAsCsv method works fine, even in overwrite mode. > > In the loop, in the first iteration, it

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

2015-06-30 Thread Mihail Vieru
I think my problem is related to a loop in my job. Before the loop, the writeAsCsv method works fine, even in overwrite mode. In the loop, in the first iteration, it writes an empty folder containing empty files to HDFS. Even though the DataSet it is supposed to write contains elements

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

2015-06-30 Thread Mihail Vieru
Hi Till, thank you for your reply. I have the following code snippet: /intermediateGraph.getVertices().writeAsCsv(tempGraphOutputPath, "\n", ";", WriteMode.OVERWRITE);/ When I remove the WriteMode parameter, it works. So I can reason that the DataSet contains dat

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

2015-06-30 Thread Till Rohrmann
.de> wrote: > Hi, > > the writeAsCsv method is not writing anything to HDFS (version 1.2.1) when > the WriteMode is set to OVERWRITE. > A file is created but it's empty. And no trace of errors in the Flink or > Hadoop logs on all nodes in the cluster. > > What could cau

writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

2015-06-30 Thread Mihail Vieru
Hi, the writeAsCsv method is not writing anything to HDFS (version 1.2.1) when the WriteMode is set to OVERWRITE. A file is created but it's empty. And no trace of errors in the Flink or Hadoop logs on all nodes in the cluster. What could cause this issue? I really really need this fe

Re: writeAsCsv on HDFS

2015-06-25 Thread Hawin Jiang
HI Flavio Here is the example from Marton: You can used env.writeAsText method directly. StreamExecutionEnvironment env = StreamExecutionEnvironment. getExecutionEnvironment(); env.addSource(PerisitentKafkaSource(..)) .map(/* do you operations*/) .wirteAsText("hdfs://:/path/to/your/file

Re: writeAsCsv on HDFS

2015-06-25 Thread Stephan Ewen
You could also just qualify the HDFS URL, if that is simpler (put host and port of the namenode in there): "hdfs://myhost:40010/path/to/file" On Thu, Jun 25, 2015 at 3:20 PM, Robert Metzger wrote: > You have to put it into all machines > > On Thu, Jun 25, 2015 at 3:17 PM, Flavio Pompermaier > w

Re: writeAsCsv on HDFS

2015-06-25 Thread Robert Metzger
You have to put it into all machines On Thu, Jun 25, 2015 at 3:17 PM, Flavio Pompermaier wrote: > Do I have to put the hadoop conf file on each task manager or just on the > job-manager? > > On Thu, Jun 25, 2015 at 3:12 PM, Chiwan Park > wrote: > >> It represents the folder containing the hadoo

Re: writeAsCsv on HDFS

2015-06-25 Thread Flavio Pompermaier
Do I have to put the hadoop conf file on each task manager or just on the job-manager? On Thu, Jun 25, 2015 at 3:12 PM, Chiwan Park wrote: > It represents the folder containing the hadoop config files. :) > > Regards, > Chiwan Park > > > > On Jun 25, 2015, at 10:07 PM, Flavio Pompermaier > wrot

Re: writeAsCsv on HDFS

2015-06-25 Thread Chiwan Park
It represents the folder containing the hadoop config files. :) Regards, Chiwan Park > On Jun 25, 2015, at 10:07 PM, Flavio Pompermaier wrote: > > fs.hdfs.hadoopconf represents the folder containing the hadoop config files > (*-site.xml) or just one specific hadoop config file (e.g. core-site

Re: writeAsCsv on HDFS

2015-06-25 Thread Flavio Pompermaier
*fs.hdfs.hadoopconf* represents the folder containing the hadoop config files (*-site.xml) or just one specific hadoop config file (e.g. core-site.xml or the hdfs-site.xml)? On Thu, Jun 25, 2015 at 3:04 PM, Robert Metzger wrote: > Hi Flavio, > > there is a file called "conf/flink-conf.yaml" > Ad

Re: writeAsCsv on HDFS

2015-06-25 Thread Robert Metzger
Hi Flavio, there is a file called "conf/flink-conf.yaml" Add a new line in the file with the following contents: fs.hdfs.hadoopconf: /path/to/your/hadoop/config This should fix the problem. Flink can not load the configuration file from the jar containing the user code, because the file system i

Re: writeAsCsv on HDFS

2015-06-25 Thread Flavio Pompermaier
Could you describe it better with an example please? Why Flink doesn't load automatically the properties of the hadoop conf files within the jar? On Thu, Jun 25, 2015 at 2:55 PM, Robert Metzger wrote: > Hi, > > Flink is not loading the Hadoop configuration from the classloader. You > have to spe

Re: writeAsCsv on HDFS

2015-06-25 Thread Robert Metzger
Hi, Flink is not loading the Hadoop configuration from the classloader. You have to specify the path to the Hadoop configuration in the flink configuration "fs.hdfs.hadoopconf" On Thu, Jun 25, 2015 at 2:50 PM, Flavio Pompermaier wrote: > Hi to all, > I'm experiencing some problem in writing a f

writeAsCsv on HDFS

2015-06-25 Thread Flavio Pompermaier
Hi to all, I'm experiencing some problem in writing a file as csv on HDFS with flink 0.9.0. The code I use is myDataset.writeAsCsv(new Path("hdfs:///tmp", "myFile.csv").toString()); If I run the job from Eclipse everything works fine but when I deploy the job on the cluster (cloudera 5.1.3) I ob