Re: Dataset output callback

2015-05-26 Thread Flavio Pompermaier
Any insight about this? On Tue, May 26, 2015 at 5:58 PM, Flavio Pompermaier wrote: > Hi to all, > > In my program I'd like to infer from a mysql table the list of directory I > have to output on hdfs (status=0). > Once my job finish to clean each directory and update the status value of > my sql

Re: Hash join failing

2015-05-26 Thread Sebastian
Switching the sides worked (I tried that shortly after sending the mail). Thanks for the fast response :) On 26.05.2015 22:26, Stephan Ewen wrote: If you have this case, giving more memory is fighting a symptom, rather than a cause. If you really have that many duplicates in the data set (and

Hash join failing

2015-05-26 Thread Sebastian
Hi, What can I do to give Flink more memory when running it from my IDE? I'm getting the following exception: Caused by: java.lang.RuntimeException: Hash join exceeded maximum number of recursions, without reducing partitions enough to be memory resident. Probably cause: Too many duplicate k

Re: Hash join failing

2015-05-26 Thread Stephan Ewen
If you have this case, giving more memory is fighting a symptom, rather than a cause. If you really have that many duplicates in the data set (and you have not just a bad implementation of "hashCode()"), then try the following: 1) Reverse hash join sides. Duplicates hurt only on the build-side, n

RE: bug? open() not getting called with RichWindowMapFunction

2015-05-26 Thread Emmanuel
Great, thanks...Is there a roadmap somewhere that can be consulted with a timeline for the next RC or milestone? Thanks. Date: Sat, 23 May 2015 23:10:59 +0200 Subject: Re: bug? open() not getting called with RichWindowMapFunction From: gyf...@apache.org To: user@flink.apache.org Hi, This was a k

Re: Recursive directory reading error

2015-05-26 Thread Maximilian Michels
Pushed a fix to the master and will open a PR to programmatically fix this. On Tue, May 26, 2015 at 4:22 PM, Flavio Pompermaier wrote: > Yeap, that definitively solves the problem! Could you make a PR to fix > that..? > > Thank you in advance, > Flavio > > On Tue, May 26, 2015 at 3:20 PM, Maximi

Dataset output callback

2015-05-26 Thread Flavio Pompermaier
Hi to all, In my program I'd like to infer from a mysql table the list of directory I have to output on hdfs (status=0). Once my job finish to clean each directory and update the status value of my sql table. How can I do that in Flink? Is there any callback on the dataset.output() finish? Best,

Re: Visibility of FileInputFormat constants

2015-05-26 Thread Flavio Pompermaier
Thanks ;) On Tue, May 26, 2015 at 5:08 PM, Fabian Hueske wrote: > Done > > 2015-05-26 16:59 GMT+02:00 Flavio Pompermaier : > >> Could you do that so I can avoid to make a PR just for that, please? >> >> On Tue, May 26, 2015 at 4:58 PM, Fabian Hueske wrote: >> >>> Definitely! Much better than us

Re: Visibility of FileInputFormat constants

2015-05-26 Thread Fabian Hueske
Done 2015-05-26 16:59 GMT+02:00 Flavio Pompermaier : > Could you do that so I can avoid to make a PR just for that, please? > > On Tue, May 26, 2015 at 4:58 PM, Fabian Hueske wrote: > >> Definitely! Much better than using the String value. >> >> 2015-05-26 16:55 GMT+02:00 Flavio Pompermaier : >>

Re: Visibility of FileInputFormat constants

2015-05-26 Thread Flavio Pompermaier
Could you do that so I can avoid to make a PR just for that, please? On Tue, May 26, 2015 at 4:58 PM, Fabian Hueske wrote: > Definitely! Much better than using the String value. > > 2015-05-26 16:55 GMT+02:00 Flavio Pompermaier : > >> Hi to all, >> >> in my program I need to set "recursive.file.

Re: Visibility of FileInputFormat constants

2015-05-26 Thread Fabian Hueske
Definitely! Much better than using the String value. 2015-05-26 16:55 GMT+02:00 Flavio Pompermaier : > Hi to all, > > in my program I need to set "recursive.file.enumeration" to true and I > discovered that there's a constant for that variable in FileInputFormat > (ENUMERATE_NESTED_FILES_FLAG) bu

Visibility of FileInputFormat constants

2015-05-26 Thread Flavio Pompermaier
Hi to all, in my program I need to set "recursive.file.enumeration" to true and I discovered that there's a constant for that variable in FileInputFormat (ENUMERATE_NESTED_FILES_FLAG) but it's private. Do you think it could be a good idea to change it's visibility to public? Best, Flavio

Re: Recursive directory reading error

2015-05-26 Thread Flavio Pompermaier
Yeap, that definitively solves the problem! Could you make a PR to fix that..? Thank you in advance, Flavio On Tue, May 26, 2015 at 3:20 PM, Maximilian Michels wrote: > Yes, there is a loop to recursively search for files in directory but that > should be ok. The code fails when adding a new In

count the k-means iteration

2015-05-26 Thread Pa Rö
hi community, my k-means works fine now. thanks a lot for your help. now i want test something, how is the best way in flink to cout the iteration? best regards, paul

Re: HBase Connection in cluster

2015-05-26 Thread Flavio Pompermaier
In the src/main/resources folder of the project On Tue, May 26, 2015 at 2:42 PM, Hilmi Yildirim wrote: > Where do you put the hbase-site.xml? In the resource folder of the > project or on the cluster? > > > Am 26.05.2015 um 14:12 schrieb Flavio Pompermaier: > > I usually put those connection pa

Re: Recursive directory reading error

2015-05-26 Thread Maximilian Michels
Yes, there is a loop to recursively search for files in directory but that should be ok. The code fails when adding a new InputSplit to an ArrayList. This is a standard operation. Oh, I think I found a bug in `addNestedFiles`. It does not pick up the length of the recursively found files in line 5

Re: HBase Connection in cluster

2015-05-26 Thread Hilmi Yildirim
Where do you put the hbase-site.xml? In the resource folder of the project or on the cluster? Am 26.05.2015 um 14:12 schrieb Flavio Pompermaier: I usually put those connection params inside the hbase-site.xml that will be included in the generated jar.. On Tue, May 26, 2015 at 2:07 PM, Hilmi

Re: Recursive directory reading error

2015-05-26 Thread Flavio Pompermaier
I have 10 files..I debugged the code and it seems that there's a loop in the FileInputFormat when files are nested far away from the root directory of the scan On Tue, May 26, 2015 at 2:14 PM, Robert Metzger wrote: > Hi Flavio, > > how many files are in the directory? > You can count with "find

Re: Recursive directory reading error

2015-05-26 Thread Robert Metzger
Hi Flavio, how many files are in the directory? You can count with "find /tmp/myDir | wc -l" Flink running out of memory while creating input splits indicates to me that there are a lot of files in there. On Tue, May 26, 2015 at 2:10 PM, Flavio Pompermaier wrote: > Hi to all, > > I'm trying to

Re: HBase Connection in cluster

2015-05-26 Thread Flavio Pompermaier
I usually put those connection params inside the hbase-site.xml that will be included in the generated jar.. On Tue, May 26, 2015 at 2:07 PM, Hilmi Yildirim wrote: > I want to add that it is strange that the client wants to establish a > connection to localhost but I have defined another machine

Recursive directory reading error

2015-05-26 Thread Flavio Pompermaier
Hi to all, I'm trying to recursively read a directory but it seems that the totalLength value in the FileInputformat.createInputSplits() is not computed correctly.. I have a files organized as: /tmp/myDir/A/B/cunk-1.txt /tmp/myDir/A/B/cunk-2.txt .. If I try to do the following: Configuration

Re: HBase Connection in cluster

2015-05-26 Thread Hilmi Yildirim
I want to add that it is strange that the client wants to establish a connection to localhost but I have defined another machine. Am 26.05.2015 um 14:05 schrieb Hilmi Yildirim: Hi, I implemented a job which reads data from HBASE with following code (I replaced the real address by m1.example.

HBase Connection in cluster

2015-05-26 Thread Hilmi Yildirim
Hi, I implemented a job which reads data from HBASE with following code (I replaced the real address by m1.example.com) protected Scan getScanner() { Scan scan = new Scan(); Configuration conf = HBaseConfiguration.create(); conf.set("hbase.zookeeper.q

Re: org.apache.flink.addons.hbase.TableInputFormat

2015-05-26 Thread Hilmi Yildirim
Thank you very much Am 26.05.2015 um 11:26 schrieb Robert Metzger: Hi, you need to include dependencies not part of the main flink jar into your usercode jar file. Therefore, you have to build a "fat jar". The easiest way to do that is like this: http://stackoverflow.com/questions/574594/how

Re: k means - waiting for dataset

2015-05-26 Thread Fabian Hueske
Sure, here some pseudo code: public class CentroidMover extends GroupReduceFunction { public void reduce(Iterable points, Collector out) { int cnt = 0; Centroid sum = new Centroid(0,0); for(Point p : points) { sum = sum + p // (your addition logic goes here) cnt++;

Re: k means - waiting for dataset

2015-05-26 Thread Pa Rö
thanks for your message, maybe you can give me a exsample for the GroupReduceFunction? 2015-05-22 23:29 GMT+02:00 Fabian Hueske : > There are two ways to do that: > > 1) You use a GroupReduceFunction, which gives you an iterator over all > points similar to Hadoop's ReduceFunction. > 2) You use

Re: org.apache.flink.addons.hbase.TableInputFormat

2015-05-26 Thread Robert Metzger
Hi, you need to include dependencies not part of the main flink jar into your usercode jar file. Therefore, you have to build a "fat jar". The easiest way to do that is like this: http://stackoverflow.com/questions/574594/how-can-i-create-an-executable-jar-with-dependencies-using-maven But that ap

org.apache.flink.addons.hbase.TableInputFormat

2015-05-26 Thread Hilmi Yildirim
Hi, I want to deploy my job on a cluster. Unfortunately, the job does not run because I used org.apache.flink.addons.hbase.TableInputFormat which is not included in the library of the flink folder. Therefore, I wanted to build my own flink version with mvn package -DskipTests. Again, the libra

Re: how Flink Optimizer work and what is process do it?

2015-05-26 Thread Fabian Hueske
Optimization happens automatically when you submit a batch program (DataSet API). 2015-05-26 10:27 GMT+02:00 hagersaleh : > this optimizer automatic mode or I determination it > > > > -- > View this message in context: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/how-Flin

Re: how Flink Optimizer work and what is process do it?

2015-05-26 Thread hagersaleh
this optimizer automatic mode or I determination it -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/how-Flink-Optimizer-work-and-what-is-process-do-it-tp1359p1363.html Sent from the Apache Flink User Mailing List archive. mailing list archiv

Re: how Flink Optimizer work and what is process do it?

2015-05-26 Thread Fabian Hueske
It's similar to a query optimizer in a relational database system. 2015-05-26 10:10 GMT+02:00 hagersaleh : > very thanks > what meaning Optimizer in flink? > > > > -- > View this message in context: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/how-Flink-Optimizer-work-and

Re: how Flink Optimizer work and what is process do it?

2015-05-26 Thread hagersaleh
very thanks what meaning Optimizer in flink? -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/how-Flink-Optimizer-work-and-what-is-process-do-it-tp1359p1361.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Na