Re: Connecting the channel failed: Connection refused

2015-06-23 Thread Aaron Jackson
Yes, the task manager continues running. I have put together a test app to demonstrate the problem and in doing so noticed some oddities. The problem manifests itself on a simple join (I originally believed it was the distinct, I was wrong). - When the source is generated via fromCollection()

Documentation Error

2015-06-23 Thread Maximilian Alber
Hi Flinksters, just some minor: http://ci.apache.org/projects/flink/flink-docs-master/setup/yarn_setup.html in the second code sample should be ./bin/flink run -m yarn-cluster -yn 4 -yjm 1024 -ytm 4096 ./examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar instead of: ./bin/flink -m yarn-clu

Re: Best way to write data to HDFS by Flink

2015-06-23 Thread Márton Balassi
Dear Hawin, We do not have out of the box support for that, it is something you would need to implement yourself in a custom SinkFunction. Best, Marton On Mon, Jun 22, 2015 at 11:51 PM, Hawin Jiang wrote: > Hi Marton > > if we received a huge data from kafka and wrote to HDFS immediately. W

Re: Kafka0.8.2.1 + Flink0.9.0 issue

2015-06-23 Thread Márton Balassi
Dear Hawin, Sorry, I ahve managed to link to a pom that has been changed in the meantime. But we have added a section to our doc clarifying your question. [1] Since then Stephan has proposed an even nicer solution that did not make it into the doc yet, namely if you start from our quickstart pom a

Re: Building Yarn-Version of Flink

2015-06-23 Thread Stephan Ewen
I think it is a valid concern that the POM should be written such that the given command is valid. Should be only a simple change on the root POM, to make sure the Hadoop 2 profile is not only activated in the absence of a profile selection, but also when the selected profile is "2" Stephan On T

Re: Building Yarn-Version of Flink

2015-06-23 Thread Maximilian Alber
Thanks! I'm still used to 0.7 :) Cheers, Max On Tue, Jun 23, 2015 at 2:18 PM, Maximilian Michels wrote: > Hi Max! > > Nowadays, the default target when building from source is Hadoop 2. So a > simple mvn clean package -DskipTests should do it. You only need the flag > when you build for Hadoo

Re: memory flush on cluster

2015-06-23 Thread Ufuk Celebi
On 23 Jun 2015, at 13:53, Stephan Ewen wrote: > Currently, Flink does not cache anything across runs, except JAR files on the > workers. > > The reason the first run is slower may be: > - Because in the first run, code is distributed in the cluster. In > subsequent runs, the JAR files need n

Re: Building Yarn-Version of Flink

2015-06-23 Thread Maximilian Michels
Hi Max! Nowadays, the default target when building from source is Hadoop 2. So a simple mvn clean package -DskipTests should do it. You only need the flag when you build for Hadoop 1: -Dhadoop.profile=1. Cheers, The other Max On Tue, Jun 23, 2015 at 2:03 PM, Maximilian Alber < alber.maximil...@g

Building Yarn-Version of Flink

2015-06-23 Thread Maximilian Alber
Hi Flinksters, I just tried to build the current yarn version of Flink. The second error is probably a because maven is of an older version. But the first one seems to be an error. albermax@hadoop1:~/bumpboost/working/programs/flink/incubator-flink$ mvn clean package -DskipTests -Dhadoop.profile=

Re: memory flush on cluster

2015-06-23 Thread Stephan Ewen
Currently, Flink does not cache anything across runs, except JAR files on the workers. The reason the first run is slower may be: - Because in the first run, code is distributed in the cluster. In subsequent runs, the JAR files need not be redistributed. - Because the JIT takes a bit to kick in

Re: Random Shuffling

2015-06-23 Thread Maximilian Alber
Thank you! Still I cannot guarantee the size of each partition, or can I? Something like randomSplit in Spark. Cheers, Max On Mon, Jun 15, 2015 at 5:46 PM, Matthias J. Sax < mj...@informatik.hu-berlin.de> wrote: > Hi, > > using partitionCustom, the data distribution depends only on your > proba

Re: Random Selection

2015-06-23 Thread Maximilian Alber
No clue. I used the current branch aka 0.9-SNAPSHOT. Or is this something related to Scala? On Mon, Jun 22, 2015 at 4:45 PM, Stephan Ewen wrote: > Actually, the closure cleaner is supposed to take care of the "anonymous > inner class" situation. > > Did you deactivate that one, by any chance? >

memory flush on cluster

2015-06-23 Thread Pa Rö
hi flink community, to time i test my flink app with a benchmark on an hadoop cluster (flink on yarn). my results show me that flink need for the first round more time as all other rounds. maybe flink cache something in memory? and if i run the benchmark 100 rounds my system freeze, i think the me