Re: Resolving dependencies when using sbt

2015-09-13 Thread Daniel Blazevski
nevermind on this issue, based on this (a whole different issue with Kafka): https://issues.apache.org/jira/browse/FLINK-2408 I saw the following (instead of 0.10-SNAPSHOT in build.sbt), and the error message went away: val flinkVersion = "0.9.0" libraryDependencies ++= Seq("org.apache.flink" % "

Resolving dependencies when using sbt

2015-09-13 Thread Daniel Blazevski
Hello, Earlier today, I was able to get a Flink cluster running, and successfully ran the wordcount jar file in the examples folder. I then tried to compile the WordCount example using sbt found here: https://ci.apache.org/projects/flink/flink-docs-master/quickstart/scala_api_quickstart.html#alte

Re: Distribute DataSet to subset of nodes

2015-09-13 Thread Sachin Goel
Of course, someone else might have better ideas in re the partitioner. :) On Sep 14, 2015 1:12 AM, "Sachin Goel" wrote: > Hi Stefan > Just a clarification : The output corresponding to an element based on the > whole data will be a union of the outputs based on the two halves. Is this > what you'

Re: "Not enough free slots available to run the job" for word count example

2015-09-13 Thread Sachin Goel
Hi Daniel Your problem did get solved, I assume. As for the -p flag, it determines the default parallelism of operators at runtime. If you end up specifying a value more than the slots available, that's an issue. Hope that helped. Cheers Sachin On Sep 13, 2015 9:13 PM, "Daniel Blazevski" wrote:

Re: Distribute DataSet to subset of nodes

2015-09-13 Thread Sachin Goel
Hi Stefan Just a clarification : The output corresponding to an element based on the whole data will be a union of the outputs based on the two halves. Is this what you're trying to achieve? [It appears like that since every flatMap task will independently produce outputs.] In that case, one solu

Re: "Not enough free slots available to run the job" for word count example

2015-09-13 Thread Daniel Blazevski
Hello, I am not sure if I can give updates to an email I send to the user list before getting any response, but here is a quick update: I tried to run using one processor: ./bin/flink run -p 1 ./examples/flink-java-examples-0.9.1-WordCount.jar and that worked. It seems to be an issue with confi

"Not enough free slots available to run the job" for word count example

2015-09-13 Thread Daniel Blazevski
Hello, I am new to Flink, I setup a Flink cluster on 4 m4.large Amazon EC2 instances, and set the following in link-conf.yaml: jobmanager.heap.mb: 4000 taskmanager.heap.mb: 5000 taskmanager.numberOfTaskSlots: 2 parallelism.default: 8 In the 8081 dashboard, it shows 4 for Task Manager and 5 for

Distribute DataSet to subset of nodes

2015-09-13 Thread Stefan Bunk
Hi! Following problem: I have 10 nodes on which I want to execute a flatMap operator on a DataSet. In the open method of the operator, some data is read from disk and preprocessed, which is necessary for the operator. Problem is, the data does not fit in memory on one node, however, half of the da