Flink cluster dev environment in Docker

2015-03-16 Thread Emmanuel
FYI Posted my dev cluster deployment in Docker here: https://github.com/streamnsight/docker-flink Still need to work on aggregating the logs but I hope it can get people started easy. Cheers

Re: RuntimeException Gelly API: Memory ran out. Compaction failed.

2015-03-16 Thread Mihail Vieru
And the correct SSSPUnweighted attached. On 17.03.2015 01:23, Mihail Vieru wrote: Hi, I'm getting the following RuntimeException for an adaptation of the SingleSourceShortestPaths example using the Gelly API (see attachment). It's been adapted for unweighted graphs having vertices with Long

RuntimeException Gelly API: Memory ran out. Compaction failed.

2015-03-16 Thread Mihail Vieru
Hi, I'm getting the following RuntimeException for an adaptation of the SingleSourceShortestPaths example using the Gelly API (see attachment). It's been adapted for unweighted graphs having vertices with Long values. As an input graph I'm using the social network graph (~200MB unpacked) fro

Re: using BroadcastSet in Join/CoGroup/Cross

2015-03-16 Thread Stephan Ewen
Sure, that is totally possible. They can be used with and function. On Mon, Mar 16, 2015 at 7:04 PM, Vinh June wrote: > hello, > Is it possible to use .withBroadcastSet in other operations than Map, says > Join for example? > > > > > -- > View this message in context: > http://apache-flink-incub

using BroadcastSet in Join/CoGroup/Cross

2015-03-16 Thread Vinh June
hello, Is it possible to use .withBroadcastSet in other operations than Map, says Join for example? -- View this message in context: http://apache-flink-incubator-user-mailing-list-archive.2336050.n4.nabble.com/using-BroadcastSet-in-Join-CoGroup-Cross-tp864.html Sent from the Apache Flink (Inc

Re: Most convenient data structure for unspecified length objects

2015-03-16 Thread Vinh June
I ran into the same problem. I think it depends on the input data, in my case it is CSV of unknown size. My solution is to read as text, then process on each line and add them into Map or Array of type Any -- View this message in context: http://apache-flink-incubator-user-mailing-list-archive

Re: Most convenient data structure for unspecified length objects

2015-03-16 Thread Stephan Ewen
Ah, okay. Then how about using a List of Strings? On Mon, Mar 16, 2015 at 5:34 PM, pietro wrote: > Hi Stephan, thanks for the reply! > > My problem is that I cannot know whether I will have 0, 1,2,..or more > strings. Then, Option is not gonna help in my case :( > > > > -- > View this message in

Re: Most convenient data structure for unspecified length objects

2015-03-16 Thread pietro
Hi Stephan, thanks for the reply! My problem is that I cannot know whether I will have 0, 1,2,..or more strings. Then, Option is not gonna help in my case :( -- View this message in context: http://apache-flink-incubator-user-mailing-list-archive.2336050.n4.nabble.com/Most-convenient-data-stru

Re: Most convenient data structure for unspecified length objects

2015-03-16 Thread Stephan Ewen
Hi! If you are programming in Scala, you can always use "Option[String]" for an optional String field. Stephan On Mon, Mar 16, 2015 at 4:57 PM, pietro wrote: > I have to implement a program based on Flink that process some records. > > The peculiarity of those records is that it is not possib

Most convenient data structure for unspecified length objects

2015-03-16 Thread pietro
I have to implement a program based on Flink that process some records. The peculiarity of those records is that it is not possible to know at compile time how many fields they contain. Therefore, I cannot use a simple TupleN data type. The solution I came up with, is to use a tuple with this str

Re: Scaling a Flink cluster

2015-03-16 Thread Stephan Ewen
Hi Emmanuel! Flink does not yet include JobManager failover, but we have this on the list for the mid term future (middle to second half of the year). At this point, when the JobManager dies, the job is cancelled. Greetings, Stephan On Mon, Mar 16, 2015 at 4:43 PM, Emmanuel wrote: > I see..

RE: Scaling a Flink cluster

2015-03-16 Thread Emmanuel
I see... Because of the start-cluster script, I was under the impression that the jobmanager had to connect to each node upon start-up, which would make scaling an issue without restarting the job manager, but it makes sense now. Thanks for the clarification. Side question:what happens if the j

Re: Scaling a Flink cluster

2015-03-16 Thread Stephan Ewen
Hi Emmanuel! The slaves file is not needed on every node. It is only used by the "start-cluster.sh" Script, which makes an ssh call to every host in that file to start a taskmanager. You can add a taskmanager to an existing flink cluster by simply calling "taskmanager.sh start" on that machine (w

Re: Scaling a Flink cluster

2015-03-16 Thread Ufuk Celebi
On 16 Mar 2015, at 08:27, Emmanuel wrote: > Hello, > > In my understanding, the flink-conf.yaml is the one config file to configure > a cluster. > The slave file lists the slave nodes. > they must both be on every node. The slaves file is only used for the startup script when using the bin/s

Re: Sort tuple dataset

2015-03-16 Thread Stephan Ewen
I think that depends on your use case. If you want to work on the entire dataset as a whole anyways, you can assign a Dummy-Key (like 0) to all elements, group by that key and sort the group on the actual value. What exactly is you use case? Does the above solution work there? Am 15.03.2015 17:39

Scaling a Flink cluster

2015-03-16 Thread Emmanuel
Hello, In my understanding, the flink-conf.yaml is the one config file to configure a cluster.The slave file lists the slave nodes.they must both be on every node. I'm trying to understand what is the best strategy to scale a Flink cluster since:- adding a node means adding an entry to the slave