data flow example on cluster

2015-09-29 Thread Lydia Ickler
Hi all, I want to run the data-flow Wordcount example on a Flink Cluster. The local execution with „mvn exec:exec -Dinput=kinglear.txt -Doutput=wordcounts.txt“ is already working. How is the command to execute it on the cluster? Best regards, Lydia

For each element in a dataset, do something with another dataset

2015-09-29 Thread Pieter Hameete
Good day everyone, I am looking for a good way to do the following: I have dataset A and dataset B, and for each element in dataset A I would like to filter dataset B and obtain the size of the result. To say it short: *for each element a in A -> B.filter( _ < a.propertyx).count* Currently I am

Re: FLink Streaming - Parallelize ArrayList

2015-09-29 Thread defstat
Is there any way to "broadcast" the internal list so that all processing nodes "Know" the list, and use it either inside the connect function or in a fold operation? -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/FLink-Streaming-Parallelize-

Re: FLink Streaming - Parallelize ArrayList

2015-09-29 Thread Aljoscha Krettek
No, each operator would have its own local list. In a distributed environment it is very tricky to keep global state across all instances of operations (Flink does not support anything in this direction). If you really need it then the only way is to set the parallelism of the operator to 1. This

Re: FLink Streaming - Parallelize ArrayList

2015-09-29 Thread defstat
Will that keep a global list for all execution environment? -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/FLink-Streaming-Parallelize-ArrayList-tp2957p2959.html Sent from the Apache Flink User Mailing List archive. mailing list archive at

Re: FLink Streaming - Parallelize ArrayList

2015-09-29 Thread Aljoscha Krettek
Hi, I think for what you are trying a Connected FlatMap would be best suited. Using this, you can have an operation that has two input streams: input1 = env.socketStream(...) input2 = env.socketStream(...) result = input1.connect(input2) .flatMap(new CoFlatMapFunction { void flatMap1(.