Hi all,
I want to run the data-flow Wordcount example on a Flink Cluster.
The local execution with „mvn exec:exec -Dinput=kinglear.txt
-Doutput=wordcounts.txt“ is already working.
How is the command to execute it on the cluster?
Best regards,
Lydia
Good day everyone,
I am looking for a good way to do the following:
I have dataset A and dataset B, and for each element in dataset A I would
like to filter dataset B and obtain the size of the result. To say it short:
*for each element a in A -> B.filter( _ < a.propertyx).count*
Currently I am
Is there any way to "broadcast" the internal list so that all processing
nodes "Know" the list, and use it either inside the connect function or in a
fold operation?
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/FLink-Streaming-Parallelize-
No, each operator would have its own local list.
In a distributed environment it is very tricky to keep global state across
all instances of operations (Flink does not support anything in this
direction). If you really need it then the only way is to set the
parallelism of the operator to 1. This
Will that keep a global list for all execution environment?
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/FLink-Streaming-Parallelize-ArrayList-tp2957p2959.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at
Hi,
I think for what you are trying a Connected FlatMap would be best suited.
Using this, you can have an operation that has two input streams:
input1 = env.socketStream(...)
input2 = env.socketStream(...)
result = input1.connect(input2)
.flatMap(new CoFlatMapFunction {
void flatMap1(.