Hi. I am struggling the past few days to find a solution on the following problem, using Apache Flink:
I have a stream of vectors, represented by files in a local folder. After a new text file is located using DataStream<String> text = env.readFileStream(...), I transform (flatMap), the Input into a SingleOutputStreamOperator<Tuple2<String, Integer>, ?>, with the Integer being the score coming from a scoring function. I want to persist a global HashMap containing the top-k vectors, using their scores. I approached the problem using a statefull transformation. 1. The first problem I have is that the HashMap retains per-sink data (so for each thread of workers, one HashMap of data). How can I make that a Global collection 2. Using Apache Spark, I made that possible by JavaPairDStream<String, Integer> stateDstream = tuples.updateStateByKey(updateFunction); and then making transformations on the stateDstream. Is there a way I can get the same functionality using FLink? Thanks in advance! -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Statefull-computation-tp2494.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.