Hi. I am struggling the past few days to find a solution on the following
problem, using Apache Flink: 

I have a stream of vectors, represented by files in a local folder. After a
new text file is located using DataStream<String> text =
env.readFileStream(...), I transform (flatMap), the Input into a
SingleOutputStreamOperator<Tuple2&lt;String, Integer>, ?>, with the Integer
being the score coming from a scoring function. 

I want to persist a global HashMap containing the top-k vectors, using their
scores. I approached the problem using a statefull transformation. 
1. The first problem I have is that the HashMap retains per-sink data (so
for each thread of workers, one HashMap of data). How can I make that a
Global collection 

2. Using Apache Spark, I made that possible by 
JavaPairDStream<String, Integer> stateDstream =
tuples.updateStateByKey(updateFunction); 

and then making transformations on the stateDstream. Is there a way I can
get the same functionality using FLink? 

Thanks in advance! 



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Statefull-computation-tp2494.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.

Reply via email to