Re: Statefull computation

2015-08-23 Thread Aljoscha Krettek
Hi, In the example the result is not correct because the values for a,b,c and d are never forwarded from instance 2 even though they would modify the global top-k result. It works, though, if you partition by the key field (tuple field 0, in this case) before doing the summation and local top-k. I

Re: Statefull computation

2015-08-23 Thread Gyula Fóra
Hey, I am not sure if I get it, why aren't the results correct? You don't instantly get the global top-k, but you are always updating it with the new local results. Gyula Aljoscha Krettek ezt írta (időpont: 2015. aug. 23., V, 22:58): > Hi, > I wanted to post something along the same lines but

Re: Statefull computation

2015-08-23 Thread Aljoscha Krettek
Hi, I wanted to post something along the same lines but now I don't think the approach with local top-ks and merging works. For example, if you want to get top-4 and you do the pre-processing in two parallel instances. This input data would lead to incorrect results: 1. Instance: a 6 b 5 c 4 d 3

Re: Statefull computation

2015-08-23 Thread Gyula Fóra
Hey! What you are trying to do here is a global rolling aggregation, which is inherently a DOP 1 operation. Your observation is correct that if you want to use a simple stateful sink, you need to make sure that you set the parallelism to 1 in order to get correct results. What you can do is to ke

Statefull computation

2015-08-23 Thread defstat
Hi. I am struggling the past few days to find a solution on the following problem, using Apache Flink: I have a stream of vectors, represented by files in a local folder. After a new text file is located using DataStream text = env.readFileStream(...), I transform (flatMap), the Input into a Sing