Hi,
In the example the result is not correct because the values for a,b,c and d
are never forwarded from instance 2 even though they would modify the
global top-k result. It works, though, if you partition by the key field
(tuple field 0, in this case) before doing the summation and local top-k. I
Hey,
I am not sure if I get it, why aren't the results correct?
You don't instantly get the global top-k, but you are always updating it
with the new local results.
Gyula
Aljoscha Krettek ezt írta (időpont: 2015. aug. 23.,
V, 22:58):
> Hi,
> I wanted to post something along the same lines but
Hi,
I wanted to post something along the same lines but now I don't think the
approach with local top-ks and merging works. For example, if you want to
get top-4 and you do the pre-processing in two parallel instances. This
input data would lead to incorrect results:
1. Instance:
a 6
b 5
c 4
d 3
Hey!
What you are trying to do here is a global rolling aggregation, which is
inherently a DOP 1 operation. Your observation is correct that if you want
to use a simple stateful sink, you need to make sure that you set the
parallelism to 1 in order to get correct results.
What you can do is to ke
Hi. I am struggling the past few days to find a solution on the following
problem, using Apache Flink:
I have a stream of vectors, represented by files in a local folder. After a
new text file is located using DataStream text =
env.readFileStream(...), I transform (flatMap), the Input into a
Sing