Hi Vasia,
You are right about the topDistance, it is the dataset which has only 1
double value. I already looked at the Aggregator and I can only get the
value of an aggregator in the next iteration. However, my problem is a bit
tricky because the topDistance controls how the newSeeds is calculate
Hi Truong,
I guess the problem is that you want to use topDistance as a broadcast set
inside the iteration? If I understand correctly this is a dataset with a
single value, right? Could you maybe compute it with an aggregator instead?
-Vasia.
On 5 July 2016 at 21:48, Nguyen Xuan Truong wrote:
Hi Vasia,
Thank you very much for your explanation :). When running with small
maxIteration, the job graph that Flink executed was optimal. However, when
maxIterations was large, Flink took very long time to generate the job
graph. The actually time to execute the jobs was very fast but the time t
Hi Truong,
I'm afraid what you're experiencing is to be expected. Currently, for loops
do not perform well in Flink since there is no support for caching
intermediate results yet. This has been a quite often requested feature
lately, so maybe it will be added soon :)
Until then, I suggest you try
Hi,
I have a Flink program which is similar to Kmeans algorithm. I use normal
iteration(for loop) because Flink iteration does not allow to compute the
intermediate results(in this case the topDistance) within one iteration.
The problem is that my program only runs when maxIteration is small. When