Flink programm with for loop yields wrong results when run in parallel

Adrian Bartnik Mon, 04 Jul 2016 02:56:58 -0700

Hi,

I have a Flink programm, which outputs wrong results once I set theparallelism to a value larger that 1.

If I run the programm with parallelism 1, everything works fine.

The algorithm works on one input dataset, which will iteratively besplit until the desired output split size is reached.The way how to split the cluster in each iteration is also determinediteratively.


Pseudocode:

val input = DataSet

for (currentSplitNumber <- 1 to numberOfSplits) { // Split dataset untildesired #splits was reached

    // Iteratively compute best split
    Dataset determinedSplit = Iteration involving input

    // Split dataset to 2 smaller ones
    val tmpDataSet1 = determinedSplit.filter(x ==1) ...
    val tmpDataSet2 = determinedSplit.filter(x ==0) ...

tmpDataSet1.count() // These are necessary, to store the size ofeach split

    tmpDataSet2.count()

// Store tmpDataSet1 and 2 as they are needed in one of the nextloop executions (as dataset to be split)

    ...

}

In all comes down to 2 nested loops, one of which can be replaced by aiteration.As nested iterations are not supported yet, I do not know how to avoidthe outer loop.


Is this a know problem, and if yes, what would be a solution?

Best,
Adrian

Flink programm with for loop yields wrong results when run in parallel

Reply via email to