Hi All,
We're running into a memory management issue when using the iterateWithTermination function. Using a small amount of data, everything works perfectly fine. However, as soon as the main memory is filled up on a worker, nothing seems to be happening any more. Once this happens, any worker whose memory is full will have its CPU workload drop to a minimum (<5%), while maintaining a full memory with no apparent garbage collection happening and thus the memory remaining full. All Tasks within this iteration are set to started, yet none of them actually do anything measurable. While runs with slightly less data (so that all intermediate results barely fit into main memory) finished within minutes, runs where the data would no longer fit would run for days with no results in sight. When using fewer workers or even running the algorithm locally, this issue already appears when using less data, which the larger cluster (with more combined memory) could still handle. Our code can be found at [1]. Best regards Ricarda [1]: https://github.com/DBDA15/graph-mining/tree/master/graph-mining-flink