Hi everyone, I compared Flink and Spark by using PageRank. I guessed Flink will beat Spark or have the same level. But Spark is up to 4x faster then Flink. I hope I did a mistake. So please help me to improve the performance of my cluster and config.
The cluster has 4 computers: One JobManager (Quad Core with Hyper Threading -> 8 cores) and 16GB jobmanager.heap.mp)) Three TaskManager (each Quad Core with Hyper Threading -> 8 cores and 16GB (taskmanager.heap.mp)) In total 24 cores/ task slots. I ran PR as vertex-centric, scatter-gather, gather-sum-apply and with bulk iteration. The parallelism was 24. Runtime in ms: Pregel: 90.000ms SG: 64.000ms GSA: 80.000ms Bulk: 53.000ms Spark with Pregel ran in 23.000ms The input file was: https://snap.stanford.edu/data/wiki-topcats.html Thanks for helping! Marc