Hello squirrels, I want to discuss with you a few concerns I have about our current vertex-centric model implementation, Spargel, now fully subsumed by Gelly.
Spargel is our implementation of Pregel [1], but it violates some fundamental properties of the model, as described in the paper and as implemented in e.g. Giraph, GPS, Hama. I often find myself confused both when trying to explain it to current Giraph users and when porting my Giraph algorithms to it. More specifically: - in the Pregel model, messages produced in superstep n, are received in superstep n+1. In Spargel, they are produced and consumed in the same iteration. - in Pregel, vertices are active during a superstep, if they have received a message in the previous superstep. In Spargel, a vertex is active during a superstep if it has changed its value. These two differences require a lot of rethinking when porting applications and can easily cause bugs. The most important problem however is that we require the user to split the computation in 2 phases (2 UDFs): - messaging: has access to the vertex state and can produce messages - update: has access to incoming messages and can update the vertex value Pregel/Giraph only expose one UDF to the user: - compute: has access to both the vertex state and the incoming messages, can produce messages and update the vertex value. This might not seem like a big deal, but except from forcing the user to split their program logic into 2 phases, Spargel also makes some common computation patterns non-intuitive or impossible to write. A very simple example is propagating a message based on its value or sender ID. To do this with Spargel, one has to store all the incoming messages in the vertex value (might be of different type btw) during the messaging phase, so that they can be accessed during the update phase. So, my first question is, when implementing Spargel, were other alternatives considered and maybe rejected in favor of performance or because of some other reason? If someone knows, I would love to hear about them! Second, I wrote a prototype implementation [2] that only exposes one UDF, compute(), by keeping the vertex state in the solution set and the messages in the workset. This way all previously mentioned limitations go away and the API (see "SSSPComputeFunction" in the example [3]) looks a lot more like Giraph (see [4]). I have not run any experiments yet and the prototype has some ugly hacks, but if you think any of this makes sense, then I'd be willing to follow up and try to optimize it. If we see that it performs well, we can consider either replacing Spargel or adding it as an alternative. Thanks for reading this long e-mail and looking forward to your input! Cheers, -Vasia. [1]: https://kowshik.github.io/JPregel/pregel_paper.pdf [2]: https://github.com/vasia/flink/tree/spargel-2/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/spargelnew [3]: https://github.com/vasia/flink/blob/spargel-2/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/spargelnew/example/SSSPCompute.java [4]: https://github.com/grafos-ml/okapi/blob/master/src/main/java/ml/grafos/okapi/graphs/SingleSourceShortestPaths.java