Hello squirrels,

I want to discuss with you a few concerns I have about our current
vertex-centric model implementation, Spargel, now fully subsumed by Gelly.

Spargel is our implementation of Pregel [1], but it violates some
fundamental properties of the model, as described in the paper and as
implemented in e.g. Giraph, GPS, Hama. I often find myself confused both
when trying to explain it to current Giraph users and when porting my
Giraph algorithms to it.

More specifically:
- in the Pregel model, messages produced in superstep n, are received in
superstep n+1. In Spargel, they are produced and consumed in the same
iteration.
- in Pregel, vertices are active during a superstep, if they have received
a message in the previous superstep. In Spargel, a vertex is active during
a superstep if it has changed its value.

These two differences require a lot of rethinking when porting applications
and can easily cause bugs.

The most important problem however is that we require the user to split the
computation in 2 phases (2 UDFs):
- messaging: has access to the vertex state and can produce messages
- update: has access to incoming messages and can update the vertex value

Pregel/Giraph only expose one UDF to the user:
- compute: has access to both the vertex state and the incoming messages,
can produce messages and update the vertex value.

This might not seem like a big deal, but except from forcing the user to
split their program logic into 2 phases, Spargel also makes some common
computation patterns non-intuitive or impossible to write. A very simple
example is propagating a message based on its value or sender ID. To do
this with Spargel, one has to store all the incoming messages in the vertex
value (might be of different type btw) during the messaging phase, so that
they can be accessed during the update phase.

So, my first question is, when implementing Spargel, were other
alternatives considered and maybe rejected in favor of performance or
because of some other reason? If someone knows, I would love to hear about
them!

Second, I wrote a prototype implementation [2] that only exposes one UDF,
compute(), by keeping the vertex state in the solution set and the messages
in the workset. This way all previously mentioned limitations go away and
the API (see "SSSPComputeFunction" in the example [3]) looks a lot more
like Giraph (see [4]).

I have not run any experiments yet and the prototype has some ugly hacks,
but if you think any of this makes sense, then I'd be willing to follow up
and try to optimize it. If we see that it performs well, we can consider
either replacing Spargel or adding it as an alternative.

Thanks for reading this long e-mail and looking forward to your input!

Cheers,
-Vasia.

[1]: https://kowshik.github.io/JPregel/pregel_paper.pdf
[2]:
https://github.com/vasia/flink/tree/spargel-2/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/spargelnew
[3]:
https://github.com/vasia/flink/blob/spargel-2/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/spargelnew/example/SSSPCompute.java
[4]:
https://github.com/grafos-ml/okapi/blob/master/src/main/java/ml/grafos/okapi/graphs/SingleSourceShortestPaths.java

Reply via email to