@Martin: thanks for your input! If you ran into any other issues that I didn't mention, please let us know. Obviously, even with my proposal, there are still features we cannot support, e.g. updating edge values and graph mutations. We'll need to re-think the underlying iteration and/or graph representation for those.
@Fabian: thanks a lot, no rush :) Let me give you some more information that might make it easier to reason about performance: Currently, in Spargel the SolutionSet (SS) keeps the vertex state and the workset (WS) keeps the active vertices. The iteration is composed of 2 coGroups. The first one takes the WS and the edges and produces messages. The second one takes the messages and the SS and produced the new WS and the SS-delta. In my proposal, the SS has the vertex state and the WS has <vertexId, MessageIterator> pairs, i.e. the inbox of each vertex. The plan is more complicated because compute() needs to have two iterators: over the edges and over the messages. First, I join SS and WS to get the active vertices (have received a msg) and their current state. Then I coGroup the result with the edges to access the neighbors. Now the main problem is that this coGroup needs to have 2 outputs: the new messages and the new vertex value. I couldn't really find a nice way to do this, so I'm emitting a Tuple that contains both types and I have a flag to separate them later with 2 flatMaps. From the vertex flatMap, I crete the SS-delta and from the messaged flatMap I apply a reduce to group the messages by vertex and send them to the new WS. One optimization would be to expose a combiner here to reduce message size. tl;dr: 1. 2 coGroups vs. Join + coGroup + flatMap + reduce 2. how can we efficiently emit 2 different types of records from a coGroup? 3. does it make any difference if we group/combine the messages before updating the workset or after? Cheers, -Vasia. On 27 October 2015 at 18:39, Fabian Hueske <fhue...@gmail.com> wrote: > I'll try to have a look at the proposal from a performance point of view in > the next days. > Please ping me, if I don't follow up this thread. > > Cheers, Fabian > > 2015-10-27 18:28 GMT+01:00 Martin Junghanns <m.jungha...@mailbox.org>: > > > Hi, > > > > At our group, we also moved several algorithms from Giraph to Gelly and > > ran into some confusing issues (first in understanding, second during > > implementation) caused by the conceptional differences you described. > > > > If there are no concrete advantages (performance mainly) in the Spargel > > implementation, we would be very happy to see the Gelly API be aligned to > > Pregel-like systems. > > > > Your SSSP example speaks for itself. Straightforward, if the reader is > > familiar with Pregel/Giraph/... > > > > Best, > > Martin > > > > > > On 27.10.2015 17:40, Vasiliki Kalavri wrote: > > > >> Hello squirrels, > >> > >> I want to discuss with you a few concerns I have about our current > >> vertex-centric model implementation, Spargel, now fully subsumed by > Gelly. > >> > >> Spargel is our implementation of Pregel [1], but it violates some > >> fundamental properties of the model, as described in the paper and as > >> implemented in e.g. Giraph, GPS, Hama. I often find myself confused both > >> when trying to explain it to current Giraph users and when porting my > >> Giraph algorithms to it. > >> > >> More specifically: > >> - in the Pregel model, messages produced in superstep n, are received in > >> superstep n+1. In Spargel, they are produced and consumed in the same > >> iteration. > >> - in Pregel, vertices are active during a superstep, if they have > received > >> a message in the previous superstep. In Spargel, a vertex is active > during > >> a superstep if it has changed its value. > >> > >> These two differences require a lot of rethinking when porting > >> applications > >> and can easily cause bugs. > >> > >> The most important problem however is that we require the user to split > >> the > >> computation in 2 phases (2 UDFs): > >> - messaging: has access to the vertex state and can produce messages > >> - update: has access to incoming messages and can update the vertex > value > >> > >> Pregel/Giraph only expose one UDF to the user: > >> - compute: has access to both the vertex state and the incoming > messages, > >> can produce messages and update the vertex value. > >> > >> This might not seem like a big deal, but except from forcing the user to > >> split their program logic into 2 phases, Spargel also makes some common > >> computation patterns non-intuitive or impossible to write. A very simple > >> example is propagating a message based on its value or sender ID. To do > >> this with Spargel, one has to store all the incoming messages in the > >> vertex > >> value (might be of different type btw) during the messaging phase, so > that > >> they can be accessed during the update phase. > >> > >> So, my first question is, when implementing Spargel, were other > >> alternatives considered and maybe rejected in favor of performance or > >> because of some other reason? If someone knows, I would love to hear > about > >> them! > >> > >> Second, I wrote a prototype implementation [2] that only exposes one > UDF, > >> compute(), by keeping the vertex state in the solution set and the > >> messages > >> in the workset. This way all previously mentioned limitations go away > and > >> the API (see "SSSPComputeFunction" in the example [3]) looks a lot more > >> like Giraph (see [4]). > >> > >> I have not run any experiments yet and the prototype has some ugly > hacks, > >> but if you think any of this makes sense, then I'd be willing to follow > up > >> and try to optimize it. If we see that it performs well, we can consider > >> either replacing Spargel or adding it as an alternative. > >> > >> Thanks for reading this long e-mail and looking forward to your input! > >> > >> Cheers, > >> -Vasia. > >> > >> [1]: https://kowshik.github.io/JPregel/pregel_paper.pdf > >> [2]: > >> > >> > https://github.com/vasia/flink/tree/spargel-2/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/spargelnew > >> [3]: > >> > >> > https://github.com/vasia/flink/blob/spargel-2/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/spargelnew/example/SSSPCompute.java > >> [4]: > >> > >> > https://github.com/grafos-ml/okapi/blob/master/src/main/java/ml/grafos/okapi/graphs/SingleSourceShortestPaths.java > >> > >> >