Re: [DISCUSS] Gelly iteration abstractions

Stephan Ewen Mon, 23 Feb 2015 07:21:10 -0800

Closed-loop iterations are much more efficient right now. Long for loops
suffer from memory fragmentation (an issue that is in the list to fix).


Also, closed loops can be stateful (delta iterations) and do not require
task re-deployment.

On Mon, Feb 23, 2015 at 4:15 PM, Vasiliki Kalavri <vasilikikala...@gmail.com
> wrote:

> I see that's cool :-)
> So, what is the advantage of closed-loop versus for-loop iterations?
> Custom convergence criteria / aggregators and more efficient execution
> plans?
>
> On 23 February 2015 at 15:01, Stephan Ewen <se...@apache.org> wrote:
>
> > For loops are basically rolled out - they yield long execution plans.
> >
> > On Mon, Feb 23, 2015 at 2:44 PM, Vasiliki Kalavri <
> > vasilikikala...@gmail.com
> > > wrote:
> >
> > > for-loop iterations could cover some cases, I guess, when the number of
> > > iterations is known beforehand.
> > > Are there currently any restrictions on what can be used inside a
> > for-loop?
> > > How are they translated into execution plans?
> > >
> > > On 23 February 2015 at 13:08, Stephan Ewen <se...@apache.org> wrote:
> > >
> > > > Some things may not work well as "closed-loop" iterations.
> > > >
> > > > Is it possible to express those as for-loop iterations?
> > > >
> > > > On Mon, Feb 23, 2015 at 1:03 PM, Vasiliki Kalavri <
> > > > vasilikikala...@gmail.com
> > > > > wrote:
> > > >
> > > > > Hi Stephan,
> > > > >
> > > > > yes, this would work for the cases where an algorithm only updates
> > the
> > > > > vertex values or only updates the edge values.
> > > > >
> > > > > What we would like to also support is
> > > > > (a) algorithms where both vertices and edges are updated in one
> > > iteration
> > > > > (b) algorithms where the graph structure changes from one iteration
> > to
> > > > the
> > > > > next and
> > > > > (c) branching inside an iteration, i.e. executing a different
> > > "iteration
> > > > > body" based on some condition.
> > > > >
> > > > > We can still currently implement those with regular Flink iteration
> > > > > operators, but the resulting code is not that nice or efficient.
> > > > > For example, if we want to update both edges and vertex values, we
> > can
> > > > > still create a solution set where the vertex values are attached to
> > > each
> > > > > edge.
> > > > > Regarding branching inside an iteration, we can use an aggregator
> > that
> > > > > tracks the iteration phase, but then we need to somehow make the
> > > > different
> > > > > phases to consist of the same operators and also check the
> branching
> > > > > condition inside each UDF.
> > > > >
> > > > > Cheers,
> > > > > V.
> > > > >
> > > > >
> > > > > On 23 February 2015 at 11:05, Stephan Ewen <se...@apache.org>
> wrote:
> > > > >
> > > > > > As a workaround, it should always work to get the Edge and Vertex
> > > data
> > > > > set
> > > > > > from the graph and use the regular Fink iteration operators?
> > > > > >
> > > > > >
> > > > > > On Sun, Feb 22, 2015 at 4:53 PM, Vasiliki Kalavri <
> > > > > > vasilikikala...@gmail.com
> > > > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > yes, I was referring to the parallel Boruvka algorithm. There
> are
> > > > > several
> > > > > > > ways to implement this one in Flink and I believe that the one
> > > > > described
> > > > > > in
> > > > > > > the paper (vertex-centric) is not the most elegant one :)
> > > > > > >
> > > > > > > Andra is now working on an idea that uses the delta iteration
> > > > > abstraction
> > > > > > > and we believe that it will be both more efficient and easier
> to
> > > > > > > understand. It has the edges in the solution set and the
> vertices
> > > in
> > > > > the
> > > > > > > workset, so it follows the pattern I describe in (2) in my
> > previous
> > > > > > e-mail.
> > > > > > > As a next step, we would like to see how having an iteration
> > > operator
> > > > > > that
> > > > > > > could update the whole graph -what I describe as (3)- would
> make
> > > this
> > > > > > even
> > > > > > > nicer.
> > > > > > >
> > > > > > > Any ideas are highly welcome!
> > > > > > >
> > > > > > > Cheers,
> > > > > > > V.
> > > > > > >
> > > > > > > On 22 February 2015 at 16:32, Andra Lungu <
> lungu.an...@gmail.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi Alex,
> > > > > > > >
> > > > > > > > Vasia is talking about the second version(presented Friday)
> of
> > > > > Parallel
> > > > > > > > Boruvka, which can be found here:
> > > > > > > > https://github.com/TU-Berlin-DIMA/IMPRO-3.WS14/pull/59
> > > > > > > >
> > > > > > > > I will propose the third, non-Pregel like approach directly
> to
> > > > Gelly
> > > > > > > soon.
> > > > > > > >
> > > > > > > > If you have additional questions, I will be happy to answer
> > them.
> > > > > > > >
> > > > > > > > Andra
> > > > > > > >
> > > > > > > > On Sun, Feb 22, 2015 at 4:23 PM, Alexander Alexandrov <
> > > > > > > > alexander.s.alexand...@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > Hi Vasia,
> > > > > > > > >
> > > > > > > > > I am trying to look at the problem in more detail. Which
> > > version
> > > > of
> > > > > > the
> > > > > > > > MST
> > > > > > > > > are you talking about?
> > > > > > > > >
> > > > > > > > > Right now in the Gelly repository I can only find the SSSP
> > > > example
> > > > > > > > > (parallel Bellman-Ford) from Section 4.2 in [1].
> > > > > > > > >
> > > > > > > > > However, it seems that the issues encountered by Andra are
> > > > related
> > > > > to
> > > > > > > the
> > > > > > > > > implementation of Parallel Boruvka (Section 3.2 in [2]). Is
> > > that
> > > > > > > correct?
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > > A.
> > > > > > > > >
> > > > > > > > > [1] http://www.vldb.org/pvldb/vol7/p1047-han.pdf
> > > > > > > > > [2] http://www.vldb.org/pvldb/vol7/p577-salihoglu.pdf
> > > > > > > > >
> > > > > > > > > 2015-02-19 21:03 GMT+01:00 Vasiliki Kalavri <
> > > > > > vasilikikala...@gmail.com
> > > > > > > >:
> > > > > > > > >
> > > > > > > > > > Hello beautiful Flink people,
> > > > > > > > > >
> > > > > > > > > > during the past few days, Andra and I have been
> discussing
> > > > about
> > > > > > how
> > > > > > > to
> > > > > > > > > > extend Gelly's iteration methods.
> > > > > > > > > >
> > > > > > > > > > Alexander's course (and his awesome students) has made it
> > > > obvious
> > > > > > > that
> > > > > > > > > > vertex-centric iterations are not the best fit for
> > algorithms
> > > > > which
> > > > > > > > don't
> > > > > > > > > > follow the common "propagate-update" pattern. For
> example,
> > > > Andra
> > > > > is
> > > > > > > > > working
> > > > > > > > > > on an implementation of Minimum Spanning Tree, which
> > requires
> > > > > > > branching
> > > > > > > > > > inside an iteration and also requires a convergence check
> > of
> > > an
> > > > > > > > internal
> > > > > > > > > > iteration. Others also reported similar issues [1, 2].
> > Trying
> > > > to
> > > > > > fit
> > > > > > > > such
> > > > > > > > > > algorithms to the vertex-centric model leads to long and
> > ugly
> > > > > code,
> > > > > > > > e.g.
> > > > > > > > > > aggregators to keep track of algorithm phases,
> duplicating
> > > > data,
> > > > > > etc.
> > > > > > > > > >
> > > > > > > > > > One limitation of the vertex-centric and the upcoming GAS
> > > model
> > > > > is
> > > > > > > that
> > > > > > > > > > they both only allow the vertex values to be updated in
> > each
> > > > > > > iteration.
> > > > > > > > > > However, for some algorithms we need to update the edge
> > > values
> > > > > and
> > > > > > in
> > > > > > > > > > others we need to update both. In even more complex
> > > situations
> > > > > > (like
> > > > > > > > > > Andra's MST) in some iterations we need to update the
> > vertex
> > > > > values
> > > > > > > and
> > > > > > > > > in
> > > > > > > > > > some iterations we need to update the edge values.
> > > > > > > > > > Another problem is that we currently don't have a way to
> > > allow
> > > > > > > > different
> > > > > > > > > > computational phases inside an iteration. This is
> something
> > > > that
> > > > > > > Giraph
> > > > > > > > > > solves with master compute, a function that is executed
> > once
> > > > > before
> > > > > > > > each
> > > > > > > > > > superstep and sets the computation function.
> > > > > > > > > >
> > > > > > > > > > All that said, I believe that we can solve most of these
> > > issues
> > > > > if
> > > > > > we
> > > > > > > > > > nicely expose Flink's iteration operators in Gelly. I can
> > see
> > > > the
> > > > > > > > > following
> > > > > > > > > > cases:
> > > > > > > > > >
> > > > > > > > > > 1. Bulk & delta iterations where the solution set is the
> > > vertex
> > > > > > > > dataset:
> > > > > > > > > > this will be similar to vertex-centric and GAS, but will
> > > allow
> > > > > more
> > > > > > > > > > flexible dataflows inside the iteration.
> > > > > > > > > > 2. Bulk & delta iterations where the solution set is the
> > edge
> > > > > > > dataset:
> > > > > > > > > for
> > > > > > > > > > the cases where we need to update edge values.
> > > > > > > > > > 3. Bulk & delta iterations where the solution set is the
> > > Graph:
> > > > > > this
> > > > > > > > will
> > > > > > > > > > cover more complex cases, where the algorithm updates
> both
> > > > > vertices
> > > > > > > and
> > > > > > > > > > edges or even adds/removes vertices/edges, i.e. updates
> the
> > > > whole
> > > > > > > > Graph.
> > > > > > > > > >
> > > > > > > > > > What do you think? I can see 1 & 2 being very easy to
> > > > implement,
> > > > > > but
> > > > > > > I
> > > > > > > > > > suspect 3 won't be that easy (but so awesome to have ^^).
> > > > > > > > > > Would it work the way a Graph is represented now, i.e.
> > with 2
> > > > > > > DataSets?
> > > > > > > > > >
> > > > > > > > > > Any comment, idea, pointer would be much appreciated!
> Thank
> > > you
> > > > > ^^
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > > -V.
> > > > > > > > > >
> > > > > > > > > > [1]:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-incubator-user-mailing-list-archive.2336050.n4.nabble.com/Can-a-master-class-control-the-superstep-in-Flink-Spargel-td733.html
> > > > > > > > > > [2]:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://issues.apache.org/jira/browse/FLINK-1552?focusedCommentId=14325769&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14325769
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Gelly iteration abstractions

Reply via email to