A quickfix would be to take the first join and give it a "JoinHint.REPARTITION_HASH_BUILD_SECOND" hint.
The best thing would be to have batch exchanges for iterations. The second best thing would be to recognize in the optimizer that a batch exchange cannot happen (if inside an iteration) and instead set the receiver task to break the pipeline (set TempMode.makePipelineBreaker()) On Tue, Sep 8, 2015 at 12:43 PM, Ufuk Celebi <u...@apache.org> wrote: > > > On 08 Sep 2015, at 10:12, Schueler, Ricarda < > ricarda.schue...@student.hpi.uni-potsdam.de> wrote: > > > > Hi, > > > > we tested it with the version 0.9.1, but unfortunately the issue > persists. > > Thanks for helping me out debugging this Ricarda! :) > > From what I can tell, this is not a deadlock in the network runtime, but a > join deadlock within an iteration. > > https://gist.github.com/uce/3fd5ca45383402ed1b16 > > @Stephan, Fabian: What’s the best way to fix this for good? > > @Ricarda: you can work your way around this by providing > JoinHint.REPARTITION_SORT_MERGE as a join hint in the bulk iteration, i.e. > > joinedtriangles = joinedtriangles.join(graph, > JoinHint.REPARTITION_SORT_MERGE).where({triangle => > (triangle.edge3.vertex1, triangle.edge3.vertex2)}).equalTo("vertex1", > "vertex2"){ > (triangle, edge) => > triangle.edge3.triangleCount = edge.triangleCount > triangle > }.name("third triangle edge join”) > > I saw that you were benchmarking this for a project. This should impact > the runtime of your program, so you might need to re-run the experiments. > > – Ufuk > >