Your arguments are perfectly valid. So, what I suggest is to have the functions as they are now, e.g. groupReduceOnNeighbors and to add a groupReduceOnNeighbors(blablaSameArguments, boolean useJoinHints). That way, the user can decide whether they'd like to trade speed for a program that actually finishes :).
On Sat, Aug 22, 2015 at 11:28 AM, Martin Junghanns <m.jungha...@mailbox.org> wrote: > Hi, > > I guess enforcing a Join Strategy by default is not the best option since > you can't assume what the user did before actually calling the Gelly > functions and how the data looks like (maybe its one of the 1% graphs where > the relation is the other way around or the vertex data set is very large); > maybe the datasets are already sorted / partitioned. Another solution could > be overloading the Gelly functions that use joins and letting the users > decide to give hints or not? > > As an example, I am currently benchmarking graphs with up to 700M vertices > and 3B edges on a YARN cluster and at one point in the job I need to join > vertices and edges. I also tried to give the broadcast-hash-second > (vertices) hint and the job performed significantly slower than letting the > system decide. > > Best, > Martin > > > On 22.08.2015 09:51, Andra Lungu wrote: > >> Hey everyone, >> >> When coding for my thesis, I observed that half of the current Gelly >> functions (the ones that use join operators) fail on a cluster environment >> with the following exception: >> >> java.lang.IllegalArgumentException: Too few memory segments provided. >> Hash Join >> needs at least 33 memory segments. >> >> This is because, in 99% of the cases, the vertex data set is significantly >> smaller than the edge data set. What I did to get rid of the error was the >> following: >> >> DataSet<Tuple2<Edge<K, EV>, Vertex<K, VV>>> edgesWithSources = edges >> .join(this.vertices, >> JoinOperatorBase.JoinHint.BROADCAST_HASH_SECOND).where(0).equalTo(0) >> >> In short, I added join hints. I believe this should also be in Gelly, in >> case someone bumps into the same problem somewhere in the future. >> >> What do you think? >> >> >