Re: Join hints for the Gelly functions

2015-08-22 Thread Stephan Ewen
This is an interesting issue, because, quite frankly, the join hint you passed simply reversed the sides of the join. The algorithm is still the same and has the same minimum memory requirements. The fact that it made a difference is quite curious. The only thing I can imagine is that this hint ch

Re: Join hints for the Gelly functions

2015-08-22 Thread Andra Lungu
Your arguments are perfectly valid. So, what I suggest is to have the functions as they are now, e.g. groupReduceOnNeighbors and to add a groupReduceOnNeighbors(blablaSameArguments, boolean useJoinHints). That way, the user can decide whether they'd like to trade speed for a program that actually

Re: Join hints for the Gelly functions

2015-08-22 Thread Vasiliki Kalavri
Hey, I agree with Martin on this. It's the optimizer's job to decide the join strategy. Maybe the join hint worked on 99% of your cases, but we can't simply generalize this for all datasets and algorithms and hard-code a joint hint that assumes that the vertex set is always much smaller than the

Re: Join hints for the Gelly functions

2015-08-22 Thread Martin Junghanns
Hi, I guess enforcing a Join Strategy by default is not the best option since you can't assume what the user did before actually calling the Gelly functions and how the data looks like (maybe its one of the 1% graphs where the relation is the other way around or the vertex data set is very la

Join hints for the Gelly functions

2015-08-22 Thread Andra Lungu
Hey everyone, When coding for my thesis, I observed that half of the current Gelly functions (the ones that use join operators) fail on a cluster environment with the following exception: java.lang.IllegalArgumentException: Too few memory segments provided. Hash Join needs at least 33 memory segm