Hey everyone, When coding for my thesis, I observed that half of the current Gelly functions (the ones that use join operators) fail on a cluster environment with the following exception:
java.lang.IllegalArgumentException: Too few memory segments provided. Hash Join needs at least 33 memory segments. This is because, in 99% of the cases, the vertex data set is significantly smaller than the edge data set. What I did to get rid of the error was the following: DataSet<Tuple2<Edge<K, EV>, Vertex<K, VV>>> edgesWithSources = edges .join(this.vertices, JoinOperatorBase.JoinHint.BROADCAST_HASH_SECOND).where(0).equalTo(0) In short, I added join hints. I believe this should also be in Gelly, in case someone bumps into the same problem somewhere in the future. What do you think?