Hi Everyone, Here's the Scala code for generating the EdgeRDD, VertexRDD, and Graph:
//Generate a mapping of vertex (edge) names to VertexIds val vertexNameToIdRDD = rawEdgeRDD.flatMap(x => Seq(x._1.src,x._1.dst)).distinct.zipWithUniqueId.cache //Generate VertexRDD with vertex data (in my case, a custom VertexValue object) val vertexRDD = vertexNameToIdRDD.map{case(x,id) => (id,new VertexValue(x))}; //Replace edge names with corresponding VertexId values and insert custom EdgeValue ev val edgeRDD = rawEdgeRDD.map{ case(e,ev) => (e.src,(e.dst,ev))}.join(vertexNametoIdRDD). map{ case((src, ((dst,ev),sid))) => (dst, (sid,ev))}.join(vertexNametoIdRDD). map{ case((dst, ((sid,ev),did))) => new Edge(sid,did,ev) } val graph = Graph(vertexRDD,edgeRDD) Somewhere in here there appears to be one or more issues that results in a Graph with non-deterministic Graph.triplets and Graph.collectNeighbors(EdgeDirection.Either) output. Please let me know if you can think of any errors in this approach. Thanks --John -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Non-Deterministic-Graph-Building-tp22638p22647.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org