Hi Everyone,

Here's the Scala code for generating the EdgeRDD, VertexRDD, and Graph:

//Generate a mapping of vertex (edge) names to VertexIds
val vertexNameToIdRDD = rawEdgeRDD.flatMap(x =>
Seq(x._1.src,x._1.dst)).distinct.zipWithUniqueId.cache

//Generate VertexRDD with vertex data (in my case, a custom VertexValue
object)
val vertexRDD = vertexNameToIdRDD.map{case(x,id) => (id,new
VertexValue(x))};

//Replace edge names with corresponding VertexId values and insert custom
EdgeValue ev
val edgeRDD = rawEdgeRDD.map{ case(e,ev) =>
(e.src,(e.dst,ev))}.join(vertexNametoIdRDD).
                    map{ case((src, ((dst,ev),sid))) => (dst,
(sid,ev))}.join(vertexNametoIdRDD).
                    map{ case((dst, ((sid,ev),did))) => new Edge(sid,did,ev)
}

val graph = Graph(vertexRDD,edgeRDD)
                    
Somewhere in here there appears to be one or more issues that results in a
Graph with non-deterministic Graph.triplets and
Graph.collectNeighbors(EdgeDirection.Either) output.

Please let me know if you can think of any errors in this approach.

Thanks

--John



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Non-Deterministic-Graph-Building-tp22638p22647.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to