Yifan LI <iamyifa...@gmail.com> writes: > Maybe you could get the vertex, for instance, which id is 80, by using: > > graph.vertices.filter{case(id, _) => id==80}.collect > > but I am not sure this is the exactly efficient way.(it will scan the whole > table? if it can not get benefit from index of VertexRDD table)
Until IndexedRDD is merged, a scan and collect is the best officially supported way. PairRDDFunctions.lookup does this under the hood as well. However, it's possible to use the VertexRDD's hash index to do a much more efficient lookup. Note that these APIs may change, since VertexPartitionBase and its subclasses are private[graphx]. You can access the partitions of a VertexRDD using VertexRDD#partitionsRDD, and each partition has VertexPartitionBase#isDefined and VertexPartitionBase#apply. Putting it all together: val verts: VertexRDD[_] = ... val targetVid: VertexId = 80L val result = verts.partitionsRDD.flatMap { part => if (part.isDefined(targetVid)) Some(part(targetVid)) else None }.collect.head Once IndexedRDD [1] is merged, it will provide this functionality using verts.get(targetVid). Its implementation of get also uses the hash partitioner to run only one task [2]. Ankur [1] https://issues.apache.org/jira/browse/SPARK-2365 [2] https://github.com/ankurdave/spark/blob/IndexedRDD/core/src/main/scala/org/apache/spark/rdd/IndexedRDDLike.scala#L89