Yifan LI <iamyifa...@gmail.com> writes:
> Maybe you could get the vertex, for instance, which id is 80, by using:
>
> graph.vertices.filter{case(id, _) => id==80}.collect
>
> but I am not sure this is the exactly efficient way.(it will scan the whole 
> table? if it can not get benefit from index of VertexRDD table)

Until IndexedRDD is merged, a scan and collect is the best officially supported 
way. PairRDDFunctions.lookup does this under the hood as well.

However, it's possible to use the VertexRDD's hash index to do a much more 
efficient lookup. Note that these APIs may change, since VertexPartitionBase 
and its subclasses are private[graphx].

You can access the partitions of a VertexRDD using VertexRDD#partitionsRDD, and 
each partition has VertexPartitionBase#isDefined and VertexPartitionBase#apply. 
Putting it all together:

    val verts: VertexRDD[_] = ...
    val targetVid: VertexId = 80L
    val result = verts.partitionsRDD.flatMap { part =>
      if (part.isDefined(targetVid)) Some(part(targetVid))
      else None
    }.collect.head

Once IndexedRDD [1] is merged, it will provide this functionality using 
verts.get(targetVid). Its implementation of get also uses the hash partitioner 
to run only one task [2].

Ankur

[1] https://issues.apache.org/jira/browse/SPARK-2365
[2] 
https://github.com/ankurdave/spark/blob/IndexedRDD/core/src/main/scala/org/apache/spark/rdd/IndexedRDDLike.scala#L89

Reply via email to