Oh I was almost sure that lookup was optimized using the partition info
Le 29 juil. 2014 21:25, "Ankur Dave" <ankurd...@gmail.com> a écrit :

> Yifan LI <iamyifa...@gmail.com> writes:
> > Maybe you could get the vertex, for instance, which id is 80, by using:
> >
> > graph.vertices.filter{case(id, _) => id==80}.collect
> >
> > but I am not sure this is the exactly efficient way.(it will scan the
> whole table? if it can not get benefit from index of VertexRDD table)
>
> Until IndexedRDD is merged, a scan and collect is the best officially
> supported way. PairRDDFunctions.lookup does this under the hood as well.
>
> However, it's possible to use the VertexRDD's hash index to do a much more
> efficient lookup. Note that these APIs may change, since
> VertexPartitionBase and its subclasses are private[graphx].
>
> You can access the partitions of a VertexRDD using
> VertexRDD#partitionsRDD, and each partition has
> VertexPartitionBase#isDefined and VertexPartitionBase#apply. Putting it all
> together:
>
>     val verts: VertexRDD[_] = ...
>     val targetVid: VertexId = 80L
>     val result = verts.partitionsRDD.flatMap { part =>
>       if (part.isDefined(targetVid)) Some(part(targetVid))
>       else None
>     }.collect.head
>
> Once IndexedRDD [1] is merged, it will provide this functionality using
> verts.get(targetVid). Its implementation of get also uses the hash
> partitioner to run only one task [2].
>
> Ankur
>
> [1] https://issues.apache.org/jira/browse/SPARK-2365
> [2]
> https://github.com/ankurdave/spark/blob/IndexedRDD/core/src/main/scala/org/apache/spark/rdd/IndexedRDDLike.scala#L89
>

Reply via email to