Oh I was almost sure that lookup was optimized using the partition info Le 29 juil. 2014 21:25, "Ankur Dave" <ankurd...@gmail.com> a écrit :
> Yifan LI <iamyifa...@gmail.com> writes: > > Maybe you could get the vertex, for instance, which id is 80, by using: > > > > graph.vertices.filter{case(id, _) => id==80}.collect > > > > but I am not sure this is the exactly efficient way.(it will scan the > whole table? if it can not get benefit from index of VertexRDD table) > > Until IndexedRDD is merged, a scan and collect is the best officially > supported way. PairRDDFunctions.lookup does this under the hood as well. > > However, it's possible to use the VertexRDD's hash index to do a much more > efficient lookup. Note that these APIs may change, since > VertexPartitionBase and its subclasses are private[graphx]. > > You can access the partitions of a VertexRDD using > VertexRDD#partitionsRDD, and each partition has > VertexPartitionBase#isDefined and VertexPartitionBase#apply. Putting it all > together: > > val verts: VertexRDD[_] = ... > val targetVid: VertexId = 80L > val result = verts.partitionsRDD.flatMap { part => > if (part.isDefined(targetVid)) Some(part(targetVid)) > else None > }.collect.head > > Once IndexedRDD [1] is merged, it will provide this functionality using > verts.get(targetVid). Its implementation of get also uses the hash > partitioner to run only one task [2]. > > Ankur > > [1] https://issues.apache.org/jira/browse/SPARK-2365 > [2] > https://github.com/ankurdave/spark/blob/IndexedRDD/core/src/main/scala/org/apache/spark/rdd/IndexedRDDLike.scala#L89 >