andy petrella <[email protected]> writes: > Oh I was almost sure that lookup was optimized using the partition info
It does use the partitioner to run only one task, but within that task it has to scan the entire partition: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala#L710 Ankur
