In that case, why not collectAsMap() and have the whole result as a simple Map in memory? then lookups are trivial. RDDs aren't distributed maps.
On Tue, Aug 19, 2014 at 9:17 AM, Emmanuel Castanier <[email protected]> wrote: > Thanks for your answer. > In my case, that’s sad cause we have only 60 entries in the final RDD, I was > thinking it will be fast to get the needed one. > > > Le 19 août 2014 à 09:58, Sean Owen <[email protected]> a écrit : > >> You can use the function lookup() to accomplish this too; it may be a >> bit faster. >> >> It will never be efficient like a database lookup since this is >> implemented by scanning through all of the data. There is no index or >> anything. >> >> On Tue, Aug 19, 2014 at 8:43 AM, Emmanuel Castanier >> <[email protected]> wrote: >>> Hi all, >>> >>> I’m totally newbie on Spark, so my question may be a dumb one. >>> I tried Spark to compute values, on this side all works perfectly (and it's >>> fast :) ). >>> >>> At the end of the process, I have an RDD with Key(String)/Values(Array >>> of String), on this I want to get only one entry like this : >>> >>> myRdd.filter(t => t._1.equals(param)) >>> >>> If I make a collect to get the only « tuple » , It takes about 12 seconds >>> to execute, I imagine that’s because Spark may be used differently... >>> >>> Best regards, >>> >>> Emmanuel >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
