Re: Performance problem on collect

2014-08-19 Thread Emmanuel Castanier
It did the job. Thanks. :) Le 19 août 2014 à 10:20, Sean Owen a écrit : > In that case, why not collectAsMap() and have the whole result as a > simple Map in memory? then lookups are trivial. RDDs aren't > distributed maps. > > On Tue, Aug 19, 2014 at 9:17 AM, Emmanuel Castanier > wrote: >> Th

Re: Performance problem on collect

2014-08-19 Thread Sean Owen
In that case, why not collectAsMap() and have the whole result as a simple Map in memory? then lookups are trivial. RDDs aren't distributed maps. On Tue, Aug 19, 2014 at 9:17 AM, Emmanuel Castanier wrote: > Thanks for your answer. > In my case, that’s sad cause we have only 60 entries in the fina

Re: Performance problem on collect

2014-08-19 Thread Emmanuel Castanier
Thanks for your answer. In my case, that’s sad cause we have only 60 entries in the final RDD, I was thinking it will be fast to get the needed one. Le 19 août 2014 à 09:58, Sean Owen a écrit : > You can use the function lookup() to accomplish this too; it may be a > bit faster. > > It will n

Re: Performance problem on collect

2014-08-19 Thread Sean Owen
You can use the function lookup() to accomplish this too; it may be a bit faster. It will never be efficient like a database lookup since this is implemented by scanning through all of the data. There is no index or anything. On Tue, Aug 19, 2014 at 8:43 AM, Emmanuel Castanier wrote: > Hi all, >

Performance problem on collect

2014-08-19 Thread Emmanuel Castanier
Hi all, I’m totally newbie on Spark, so my question may be a dumb one. I tried Spark to compute values, on this side all works perfectly (and it's fast :) ). At the end of the process, I have an RDD with Key(String)/Values(Array of String), on this I want to get only one entry like this : myRdd