Re: Performance problem on collect

Emmanuel Castanier Tue, 19 Aug 2014 01:19:09 -0700

Thanks for your answer.
In my case, that’s sad cause we have only 60 entries in the final RDD, I was 
thinking it will be fast to get the needed one.



Le 19 août 2014 à 09:58, Sean Owen <so...@cloudera.com> a écrit :

> You can use the function lookup() to accomplish this too; it may be a
> bit faster.
> 
> It will never be efficient like a database lookup since this is
> implemented by scanning through all of the data. There is no index or
> anything.
> 
> On Tue, Aug 19, 2014 at 8:43 AM, Emmanuel Castanier
> <emmanuel.castan...@gmail.com> wrote:
>> Hi all,
>> 
>> I’m totally newbie on Spark, so my question may be a dumb one.
>> I tried Spark to compute values, on this side all works perfectly (and it's 
>> fast :) ).
>> 
>> At the end of the process, I have an RDD with Key(String)/Values(Array
>> of String), on this I want to get only one entry like this :
>> 
>> myRdd.filter(t => t._1.equals(param))
>> 
>> If I make a collect to get the only « tuple » , It takes about 12 seconds to 
>> execute, I imagine that’s because Spark may be used differently...
>> 
>> Best regards,
>> 
>> Emmanuel
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Performance problem on collect

Reply via email to