Re: Performance problem on collect

Sean Owen Tue, 19 Aug 2014 01:22:09 -0700

In that case, why not collectAsMap() and have the whole result as a
simple Map in memory? then lookups are trivial. RDDs aren't
distributed maps.


On Tue, Aug 19, 2014 at 9:17 AM, Emmanuel Castanier
<[email protected]> wrote:
> Thanks for your answer.
> In my case, that’s sad cause we have only 60 entries in the final RDD, I was 
> thinking it will be fast to get the needed one.
>
>
> Le 19 août 2014 à 09:58, Sean Owen <[email protected]> a écrit :
>
>> You can use the function lookup() to accomplish this too; it may be a
>> bit faster.
>>
>> It will never be efficient like a database lookup since this is
>> implemented by scanning through all of the data. There is no index or
>> anything.
>>
>> On Tue, Aug 19, 2014 at 8:43 AM, Emmanuel Castanier
>> <[email protected]> wrote:
>>> Hi all,
>>>
>>> I’m totally newbie on Spark, so my question may be a dumb one.
>>> I tried Spark to compute values, on this side all works perfectly (and it's 
>>> fast :) ).
>>>
>>> At the end of the process, I have an RDD with Key(String)/Values(Array
>>> of String), on this I want to get only one entry like this :
>>>
>>> myRdd.filter(t => t._1.equals(param))
>>>
>>> If I make a collect to get the only « tuple » , It takes about 12 seconds 
>>> to execute, I imagine that’s because Spark may be used differently...
>>>
>>> Best regards,
>>>
>>> Emmanuel
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Performance problem on collect

Reply via email to