Re: How to enforce RDD to be cached?

shahab Wed, 03 Dec 2014 10:05:51 -0800

Daniel and Paolo, thanks for the comments.

best,
/Shahab


On Wed, Dec 3, 2014 at 3:12 PM, Paolo Platter <paolo.plat...@agilelab.it>
wrote:

>  Yes,
>
>  otherwise you can try:
>
>  rdd.cache().count()
>
>  and then run your benchmark
>
>  Paolo
>
>   *Da:* Daniel Darabos <daniel.dara...@lynxanalytics.com>
> *Data invio:* ‎mercoledì‎ ‎3‎ ‎dicembre‎ ‎2014 ‎12‎:‎28
> *A:* shahab <shahab.mok...@gmail.com>
> *Cc:* user@spark.apache.org
>
>
>
> On Wed, Dec 3, 2014 at 10:52 AM, shahab <shahab.mok...@gmail.com> wrote:
>
>> Hi,
>>
>>  I noticed that rdd.cache() is not happening immediately rather due to
>> lazy feature of Spark, it is happening just at the moment  you perform some
>> map/reduce actions. Is this true?
>>
>
>  Yes, this is correct.
>
>   If this is the case, how can I enforce Spark to cache immediately at
>> its cache() statement? I need this to perform some benchmarking and I need
>> to separate rdd caching and rdd transformation/action processing time.
>>
>
>  The typical solution I think is to run rdd.foreach(_ => ()) to trigger a
> calculation.
>

Re: How to enforce RDD to be cached?

Reply via email to