Daniel and Paolo, thanks for the comments. best, /Shahab
On Wed, Dec 3, 2014 at 3:12 PM, Paolo Platter <paolo.plat...@agilelab.it> wrote: > Yes, > > otherwise you can try: > > rdd.cache().count() > > and then run your benchmark > > Paolo > > *Da:* Daniel Darabos <daniel.dara...@lynxanalytics.com> > *Data invio:* mercoledì 3 dicembre 2014 12:28 > *A:* shahab <shahab.mok...@gmail.com> > *Cc:* user@spark.apache.org > > > > On Wed, Dec 3, 2014 at 10:52 AM, shahab <shahab.mok...@gmail.com> wrote: > >> Hi, >> >> I noticed that rdd.cache() is not happening immediately rather due to >> lazy feature of Spark, it is happening just at the moment you perform some >> map/reduce actions. Is this true? >> > > Yes, this is correct. > > If this is the case, how can I enforce Spark to cache immediately at >> its cache() statement? I need this to perform some benchmarking and I need >> to separate rdd caching and rdd transformation/action processing time. >> > > The typical solution I think is to run rdd.foreach(_ => ()) to trigger a > calculation. >