On Wed, Dec 3, 2014 at 10:52 AM, shahab <shahab.mok...@gmail.com> wrote:
> Hi, > > I noticed that rdd.cache() is not happening immediately rather due to lazy > feature of Spark, it is happening just at the moment you perform some > map/reduce actions. Is this true? > Yes, this is correct. If this is the case, how can I enforce Spark to cache immediately at its > cache() statement? I need this to perform some benchmarking and I need to > separate rdd caching and rdd transformation/action processing time. > The typical solution I think is to run rdd.foreach(_ => ()) to trigger a calculation.