Re: Doubts regarding Shark

2014-05-13 Thread Mayur Rustagi
The table will be cached but 10GB (Most likely more) would be on disk. You can check that in the storage tab in shark application. Java out of memory could be as your worker memory is too low or memory allocated to Shark is too low. Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics

Re: Doubts regarding Shark

2014-05-12 Thread Nicholas Chammas
To answer your first question, caching in Spark is lazy, meaning that Spark will not actually try to cache the RDD you've targeted until you take some sort of action on that RDD (like a count). That might be why you don't see any error at first. On Thu, May 8, 2014 at 2:46 AM, vinay Bajaj wrote