The table will be cached but 10GB (Most likely more) would be on disk. You
can check that in the storage tab in shark application.
Java out of memory could be as your worker memory is too low or memory
allocated to Shark is too low.
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics
To answer your first question, caching in Spark is lazy, meaning that Spark
will not actually try to cache the RDD you've targeted until you take some
sort of action on that RDD (like a count).
That might be why you don't see any error at first.
On Thu, May 8, 2014 at 2:46 AM, vinay Bajaj wrote