Thanks.. it works now. -Simon
On Thu, Jun 5, 2014 at 10:47 AM, Nick Pentreath <nick.pentre...@gmail.com> wrote: > Have you set the persistence level of the RDD to MEMORY_ONLY_SER ( > http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence)? > If you're calling cache, the default persistence level is MEMORY_ONLY so > that setting will have no impact. > > > On Thu, Jun 5, 2014 at 4:41 PM, Xu (Simon) Chen <xche...@gmail.com> wrote: > >> I have a working set larger than available memory, thus I am hoping to >> turn on rdd compression so that I can store more in-memory. Strangely it >> made no difference. The number of cached partitions, fraction cached, and >> size in memory remain the same. Any ideas? >> >> I confirmed that rdd compression wasn't on before and it was on for the >> second test. >> >> scala> sc.getConf.getAll foreach println >> ... >> (spark.rdd.compress,true) >> ... >> >> I haven't tried lzo vs snappy, but my guess is that either one should >> provide at least some benefit.. >> >> Thanks. >> -Simon >> >> >