Thanks Michael for explaination. Actually I tried caching the RDD and making table on it. But the performance for cacheTable was 3X better than caching RDD. Now I know why it is better. But is it possible to add the support for persistence level into cacheTable itself like RDD. May be it is not related, but on the same size of data set, when I use cacheTable I have to specify larger executor memory than I need in case of caching RDD. Although in the storage tab on status web UI, the memory footprint is almost same 58.3 GB in cacheTable and 59.7GB in cache RDD. Is it possible that there is some memory leak or cacheTable works differently and thus require higher memory. The difference is 5GB per executor for the dataset of size 122 GB.
Thanks, Gurvinder On 08/01/2014 04:42 AM, Michael Armbrust wrote: > cacheTable uses a special columnar caching technique that is > optimized for SchemaRDDs. It something similar to MEMORY_ONLY_SER > but not quite. You can specify the persistence level on the > SchemaRDD itself and register that as a temporary table, however it > is likely you will not get as good performance. > > > On Thu, Jul 31, 2014 at 6:16 AM, Gurvinder Singh > <gurvinder.si...@uninett.no <mailto:gurvinder.si...@uninett.no>> > wrote: > > Hi, > > I am wondering how can I specify the persistence level in > cacheTable. As it is takes only table name as parameter. It should > be possible to specify the persistence level. > > - Gurvinder > >