Re: Memory requirement of using Spark

2014-10-24 Thread jian.t
Thanks Akhil. I searched DISK_AND_MEMORY_SER trying to figure out how it works, and I cannot find any documentation on that. Do you have a link for that? If what DISK_AND_MEMORY_SER does is reading and writing to the disk with some memory caching, does that mean the output will be written to disk

Re: Memory requirement of using Spark

2014-10-24 Thread Akhil Das
You can use spark-sql to solve this usecase, and you don't need to have 800G of memory (but of course if you are caching the whole data into memory, then you would need it.). You can persist the data by setting DISK_AND_MEMORY_SER property if you don't want to bring whole data into memory, in this