Re: spark on disk executions

2014-08-19 Thread Sean Owen
Spark does not require that data sets fit in memory to begin with. Yes, there's nothing inherently problematic about processing 1TB data with a lot less than 1TB of cluster memory. You probably want to read: http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence On Tue, Aug

spark on disk executions

2014-08-19 Thread Oleg Ruchovets
Hi , We have ~ 1TB of data to process , but our cluster doesn't have sufficient memory for such data set. ( we have 5-10 machine cluster). Is it possible to process 1TB data using ON DISK options using spark? If yes where can I read about the configuration for ON DISK executions. Thanks Oleg