Spark does not require that data sets fit in memory to begin with. Yes, there's nothing inherently problematic about processing 1TB data with a lot less than 1TB of cluster memory.
You probably want to read: http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence On Tue, Aug 19, 2014 at 5:38 PM, Oleg Ruchovets <oruchov...@gmail.com> wrote: > Hi , > We have ~ 1TB of data to process , but our cluster doesn't have > sufficient memory for such data set. ( we have 5-10 machine cluster). > Is it possible to process 1TB data using ON DISK options using spark? > > If yes where can I read about the configuration for ON DISK executions. > > > Thanks > Oleg. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org