Re: spark on disk executions

Sean Owen Tue, 19 Aug 2014 10:14:21 -0700

Spark does not require that data sets fit in memory to begin with.
Yes, there's nothing inherently problematic about processing 1TB data
with a lot less than 1TB of cluster memory.


You probably want to read:
http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence



On Tue, Aug 19, 2014 at 5:38 PM, Oleg Ruchovets <oruchov...@gmail.com> wrote:
> Hi ,
>    We have ~ 1TB of data to process , but our cluster doesn't have
> sufficient memory for such data set. ( we have 5-10 machine cluster).
> Is it possible to process  1TB data using ON DISK options using spark?
>
> If yes where can I read about the configuration for ON DISK executions.
>
>
> Thanks
> Oleg.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: spark on disk executions

Reply via email to