Hi,

Any thoughts on this? Thanks.

-Suren



On Thu, Apr 3, 2014 at 8:27 AM, Surendranauth Hiraman <
suren.hira...@velos.io> wrote:

> Hi,
>
> I know if we call persist with the right options, we can have Spark
> persist an RDD's data on disk.
>
> I am wondering what happens in intermediate operations that could
> conceivably create large collections/Sequences, like GroupBy and shuffling.
>
> Basically, one part of the question is when is disk used internally?
>
> And is calling persist() on the RDD returned by such transformations what
> let's it know to use disk in those situations? Trying to understand if
> persist() is applied during the transformation or after it.
>
> Thank you.
>
>
> SUREN HIRAMAN, VP TECHNOLOGY
> Velos
> Accelerating Machine Learning
>
> 440 NINTH AVENUE, 11TH FLOOR
> NEW YORK, NY 10001
> O: (917) 525-2466 ext. 105
> F: 646.349.4063
> E: suren.hiraman@v <suren.hira...@sociocast.com>elos.io
> W: www.velos.io
>
>


-- 

SUREN HIRAMAN, VP TECHNOLOGY
Velos
Accelerating Machine Learning

440 NINTH AVENUE, 11TH FLOOR
NEW YORK, NY 10001
O: (917) 525-2466 ext. 105
F: 646.349.4063
E: suren.hiraman@v <suren.hira...@sociocast.com>elos.io
W: www.velos.io

Reply via email to