Hi, Any thoughts on this? Thanks.
-Suren On Thu, Apr 3, 2014 at 8:27 AM, Surendranauth Hiraman < suren.hira...@velos.io> wrote: > Hi, > > I know if we call persist with the right options, we can have Spark > persist an RDD's data on disk. > > I am wondering what happens in intermediate operations that could > conceivably create large collections/Sequences, like GroupBy and shuffling. > > Basically, one part of the question is when is disk used internally? > > And is calling persist() on the RDD returned by such transformations what > let's it know to use disk in those situations? Trying to understand if > persist() is applied during the transformation or after it. > > Thank you. > > > SUREN HIRAMAN, VP TECHNOLOGY > Velos > Accelerating Machine Learning > > 440 NINTH AVENUE, 11TH FLOOR > NEW YORK, NY 10001 > O: (917) 525-2466 ext. 105 > F: 646.349.4063 > E: suren.hiraman@v <suren.hira...@sociocast.com>elos.io > W: www.velos.io > > -- SUREN HIRAMAN, VP TECHNOLOGY Velos Accelerating Machine Learning 440 NINTH AVENUE, 11TH FLOOR NEW YORK, NY 10001 O: (917) 525-2466 ext. 105 F: 646.349.4063 E: suren.hiraman@v <suren.hira...@sociocast.com>elos.io W: www.velos.io