ated and laziness is not terribly useful. Ideal for
> massive in-memory cluster computing yes - but iterative... ? not sure. I have
> that book "Functional Programming in Scala" and I hope to read it someday and
> enrich my understanding here.
>
> Subject: Re: Modifying
seful. Ideal for massive
in-memory cluster computing yes - but iterative... ? not sure. I have that book
"Functional Programming in Scala" and I hope to read it someday and enrich my
understanding here.
Subject: Re: Modifying an RDD in forEach
From: mohitja...@gmail.com
Date
Ron,
“appears to be working” might be true when there are no failures. on large
datasets being processed on a large number of machines, failures of several
types(server, network, disk etc) can happen. At that time, Spark will not
“know” that you changed the RDD in-place and will use any version
You'll benefit by viewing Matei's talk in Yahoo on Spark internals and how
it optimizes execution of iterative jobs.
Simple answer is
1. Spark doesn't materialize RDD when you do an iteration but lazily
captures the transformation functions in RDD.(only function and closure ,
no data operation actu