Re: Modifying an RDD in forEach

2014-12-06 Thread Mohit Jaggi
ated and laziness is not terribly useful. Ideal for > massive in-memory cluster computing yes - but iterative... ? not sure. I have > that book "Functional Programming in Scala" and I hope to read it someday and > enrich my understanding here. > > Subject: Re: Modifying

RE: Modifying an RDD in forEach

2014-12-06 Thread Ron Ayoub
seful. Ideal for massive in-memory cluster computing yes - but iterative... ? not sure. I have that book "Functional Programming in Scala" and I hope to read it someday and enrich my understanding here. Subject: Re: Modifying an RDD in forEach From: mohitja...@gmail.com Date

Re: Modifying an RDD in forEach

2014-12-06 Thread Mohit Jaggi
Ron, “appears to be working” might be true when there are no failures. on large datasets being processed on a large number of machines, failures of several types(server, network, disk etc) can happen. At that time, Spark will not “know” that you changed the RDD in-place and will use any version

Re: Modifying an RDD in forEach

2014-12-06 Thread Mayur Rustagi
You'll benefit by viewing Matei's talk in Yahoo on Spark internals and how it optimizes execution of iterative jobs. Simple answer is 1. Spark doesn't materialize RDD when you do an iteration but lazily captures the transformation functions in RDD.(only function and closure , no data operation actu