Thanks!

Ya that's what I'm doing so far, but I wanted to see if it's possible to keep 
the tuples inside Spark for fault tolerance purposes.

-A
From: Mark Hamstra [mailto:[email protected]]
Sent: March-28-14 10:45 AM
To: [email protected]
Subject: Re: function state lost when next RDD is processed

As long as the amount of state being passed is relatively small, it's probably 
easiest to send it back to the driver and to introduce it into RDD 
transformations as the zero value of a fold.

On Fri, Mar 28, 2014 at 7:12 AM, Adrian Mocanu 
<[email protected]<mailto:[email protected]>> wrote:
I'd like to resurrect this thread since I don't have an answer yet.

From: Adrian Mocanu 
[mailto:[email protected]<mailto:[email protected]>]
Sent: March-27-14 10:04 AM
To: [email protected]<mailto:[email protected]>
Subject: function state lost when next RDD is processed

Is there a way to pass a custom function to spark to run it on the entire 
stream? For example, say I have a function which sums up values in each RDD and 
then across RDDs.

I've tried with map, transform, reduce. They all apply my sum function on 1 
RDD. When the next RDD comes the function starts from 0 so the sum of the 
previous RDD is lost.

Does Spark support a way of passing a custom function so that its state is 
preserved across RDDs and not only within RDD?

Thanks
-Adrian


Reply via email to