Re: Maintain state outside rdd

2016-01-27 Thread Krishna
Executing on worker is fine since state I would like to maintain is specific to a partition. Accumulators, being global counters, wont work. On Wednesday, January 27, 2016, Jakob Odersky wrote: > Be careful with mapPartitions though, since it is executed on worker > nodes, you may not see side-e

Re: Maintain state outside rdd

2016-01-27 Thread Jakob Odersky
Be careful with mapPartitions though, since it is executed on worker nodes, you may not see side-effects locally. Is it not possible to represent your state changes as part of your rdd's transformations? I.e. return a tuple containing the modified data and some accumulated state. If that really do

Re: Maintain state outside rdd

2016-01-27 Thread Krishna
mapPartitions(...) seems like a good candidate, since, it's processing over a partition while maintaining state across map(...) calls. On Wed, Jan 27, 2016 at 6:58 PM, Ted Yu wrote: > Initially I thought of using accumulators. > > Since state change can be anything, how about storing state in ex

Re: Maintain state outside rdd

2016-01-27 Thread Ted Yu
Initially I thought of using accumulators. Since state change can be anything, how about storing state in external NoSQL store such as hbase ? On Wed, Jan 27, 2016 at 6:37 PM, Krishna wrote: > Thanks; What I'm looking for is a way to see changes to the state of some > variable during map(..) ph

Re: Maintain state outside rdd

2016-01-27 Thread Krishna
Thanks; What I'm looking for is a way to see changes to the state of some variable during map(..) phase. I simplified the scenario in my example by making row_index() increment "incr" by 1 but in reality, the change to "incr" can be anything. On Wed, Jan 27, 2016 at 6:25 PM, Ted Yu wrote: > Have

Re: Maintain state outside rdd

2016-01-27 Thread Ted Yu
Have you looked at this method ? * Zips this RDD with its element indices. The ordering is first based on the partition index ... def zipWithIndex(): RDD[(T, Long)] = withScope { On Wed, Jan 27, 2016 at 6:03 PM, Krishna wrote: > Hi, > > I've a scenario where I need to maintain state that i