Re: Maintain state outside rdd

Ted Yu Wed, 27 Jan 2016 18:59:01 -0800

Initially I thought of using accumulators.

Since state change can be anything, how about storing state in external
NoSQL store such as hbase ?


On Wed, Jan 27, 2016 at 6:37 PM, Krishna <research...@gmail.com> wrote:

> Thanks; What I'm looking for is a way to see changes to the state of some
> variable during map(..) phase.
> I simplified the scenario in my example by making row_index() increment
> "incr" by 1 but in reality, the change to "incr" can be anything.
>
> On Wed, Jan 27, 2016 at 6:25 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> Have you looked at this method ?
>>
>>    * Zips this RDD with its element indices. The ordering is first based
>> on the partition index
>> ...
>>   def zipWithIndex(): RDD[(T, Long)] = withScope {
>>
>> On Wed, Jan 27, 2016 at 6:03 PM, Krishna <research...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I've a scenario where I need to maintain state that is local to a worker
>>> that can change during map operation. What's the best way to handle this?
>>>
>>> *incr = 0*
>>> *def row_index():*
>>> *  global incr*
>>> *  incr += 1*
>>> *  return incr*
>>>
>>> *out_rdd = inp_rdd.map(lambda x: row_index()).collect()*
>>>
>>> "out_rdd" in this case only contains 1s but I would like it to have
>>> index of each row in "inp_rdd".
>>>
>>
>>
>

Re: Maintain state outside rdd

Reply via email to