Have you looked at this method ?
* Zips this RDD with its element indices. The ordering is first based on
the partition index
...
def zipWithIndex(): RDD[(T, Long)] = withScope {
On Wed, Jan 27, 2016 at 6:03 PM, Krishna <[email protected]> wrote:
> Hi,
>
> I've a scenario where I need to maintain state that is local to a worker
> that can change during map operation. What's the best way to handle this?
>
> *incr = 0*
> *def row_index():*
> * global incr*
> * incr += 1*
> * return incr*
>
> *out_rdd = inp_rdd.map(lambda x: row_index()).collect()*
>
> "out_rdd" in this case only contains 1s but I would like it to have index
> of each row in "inp_rdd".
>