Really, the analog of a Mapper is not map(), but mapPartitions(). Instead of:

rdd.map(yourFunction)

... you can run setup code before mapping a bunch of records, and
after, like so:

rdd.mapPartitions { partition =>
   // Some setup code here
   partition.map(yourfunction)
   // Some cleanup code here
}

You couldn't share state across Mappers, or Mappers and Reducers in
Hadoop. (At least there was no direct way.) Same here. But you can
maintain state across many map calls.

On Wed, Jul 30, 2014 at 6:07 PM, Kevin <kevin.macksa...@gmail.com> wrote:
> Hi,
>
> Is it possible to maintain state inside a Spark map function? With Hadoop
> MapReduce, Mappers and Reducers are classes that can have their own state
> using instance variables. Can this be done with Spark? Are there any
> examples?
>
> Most examples I have seen do a simple operating on the value passed into the
> map function and then pass it along to the reduce function.
>
> Thanks in advance.
>
> -Kevin

Reply via email to