Really, the analog of a Mapper is not map(), but mapPartitions(). Instead of:
rdd.map(yourFunction) ... you can run setup code before mapping a bunch of records, and after, like so: rdd.mapPartitions { partition => // Some setup code here partition.map(yourfunction) // Some cleanup code here } You couldn't share state across Mappers, or Mappers and Reducers in Hadoop. (At least there was no direct way.) Same here. But you can maintain state across many map calls. On Wed, Jul 30, 2014 at 6:07 PM, Kevin <kevin.macksa...@gmail.com> wrote: > Hi, > > Is it possible to maintain state inside a Spark map function? With Hadoop > MapReduce, Mappers and Reducers are classes that can have their own state > using instance variables. Can this be done with Spark? Are there any > examples? > > Most examples I have seen do a simple operating on the value passed into the > map function and then pass it along to the reduce function. > > Thanks in advance. > > -Kevin