On Thu, Jul 31, 2014 at 2:11 AM, Tobias Pfeiffer <t...@preferred.jp> wrote: > rdd.mapPartitions { partition => > // Some setup code here > val result = partition.map(yourfunction) > > // Some cleanup code here > result > }
Yes, I realized that after I hit send. You definitely have to store and return the result from the mapping! > rdd.mapPartitions { partition => > if (!partition.isEmpty) { > > // Some setup code here > partition.map(item => { > val output = yourfunction(item) > if (!partition.hasNext) { > // Some cleanup code here > } > output > }) > } else { > // return an empty Iterator of your return type > } > } Great point, yeah. If you knew the number of values were small you could collect them and process locally, but this is the right general way to do it.