If you’re just relying on the side effect of |setup()| and |cleanup()| then I think this trick is OK and pretty cleaner.

But if |setup()| returns, say, a DB connection, then the |map(...)| part and |cleanup()| can’t get the connection object.

On 11/14/14 1:20 PM, Jianshi Huang wrote:

So can I write it like this?

rdd.mapPartition(i => setup(); i).map(...).mapPartition(i => cleanup(); i)

So I don't need to mess up the logic and still can use map, filter and other transformations for RDD.

Jianshi

On Fri, Nov 14, 2014 at 12:20 PM, Cheng Lian <lian.cs....@gmail.com <mailto:lian.cs....@gmail.com>> wrote:

    If you’re looking for executor side setup and cleanup functions,
    there ain’t any yet, but you can achieve the same semantics via
    |RDD.mapPartitions|.

    Please check the “setup() and cleanup” section of this blog from
    Cloudera for details:
    
http://blog.cloudera.com/blog/2014/09/how-to-translate-from-mapreduce-to-apache-spark/

    On 11/14/14 10:44 AM, Dai, Kevin wrote:

    HI, all

    Is there setup and cleanup function as in hadoop mapreduce in
    spark which does some initialization and cleanup work?

    Best Regards,

    Kevin.

    ​




--
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

Reply via email to