If you’re just relying on the side effect of |setup()| and |cleanup()|
then I think this trick is OK and pretty cleaner.
But if |setup()| returns, say, a DB connection, then the |map(...)| part
and |cleanup()| can’t get the connection object.
On 11/14/14 1:20 PM, Jianshi Huang wrote:
So can I write it like this?
rdd.mapPartition(i => setup(); i).map(...).mapPartition(i => cleanup(); i)
So I don't need to mess up the logic and still can use map, filter and
other transformations for RDD.
Jianshi
On Fri, Nov 14, 2014 at 12:20 PM, Cheng Lian <lian.cs....@gmail.com
<mailto:lian.cs....@gmail.com>> wrote:
If you’re looking for executor side setup and cleanup functions,
there ain’t any yet, but you can achieve the same semantics via
|RDD.mapPartitions|.
Please check the “setup() and cleanup” section of this blog from
Cloudera for details:
http://blog.cloudera.com/blog/2014/09/how-to-translate-from-mapreduce-to-apache-spark/
On 11/14/14 10:44 AM, Dai, Kevin wrote:
HI, all
Is there setup and cleanup function as in hadoop mapreduce in
spark which does some initialization and cleanup work?
Best Regards,
Kevin.
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/