Forked, meaning, "different from the driver"? Spark will in general not
even execute your tasks on the same machine as your driver. The driver can
choose to execute a task locally in some cases.

You are creating non-daemon threads in your function? your function can and
should clean up after itself. Just use try-finally to shut down your pool.
Or you can consider whether you can just make daemon threads. There's no
separate mechanism; you just write this into your function.

I assume you're looking at something like foreachPartitions and mean
'mapper' by way of analogy to MapReduce. This works then. But if you really
mean mapPartitions, beware that it is not an action and is lazily
evaluated.

Also consider not parallelizing manually -- is there really a need for
that? it's much simpler to let Spark manage it if possible.


On Wed, Feb 18, 2015 at 8:26 AM, Kevin Burton <bur...@spinn3r.com> wrote:

> I want to map over a Cassandra table in Spark but my code that executes
> needs a shutdown() call to return any threads, release file handles, etc.
>
> Will spark always execute my mappers as a forked process? And if so how do
> I handle threads preventing the JVM from terminating.
>
> It would be nice if there was a way to clean up after yourself gracefully
> in map jobs but I don’t think that exists right now.
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
> <http://spinn3r.com>
>
>

Reply via email to