[
https://issues.apache.org/jira/browse/SPARK-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15722170#comment-15722170
]
Michael Schmeißer commented on SPARK-650:
-----------------------------------------
A singleton is not really feasible if additional information is required which
is known (or determined) by the driver and thus needs to be sent to the
executors for the initialization to happen. In this case, the options are 1)
use some side-channel that is "magically" inferred by the executor, 2) use an
empty RDD, repartition it to the number of executors and run mapPartitions on
it, 3) piggy-back the JavaSerializer to run the initialization before any
function is called or 4) require every function which may need the resource to
initialize it on its own.
Each of these options has significant drawbacks in my opinion. While 4 sounds
good for most cases, it has some cons which I've described earlier (my comment
from Oct 16) and make it unfeasible for our use-case. Option 1 might be
possible, but the data flow wouldn't be all that obvious. Right now, we go with
a mix of option 2 and 3 (try to determine the number of executors and if you
can't, hijack the serializer), but really, this is hacked and might break in
future releases of Spark.
> Add a "setup hook" API for running initialization code on each executor
> -----------------------------------------------------------------------
>
> Key: SPARK-650
> URL: https://issues.apache.org/jira/browse/SPARK-650
> Project: Spark
> Issue Type: New Feature
> Components: Spark Core
> Reporter: Matei Zaharia
> Priority: Minor
>
> Would be useful to configure things like reporting libraries
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]