Something like this:
```
object Model {
@transient lazy val modelObject = new ModelLoader("model-filename")
def get() = modelObject
}
object SparkJob {
def main(args: Array[String]) = {
sc.addFile("s3://bucket/path/model-filename")
sc.parallelize(…).map(test => {
Mode
as an alternative
```
spark-submit --files
```
the files will be put on each executor in the working directory, so you can
then load it alongside your `map` function
Behind the scene it uses `SparkContext.addFile` method that you can use too
https://github.com/apache/spark/blob/master/core/src/m
Maybe load the model on each executor’s disk and load it from there? Depending
on how you use the data/model, using something like Livy and sharing the same
connection may help?
From: Naveen Swamy [mailto:mnnav...@gmail.com]
Sent: Wednesday, September 27, 2017 9:08 PM
To: user@spark.apache.org
S
Hello,
maybe broadcast can help you here. [1]
You can load the model once on the driver and then broadcast it to the
workers with `bc_model = sc.broadcast(model)`? You can access the model in
the map function with `bc_model.value()`.
Best
Eike
[1]
https://spark.apache.org/docs/latest/api/pytho