Hi Nimrod,
One approach would be to identify required additional jars and include them
in the Docker file (path /opt/spark/jars) for the spark image ..
that approach worked for me.
Alternately you might need to add the packages in the SparkApplication
yaml
HTH,
Karan Alang
On Tue, Oct 15, 2024 at
Hi Nimrod,
This is a method I used back in August 2023 (attached) to build the
dockerfile. A year old but I think it is still valid. In my approach, using
multi-stage builds for Python dependencies is a good way to keep the docker
image lean. For Spark JARs, you can use a similar strategy to ensur
Herewith a more fleshed out example:
An example of a *build.gradle.kts* file:
plugins {
id("java")
}
val sparkJarsDir =
objects.directoryProperty().convention(layout.buildDirectory.dir("sparkJars"))
repositories {
mavenCentral()
}
val sparkJars: Configuration by configurations.creating
The simplest solution that I have found in solving this was to use Gradle
(or Maven, if you prefer), and list the dependencies that I want copied to
$SPARK_HOME/jars as project dependencies.
Summary of steps to follow:
1. Using your favourite build tool, declare a dependency on your required
pack
Hi all,
I am creating a base Spark image that we are using internally.
We need to add some packages to the base image:
spark:3.5.1-scala2.12-java17-python3-r-ubuntu
Of course I do not want to Start Spark with --packages "..." - as it is not
efficient at all - I would like to add the needed jars t