Re: Spark Docker image with added packages

2024-10-15 Thread karan alang
Hi Nimrod, One approach would be to identify required additional jars and include them in the Docker file (path /opt/spark/jars) for the spark image .. that approach worked for me. Alternately you might need to add the packages in the SparkApplication yaml HTH, Karan Alang On Tue, Oct 15, 2024 at

Re: Spark Docker image with added packages

2024-10-15 Thread Mich Talebzadeh
Hi Nimrod, This is a method I used back in August 2023 (attached) to build the dockerfile. A year old but I think it is still valid. In my approach, using multi-stage builds for Python dependencies is a good way to keep the docker image lean. For Spark JARs, you can use a similar strategy to ensur

Re: Spark Docker image with added packages

2024-10-15 Thread Damien Hawes
Herewith a more fleshed out example: An example of a *build.gradle.kts* file: plugins { id("java") } val sparkJarsDir = objects.directoryProperty().convention(layout.buildDirectory.dir("sparkJars")) repositories { mavenCentral() } val sparkJars: Configuration by configurations.creating

Re: Spark Docker image with added packages

2024-10-15 Thread Damien Hawes
The simplest solution that I have found in solving this was to use Gradle (or Maven, if you prefer), and list the dependencies that I want copied to $SPARK_HOME/jars as project dependencies. Summary of steps to follow: 1. Using your favourite build tool, declare a dependency on your required pack

Spark Docker image with added packages

2024-10-15 Thread Nimrod Ofek
Hi all, I am creating a base Spark image that we are using internally. We need to add some packages to the base image: spark:3.5.1-scala2.12-java17-python3-r-ubuntu Of course I do not want to Start Spark with --packages "..." - as it is not efficient at all - I would like to add the needed jars t