Re: Spark Docker image with added packages

2024-10-17 Thread Ángel
Creating a custom classloader to load classes from those jars? El jue, 17 oct 2024, 19:47, Nimrod Ofek escribió: > > Hi, > > Thanks all for the replies. > > I am adding the Spark dev list as well - as I think this might be an issue > that needs to be addressed. > > The options presented here wil

Re: Spark Docker image with added packages

2024-10-17 Thread Damien Hawes
Hi, That's on you as the maintainer of the derived image to ensure that your added dependencies do not conflict with Spark's dependencies. Speaking from experience, there are several ways to achieve this: 1. Ensure you're using packages that contain shaded and relocated packages, if possible. 2.

Re: Spark Docker image with added packages

2024-10-17 Thread Nimrod Ofek
Hi, Thanks all for the replies. I am adding the Spark dev list as well - as I think this might be an issue that needs to be addressed. The options presented here will get the jars - but they don't help us with dependencies conflicts... For example - com.google.cloud.bigdataoss:gcs-connector:hado

Re: Spark Docker image with added packages

2024-10-16 Thread Nimrod Ofek
Hi, Thanks for the reply! How do you make sure you don't have conflicts with dependencies within Spark? Thanks! Nimrod On Tue, Oct 15, 2024 at 5:19 PM Damien Hawes wrote: > The simplest solution that I have found in solving this was to use Gradle > (or Maven, if you prefer), and list the depe

Re: Spark Docker image with added packages

2024-10-15 Thread karan alang
Hi Nimrod, One approach would be to identify required additional jars and include them in the Docker file (path /opt/spark/jars) for the spark image .. that approach worked for me. Alternately you might need to add the packages in the SparkApplication yaml HTH, Karan Alang On Tue, Oct 15, 2024 at

Re: Spark Docker image with added packages

2024-10-15 Thread Mich Talebzadeh
Hi Nimrod, This is a method I used back in August 2023 (attached) to build the dockerfile. A year old but I think it is still valid. In my approach, using multi-stage builds for Python dependencies is a good way to keep the docker image lean. For Spark JARs, you can use a similar strategy to ensur

Re: Spark Docker image with added packages

2024-10-15 Thread Damien Hawes
Herewith a more fleshed out example: An example of a *build.gradle.kts* file: plugins { id("java") } val sparkJarsDir = objects.directoryProperty().convention(layout.buildDirectory.dir("sparkJars")) repositories { mavenCentral() } val sparkJars: Configuration by configurations.creating

Re: Spark Docker image with added packages

2024-10-15 Thread Damien Hawes
The simplest solution that I have found in solving this was to use Gradle (or Maven, if you prefer), and list the dependencies that I want copied to $SPARK_HOME/jars as project dependencies. Summary of steps to follow: 1. Using your favourite build tool, declare a dependency on your required pack