The simplest solution that I have found in solving this was to use Gradle
(or Maven, if you prefer), and list the dependencies that I want copied to
$SPARK_HOME/jars as project dependencies.

Summary of steps to follow:

1. Using your favourite build tool, declare a dependency on your required
packages.
2. Write your Dockerfile, with or without the Spark binaries inside it.
3. Using your build tool to copy the dependencies to a location that the
Docker daemon can access.
4. Copy the dependencies into the correct directory.
5. Ensure those files have the correct permissions.

In my opinion, it is pretty easy to do this with Gradle.

Op di 15 okt. 2024 15:28 schreef Nimrod Ofek <ofek.nim...@gmail.com>:

> Hi all,
>
> I am creating a base Spark image that we are using internally.
> We need to add some packages to the base image:
> spark:3.5.1-scala2.12-java17-python3-r-ubuntu
>
> Of course I do not want to Start Spark with --packages "..." - as it is
> not efficient at all - I would like to add the needed jars to the image.
>
> Ideally, I would have add to my image something that will add the needed
> packages - something like:
>
> RUN $SPARK_HOME/bin/add-packages "..."
>
> But AFAIK there is no such option.
>
> Other than running Spark to add those packages and then creating the image
> - or running Spark always with --packages "..."  - what can I do?
> Is there a way to run just the code that is run by the --package command -
> without running Spark, so I can add the needed dependencies to my image?
>
> I am sure this is something that I am not the only one nor the first one
> to encounter...
>
> Thanks!
> Nimrod
>
>
>

Reply via email to