ns here will
> result with both conflicting.
>
> How can one add packages to their Spark (during the build process of the
> Docker image) - without causing unresolved conflicts?
>
> Thanks!
> Nimrod
>
>
> On Tue, Oct 15, 2024 at 6:53 PM Damien Hawes
> wrote:
>
appropriate jars to the configured spark
jars directory"
from(sparkJars)
into(sparkJarsDir)
}
Now, the *Dockerfile*:
FROM spark:3.5.3-scala2.12-java17-ubuntu
USER root
COPY --chown=spark:spark build/sparkJars/* "$SPARK_HOME/jars/"
USER spark
Kind regards,
Damien
On Tue,
The simplest solution that I have found in solving this was to use Gradle
(or Maven, if you prefer), and list the dependencies that I want copied to
$SPARK_HOME/jars as project dependencies.
Summary of steps to follow:
1. Using your favourite build tool, declare a dependency on your required
pack
Right now, with the structure of your data, it isn't possible.
The rows aren't duplicates of each other. "a" and "b" both exist in the
array. So Spark is correctly performing the join. It looks like you need to
find another way to model this data to get what you want to achieve.
Are the values of
Hi folks,
I'm contributing to the OpenLineage project, specifically the Apache Spark
integration. My current focus is on extending the project to support data
lineage extraction for Spark Streaming, beginning with Apache Kafka sources
and sinks.
I've encountered an obstacle when attempting to acc