Re: Regarding Spark on Kubernetes(EKS)

Mich Talebzadeh Mon, 19 Feb 2024 06:33:57 -0800

Where is your docker file? In ECR container registry.
If you are going to use EKS, then it need to be accessible to all nodes of
cluster


When you build your docker image, put your jar under the $SPARK_HOME
directory. Then add a line to your docker build file as below
Here I am accessing Google BigQuery DW from EKS
# Add a BigQuery connector jar.
ENV SPARK_EXTRA_JARS_DIR=/opt/spark/jars/
ENV SPARK_EXTRA_CLASSPATH='/opt/spark/jars/*'
RUN mkdir -p "${SPARK_EXTRA_JARS_DIR}" \
    && chown spark:spark "${SPARK_EXTRA_JARS_DIR}"
COPY --chown=spark:spark \
    spark-bigquery-with-dependencies_2.12-0.22.2.jar
"${SPARK_EXTRA_JARS_DIR}"

Here I am accessing Google BigQuery DW from EKS cluster

HTH

Mich Talebzadeh,
Dad | Technologist | Solutions Architect | Engineer
London
United Kingdom


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".


On Mon, 19 Feb 2024 at 13:42, Jagannath Majhi <
[email protected]> wrote:

> Dear Spark Community,
>
> I hope this email finds you well. I am reaching out to seek assistance and
> guidance regarding a task I'm currently working on involving Apache Spark.
>
> I have developed a JAR file that contains some Spark applications and
> functionality, and I need to run this JAR file within a Spark cluster.
> However, the JAR file is located in an AWS S3 bucket. I'm facing some
> challenges in configuring Spark to access and execute this JAR file
> directly from the S3 bucket.
>
> I would greatly appreciate any advice, best practices, or pointers on how
> to achieve this integration effectively. Specifically, I'm looking for
> insights on:
>
>    1. Configuring Spark to access and retrieve the JAR file from an AWS
>    S3 bucket.
>    2. Setting up the necessary permissions and authentication mechanisms
>    to ensure seamless access to the S3 bucket.
>    3. Any potential performance considerations or optimizations when
>    running Spark applications with dependencies stored in remote storage like
>    AWS S3.
>
> If anyone in the community has prior experience or knowledge in this area,
> I would be extremely grateful for your guidance. Additionally, if there are
> any relevant resources, documentation, or tutorials that you could
> recommend, it would be incredibly helpful.
>
> Thank you very much for considering my request. I look forward to hearing
> from you and benefiting from the collective expertise of the Spark
> community.
>
> Best regards, Jagannath Majhi
>

Re: Regarding Spark on Kubernetes(EKS)

Reply via email to