Re: Spark client for Hadoop 2.x

Dongjoon Hyun Sun, 10 Apr 2022 14:48:54 -0700

Hi, Amin

In general, the Apache Spark community has received many feedbacks and been
moving forward to

- Use the latest Hadoop versions for more bug fixes including CVEs.
- Use Hadoop's shaded clients to minimize the dependency issues

Since the above is not achievable with Hadoop 2 clients, I believe the
official answer is `No` to (1). (Especially for your Hadoop 2.7 cluster
released in 2018.)

For the second question, Apache Spark community has been collaborating with
Apache Hadoop community in order to use the latest Apache Hadoop 3 clients
to connect old/new Hadoop clusters and public cloud environments. I believe
your production jobs should be fine if you are not relying on some
proprietary(=non-Apache Hadoop) features from private vendors. Please
report to the Apache Hadoop community or us if you hit unknown
compatibility issues.

Bests
Dongjoon.

On Fri, Apr 8, 2022 at 9:37 PM Amin Borjian <borjianami...@outlook.com>
wrote:

>
>
> From Spark version 3.1.0 onwards, the clients provided for Spark are built
> with Hadoop 3 and placed in maven repository. Unfortunately we use Hadoop
> 2.7.7 in our infrastructure currently.
>
>
>
> 1) Does Spark have a plan to publish the Spark client dependencies for
> Hadoop 2.x?
>
> 2) Are the new Spark clients capable of connecting to the Hadoop 2.x
> cluster? (According to a simple test, Spark client 3.2.1 had no problem
> with the Hadoop 2.7 cluster but we wanted to know if there was any
> guarantee from Spark?)
>
>
>
> Thank you very much in advance
>
> Amin Borjian
>
>
>

Re: Spark client for Hadoop 2.x

Reply via email to