Re: Hadoop profile change to hadoop-2 and hadoop-3 since Spark 3.3

2021-12-11 Thread Hyukjin Kwon
cc @Holden Karau  @DB Tsai  @Imran
Rashid  @Mridul Muralidharan  FYI

On Thu, 9 Dec 2021 at 14:07, angers zhu  wrote:

> Hi all,
>
> Since Spark 3.2, we have supported Hadoop 3.3.1 now, but its profile name
> is *hadoop-3.2* (and *hadoop-2.7*) that is not correct.
> So we made a change in https://github.com/apache/spark/pull/34715
> Starting from Spark 3.3, we use hadoop profile *hadoop-2* and *hadoop-3 *,
> and default hadoop profile is hadoop-3.
> Profile changes
>
> *hadoop-2.7* changed to *hadoop-2*
> *hadoop-3.2* changed to *hadoop-3*
> Release tar file
>
> Spark-3.3.0 with profile hadoop-3: *spark-3.3.0-bin-hadoop3.tgz*
> Spark-3.3.0 with profile hadoop-2: *spark-3.3.0-bin-hadoop2.tgz*
>
> For Spark 3.2.0, the release tar file was, for example,
> *spark-3.2.0-bin-hadoop3.2.tgz*.
> Pip install option changes
>
> For PySpark with/without a specific Hadoop version, you can install it by
> using PYSPARK_HADOOP_VERSION environment variables as below (Hadoop 3):
>
> PYSPARK_HADOOP_VERSION=3 pip install pyspark
>
> For Hadoop 2:
>
> PYSPARK_HADOOP_VERSION=2 pip install pyspark
>
> Supported values in PYSPARK_HADOOP_VERSION are now:
>
>- without: Spark pre-built with user-provided Apache Hadoop
>- 2: Spark pre-built for Apache Hadoop 2.
>- 3: Spark pre-built for Apache Hadoop 3.3 and later (default)
>
> Building Spark and Specifying the Hadoop Version
> 
>
> You can specify the exact version of Hadoop to compile against through the
> hadoop.version property.
> Example:
>
> ./build/mvn -Pyarn -Dhadoop.version=3.3.0 -DskipTests clean package
>
> or you can specify *hadoop-3* profile
>
> ./build/mvn -Pyarn -Phadoop-3 -Dhadoop.version=3.3.0 -DskipTests clean package
>
> If you want to build with Hadoop 2.x, enable *hadoop-2* profile:
>
> ./build/mvn -Phadoop-2 -Pyarn -Dhadoop.version=2.8.5 -DskipTests clean package
>
> Notes
>
> In the current master, it will use the default Hadoop 3 if you continue to
> use -Phadoop-2.7 and -Phadoop-3.2 to build Spark
> because Maven or SBT will just warn and ignore these non-existent profiles.
> Please change profiles to -Phadoop-2 or -Phadoop-3.
>


Re: Hadoop profile change to hadoop-2 and hadoop-3 since Spark 3.3

2021-12-11 Thread Hyukjin Kwon
and @tgra...@apache.org  too

On Sat, 11 Dec 2021 at 21:38, Hyukjin Kwon  wrote:

> cc @Holden Karau  @DB Tsai  @Imran
> Rashid  @Mridul Muralidharan  FYI
>
> On Thu, 9 Dec 2021 at 14:07, angers zhu  wrote:
>
>> Hi all,
>>
>> Since Spark 3.2, we have supported Hadoop 3.3.1 now, but its profile name
>> is *hadoop-3.2* (and *hadoop-2.7*) that is not correct.
>> So we made a change in https://github.com/apache/spark/pull/34715
>> Starting from Spark 3.3, we use hadoop profile *hadoop-2* and *hadoop-3 *,
>> and default hadoop profile is hadoop-3.
>> Profile changes
>>
>> *hadoop-2.7* changed to *hadoop-2*
>> *hadoop-3.2* changed to *hadoop-3*
>> Release tar file
>>
>> Spark-3.3.0 with profile hadoop-3: *spark-3.3.0-bin-hadoop3.tgz*
>> Spark-3.3.0 with profile hadoop-2: *spark-3.3.0-bin-hadoop2.tgz*
>>
>> For Spark 3.2.0, the release tar file was, for example,
>> *spark-3.2.0-bin-hadoop3.2.tgz*.
>> Pip install option changes
>>
>> For PySpark with/without a specific Hadoop version, you can install it by
>> using PYSPARK_HADOOP_VERSION environment variables as below (Hadoop 3):
>>
>> PYSPARK_HADOOP_VERSION=3 pip install pyspark
>>
>> For Hadoop 2:
>>
>> PYSPARK_HADOOP_VERSION=2 pip install pyspark
>>
>> Supported values in PYSPARK_HADOOP_VERSION are now:
>>
>>- without: Spark pre-built with user-provided Apache Hadoop
>>- 2: Spark pre-built for Apache Hadoop 2.
>>- 3: Spark pre-built for Apache Hadoop 3.3 and later (default)
>>
>> Building Spark and Specifying the Hadoop Version
>> 
>>
>> You can specify the exact version of Hadoop to compile against through
>> the hadoop.version property.
>> Example:
>>
>> ./build/mvn -Pyarn -Dhadoop.version=3.3.0 -DskipTests clean package
>>
>> or you can specify *hadoop-3* profile
>>
>> ./build/mvn -Pyarn -Phadoop-3 -Dhadoop.version=3.3.0 -DskipTests clean 
>> package
>>
>> If you want to build with Hadoop 2.x, enable *hadoop-2* profile:
>>
>> ./build/mvn -Phadoop-2 -Pyarn -Dhadoop.version=2.8.5 -DskipTests clean 
>> package
>>
>> Notes
>>
>> In the current master, it will use the default Hadoop 3 if you continue
>> to use -Phadoop-2.7 and -Phadoop-3.2 to build Spark
>> because Maven or SBT will just warn and ignore these non-existent
>> profiles.
>> Please change profiles to -Phadoop-2 or -Phadoop-3.
>>
>