Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

Xiao Li Tue, 15 Jan 2019 09:44:16 -0800

Hi, Yuming,

Thank you for your contributions! The community aims at reducing the
dependence on Hive. Currently, most of Spark users are not using Hive. The
changes looks risky to me.


To support Hadoop 3.x, we just need to resolve this JIRA:
https://issues.apache.org/jira/browse/HIVE-16391

Cheers,

Xiao

Yuming Wang <[email protected]> 于2019年1月15日周二 上午8:41写道：

> Dear Spark Developers and Users,
>
>
>
> Hyukjin and I plan to upgrade the built-in Hive from 1.2.1-spark2
> <https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2> to 2.3.4
> <https://github.com/apache/hive/releases/tag/rel%2Frelease-2.3.4> to
> solve some critical issues, such as support Hadoop 3.x, solve some ORC and
> Parquet issues. This is the list:
>
> *Hive issues*:
>
> [SPARK-26332 <https://issues.apache.org/jira/browse/SPARK-26332>][HIVE-10790]
> Spark sql write orc table on viewFS throws exception
>
> [SPARK-25193 <https://issues.apache.org/jira/browse/SPARK-25193>][HIVE-12505]
> insert overwrite doesn't throw exception when drop old data fails
>
> [SPARK-26437 <https://issues.apache.org/jira/browse/SPARK-26437>][HIVE-13083]
> Decimal data becomes bigint to query, unable to query
>
> [SPARK-25919 <https://issues.apache.org/jira/browse/SPARK-25919>][HIVE-11771]
> Date value corrupts when tables are "ParquetHiveSerDe" formatted and target
> table is Partitioned
>
> [SPARK-12014 <https://issues.apache.org/jira/browse/SPARK-12014>][HIVE-11100]
> Spark SQL query containing semicolon is broken in Beeline
>
>
>
> *Spark issues*:
>
> [SPARK-23534 <https://issues.apache.org/jira/browse/SPARK-23534>] Spark
> run on Hadoop 3.0.0
>
> [SPARK-20202 <https://issues.apache.org/jira/browse/SPARK-20202>] Remove
> references to org.spark-project.hive
>
> [SPARK-18673 <https://issues.apache.org/jira/browse/SPARK-18673>]
> Dataframes doesn't work on Hadoop 3.x; Hive rejects Hadoop version
>
> [SPARK-24766 <https://issues.apache.org/jira/browse/SPARK-24766>]
> CreateHiveTableAsSelect and InsertIntoHiveDir won't generate decimal column
> stats in parquet
>
>
>
>
>
> Since the code for the *hive-thriftserver* module has changed too much
> for this upgrade, I split it into two PRs for easy review.
>
> The first PR <https://github.com/apache/spark/pull/23552> does not
> contain the changes of hive-thriftserver. Please ignore the failed test in
> hive-thriftserver.
>
> The second PR <https://github.com/apache/spark/pull/23553> is complete
> changes.
>
>
>
> I have created a Spark distribution for Apache Hadoop 2.7, you might
> download it via Google Drive
> <https://drive.google.com/open?id=1cq2I8hUTs9F4JkFyvRfdOJ5BlxV0ujgt> or Baidu
> Pan <https://pan.baidu.com/s/1b090Ctuyf1CDYS7c0puBqQ>.
>
> Please help review and test. Thanks.
>

Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

Reply via email to