Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

Ryan Blue Tue, 15 Jan 2019 09:53:47 -0800

How do we know that most Spark users are not using Hive? I wouldn't be
surprised either way, but I do want to make sure we aren't making decisions
based on any one person's (or one company's) experience about what "most"
Spark users do.


On Tue, Jan 15, 2019 at 9:44 AM Xiao Li <gatorsm...@gmail.com> wrote:

> Hi, Yuming,
>
> Thank you for your contributions! The community aims at reducing the
> dependence on Hive. Currently, most of Spark users are not using Hive. The
> changes looks risky to me.
>
> To support Hadoop 3.x, we just need to resolve this JIRA:
> https://issues.apache.org/jira/browse/HIVE-16391
>
> Cheers,
>
> Xiao
>
> Yuming Wang <wgy...@gmail.com> 于2019年1月15日周二 上午8:41写道：
>
>> Dear Spark Developers and Users,
>>
>>
>>
>> Hyukjin and I plan to upgrade the built-in Hive from 1.2.1-spark2
>> <https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2> to 2.3.4
>> <https://github.com/apache/hive/releases/tag/rel%2Frelease-2.3.4> to
>> solve some critical issues, such as support Hadoop 3.x, solve some ORC and
>> Parquet issues. This is the list:
>>
>> *Hive issues*:
>>
>> [SPARK-26332 <https://issues.apache.org/jira/browse/SPARK-26332>][HIVE-10790]
>> Spark sql write orc table on viewFS throws exception
>>
>> [SPARK-25193 <https://issues.apache.org/jira/browse/SPARK-25193>][HIVE-12505]
>> insert overwrite doesn't throw exception when drop old data fails
>>
>> [SPARK-26437 <https://issues.apache.org/jira/browse/SPARK-26437>][HIVE-13083]
>> Decimal data becomes bigint to query, unable to query
>>
>> [SPARK-25919 <https://issues.apache.org/jira/browse/SPARK-25919>][HIVE-11771]
>> Date value corrupts when tables are "ParquetHiveSerDe" formatted and target
>> table is Partitioned
>>
>> [SPARK-12014 <https://issues.apache.org/jira/browse/SPARK-12014>][HIVE-11100]
>> Spark SQL query containing semicolon is broken in Beeline
>>
>>
>>
>> *Spark issues*:
>>
>> [SPARK-23534 <https://issues.apache.org/jira/browse/SPARK-23534>] Spark
>> run on Hadoop 3.0.0
>>
>> [SPARK-20202 <https://issues.apache.org/jira/browse/SPARK-20202>] Remove
>> references to org.spark-project.hive
>>
>> [SPARK-18673 <https://issues.apache.org/jira/browse/SPARK-18673>]
>> Dataframes doesn't work on Hadoop 3.x; Hive rejects Hadoop version
>>
>> [SPARK-24766 <https://issues.apache.org/jira/browse/SPARK-24766>]
>> CreateHiveTableAsSelect and InsertIntoHiveDir won't generate decimal column
>> stats in parquet
>>
>>
>>
>>
>>
>> Since the code for the *hive-thriftserver* module has changed too much
>> for this upgrade, I split it into two PRs for easy review.
>>
>> The first PR <https://github.com/apache/spark/pull/23552> does not
>> contain the changes of hive-thriftserver. Please ignore the failed test in
>> hive-thriftserver.
>>
>> The second PR <https://github.com/apache/spark/pull/23553> is complete
>> changes.
>>
>>
>>
>> I have created a Spark distribution for Apache Hadoop 2.7, you might
>> download it via Google Drive
>> <https://drive.google.com/open?id=1cq2I8hUTs9F4JkFyvRfdOJ5BlxV0ujgt> or Baidu
>> Pan <https://pan.baidu.com/s/1b090Ctuyf1CDYS7c0puBqQ>.
>>
>> Please help review and test. Thanks.
>>
>

-- 
Ryan Blue
Software Engineer
Netflix

Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

Reply via email to