How do we know that most Spark users are not using Hive? I wouldn't be surprised either way, but I do want to make sure we aren't making decisions based on any one person's (or one company's) experience about what "most" Spark users do.
On Tue, Jan 15, 2019 at 9:44 AM Xiao Li <gatorsm...@gmail.com> wrote: > Hi, Yuming, > > Thank you for your contributions! The community aims at reducing the > dependence on Hive. Currently, most of Spark users are not using Hive. The > changes looks risky to me. > > To support Hadoop 3.x, we just need to resolve this JIRA: > https://issues.apache.org/jira/browse/HIVE-16391 > > Cheers, > > Xiao > > Yuming Wang <wgy...@gmail.com> 于2019年1月15日周二 上午8:41写道: > >> Dear Spark Developers and Users, >> >> >> >> Hyukjin and I plan to upgrade the built-in Hive from 1.2.1-spark2 >> <https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2> to 2.3.4 >> <https://github.com/apache/hive/releases/tag/rel%2Frelease-2.3.4> to >> solve some critical issues, such as support Hadoop 3.x, solve some ORC and >> Parquet issues. This is the list: >> >> *Hive issues*: >> >> [SPARK-26332 <https://issues.apache.org/jira/browse/SPARK-26332>][HIVE-10790] >> Spark sql write orc table on viewFS throws exception >> >> [SPARK-25193 <https://issues.apache.org/jira/browse/SPARK-25193>][HIVE-12505] >> insert overwrite doesn't throw exception when drop old data fails >> >> [SPARK-26437 <https://issues.apache.org/jira/browse/SPARK-26437>][HIVE-13083] >> Decimal data becomes bigint to query, unable to query >> >> [SPARK-25919 <https://issues.apache.org/jira/browse/SPARK-25919>][HIVE-11771] >> Date value corrupts when tables are "ParquetHiveSerDe" formatted and target >> table is Partitioned >> >> [SPARK-12014 <https://issues.apache.org/jira/browse/SPARK-12014>][HIVE-11100] >> Spark SQL query containing semicolon is broken in Beeline >> >> >> >> *Spark issues*: >> >> [SPARK-23534 <https://issues.apache.org/jira/browse/SPARK-23534>] Spark >> run on Hadoop 3.0.0 >> >> [SPARK-20202 <https://issues.apache.org/jira/browse/SPARK-20202>] Remove >> references to org.spark-project.hive >> >> [SPARK-18673 <https://issues.apache.org/jira/browse/SPARK-18673>] >> Dataframes doesn't work on Hadoop 3.x; Hive rejects Hadoop version >> >> [SPARK-24766 <https://issues.apache.org/jira/browse/SPARK-24766>] >> CreateHiveTableAsSelect and InsertIntoHiveDir won't generate decimal column >> stats in parquet >> >> >> >> >> >> Since the code for the *hive-thriftserver* module has changed too much >> for this upgrade, I split it into two PRs for easy review. >> >> The first PR <https://github.com/apache/spark/pull/23552> does not >> contain the changes of hive-thriftserver. Please ignore the failed test in >> hive-thriftserver. >> >> The second PR <https://github.com/apache/spark/pull/23553> is complete >> changes. >> >> >> >> I have created a Spark distribution for Apache Hadoop 2.7, you might >> download it via Google Drive >> <https://drive.google.com/open?id=1cq2I8hUTs9F4JkFyvRfdOJ5BlxV0ujgt> or Baidu >> Pan <https://pan.baidu.com/s/1b090Ctuyf1CDYS7c0puBqQ>. >> >> Please help review and test. Thanks. >> > -- Ryan Blue Software Engineer Netflix