Gopal, can you confirm the doc change that Jone Zhang suggests? The second sentence confuses me: "You can choose Spark1.5.0+ which build include the Hive jars."
Thanks. -- Lefty On Thu, Nov 19, 2015 at 8:33 PM, Jone Zhang <joyoungzh...@gmail.com> wrote: > I should add that Spark1.5.0+ is used hive1.2.1 default when you use -Phive > > So this page > <https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started> > shoule > write like below > “Note that you must have a version of Spark which does *not* include the > Hive jars if you use Spark1.4.1 and before, You can choose Spark1.5.0+ > which build include the Hive jars ” > > > 2015-11-19 5:12 GMT+08:00 Gopal Vijayaraghavan <gop...@apache.org>: > >> >> >> > I wanted to know why is it necessary to remove the Hive jars from the >> >Spark build as mentioned on this >> >> Because SparkSQL was originally based on Hive & still uses Hive AST to >> parse SQL. >> >> The org.apache.spark.sql.hive package contains the parser which has >> hard-references to the hive's internal AST, which is unfortunately >> auto-generated code (HiveParser.TOK_TABNAME etc). >> >> Everytime Hive makes a release, those constants change in value and that >> is private API because of the lack of backwards-compat, which is violated >> by SparkSQL. >> >> So Hive-on-Spark forces mismatched versions of Hive classes, because it's >> a circular dependency of Hive(v1) -> Spark -> Hive(v2) due to the basic >> laws of causality. >> >> Spark cannot depend on a version of Hive that is unreleased and >> Hive-on-Spark release cannot depend on a version of Spark that is >> unreleased. >> >> Cheers, >> Gopal >> >> >> >