Re: Building Spark to use for Hive on Spark

Lefty Leverenz Sun, 22 Nov 2015 21:23:46 -0800

Gopal, can you confirm the doc change that Jone Zhang suggests?  The second
sentence confuses me:  "You can choose Spark1.5.0+ which  build include the
Hive jars."


Thanks.

-- Lefty


On Thu, Nov 19, 2015 at 8:33 PM, Jone Zhang <joyoungzh...@gmail.com> wrote:

> I should add that Spark1.5.0+ is used hive1.2.1 default when you use -Phive
>
> So this page
> <https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started>
>  shoule
> write like below
> “Note that you must have a version of Spark which does *not* include the
> Hive jars if you use Spark1.4.1 and before, You can choose Spark1.5.0+
> which  build include the Hive jars ”
>
>
> 2015-11-19 5:12 GMT+08:00 Gopal Vijayaraghavan <gop...@apache.org>:
>
>>
>>
>> > I wanted to know  why is it necessary to remove the Hive jars from the
>> >Spark build as mentioned on this
>>
>> Because SparkSQL was originally based on Hive & still uses Hive AST to
>> parse SQL.
>>
>> The org.apache.spark.sql.hive package contains the parser which has
>> hard-references to the hive's internal AST, which is unfortunately
>> auto-generated code (HiveParser.TOK_TABNAME etc).
>>
>> Everytime Hive makes a release, those constants change in value and that
>> is private API because of the lack of backwards-compat, which is violated
>> by SparkSQL.
>>
>> So Hive-on-Spark forces mismatched versions of Hive classes, because it's
>> a circular dependency of Hive(v1) -> Spark -> Hive(v2) due to the basic
>> laws of causality.
>>
>> Spark cannot depend on a version of Hive that is unreleased and
>> Hive-on-Spark release cannot depend on a version of Spark that is
>> unreleased.
>>
>> Cheers,
>> Gopal
>>
>>
>>
>

Re: Building Spark to use for Hive on Spark

Reply via email to