Re: using the Hive SQL parser in Spark

Reynold Xin Fri, 18 Dec 2015 13:32:13 -0800

(Please use reply-all so I see the replies)

Responses inline.



On Fri, Dec 18, 2015 at 1:17 PM, Yin Huai <huaiyin....@gmail.com> wrote:

> Let me add Reynold to the thread.
>
> On Fri, Dec 18, 2015 at 12:36 PM, Gopal Vijayaraghavan <gop...@apache.org>
> wrote:
>
>>
>> >We have looked into various options, and it looks like the best option is
>> >to copy the ANTLR grammar file from Hive into Spark. Because the grammar
>> >file is tightly coupled with Hive's semantic analysis, we need to
>> refactor
>> >some code to use them so it will end up becoming the .g file plus some
>> >coupled code.
>>
>> Is the eventual goal to contribute that fork back into Hive & have Hive
>> devs maintain a compatible parser for SparkSQL?
>>
>> Would that affect Hive's ability to refactor the SQL parser in the future
>> or is this a one-time only deal?
>
>
I am not sure if it is useful at all to port that back to Hive since it has
zero user facing benefit, and would require Hive devs to spend a lot of
time reviewing the changes. Refactoring like this is always risky for an
established project.


>
>>
>> >parser. From Hive's perspective this does not provide any immediate
>> >benefits. From Spark's perspective, we iterate very quickly so having to
>> >depend on an external component also slow down our development. We also
>> >have some requirements that simply don't apply in other projects (e.g.
>> >being able to parse DataFrame expressions).
>>
>> From that I assume, this involves some form of cut-paste duplication of
>> the code into SparkSQL project with that version diverging away from
>> Hive's.
>
>
That is correct.


>
>>
>> > Thanks a lot for developing this parser, and we will try our best to
>> > contribute back as we fix bugs. I will also make sure we have the proper
>> > acknowledgment when we do this.
>>
>>
>> Under the Apache license, there's no actual restriction against a hostile
>> embrace-extend by copying hive's code verbatim as long as the fork retains
>> license notices.
>>
>> The maintainability concerns are mostly around whether this is intended as
>> an ongoing relationship, including any compatibility committments from
>> hive-dev@.
>>
>
No commitments needed from Hive. You should update/improve the parser as
you see fit. We do have a pretty comprehensive suite of Hive compatibility
tests (by using the Hive tests directly) to ensure SQL compatibility with
Hive. We will continue running those. We will also try our best to
contribute back bug fixes to the parser.

Re: using the Hive SQL parser in Spark

Reply via email to