Unfortunately it doesn't because our version of Hive has different syntax
elements and thus I need to patch them in (and a few other minor things).
It would be great if there would be a developer api on a somewhat higher
level.

On Thu, Aug 13, 2015 at 2:19 PM, Reynold Xin <r...@databricks.com> wrote:

> I believe for Hive, there is already a client interface that can be used
> to build clients for different Hive metastores. That should also work for
> your heavily forked one.
>
> For Hadoop, it is definitely a bigger project to refactor. A good way to
> start evaluating this is to list what needs to be changed. Maybe you can
> start by telling us what you need to change for every upgrade? Feel free to
> email me in private if this is sensitive and you don't want to share in a
> public list.
>
>
>
>
>
>
> On Thu, Aug 13, 2015 at 2:01 PM, Thomas Dudziak <tom...@gmail.com> wrote:
>
>> Hi,
>>
>> I have asked this before but didn't receive any comments, but with the
>> impending release of 1.5 I wanted to bring this up again.
>> Right now, Spark is very tightly coupled with OSS Hive & Hadoop which
>> causes me a lot of work every time there is a new version because I don't
>> run OSS Hive/Hadoop versions (and before you ask, I can't).
>>
>> My question is, does Spark need to be so tightly coupled with these two ?
>> Or put differently, would it be possible to introduce a developer API
>> between Spark (up and including e.g. SqlContext) and Hadoop (for HDFS bits)
>> and Hive (e.g. HiveContext and beyond) and move the actual Hadoop & Hive
>> dependencies into plugins (e.g. separate maven modules)?
>> This would allow me to easily maintain my own Hive/Hadoop-ish integration
>> with our internal systems without ever having to touch Spark code.
>> I expect this could also allow for instance Hadoop vendors to provide
>> their own, more optimized implementations without Spark having to know
>> about them.
>>
>> cheers,
>> Tom
>>
>
>

Reply via email to