Unfortunately it doesn't because our version of Hive has different syntax elements and thus I need to patch them in (and a few other minor things). It would be great if there would be a developer api on a somewhat higher level.
On Thu, Aug 13, 2015 at 2:19 PM, Reynold Xin <r...@databricks.com> wrote: > I believe for Hive, there is already a client interface that can be used > to build clients for different Hive metastores. That should also work for > your heavily forked one. > > For Hadoop, it is definitely a bigger project to refactor. A good way to > start evaluating this is to list what needs to be changed. Maybe you can > start by telling us what you need to change for every upgrade? Feel free to > email me in private if this is sensitive and you don't want to share in a > public list. > > > > > > > On Thu, Aug 13, 2015 at 2:01 PM, Thomas Dudziak <tom...@gmail.com> wrote: > >> Hi, >> >> I have asked this before but didn't receive any comments, but with the >> impending release of 1.5 I wanted to bring this up again. >> Right now, Spark is very tightly coupled with OSS Hive & Hadoop which >> causes me a lot of work every time there is a new version because I don't >> run OSS Hive/Hadoop versions (and before you ask, I can't). >> >> My question is, does Spark need to be so tightly coupled with these two ? >> Or put differently, would it be possible to introduce a developer API >> between Spark (up and including e.g. SqlContext) and Hadoop (for HDFS bits) >> and Hive (e.g. HiveContext and beyond) and move the actual Hadoop & Hive >> dependencies into plugins (e.g. separate maven modules)? >> This would allow me to easily maintain my own Hive/Hadoop-ish integration >> with our internal systems without ever having to touch Spark code. >> I expect this could also allow for instance Hadoop vendors to provide >> their own, more optimized implementations without Spark having to know >> about them. >> >> cheers, >> Tom >> > >